@tricoteuses/senat
v2.22.3
Published
Handle French Sénat's open data
Readme
Tricoteuses-Senat
Retrieve, clean up & handle French Sénat's open data
Requirements
- Node >= 22
Installation
git clone https://git.tricoteuses.fr/logiciels/tricoteuses-senat
cd tricoteuses-senat/Create a .env file to set PostgreSQL database informations and other configuration variables (you can use example.env as a template). Then
npm installDatabase creation (not needed if downloading with Docker image)
Using Docker
docker run --name local-postgres -d -p 5432:5432 -e POSTGRES_PASSWORD=$YOUR_CUSTOM_DB_PASSWORD postgresDownload data
Basic usage
Create a folder where the data will be downloaded and run the following command to download the data and convert it into JSON files.
mkdir ../senat-data/
npm run data:download ../senat-dataAvailable Commands
npm run data:download <dir>: Download, convert data to JSONnpm run data:retrieve_documents <dir>: Retrieval of textes and rapports from Sénat's websitenpm run data:retrieve_agenda <dir>: Retrieval of agenda from Sénat's websitenpm run data:retrieve_cr_seance <dir>: Retrieval of comptes-rendus de séance from Sénat's datanpm run data:retrieve_cr_commission <dir>: Retrieval of comptes-rendus de commissions from Sénat's websitenpm run data:retrieve_senateurs_photos <dir>: Retrieval of sénateurs' pictures from Sénat's website
Filtering Options
Downloading all the data is long and takes up a lot of disk space. It is possible to choose the type of data that you want to retrieve to reduce the load.
Examples:
# Only download amendments
npm run data:download ../senat-data -- -k Ameli
# Only process data from session 2023 onwards
npm run data:download ../senat-data -- --fromSession 2023Common Options
--categoriesor-k <name>: Filter by dataset categories (Available options:All,Ameli,Debats,DosLeg,Questions,Sens)--fromSession <year>: Specify the session year to retrieve data from (default: 2022)--dataDir <path>(Mandatory): Path to the working directory where all data is stored (required)--silentor-s: Disable logging--verboseor-v: Enable verbose logging--commitor-c: Automatically commit converted data--pullor-p: Pull repositories before starting--cloneor-C <url>: Clone Git repositories from a remote group or organization--remoteor-r <name>: Push commits to specified Git remote(s)--keepDir: Keep directories when cleaning data--only-recent <days>: Retrieve only documents created within the last N days
Options for Retrieving Documents
--formats <format>: Specify document formats to retrieve (options:xml,html,pdf)--types <type>: Specify document types to retrieve (options:textes,rapports)--parseDocuments: Parse documents after retrieval--parseAgenda: Parse agenda after retrieval--parseDebats: Parse comptes-rendus after retrieval
Examples
# Retrieval of textes and rapports in specific formats
npm run data:retrieve_documents ../senat-data -- --fromSession 2022 --formats xml pdf --types textes
# Retrieval & parsing (textes in xml format only for now)
npm run data:retrieve_documents ../senat-data -- --fromSession 2022 --parseDocuments
# Retrieval & parsing of agenda
npm run data:retrieve_agenda ../senat-data -- --fromSession 2022 --parseAgenda
# Retrieval & parsing of comptes-rendus de séance
npm run data:retrieve_cr_seance ../senat-data -- --parseDebats --keepDir
# Retrieval & parsing of comptes-rendus de commissions
npm run data:retrieve_cr_commission ../senat-data -- --parseDebats --keepDirData download using Docker
A Docker image that downloads and converts the data all at once is available. Build it locally or run it from the container registry.
Use the environment variables FROM_SESSION and CATEGORIES if needed.
docker run --pull always --name tricoteuses-senat -v ../senat-data:/app/senat-data -d git.tricoteuses.fr/logiciels/tricoteuses-senat:latestUse the environment variable CATEGORIES and FROM_SESSION if needed.
Using the data
Once the data is downloaded, you can use loaders to retrieve it. To use loaders in your project, you can install the @tricoteuses/senat package, and import the iterator functions that you need.
npm install @tricoteuses/senatimport { iterLoadSenatQuestions } from "@tricoteuses/senat/loaders"
// Pass data directory and legislature as arguments
for (const { item: question } of iterLoadSenatQuestions("../senat-data", 17)) {
console.log(question.id)
}Generation of raw types from SQL schema (for contributors only)
npm run data:generate_schemas ../senat-dataPublishing
To publish a new version of this package onto npm, bump the package version and publish.
# Increment version and create a new Git tag automatically
npm version patch # +0.0.1 → small fixes
npm version minor # +0.1.0 → new features
npm version major # +1.0.0 → breaking changes
npx tsc
npm publishThe Docker image will be automatically built during a CI Workflow if you push the tag to the remote repository.
git push --tags