@tricoteuses/senat

v2.22.7

Published

16 hours ago

Handle French Sénat's open data

0High
0Medium
0Low

France open data Parliament Sénat

Tricoteuses-Senat

Retrieve, clean up & handle French Sénat's open data

Requirements

Node >= 22

Installation

git clone https://git.tricoteuses.fr/logiciels/tricoteuses-senat
cd tricoteuses-senat/

Create a .env file to set PostgreSQL database informations and other configuration variables (you can use example.env as a template). Then

npm install

Database creation (not needed if downloading with Docker image)

Using Docker

docker run --name local-postgres -d -p 5432:5432 -e POSTGRES_PASSWORD=$YOUR_CUSTOM_DB_PASSWORD postgres

Download data

Basic usage

Create a folder where the data will be downloaded and run the following command to download the data and convert it into JSON files.

mkdir ../senat-data/

npm run data:download ../senat-data

Available Commands

npm run data:download <dir>: Download, convert data to JSON
npm run data:retrieve_documents <dir>: Retrieval of textes and rapports from Sénat's website
npm run data:retrieve_agenda <dir>: Retrieval of agenda from Sénat's website
npm run data:retrieve_cr_seance <dir>: Retrieval of comptes-rendus de séance from Sénat's data
npm run data:retrieve_cr_commission <dir>: Retrieval of comptes-rendus de commissions from Sénat's website
npm run data:retrieve_senateurs_photos <dir>: Retrieval of sénateurs' pictures from Sénat's website

Filtering Options

Downloading all the data is long and takes up a lot of disk space. It is possible to choose the type of data that you want to retrieve to reduce the load.

Examples:

# Only download amendments
npm run data:download ../senat-data -- -k Ameli

# Only process data from session 2023 onwards
npm run data:download ../senat-data -- --fromSession 2023

Common Options

--categories or -k <name>: Filter by dataset categories (Available options: All, Ameli, Debats, DosLeg, Questions, Sens)
--fromSession <year>: Specify the session year to retrieve data from (default: 2022)
--dataDir <path> (Mandatory): Path to the working directory where all data is stored (required)
--silent or -s: Disable logging
--verbose or -v: Enable verbose logging
--commit or -c: Automatically commit converted data
--pull or -p: Pull repositories before starting
--clone or -C <url>: Clone Git repositories from a remote group or organization
--remote or -r <name>: Push commits to specified Git remote(s)
--keepDir: Keep directories when cleaning data
--only-recent <days>: Retrieve only documents created within the last N days

Options for Retrieving Documents

--formats <format>: Specify document formats to retrieve (options: xml, html, pdf)
--types <type>: Specify document types to retrieve (options: textes, rapports)
--parseDocuments: Parse documents after retrieval
--parseAgenda: Parse agenda after retrieval
--parseDebats: Parse comptes-rendus after retrieval

Examples

# Retrieval of textes and rapports in specific formats
npm run data:retrieve_documents ../senat-data -- --fromSession 2022 --formats xml pdf --types textes

# Retrieval & parsing (textes in xml format only for now)
npm run data:retrieve_documents ../senat-data -- --fromSession 2022 --parseDocuments

# Retrieval & parsing of agenda
npm run data:retrieve_agenda ../senat-data -- --fromSession 2022 --parseAgenda

# Retrieval & parsing of comptes-rendus de séance
npm run data:retrieve_cr_seance ../senat-data -- --parseDebats --keepDir

# Retrieval & parsing of comptes-rendus de commissions
npm run data:retrieve_cr_commission ../senat-data -- --parseDebats --keepDir

Data download using Docker

A Docker image that downloads and converts the data all at once is available. Build it locally or run it from the container registry. Use the environment variables FROM_SESSION and CATEGORIES if needed.

docker run --pull always --name tricoteuses-senat -v ../senat-data:/app/senat-data -d git.tricoteuses.fr/logiciels/tricoteuses-senat:latest

Use the environment variable CATEGORIES and FROM_SESSION if needed.

Using the data

Once the data is downloaded, you can use loaders to retrieve it. To use loaders in your project, you can install the @tricoteuses/senat package, and import the iterator functions that you need.

npm install @tricoteuses/senat

import { iterLoadSenatQuestions } from "@tricoteuses/senat/loaders"

// Pass data directory and legislature as arguments
for (const { item: question } of iterLoadSenatQuestions("../senat-data", 17)) {
  console.log(question.id)
}

Generation of raw types from SQL schema (for contributors only)

npm run data:generate_schemas ../senat-data

Publishing

To publish a new version of this package onto npm, bump the package version and publish.

# Increment version and create a new Git tag automatically
npm version patch   # +0.0.1 → small fixes
npm version minor   # +0.1.0 → new features
npm version major   # +1.0.0 → breaking changes
npx tsc
npm publish

The Docker image will be automatically built during a CI Workflow if you push the tag to the remote repository.

git push --tags