@tricoteuses/assemblee
v2.5.16
Published
Retrieve, clean up & handle French Assemblée nationale's open data
Downloads
4,208
Readme
Tricoteuses-Assemblee
Retrieve, clean up & handle French Assemblée nationale's open data
Tricoteuses Légifrance is free and open source software.
documentation
- Architecture
- TypeScript API
- Main interfaces:
- Acteur : personne physique élue ou nommée dans des organes
- Amendement
- CompteRendu : compte-rendu d'un débat parlementaire
- Document : texte d'un projet de loi, d'une proposition de loi, d'un rapport, etc
- DossierParlementaire : Dossier de suivi d'un projet ou d'une proposition de loi, d'une résolution, etc
- Organe : commission, groupe politique, groupe d'étude, groupe d'amitié, etc
- Question : question au Gouvernement
- Reunion : séance publique, réunion de commission, de groupe d'étude, etc
- Scrutin : vote de chaque député lors d'un scrutin public
- Main interfaces:
- JSON Schemas
Requirements
- Node >= 18
Installation
git clone https://git.tricoteuses.fr/logiciels/tricoteuses-assemblee
cd tricoteuses-assemblee/npm installDownload and clean data
Basic usage
Create a directory to store the data, then run the following command to download, reorganize and clean the data.
mkdir ../assemblee-data/
npm run data:download ../assemblee-dataAvailable Commands
npm run data:download <dir>: Download, reorganize, and clean datanpm run data:retrieve_open_data <dir>: Download raw data files.npm run data:reorganize_data <dir>: Reorganize raw files by entity.npm run data:clean_data <dir>: Clean and validate reorganized files.npm run data:retrieve_deputes_photos <dir>: Retrieval of députés' pictures from Assemblée nationale's websitenpm run data:retrieve_senateurs_photos <dir>: Retrieval of sénateurs' pictures from Assemblée nationale's websitenpm run data:retrieve_documents <dir>: Retrieval of legislative documents from Assemblée nationale's websitenpm run data:retrieve_pending_amendements <dir>: Retrieval of pending amendments from Assemblée nationale's website (waiting to be processed by Assemblée services)
Notes:
- Reorganized files (generated by the data:reorganize_data command) are also available in Tricoteuses / Data / Données brutes de l'Assemblée. They are updated on a regular basis.
- Split & cleaned files (generated by the data:clean_data command) are also available in Tricoteuses / Data / Données nettoyées de l'Assemblée with the
_nettoyesuffix. They are updated on a regular basis.
Filtering Options
Downloading and cleaning all the data is long and takes up a lot of disk space. It is possible to choose the type of data that you want to retrieve to reduce the load.
Examples:
# Only download amendments
npm run data:download ../assemblee-data -- -k Amendements
# Only process 16th and 17th legislatures
npm run data:download ../assemblee-data -- -l 16 -l 17Common Options
--categoriesor-k <name>: Filter by dataset categories (Available options :ActeursEtOrganes,Agendas,Amendements,DossiersLegislatifs,Photos,Scrutins,Questions,ComptesRendusSeances)--legislatureor-l <number>: Specify one or more legislatures to process (e.g.,-l 15 -l 16)--dataDir <path>(Mandatory): Path to the working directory where all data is stored (required)--silentor-s: Disable logging--verboseor-v: Enable verbose logging--fetchor-f: Force re-download of data even if already present--commitor-c: Automatically commit cleaned data--pullor-p: Pull repositories before starting--cloneor-C <url>: Clone Git repositories from a remote group or organization--remoteor-r <name>: Push commits to specified Git remote(s)--keepDir: Keep Dir (Implement before cleaning data)
If you use such options, use them in all subsequent commands too (data:regorganize_data and data:clean_data).
Options for Cleaning Data
--datasetor-d <name>: Clean a specific dataset only--no-reset-after-commit: Skip Git reset after committing (useful to preserve local changes)--no-validateor-V: Skip schema validation during cleaning--fetchDocuments: Specify to retrieve documents--parseDocuments: Specify to parse documents into cleaned json--fetchVideos: Retrieve videos
Options for Retrieving Documents
--fullor-f: Retrieve all documents, even those already downloaded--document-typeor-T <type>: Restrict to specific document types (e.g.,PION)
Download using Docker
A Docker image that downloads and cleans the data all at once is available. Build it locally or run it from the container registry.
Use the environment variables LEGISLATURE and CATEGORIES if needed.
docker run --pull always --name tricoteuses-assemblee -v ../assemblee-data:/app/assemblee-data -e LEGISLATURE=17 -d git.tricoteuses.fr/logiciels/tricoteuses-assemblee:latestUsing the data
Once the data is downloaded and cleaned, you can use loaders to retrieve it. To use loaders in your project, you can install the @tricoteuses/assemblee package, and import the iterator functions that you need.
npm install @tricoteuses/assembleeimport {
iterLoadAssembleeActeurs,
iterLoadAssembleeOrganes,
iterLoadAssembleeReunions,
iterLoadAssembleeScrutins,
iterLoadAssembleeDocuments,
iterLoadAssembleeDossiersParlementaires,
iterLoadAssembleeAmendements,
iterLoadAssembleeQuestions,
iterLoadAssembleeComptesRendus,
} from "@tricoteuses/assemblee/loaders"
// Pass data directory and legislature as arguments
for (const { acteur } of iterLoadAssembleeActeurs("../assemblee-data", 17)) {
console.log(acteur.uid)
}Generating schemas and documentation (for contributors only)
View instructions here
