marcattacks
v2.3.0
Published
A powerful streaming MARC21 to RDF converter with JSONata transformation and S3 support.
Downloads
865
Maintainers
Readme
marcattacks!
Turn your MARC exports into something else.
Build
npm installnpm run build:tsnpm linkRun
Generate JSON:
marcattacks --to json ./data/sample.xmlWe can also do this for tar (and) gzipped files
marcattacks --to json ./data/sample.tar.gzGenerate Aleph sequential:
marcattacks --to alephseq ./data/sample.xmlGenerate RDF:
marcattacks --to rdf --map marc2rdf ./data/sample.xmlGenerate XML:
marcattacks --from alephseq --to xml ./data/one.alephseqTransform the MARC input using a JSONata expression or file:
marcattacks --param fix=./demo/demo.jsonata ./data/sample.xmlStdin
Use a pseudo URL stdin:// to read from the standard input
Remote files
A remote SFTP path:
marcattacks --key ~/.ssh/privatekey sftp://username@hostname:port/remote/pathThe latest XML file in a remote SFTP:
marcattacks --key ~/.ssh/privatekey sftp://username@hostname:port/remote/path/@latest:xmlAn HTTP path
marcattacks http://somewhere.org/data.xmlAn S3 path
marcattacks s3://accessKey:secretKey@hostname:port/bucket/keyuse s3s://... for using an SSL layer.
Options
Input (--from)
- alephseq (Aleph sequential)
- json
- jsonl
- marc (ISO2709)
- rdf
- csv
- tsv
- xml (MARCXML)
Output (--to)
- alephseq (Aleph sequential)
- json
- jsonl
- parquet
- rdf
- csv
- tsv
- xml (MARCXML)
Transform (--map)
- avram : A mapper from MARC to Avram
- jsonata : A jsonata fixer (default)
- marcids : A mapper from MARC to a list of record ids
- marc2rdf : A mapper from MARC to RDF (demonstrator)
Or, provide your own transformers using JavaScript plugins. See: ./plugin/demo.js for an example.
Param (--param)
Provide a params to the mapper, input and output. See examples:
npm run demo:jsonldnpm run demo:n3npm run biblio:one
Writable (--out)
- default: stdout
- file path
- sftp://username@host:port/path
- s3://accessKey:secretKey@host:port/bucket/key (or s3s://)
Logging (--info,--debug,--trace,--log)
Logging messages can be provided with the --info, --debug and --trace options.
Default the logging format is a text format that is written to stderr. This logging format and the output stream can be changed with the --log option:
--log json: write logs in a JSON format--log stdout: write logs to the stdout--log json+stdout: write logs in a JSON format and to the stdout
Compression (--z,--tar)
Gzip and tar compression of input files can be automatically detected by file name extension. If no such extensions are provided the following flags can be set to force decompression:
--z: the input file is gzipped--tar: the input file is tarred
Environment Variables
SFTP and S3 credentials can be set using environment variables or a local .env file.
Available variables:
- SFTP_USERNAME
- SFTP_PASSWORD
- S3_ACCESS_KEY
- S3_SECRET_KEY
A SFTP private key can be provided using the --key-env command line option. E.g. --key-env PRIVATE_KEY, which results reading a PRIVATE_KEY environment variable.
Discover files at a (remote) endpoint
Find all files that end with xml on an sftp site:
npx globtrotr --key ~/.ssh/mykey sftp://username@hostname:port/remote/path/@glob:xmlOr, for an S3 site:
npx globtrotr s3s://accessKey:privateKey@hostname:port/bucket/@glob:xmlConcatenate files
Some formats such as jsonl allow for concatenation of the output. With Bash grouped blocks marcattacks can then be used to concatenate files:
#!/bin/bash
# Example how to process files in sequence and concatenate the output
{
npx marcattacks --from alephseq --to jsonl data/one.alephseq
npx marcattacks --from xml --to jsonl data/sample.tar
npx marcattacks --from xml --to jsonl data/sample.tar.gz
npx marcattacks --from xml --to jsonl data/sample.xml.gz
npx marcattacks --from xml --to jsonl data/sample.xml
} | npx marcattacks --from jsonl --to xml stdin://