pdb-images
v2.7.0
Published
Generates images from mmCIF/BCIF files
Maintainers
Readme
PDBImages
PDBImages is a command-line tool for generating images of macromolecular structures from mmCIF or binary CIF structure files based on Mol*.
Installing as a command-line tool
PDBImages is available in the npm registry. You can install it globally on your machine (requires Node.js >= 18):
npm install -g pdb-imagesUsage
NOTE: The following examples assume you installed PDBImages globally with npm install -g pdb-images. If you installed locally in the current directory (npm install pdb-images), use npx pdb-images instead of pdb-images. If you cloned the git repository and built it, use node ./lib/cli/pdb-images.js instead of pdb-images.
Print help:
pdb-images --helpGenerate all images for PDB entry 1ad5 and save in directory data/output_1ad5/, with default settings:
pdb-images 1ad5 data/output_1ad5/Generate images for AlphaFold entry AF-Q5VSL9-F1-model_v4 and save in directory data/output_Q5VSL9/, with default settings:
pdb-images AF-Q5VSL9-F1-model_v4 data/output_Q5VSL9/ --mode alphafoldAnother example, with all command line arguments given:
pdb-images 1hda data/output_1hda/ \
--input test_data/structures/1hda.cif \
--input-public https://www.ebi.ac.uk/pdbe/entry-files/download/1hda.bcif \
--mode pdb \
--api-url https://www.ebi.ac.uk/pdbe/api \
--api-retry \
--no-api \
--format webp \
--size 500x500 300x200 \
--render-each-size \
--type entry assembly \
--view front \
--opaque-background \
--no-axes \
--show-hydrogens \
--show-branched-sticks \
--ensemble-shades \
--allow-lowest-quality \
--date 2023-04-20 \
--clear \
--log DEBUGInput
Input is a structure file in mmCIF (.cif) or binary CIF (.bcif) format. The input file can be also compressed by GZIP (.cif.gz, .bcif.gz). If the --input option is not given, the input file will be retrieved from a public source (https://www.ebi.ac.uk/pdbe/entry-files/download/{id}.bcif for PDB mode, https://alphafold.ebi.ac.uk/files/{id}.bcif for AlphaFold mode). However, for this to work in AlphaFold mode, the user has to specify full identifier of a model in AlphaFold DB, e.g. "AF-Q5VSL9-F1-model_v4", not only "Q5VSL9".
Supplementary input data will be retrieved from the PDBe API. The default API URL is https://www.ebi.ac.uk/pdbe/api but can be changed by the --api-url option. The URL can use http:, https:, or file: protocol; using file: protocol allows the user to "plug in" custom data from a local directory, e.g. --api-url 'file://path-to-this-repository/test_data/api'. When using this approach, the organization of the files in the directory and the format of these file must imitate the PDBe API endpoints; see test_data/api/ directory for a demonstration. If the program cannot find a specific file in the directory, it will print a warning and proceed as if the API returned an empty JSON response ({}).
Overview of accessed API endpoints (will be prefixed by the API URL and {id} will be replaced by the entry ID (i.e. the first command line argument)):
/pdb/entry/molecules/{id}– for entity names in the image captions (not essential)/pdb/entry/summary/{id}– for preferred assembly information (not essential)/pdb/entry/modified_AA_or_NA/{id}– for modified residue data (essential formodresimages)/mappings/{id},/nucleic_mappings/{id}– for SIFTS domain mappings (essential fordomainimages)/validation/residuewise_outlier_summary/entry/{id}– for validation report data (essential forvalidationimages)
With the --no-api option, API will not be used at all. Running without API will affect the program's behavior as follows:
- the image types that vitally depend on the API data (i.e.
domain,modres,validation) will not be generated; - some features can behave slightly differently (entity names for captions will be retrieved from the structure file instead of the API data;
entityimages will be rendered using the first assembly instead of the preferred assembly); - the final self-check, whether all expected images have been generated, will be skipped.
The legacy PDB file format is not directly supported by pdb-images. For convenience, this package provides a script for conversion of PDB files to mmCIF, which can then be passed to pdb-images. However, correct behavior with the converted files cannot be guaranteed, as the internal logic of the PDB format is fundamentally different from mmCIF, and this conversion should not be relied on. Use original mmCIF files whenever possible. Usage:
pdb2cif data/1ad5.pdb data/1ad5.cifOutput
Image files
The program creates a collection of image types. Each scene can be rendered in different views (front, side, top; --view option) and in different resolutions (--size option).
The rendered images can be saved in a variety of file formats (PNG (default), WEBP, JPEG, GIF, TIFF, AVIF; --format option).
Besides the images, the program also saves .molj files (Mol* plugin states, aka snapshots, which can be loaded in Mol*) and .caption.json files (image captions).
(Names of the individual files may be a bit confusing, as they were inherited from an older image generation process. See section Generated image types for explanation of the filenames.)
(Note that output image files in WEBP format tend to be much smaller when rendered with --opaque-background).
Summary files
After generating all images, two summary files are created:
{pdb}_filelistcontains the list of created images{pdb}.jsoncontains the structured list of created images, including their captions and some other metadata.
These summary files contain filenames without suffixes, e.g. 1ad5_deposited_chain_front instead of the full filename 1ad5_deposited_chain_front_image-800x800.png. To get full filenames, you must combine the filenames in the "image" sections and the suffixes in the "image_suffix" section of the JSON summary file (e.g 1ad5_deposited_chain_front + _image-800x800.png -> 1ad5_deposited_chain_front_image-800x800.png).
If the output directory contains older files from previous runs, these will also be included in the summary files (run with --clear to remove any older files instead). If you only want to update the summary files based on the current contents of the output directory without generating any new images, run with --type (without specifying any type).
After creating all output files, the program will perform a self-check, i.e. it will compare the expected list of output files (based purely on API data, agnostic to the structure file) with the actual list of generated output files. In case that any expected file is missing, the program will print an error message, save the expected file list to {id}_expected_files.txt, and terminate with a non-zero exit code. This self-check is skipped when using --no-api.
Generated image types
PDBImages generates many types of images. By default, it will create all image types that make sense for the selected mode (pdb/alphafold) and entry. Alternatively, the user can select a subset of image types by the --type option. These are all the available types:
entry– Create images of the whole deposited structure, colored by chains and colored by entities (i.e. chemically distinct molecules).- –>
{pdb}_deposited_chain_{view}_image-{size}.png - –>
{pdb}_deposited_chemically_distinct_molecules_{view}_image-{size}.png
- –>
assembly– For each assembly listed in the mmCIF file, create images of the whole assembly, colored by chains and colored by entities.- –>
{pdb}_assembly_{assembly}_chain_{view}_image-{size}.png - –>
{pdb}_assembly_{assembly}_chemically_distinct_molecules_{view}_image-{size}.png
- –>
entity- For each distinct entity, create an image of the preferred assembly with this entity highlighted. This excludes the water entity. If an entity is not present in the preferred assembly, the program will instead use the first assembly where this entity is present (e.g. entity 5 in 7nys). If an entity is not present in any assembly, the deposited model will be used instead (e.g. entity 3 in 6ml1).- –>
{pdb}_entity_{entity}_{view}_image-{size}.png
- –>
domain– Create images for SIFTS mappings (CATH, SCOP, Pfam, Rfam). Namely, for each combination of SIFTS family and entity, select a chain belonging to that entity and create an image of the chain with highlighted SIFTS domain(s). If there are domains from the same family in different entities, process each of them separately. If there are multiple domains from the same family in the same entity but in different chains, process just one of the chains. If there are multiple domains from the same family within one chain, render this chain with each domain highlighted in a different color (choose the chain with most domain in such case). Requires API.- –>
{pdb}_{entity}_{chain}_{source}_{family}_image-{size}.png
- –>
ligand– For each distinct non-polymer entity in the structure (with the exception of water), create an image of this molecule highlighted plus its surrounding. If there are multiple instances of the same entity, only process one of them.- –>
{pdb}_ligand_{ligand}_image-{size}.png
- –>
modres– For each distinct modified residue in the structure, create an image of the preferred assembly with all instances of this modified residue highlighted. Requires API.- –>
{pdb}_modres_{modres}_{view}_image-{size}.png
- –>
bfactor– Create an image of the deposited structure in putty representation with color-coded B-factors. Skip if the structure is not from a diffraction method (thus B-factors are not available).- –>
{pdb}_bfactor_image-{size}.png
- –>
validation– Create an image of the deposited structure with color-coded validation data. Requires API.- –>
{pdb}_validation_geometry_deposited_image-{size}.png
- –>
plddt– Create an image of the deposited structure with color-coded pLDDT values. This is only for--mode alphafold.- –>
{pdb}_modres_{modres}_image-{size}.png
- –>
all– A shortcut to create all meaningful image types (i.e. all butplddtinpdbmode,plddtinalphafoldmode).
By default, some image types are rendered in three views (front, side, top view) with axis arrows shown in the left bottom corner, while other image types are only rendered in front view without axis arrows. This can be changed by the --view and --no-axes options.
By default, the images are rendered in one resolution, 800x800. This can be changed by the --size option. If multiple sizes are provided (e.g. --size 100x100 800x800 1600x1600), only the largest size (measured by area) will be rendered and the others will be obtained by resizing (use --render_each_size to render each size separately).
If you use --size without any value, no images will be rendered but captions (.caption.json) and state files (.molj) will still be created.
Overview of the command-line arguments
positional arguments:
entry_id Entry identifier (PDB ID or AlphaFoldDB ID).
output_dir Output directory.
optional arguments:
-h, --help show this help message and exit
-v, --version Print version info and exit.
--input INPUT Input structure file path or URL (.cif, .bcif,
.cif.gz, .bcif.gz).
--input-public INPUT_PUBLIC
Input structure URL to use in saved Mol* states (.molj
files) (cif or bcif format).
--mode {pdb,alphafold}
Mode.
--api-url API_URL PDBe API URL (can use http:, https:, or file: protocol).
Default: https://www.ebi.ac.uk/pdbe/api.
--api-retry Retry any failed API call up to 5 times, waiting
random time (up to 30 seconds) before each retry.
--no-api Do not use PDBe API at all (some images will be
skipped, some entity names will be different in
captions, etc.).
--format [{png,webp,jpg,jpeg,gif,tif,tiff,avif} ...]
One or more output image formats. Default: png.
--size [SIZE ...] One or more output image sizes, e.g. 800x800 200x200.
Default: 800x800. Only the largest size is rendered,
others are obtained by resizing unless
--render_each_size is used. Use without any value to
disable image rendering (only create captions and MOLJ
files).
--render-each-size Render image for each size listed in --size, instead
of rendering only the first size and resampling to the
other sizes.
--type [{entry,assembly,entity,domain,ligand,modres,bfactor,validation,plddt,all} ...]
One or more image types to be created. Use "all" as a
shortcut for all types. See README.md for details on
image types. Default: all. Use without any value to
skip all types (only create summary files from
existing outputs).
--view {front,all,auto}
Select which views should be created for each image
type (front view / all views (front, side, top) / auto
(creates all views only for these image types: entry,
assembly, entity, modres, plddt)). Default: auto.
--opaque-background Render opaque background in images (default:
transparent background).
--no-axes Do not render axis indicators aka PCA arrows (default:
render axes when rendering the same scene from
multiple view angles (front, side, top)).
--show-hydrogens Show hydrogen atoms in ball-and-stick visuals
(default: always ignore hydrogen atoms).
--show-branched-sticks
Show semi-transparent ball-and-stick visuals for
branched entities (i.e. carbohydrates) in addition to
the default 3D-SNFG visuals.
--ensemble-shades Show individual models within an ensemble in different
shades of the base color (lighter and darker),
default: use the same colors for all models.
--allow-lowest-quality
Allow any quality level for visuals, including
"lowest", which is really ugly (default: allow only
"lower" quality level and better).
--force-bfactor Force outputting "bfactor" image type even if the structure is
not from X-ray (this might be necessary for custom mmCIF files
with missing information about experimental methods).
--date DATE Date to use as "last_modification" in the caption JSON
(default: today's date formatted as YYYY-MM-DD).
--clear Remove all contents of the output directory before
running.
--log {ALL,TRACE,DEBUG,INFO,WARN,ERROR,FATAL,MARK,OFF}
Set logging level. Default: INFO.Run in Docker
NOTE: Docker image for PDBImages uses Xvfb, which results in much worse performance compared to running it directly on a machine with GPU (see FAQ).
Get image from repository and run
docker run -v ~/data/output_1ad5:/out pdbegroup/pdb-images 1ad5 /outBuild and run
docker build . -t pdb-images # if you run it on the same architecture as build
docker build . -t pdb-images --platform linux/amd64 # if you need it for a different architecture
docker run -v ~/data/output_1ad5:/out pdb-images 1ad5 /outRun in Singularity
singularity build ./pdb-images docker://pdbegroup/pdb-images
singularity run --env XVFB_DIR=~/data/xvfb ./pdb-images 1ad5 ~/data/output_1ad5It is important to set XVFB_DIR variable to an existing mounted directory (use --bind if paths are not mounted automatically). When running multiple jobs in parallel, set a separate XVFB_DIR for each job.
Including as a dependency
PDBImages is available in the npm registry. You can add it as a dependency to your own package (requires Node.js >= 18):
npm install pdb-imagesThen you can call the asynchronous main function (and others) in your code. This example shows how to call main from TypeScript code:
import { createArgs } from 'pdb-images/lib/args';
import { main } from 'pdb-images/lib/main';
main(createArgs('1ad5', 'data/output_1ad5/', { size: [{ width: 1600, height: 1200 }], view: 'front', clear: true }));In TypeScript configuration (tsconfig.js) use "module": "CommonJS".
Development
Install dependencies
npm installRequires Node.js >= 18. See FAQ if installation fails on the gl package.
Build
rm -rf ./lib/ # For a clean build
npm run buildBuild automatically on file save:
npm run watchTest
npm run lint
npm run jestRelease
To release a new version of this package:
- Change version in
package.json - Change version in
src/main.ts(export const VERSION = ...) - Run tests (will check if the versions match)
- Update
CHANGELOG.md - Commit and push to
mainbranch (use the version as the commit message, e.g.2.0.0) - Create a git tag using semantic versioning (e.g.
2.0.0); do not start the tag with "v" (e.g.v2.0.0) - GitHub workflow will automatically publish npm package (https://www.npmjs.com/package/pdb-images)
- GitHub workflow will automatically publish Docker images (https://hub.docker.com/r/pdbegroup/pdb-images and dockerhub.ebi.ac.uk/pdbe/packages/pdb-images)
Citing
If you found PDBImages helpful, please cite:
Midlik A, Nair S, Anyango S, Deshpande M, Sehnal D, Varadi M, Velankar S (2023) PDBImages: a command-line tool for automated macromolecular structure visualization. Bioinformatics, 39(12), btad744. https://doi.org/10.1093/bioinformatics/btad744
FAQ
npm installfails on theglpackage, printing something like:... npm ERR! gyp ERR! not ok ...This is probably because some dependencies needed to build the
glpackage are missing and/or Python path is not set correctly. Try this:sudo apt-get install -y build-essential libxi-dev libglu1-mesa-dev libglew-dev pkg-config export NODE_GYP_FORCE_PYTHON=$(which python3)or follow instructions here: https://www.npmjs.com/package/gl#system-dependencies
Installation completed successfully and running
pdb-images --helpworks fine, but trying to run image generation gives an error like this:var ext = gl.getExtension('ANGLE_instanced_arrays'); TypeError: Cannot read properties of null (reading 'getExtension')This will be thrown when X server is not available on the machine, which is a common situation in large computing infrastructures or cloud environments.
The easiest solution is to use
XvfbX server:sudo apt-get install xvfb xvfb-run --auto-servernum pdb-images 1ad5 data/output_1ad5/This approach is used for the GitHub testing workflow (
sudo apt-get install xvfb && xvfb-run --auto-servernum npm run jest). It is also used in the enclosed Dockerfile.The downside of this approach is that
Xvfbis a purely software implementation and cannot use GPU (this information cannot be found in any official source but a bunch of people on StackOverflow say so), thus not allowing the full performance potential of PDBImages.Installation completed successfully and running
pdb-images --helpworks fine, but trying to run image generation gives an error like this:ReferenceError: fetch is not definedThis is probably because you are using an older version of Node.js. Version 18 or higher is required to run PDBImages.
When you update Node.js, make sure to uninstall the PDBImages package and then install it again:
npm uninstall -g pdb-images npm install -g pdb-images(use
-gonly if you install globally)
