genehood-cli

v0.2.9-1

Published

4 years ago

Command line interface to generate GeneHood dataset.

Downloads

0High
0Medium
0Low

daviortega

bioinformatics mist3 comparative genomics gene neighborhood

genehood-cli

npm

Command-line interface to generate GeneHood datasets.

Dependencies

Genehood needs nodeJS version 10+ and ncbi-tools+ version 2.6+ to run.

Install

npm install -g genehood-cli

Usage

GeneHood uses MiST3 API to collect the necessary information needed for the analysis. Thus, the only inputs required from the user are:

A list of reference genes,
how many upstream and downstream genes should be in the analysis.
Phylogenetic analysis in Newick format (optional)

Reference Genes

GeneHood reads a list of reference genes from the user and searches for the upstream and downstream information from those genes on MiST3.

For this reason, GeneHood uses the MiST3 standard for gene identifiers called stable id.

It is a composite of the NCBI genome version and the locus number of the gene.

Here are some examples:

Performing GeneHood analysis

Once Genehood-cli is installed globally (-g option), NPM generates an executable called: genehood.

genehood takes one argument as the name of the project (in this example myNewProject) and a mandatory --action flag with four possible values:

Step 1: Initialize the project

To start a new analysis, we must initialize a new project.

genehood myProject --action init

This command will generate two files:

myProject.geneHood.config.json
myProject.geneHood.data.josn.gz

Now, we must edit the config file to tell GeneHood to which genes it should collect gene neighborhood information.

Step 2: Edit the config file to set initial parameters

genehood-cli version 0.2.8 has flags to facilitate this process, see below.

There are several parts in the GeneHood config file, but what matters is under the section user. There we will find three sub-sections:

Let's focus on the settings section first. It has three sub-sections that need user input:

For example, let us add as reference genes the _cheA_s from the three chemosensory systems in the Vibrio cholerae:

|system |stable Ids| |:-:|-| |F6|GCF_000006745.1-VC2063| |F7|GCF_000006745.1-VCA1095| |F9|GCF_000006745.1-VC1397|

and also, let us include 15 genes upstream and 15 downstream from the reference genes.

To do that, we can edit the config file using any text editor.

The user section of the config file will be something like this:

"user": {
 "newickTree": "",
 "settings": {
  "downstream": 15,
  "geneHoodPrefix": "vibrio",
  "stableIds": [
   "GCF_000006745.1-VC1397",
   "GCF_000006745.1-VC2063",
   "GCF_000006745.1-VCA1095"
  ],
  "upstream": 15
 },
 "startingStep": "fetchData",
 "stopStep": ""
}

Save the file and proceed to the next step.

Step 2 (alternative): Set parameters using flags.

We can set the genes downstream and upstream using --addRange

We can add the identifiers to a text file (one identifier per line) and pass to genehood using the flag --addStableIds.

If we put the identifiers into a file named vibrioIds.txt, we can accomplish the same setup as before by typing:

genehood myProject --addRange 10 10 --addStableIds vibrioIds.txt

Step 3: Running GeneHood

Make sure we have an Internet connection and that blastp and makeblastdb are executables in our systems.

then run:

genehood myProject --action run

That is it. GeneHood should do all the rest.

Step 4: Clean up

If everything goes as expected, we should have a file called myProject.geneHood.pack.json.gz in our directory. It probably should have a bunch of other files that GeneHood used temporarily.

We can safely remove these temp files using the action cleanUp from genehood:

genehood myProject --action cleanUp

GeneHood cleans all the files but 2: the config file and the pack file. It is a little redundant since GeneHood's pack also contains the config file. We made it this way to facilitate for the user to see how they ran the analysis or to re-run the analysis with few changes in the config file, if needed.

Now we just need to visualize the data.

Optional step 4.5: Add Phylogeny

We can add a phylogeny (in Newick format) to the config file at any moment, and the genehood-cli API has a helper option: --addPhylogeny. If we add the phylogeny after the pack has been built, genehood-cli will repack the file for us.

Adding phylogeny will let the viewer to order the gene clusters following the order of the phylogenetic tree. The tree can be built in any way: single gene, multiple concatenated genes and etc. However, in order for the viewer to work the names of the leafs need to be exactly the same as the identifiers of the reference genes.

To add a new phylogeny:

genehood myProject --addPhylogeny myPhylogeny.nwk

Step 5: Load the data on genehood.io

Open the GeneHood on a web browser and load the myProject.geneHood.pack.json.gz.

Now just explore the data.

To learn more about the GeneHood viewer, go to genehood.io and click in Demo.

Developers Documentation

Developer's Documentation

... to be continued.

Written with ❤ in Typescript.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

genehood-cli

Dependencies

Install

Usage

Reference Genes

Performing GeneHood analysis

Step 1: Initialize the project

Step 2: Edit the config file to set initial parameters

Step 2 (alternative): Set parameters using flags.

Step 3: Running GeneHood

Step 4: Clean up

Optional step 4.5: Add Phylogeny

Step 5: Load the data on genehood.io

Developers Documentation