Container

ncbi-taxonomist comes with a Docker container and Singularity image. Both include jq to facilitate JSON handling. Both containers have the /dbs mountpoint to mount host directories, e.g. to use local databases.

Note

The commands shown here assume a current Linux system. Please adjust the commands to your system, accordingly.

Docker

The Docker container can be found at https://gitlab.com/janpb/ncbi-taxonomist/container_registry/. Please check the Docker Docs if some commands are unclear.

  • The Docker image creates the user user for the container to run all commands
  • The container has the mountpoint /dbs to bind host paths

Install

The latest ncbi-taxonomist Docker image can be pulled from registry.gitlab.com/janpb/ncbi-taxonomist:latest . It can be run with the command docker run registry.gitlab.com/janpb/ncbi-taxonomist.

If desired, the image can be tagged to a more concise tag name using docker tag registry.gitlab.com/janpb/ncbi-taxonomist ncbi-taxonomist.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
$: docker pull registry.gitlab.com/janpb/ncbi-taxonomist:latest
latest: Pulling from janpb/ncbi-taxonomist
cbdbe7a5bc2a: Pull complete
50d9a3e26028: Pull complete
a0e2567dead0: Pull complete
#cut
$: docker tag registry.gitlab.com/janpb/ncbi-taxonomist:latest ncbi-taxonomist
$: docker images
ncbi-taxonomist                             latest              f957b80d1034        22 hours ago        68.3MB
registry.gitlab.com/janpb/ncbi-taxonomist   latest              f957b80d1034        22 hours ago        68.3MB

Line 6 indicats cut output and the output on lines 3-8 and 12-13 will likely look different.

Test

Assuming the image is tagged ncbi-taxonomist, the following command should print the basic usage:

1
2
3
4
5
6
7
$: docker run --rm -it ncbi-taxonomist
usage: ncbi-taxonomist [--version] [-v] [--apikey APIKEY] {map,resolve,import,collect,subtree,group} ...

commands:
  {map,resolve,import,collect,subtree,group}
    map                 Map taxid to names and vice-versa
#cut

Basic usage

The examples assume the image has been tagged ncbi-taxonomist and show representative commands.

Mapping

1
2
$: docker run --rm -it  ncbi-taxonomist map -t 9606
{"mode":"mapping","query":"9606","cast":"taxon","taxon":{"taxid":9606,"rank":"species","names":{"Homo sapiens":"scientific_name","human":"GenbankCommonName","man":"CommonName"},"parentid":9605,"name":"Homo sapiens"}}

Resolving

1
2
3
$: docker run --rm -it  ncbi-taxonomist resolve -t 2 -n 'Arabidopsis'
{"mode":"resolve","query":"Arabidopsis","cast":"taxon","taxon":{"taxid":3701,"rank":"genus","names":{"Arabidopsis":"scientific_name","Cardaminopsis":"Synonym"},"parentid":980083,"name":"Arabidopsis"},"lineage":[{"taxid":3701,"rank":"genus","names":{"Arabidopsis":"scientific_name","Cardaminopsis":"Synonym"},"parentid":980083,"name":"Arabidopsis"},{"taxid":980083,"rank":"tribe","names":{"Camelineae":"scientific_name"},"parentid":3700,"name":"Camelineae"},{"taxid":3700,"rank":"family","names":{"Brassicaceae":"scientific_name"},"parentid":3699,"name":"Brassicaceae"},{"taxid":3699,"rank":"order","names":{"Brassicales":"scientific_name"},"parentid":91836,"name":"Brassicales"},{"taxid":91836,"rank":"clade","names":{"malvids":"scientific_name"},"parentid":71275,"name":"malvids"},{"taxid":71275,"rank":"clade","names":{"rosids":"scientific_name"},"parentid":1437201,"name":"rosids"},{"taxid":1437201,"rank":"clade","names":{"Pentapetalae":"scientific_name"},"parentid":91827,"name":"Pentapetalae"},{"taxid":91827,"rank":"clade","names":{"Gunneridae":"scientific_name"},"parentid":71240,"name":"Gunneridae"},{"taxid":71240,"rank":"clade","names":{"eudicotyledons":"scientific_name"},"parentid":1437183,"name":"eudicotyledons"},{"taxid":1437183,"rank":"clade","names":{"Mesangiospermae":"scientific_name"},"parentid":3398,"name":"Mesangiospermae"},{"taxid":3398,"rank":"class","names":{"Magnoliopsida":"scientific_name"},"parentid":58024,"name":"Magnoliopsida"},{"taxid":58024,"rank":"clade","names":{"Spermatophyta":"scientific_name"},"parentid":78536,"name":"Spermatophyta"},{"taxid":78536,"rank":"clade","names":{"Euphyllophyta":"scientific_name"},"parentid":58023,"name":"Euphyllophyta"},{"taxid":58023,"rank":"clade","names":{"Tracheophyta":"scientific_name"},"parentid":3193,"name":"Tracheophyta"},{"taxid":3193,"rank":"clade","names":{"Embryophyta":"scientific_name"},"parentid":131221,"name":"Embryophyta"},{"taxid":131221,"rank":"subphylum","names":{"Streptophytina":"scientific_name"},"parentid":35493,"name":"Streptophytina"},{"taxid":35493,"rank":"phylum","names":{"Streptophyta":"scientific_name"},"parentid":33090,"name":"Streptophyta"},{"taxid":33090,"rank":"kingdom","names":{"Viridiplantae":"scientific_name"},"parentid":2759,"name":"Viridiplantae"},{"taxid":2759,"rank":"superkingdom","names":{"Eukaryota":"scientific_name"},"parentid":131567,"name":"Eukaryota"},{"taxid":131567,"rank":"no rank","names":{"cellular organisms":"scientific_name"},"parentid":null,"name":"cellular organisms"}]}
{"mode":"resolve","query":"2","cast":"taxon","taxon":{"taxid":2,"rank":"superkingdom","names":{"Bacteria":"scientific_name","eubacteria":"GenbankCommonName","bacteria":"BlastName","Monera":"Inpart","Procaryotae":"Inpart","Prokaryota":"Inpart","Prokaryotae":"Inpart","prokaryote":"Inpart","prokaryotes":"Inpart"},"parentid":131567,"name":"Bacteria"},"lineage":[{"taxid":2,"rank":"superkingdom","names":{"Bacteria":"scientific_name","eubacteria":"GenbankCommonName","bacteria":"BlastName","Monera":"Inpart","Procaryotae":"Inpart","Prokaryota":"Inpart","Prokaryotae":"Inpart","prokaryote":"Inpart","prokaryotes":"Inpart"},"parentid":131567,"name":"Bacteria"},{"taxid":131567,"rank":"no rank","names":{"cellular organisms":"scientific_name"},"parentid":null,"name":"cellular organisms"}]}

Pipelines

1
2
3
$: docker run --rm  -i ncbi-taxonomist map -edb bioproject -a PRJNA604394 | \
   docker run --rm  -i ncbi-taxonomist resolve -m
{"mode":"resolve","query":"PRJNA604394","cast":"accs","accs":{"taxid":573,"accessions":{"project_id":604394,"project_acc":"PRJNA604394","project_name":"Klebsiella pneumoniae strain:S01"},"db":"bioproject","uid":604394},"lineage":[{"taxid":573,"rank":"species","names":{"Klebsiella pneumoniae":"scientific_name","'Klebsiella aerogenes' (Kruse) Taylor et al. 1956":"Synonym","Bacillus pneumoniae":"Synonym","Bacterium pneumoniae crouposae":"Synonym","Hyalococcus pneumoniae":"Synonym","Klebsiella pneumoniae aerogenes":"Synonym","Klebsiella sp. 2N3":"Includes","Klebsiella sp. C1(2016)":"Includes","Klebsiella sp. M-AI-2":"Includes","Klebsiella sp. PB12":"Includes","Klebsiella sp. RCE-7":"Includes","ATCC 13883":"type material","ATCC:13883":"type material","BCCM/LMG:2095":"type material","CCUG 225":"type material","CCUG:225":"type material","CDC 298-53":"type material","CDC:298-53":"type material","CIP 82.91":"type material","CIP:82.91":"type material","DSM 30104":"type material","DSM:30104":"type material","HAMBI 450":"type material","HAMBI:450":"type material","IAM 14200":"type material","IAM:14200":"type material","IFO 14940":"type material","IFO:14940":"type material","JCM 1662":"type material","JCM:1662":"type material","LMG 2095":"type material","LMG:2095":"type material","NBRC 14940":"type material","NBRC:14940":"type material","NCTC 9633":"type material","NCTC:9633":"type material"},"parentid":570,"name":"Klebsiella pneumoniae"},{"taxid":570,"rank":"genus","names":{"Klebsiella":"scientific_name"},"parentid":543,"name":"Klebsiella"},{"taxid":543,"rank":"family","names":{"Enterobacteriaceae":"scientific_name"},"parentid":91347,"name":"Enterobacteriaceae"},{"taxid":91347,"rank":"order","names":{"Enterobacterales":"scientific_name"},"parentid":1236,"name":"Enterobacterales"},{"taxid":1236,"rank":"class","names":{"Gammaproteobacteria":"scientific_name"},"parentid":1224,"name":"Gammaproteobacteria"},{"taxid":1224,"rank":"phylum","names":{"Proteobacteria":"scientific_name"},"parentid":2,"name":"Proteobacteria"},{"taxid":2,"rank":"superkingdom","names":{"Bacteria":"scientific_name"},"parentid":131567,"name":"Bacteria"},{"taxid":131567,"rank":"no rank","names":{"cellular organisms":"scientific_name"},"parentid":null,"name":"cellular organisms"}]}

Local database

To use local databases with the ncbi-taxonomist Docker container, the path on the host machine needs to be bound to the container’s internal mountpoint /dbs. To have the proper permissions, the --user argument needs to be set when writing to a local database. On Linux, this can be done via the id command (Listing 2).

Listing 2 Populating a local database using the ncbi-taxonomist Docker container. Line 4 shows how to run the container as current user.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
$ ls ${PWD}
#empty
$: docker run --rm -i ncbi-taxonomist collect -t 9606  \ |
   docker run --rm -i --user $(id -u):$(id -g) -v ${PWD}:/dbs ncbi-taxonomist import -db /dbs/dockertaxa.db
{"taxid":131567,"rank":"no rank","names":{"cellular organisms":"scientific_name"},"parentid":null,"name":"cellular organisms"}
{"taxid":2759,"rank":"superkingdom","names":{"Eukaryota":"scientific_name"},"parentid":131567,"name":"Eukaryota"}
{"taxid":33154,"rank":"clade","names":{"Opisthokonta":"scientific_name"},"parentid":2759,"name":"Opisthokonta"}
{"taxid":33208,"rank":"kingdom","names":{"Metazoa":"scientific_name"},"parentid":33154,"name":"Metazoa"}
{"taxid":6072,"rank":"clade","names":{"Eumetazoa":"scientific_name"},"parentid":33208,"name":"Eumetazoa"}
{"taxid":33213,"rank":"clade","names":{"Bilateria":"scientific_name"},"parentid":6072,"name":"Bilateria"}
{"taxid":33511,"rank":"clade","names":{"Deuterostomia":"scientific_name"},"parentid":33213,"name":"Deuterostomia"}
{"taxid":7711,"rank":"phylum","names":{"Chordata":"scientific_name"},"parentid":33511,"name":"Chordata"}
{"taxid":89593,"rank":"subphylum","names":{"Craniata":"scientific_name"},"parentid":7711,"name":"Craniata"}
#cut
$: ls ${PWD}
dockertaxa.db
$: docker run --rm -i -v ${PWD}:/dbs ncbi-taxonomist resolve -t 9606 -db /dbs/dockertaxa.db
{"mode":"resolve","query":"9606","cast":"taxon","taxon":{"taxid":9606,"rank":"species","names":{"Homo sapiens":"scientific_name","human":"GenbankCommonName","man":"CommonName"},"parentid":9605,"name":"Homo sapiens"},"lineage":[{"taxid":9606,"rank":"species","names":{"Homo sapiens":"scientific_name","human":"GenbankCommonName","man":"CommonName"},"parentid":9605,"name":"Homo sapiens"},{"taxid":9605,"rank":"genus","names":{"Homo":"scientific_name"},"parentid":207598,"name":"Homo"},{"taxid":207598,"rank":"subfamily","names":{"Homininae":"scientific_name"},"parentid":9604,"name":"Homininae"},{"taxid":9604,"rank":"family","names":{"Hominidae":"scientific_name"},"parentid":314295,"name":"Hominidae"},{"taxid":314295,"rank":"superfamily","names":{"Hominoidea":"scientific_name"},"parentid":9526,"name":"Hominoidea"},{"taxid":9526,"rank":"parvorder","names":{"Catarrhini":"scientific_name"},"parentid":314293,"name":"Catarrhini"},{"taxid":314293,"rank":"infraorder","names":{"Simiiformes":"scientific_name"},"parentid":376913,"name":"Simiiformes"},{"taxid":376913,"rank":"suborder","names":{"Haplorrhini":"scientific_name"},"parentid":9443,"name":"Haplorrhini"},{"taxid":9443,"rank":"order","names":{"Primates":"scientific_name"},"parentid":314146,"name":"Primates"},{"taxid":314146,"rank":"superorder","names":{"Euarchontoglires":"scientific_name"},"parentid":1437010,"name":"Euarchontoglires"},{"taxid":1437010,"rank":"clade","names":{"Boreoeutheria":"scientific_name"},"parentid":9347,"name":"Boreoeutheria"},{"taxid":9347,"rank":"clade","names":{"Eutheria":"scientific_name"},"parentid":32525,"name":"Eutheria"},{"taxid":32525,"rank":"clade","names":{"Theria":"scientific_name"},"parentid":40674,"name":"Theria"},{"taxid":40674,"rank":"class","names":{"Mammalia":"scientific_name"},"parentid":32524,"name":"Mammalia"},{"taxid":32524,"rank":"clade","names":{"Amniota":"scientific_name"},"parentid":32523,"name":"Amniota"},{"taxid":32523,"rank":"clade","names":{"Tetrapoda":"scientific_name"},"parentid":1338369,"name":"Tetrapoda"},{"taxid":1338369,"rank":"clade","names":{"Dipnotetrapodomorpha":"scientific_name"},"parentid":8287,"name":"Dipnotetrapodomorpha"},{"taxid":8287,"rank":"superclass","names":{"Sarcopterygii":"scientific_name"},"parentid":117571,"name":"Sarcopterygii"},{"taxid":117571,"rank":"clade","names":{"Euteleostomi":"scientific_name"},"parentid":117570,"name":"Euteleostomi"},{"taxid":117570,"rank":"clade","names":{"Teleostomi":"scientific_name"},"parentid":7776,"name":"Teleostomi"},{"taxid":7776,"rank":"clade","names":{"Gnathostomata":"scientific_name"},"parentid":7742,"name":"Gnathostomata"},{"taxid":7742,"rank":"clade","names":{"Vertebrata":"scientific_name"},"parentid":89593,"name":"Vertebrata"},{"taxid":89593,"rank":"subphylum","names":{"Craniata":"scientific_name"},"parentid":7711,"name":"Craniata"},{"taxid":7711,"rank":"phylum","names":{"Chordata":"scientific_name"},"parentid":33511,"name":"Chordata"},{"taxid":33511,"rank":"clade","names":{"Deuterostomia":"scientific_name"},"parentid":33213,"name":"Deuterostomia"},{"taxid":33213,"rank":"clade","names":{"Bilateria":"scientific_name"},"parentid":6072,"name":"Bilateria"},{"taxid":6072,"rank":"clade","names":{"Eumetazoa":"scientific_name"},"parentid":33208,"name":"Eumetazoa"},{"taxid":33208,"rank":"kingdom","names":{"Metazoa":"scientific_name"},"parentid":33154,"name":"Metazoa"},{"taxid":33154,"rank":"clade","names":{"Opisthokonta":"scientific_name"},"parentid":2759,"name":"Opisthokonta"},{"taxid":2759,"rank":"superkingdom","names":{"Eukaryota":"scientific_name"},"parentid":131567,"name":"Eukaryota"},{"taxid":131567,"rank":"no rank","names":{"cellular organisms":"scientific_name"},"parentid":null,"name":"cellular organisms"}]}

Docker ncbi-taxonomist and jq

To use the included jq, Docker’s run command has to be adjusted with the --entrypoint argument (Listing 3).

Listing 3 ncbi-taxonomist and jq together in the Docker container. Line 3 shows how to modify the Docker run command for jq.
1
2
3
4
5
$: docker run --rm -i ncbi-taxonomist map -a QZWG01000002.1 MG831203 | \
   docker run --rm -i ncbi-taxonomist resolve --mapping              | \
   docker run --rm -i --entrypoint 'jq' ncbi-taxonomist  -r  '[.query, .lineage[].name]|@tsv'
MG831203        Deformed wing virus     Iflavirus       Iflaviridae     Picornavirales  Pisoniviricetes Pisuviricota    Orthornavirae   Riboviria      Viruses
QZWG01000002.1  Glycine soja    Glycine subgen. Soja    Glycine Phaseoleae      indigoferoid/millettioid clade  NPAAA clade     50 kb inversion clade  Papilionoideae  Fabaceae        Fabales fabids  rosids  Pentapetalae    Gunneridae      eudicotyledons  Mesangiospermae Magnoliopsida Spermatophyta    Euphyllophyta   Tracheophyta    Embryophyta     Streptophytina  Streptophyta    Viridiplantae   Eukaryota       cellular organisms

Singularity

The Singularity container can be found at https://cloud.sylabs.io/library/jpb/ncbi-taxonomist/ncbi-taxonomist. Please check the Singularity Docs if some commands are unclear.

  • The Singularity image creates the user user for the container to run all commands
  • The container has the mountpoint /dbs to bind host paths

Install

The latest ncbi-taxonomist Singularity image can be pulled from https://cloud.sylabs.io/library/jpb/ncbi-taxonomist/ncbi-taxonomist using the command singularity pull library://jpb/ncbi-taxonomist/ncbi-taxonomist.

If desired, the image can be renamed to a more concise name.

1
2
3
4
$: singularity pull library://jpb/ncbi-taxonomist/ncbi-taxonomist
INFO:    Downloading library image
23.7MiB / 23.7MiB [==============================================================================] 100 % 545.9 KiB/s 0s
$: mv ncbi-taxonomist_latest.sif ncbi-taxonomist.sif

Line 3 will likely look different.

Build

The Singularity container can be built using the definition file container/SINGULARITY.def present in the repository.

For more Singularity building ootions check the corresponding man page (‘’man singularity build’‘) or documentation

To build locally, you need root permissions or use the --remote option for the build command (Listing 4):

Listing 4 Building the ncbi-taxonomist Singularity container locally. The command on line 1 requires root permissions while the command on line 2 uses the ‘’–remote’’ build option without root permissions.
1
2
$: singularity build ncbi-taxonomist.sif SINGULARITY.def
$: singularity build --remote  ncbi-taxonomist.sif SINGULARITY.def

Test

Assuming the image is named ncbi-taxonomist.sif, invoking the command without arguments shows the basic usage and indicating a succesful isntall(Listing 5):

Listing 5 ncbi-taxonomist usage
1
2
3
4
5
6
7
$: ./ncbi-taxonomist
usage: ncbi-taxonomist [--version] [-v] [--apikey APIKEY] {map,resolve,import,collect,subtree,group} ...

commands:
  {map,resolve,import,collect,subtree,group}
    map                 Map taxid to names and vice-versa
#cut

Basic usage

The examples assume the image is names ncbi-taxonomist.sif and show representative commands. The image can be used as an executable, i.e. it can be invoked as ./ncbi-taxonomist.sif. This corresponds to the command singularity run ncbi-taxonomist.sif. Listing 6 shows hoe to use both commands.

Mapping

1
2
$: ./ncbi-taxonomist.sif map -t 9606
{"mode":"mapping","query":"9606","cast":"taxon","taxon":{"taxid":9606,"rank":"species","names":{"Homo sapiens":"scientific_name","human":"GenbankCommonName","man":"CommonName"},"parentid":9605,"name":"Homo sapiens"}}

Resolving

1
2
3
$: ./ncbi-taxonomist.sif resolve -t 2 -n 'Arabidopsis'
{"mode":"resolve","query":"Arabidopsis","cast":"taxon","taxon":{"taxid":3701,"rank":"genus","names":{"Arabidopsis":"scientific_name","Cardaminopsis":"Synonym"},"parentid":980083,"name":"Arabidopsis"},"lineage":[{"taxid":3701,"rank":"genus","names":{"Arabidopsis":"scientific_name","Cardaminopsis":"Synonym"},"parentid":980083,"name":"Arabidopsis"},{"taxid":980083,"rank":"tribe","names":{"Camelineae":"scientific_name"},"parentid":3700,"name":"Camelineae"},{"taxid":3700,"rank":"family","names":{"Brassicaceae":"scientific_name"},"parentid":3699,"name":"Brassicaceae"},{"taxid":3699,"rank":"order","names":{"Brassicales":"scientific_name"},"parentid":91836,"name":"Brassicales"},{"taxid":91836,"rank":"clade","names":{"malvids":"scientific_name"},"parentid":71275,"name":"malvids"},{"taxid":71275,"rank":"clade","names":{"rosids":"scientific_name"},"parentid":1437201,"name":"rosids"},{"taxid":1437201,"rank":"clade","names":{"Pentapetalae":"scientific_name"},"parentid":91827,"name":"Pentapetalae"},{"taxid":91827,"rank":"clade","names":{"Gunneridae":"scientific_name"},"parentid":71240,"name":"Gunneridae"},{"taxid":71240,"rank":"clade","names":{"eudicotyledons":"scientific_name"},"parentid":1437183,"name":"eudicotyledons"},{"taxid":1437183,"rank":"clade","names":{"Mesangiospermae":"scientific_name"},"parentid":3398,"name":"Mesangiospermae"},{"taxid":3398,"rank":"class","names":{"Magnoliopsida":"scientific_name"},"parentid":58024,"name":"Magnoliopsida"},{"taxid":58024,"rank":"clade","names":{"Spermatophyta":"scientific_name"},"parentid":78536,"name":"Spermatophyta"},{"taxid":78536,"rank":"clade","names":{"Euphyllophyta":"scientific_name"},"parentid":58023,"name":"Euphyllophyta"},{"taxid":58023,"rank":"clade","names":{"Tracheophyta":"scientific_name"},"parentid":3193,"name":"Tracheophyta"},{"taxid":3193,"rank":"clade","names":{"Embryophyta":"scientific_name"},"parentid":131221,"name":"Embryophyta"},{"taxid":131221,"rank":"subphylum","names":{"Streptophytina":"scientific_name"},"parentid":35493,"name":"Streptophytina"},{"taxid":35493,"rank":"phylum","names":{"Streptophyta":"scientific_name"},"parentid":33090,"name":"Streptophyta"},{"taxid":33090,"rank":"kingdom","names":{"Viridiplantae":"scientific_name"},"parentid":2759,"name":"Viridiplantae"},{"taxid":2759,"rank":"superkingdom","names":{"Eukaryota":"scientific_name"},"parentid":131567,"name":"Eukaryota"},{"taxid":131567,"rank":"no rank","names":{"cellular organisms":"scientific_name"},"parentid":null,"name":"cellular organisms"}]}
{"mode":"resolve","query":"2","cast":"taxon","taxon":{"taxid":2,"rank":"superkingdom","names":{"Bacteria":"scientific_name","eubacteria":"GenbankCommonName","bacteria":"BlastName","Monera":"Inpart","Procaryotae":"Inpart","Prokaryota":"Inpart","Prokaryotae":"Inpart","prokaryote":"Inpart","prokaryotes":"Inpart"},"parentid":131567,"name":"Bacteria"},"lineage":[{"taxid":2,"rank":"superkingdom","names":{"Bacteria":"scientific_name","eubacteria":"GenbankCommonName","bacteria":"BlastName","Monera":"Inpart","Procaryotae":"Inpart","Prokaryota":"Inpart","Prokaryotae":"Inpart","prokaryote":"Inpart","prokaryotes":"Inpart"},"parentid":131567,"name":"Bacteria"},{"taxid":131567,"rank":"no rank","names":{"cellular organisms":"scientific_name"},"parentid":null,"name":"cellular organisms"}]}

Pipelines

1
2
3
$: ./ncbi-taxonomist.sif map -edb bioproject -a PRJNA604394 | \
   ./ncbi-taxonomist.sif resolve -m
{"mode":"resolve","query":"PRJNA604394","cast":"accs","accs":{"taxid":573,"accessions":{"project_id":604394,"project_acc":"PRJNA604394","project_name":"Klebsiella pneumoniae strain:S01"},"db":"bioproject","uid":604394},"lineage":[{"taxid":573,"rank":"species","names":{"Klebsiella pneumoniae":"scientific_name","'Klebsiella aerogenes' (Kruse) Taylor et al. 1956":"Synonym","Bacillus pneumoniae":"Synonym","Bacterium pneumoniae crouposae":"Synonym","Hyalococcus pneumoniae":"Synonym","Klebsiella pneumoniae aerogenes":"Synonym","Klebsiella sp. 2N3":"Includes","Klebsiella sp. C1(2016)":"Includes","Klebsiella sp. M-AI-2":"Includes","Klebsiella sp. PB12":"Includes","Klebsiella sp. RCE-7":"Includes","ATCC 13883":"type material","ATCC:13883":"type material","BCCM/LMG:2095":"type material","CCUG 225":"type material","CCUG:225":"type material","CDC 298-53":"type material","CDC:298-53":"type material","CIP 82.91":"type material","CIP:82.91":"type material","DSM 30104":"type material","DSM:30104":"type material","HAMBI 450":"type material","HAMBI:450":"type material","IAM 14200":"type material","IAM:14200":"type material","IFO 14940":"type material","IFO:14940":"type material","JCM 1662":"type material","JCM:1662":"type material","LMG 2095":"type material","LMG:2095":"type material","NBRC 14940":"type material","NBRC:14940":"type material","NCTC 9633":"type material","NCTC:9633":"type material"},"parentid":570,"name":"Klebsiella pneumoniae"},{"taxid":570,"rank":"genus","names":{"Klebsiella":"scientific_name"},"parentid":543,"name":"Klebsiella"},{"taxid":543,"rank":"family","names":{"Enterobacteriaceae":"scientific_name"},"parentid":91347,"name":"Enterobacteriaceae"},{"taxid":91347,"rank":"order","names":{"Enterobacterales":"scientific_name"},"parentid":1236,"name":"Enterobacterales"},{"taxid":1236,"rank":"class","names":{"Gammaproteobacteria":"scientific_name"},"parentid":1224,"name":"Gammaproteobacteria"},{"taxid":1224,"rank":"phylum","names":{"Proteobacteria":"scientific_name"},"parentid":2,"name":"Proteobacteria"},{"taxid":2,"rank":"superkingdom","names":{"Bacteria":"scientific_name"},"parentid":131567,"name":"Bacteria"},{"taxid":131567,"rank":"no rank","names":{"cellular organisms":"scientific_name"},"parentid":null,"name":"cellular organisms"}]}

Local database

To use local databases with the ncbi-taxonomist Singularity container, the path on the host machine needs to be bound to the container’s internal mountpoint /dbs via the --bind options, which cannot be used when using the executable form (Listing 6). However, the bind options can be stored in the enviromental variable SINGULARITY_BIND (Listing 7).

Listing 6 Populating a local database using the ncbi-taxonomist Singularity container. Lines 4 and 17 and shows how to bind the current working directory to the container. #cut indicates shortened output.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
$ ls ${PWD}
#empty
$: ./ncbi-taxonomist.sif collect -t 9606 | \
   singularity run --bind ${PWD}:/dbs ncbi-taxonomist.sif import -db /dbs/simgtaxa.db
{"taxid":131567,"rank":"no rank","names":{"cellular organisms":"scientific_name"},"parentid":null,"name":"cellular organisms"}
{"taxid":2759,"rank":"superkingdom","names":{"Eukaryota":"scientific_name"},"parentid":131567,"name":"Eukaryota"}
{"taxid":33154,"rank":"clade","names":{"Opisthokonta":"scientific_name"},"parentid":2759,"name":"Opisthokonta"}
{"taxid":33208,"rank":"kingdom","names":{"Metazoa":"scientific_name"},"parentid":33154,"name":"Metazoa"}
{"taxid":6072,"rank":"clade","names":{"Eumetazoa":"scientific_name"},"parentid":33208,"name":"Eumetazoa"}
{"taxid":33213,"rank":"clade","names":{"Bilateria":"scientific_name"},"parentid":6072,"name":"Bilateria"}
{"taxid":33511,"rank":"clade","names":{"Deuterostomia":"scientific_name"},"parentid":33213,"name":"Deuterostomia"}
{"taxid":7711,"rank":"phylum","names":{"Chordata":"scientific_name"},"parentid":33511,"name":"Chordata"}
{"taxid":89593,"rank":"subphylum","names":{"Craniata":"scientific_name"},"parentid":7711,"name":"Craniata"}
#cut
$: ls ${PWD}
simgtaxa.db
$: singularity run --bind ${PWD}:/dbs ncbi-taxonomist.sif resolve -t 9606 -db /dbs/simgtaxa.db
{"mode":"resolve","query":"9606","cast":"taxon","taxon":{"taxid":9606,"rank":"species","names":{"Homo sapiens":"scientific_name","human":"GenbankCommonName","man":"CommonName"},"parentid":9605,"name":"Homo sapiens"},"lineage":[{"taxid":9606,"rank":"species","names":{"Homo sapiens":"scientific_name","human":"GenbankCommonName","man":"CommonName"},"parentid":9605,"name":"Homo sapiens"},{"taxid":9605,"rank":"genus","names":{"Homo":"scientific_name"},"parentid":207598,"name":"Homo"},{"taxid":207598,"rank":"subfamily","names":{"Homininae":"scientific_name"},"parentid":9604,"name":"Homininae"},{"taxid":9604,"rank":"family","names":{"Hominidae":"scientific_name"},"parentid":314295,"name":"Hominidae"},{"taxid":314295,"rank":"superfamily","names":{"Hominoidea":"scientific_name"},"parentid":9526,"name":"Hominoidea"},{"taxid":9526,"rank":"parvorder","names":{"Catarrhini":"scientific_name"},"parentid":314293,"name":"Catarrhini"},{"taxid":314293,"rank":"infraorder","names":{"Simiiformes":"scientific_name"},"parentid":376913,"name":"Simiiformes"},{"taxid":376913,"rank":"suborder","names":{"Haplorrhini":"scientific_name"},"parentid":9443,"name":"Haplorrhini"},{"taxid":9443,"rank":"order","names":{"Primates":"scientific_name"},"parentid":314146,"name":"Primates"},{"taxid":314146,"rank":"superorder","names":{"Euarchontoglires":"scientific_name"},"parentid":1437010,"name":"Euarchontoglires"},{"taxid":1437010,"rank":"clade","names":{"Boreoeutheria":"scientific_name"},"parentid":9347,"name":"Boreoeutheria"},{"taxid":9347,"rank":"clade","names":{"Eutheria":"scientific_name"},"parentid":32525,"name":"Eutheria"},{"taxid":32525,"rank":"clade","names":{"Theria":"scientific_name"},"parentid":40674,"name":"Theria"},{"taxid":40674,"rank":"class","names":{"Mammalia":"scientific_name"},"parentid":32524,"name":"Mammalia"},{"taxid":32524,"rank":"clade","names":{"Amniota":"scientific_name"},"parentid":32523,"name":"Amniota"},{"taxid":32523,"rank":"clade","names":{"Tetrapoda":"scientific_name"},"parentid":1338369,"name":"Tetrapoda"},{"taxid":1338369,"rank":"clade","names":{"Dipnotetrapodomorpha":"scientific_name"},"parentid":8287,"name":"Dipnotetrapodomorpha"},{"taxid":8287,"rank":"superclass","names":{"Sarcopterygii":"scientific_name"},"parentid":117571,"name":"Sarcopterygii"},{"taxid":117571,"rank":"clade","names":{"Euteleostomi":"scientific_name"},"parentid":117570,"name":"Euteleostomi"},{"taxid":117570,"rank":"clade","names":{"Teleostomi":"scientific_name"},"parentid":7776,"name":"Teleostomi"},{"taxid":7776,"rank":"clade","names":{"Gnathostomata":"scientific_name"},"parentid":7742,"name":"Gnathostomata"},{"taxid":7742,"rank":"clade","names":{"Vertebrata":"scientific_name"},"parentid":89593,"name":"Vertebrata"},{"taxid":89593,"rank":"subphylum","names":{"Craniata":"scientific_name"},"parentid":7711,"name":"Craniata"},{"taxid":7711,"rank":"phylum","names":{"Chordata":"scientific_name"},"parentid":33511,"name":"Chordata"},{"taxid":33511,"rank":"clade","names":{"Deuterostomia":"scientific_name"},"parentid":33213,"name":"Deuterostomia"},{"taxid":33213,"rank":"clade","names":{"Bilateria":"scientific_name"},"parentid":6072,"name":"Bilateria"},{"taxid":6072,"rank":"clade","names":{"Eumetazoa":"scientific_name"},"parentid":33208,"name":"Eumetazoa"},{"taxid":33208,"rank":"kingdom","names":{"Metazoa":"scientific_name"},"parentid":33154,"name":"Metazoa"},{"taxid":33154,"rank":"clade","names":{"Opisthokonta":"scientific_name"},"parentid":2759,"name":"Opisthokonta"},{"taxid":2759,"rank":"superkingdom","names":{"Eukaryota":"scientific_name"},"parentid":131567,"name":"Eukaryota"},{"taxid":131567,"rank":"no rank","names":{"cellular organisms":"scientific_name"},"parentid":null,"name":"cellular organisms"}]}
Listing 7 Populating a local database using the ncbi-taxonomist Singularity container using the SINGULARITY_BIND enviromental variable. Line 1 shows how to set the enviromental variable and the echo command on line 2 should correspond to your current working directory. #result indicates the same results for the corresponding commands in Listing 6.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
$: export SINGULARITY_BIND="${PWD}:/dbs"
$: echo $SINGULARITY_BIND
/path/to/your/current/working/directory
$: ./ncbi-taxonomist.sif collect -t 9606  | \
   ./ncbi-taxonomist.sif import -db /dbs/simgtaxa.db
#result
$: ls ${PWD}
simgtaxa.db
$: ./ncbi-taxonomist.sif resolve -t 9606 -db /dbs/simgtaxa.db
#result

Singularity ncbi-taxonomist and jq

To use the included jq with the Singularity container, the run command has to used in conjunction with the –app option

Listing 8 Using ncbi-taxonomist and jq together in the Singularity container. Line 1 shows how to invoke jq to print its usage (cut for clarity). Line 5 shows the use of jq in a ncbi-taxonomist Singularity pipeline.
1
2
3
4
5
6
7
$: singularity run --app jq ncbi-taxonomist.sif
#jq usage
$: ./ncbi-taxonomist.sif map -a QZWG01000002.1 MG831203 | \
   ./ncbi-taxonomist.sif resolve --mapping              | \
   singularity run --app jq ncbi-taxonomist.sif  -r  '[.query, .lineage[].name]|@tsv'
  MG831203        Deformed wing virus     Iflavirus       Iflaviridae     Picornavirales  Pisoniviricetes Pisuviricota    Orthornavirae   Riboviria      Viruses
  QZWG01000002.1  Glycine soja    Glycine subgen. Soja    Glycine Phaseoleae      indigoferoid/millettioid clade  NPAAA clade     50 kb inversion clade  Papilionoideae  Fabaceae        Fabales fabids  rosids  Pentapetalae    Gunneridae      eudicotyledons  Mesangiospermae Magnoliopsida Spermatophyta    Euphyllophyta   Tracheophyta    Embryophyta     Streptophytina  Streptophyta    Viridiplantae   Eukaryota       cellular organisms