Cookbook

Reformatting results

Examples how to use jq to reformat JSON output. For more jq help, please refer to:

Convert accession lineages into TSV

Converting the lineage of several nucleotide accessions into a tab separated output. The queried accession is printed in the first field.

Substituting @tsv with @csv in the example will result in CSV output.

1
2
3
4
ncbi-taxonomist map -a QZWG01000002.1 MG831203 | ncbi-taxonomist resolve --mapping \|
jq -r  '[.query, .lineage[].name]|@tsv'
MG831203  Deformed wing virus Iflavirus Iflaviridae Picornavirales  Pisoniviricetes Pisuviricota  Orthornavirae Riboviria Viruses
QZWG01000002.1  Glycine soja  Glycine subgen. Soja  Glycine Phaseoleae  indigoferoid/millettioid clade  NPAAA clade 50 kb inversion clade Papilionoideae  Fabaceae  Fabales fabids  rosids  Pentapetalae  Gunneridae  eudicotyledons  Mesangiospermae Magnoliopsida Spermatophyta Euphyllophyta Tracheophyta  Embryophyta Streptophytina  Streptophyta  Viridiplantae Eukaryota cellular organisms

Convert a lineage into a table

Convert the lineage into a table with the tab separated columns taxid, rank, and parentid.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
ncbi-taxonomist resolve -t 9606 \ |
jq -r  '.lin[]|"\(.taxon_id) \(.name) \(.rank) \(.parent_id)"'
9606  Homo sapiens  species 9605
9605  Homo  genus 207598
207598  Homininae subfamily 9604
9604  Hominidae family  314295
314295  Hominoidea  superfamily 9526
9526  Catarrhini  parvorder 314293
314293  Simiiformes infraorder  376913
376913  Haplorrhini suborder  9443
9443  Primates  order 314146
314146  Euarchontoglires  superorder  1437010
1437010 Boreoeutheria clade 9347
9347  Eutheria  clade 32525
32525 Theria  clade 40674
40674 Mammalia  class 32524
32524 Amniota clade 32523
32523 Tetrapoda clade 1338369
1338369 Dipnotetrapodomorpha  clade 8287
8287  Sarcopterygii superclass  117571
117571  Euteleostomi  clade 117570
117570  Teleostomi  clade 7776
7776  Gnathostomata clade 7742
7742  Vertebrata  clade 89593
89593 Craniata  subphylum 7711
7711  Chordata  phylum  33511
33511 Deuterostomia clade 33213
33213 Bilateria clade 6072
6072  Eumetazoa clade 33208
33208 Metazoa kingdom 33154
33154 Opisthokonta  clade 2759
2759  Eukaryota superkingdom  131567
131567  cellular organisms  no rank null

Importing accessions

Mapping accessions fetched only the corresponding taxid but not all corresponding metadata.

Map accessions and collect corresponding taxa

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
ncbi-taxonomist map --entrezdb protein --accessions  AFR11853 AIA66128.1 | \
ncbi-taxonomist import -db taxa.db                                       | \
jq '.accession.taxid'                                                    | \
ncbi-taxonomist collect -t                                               | \
ncbi-taxonomist import -db taxa.db
{"taxid":10239,"rank":"superkingdom","names":{"Viruses":"scientific_name"},"parentid":null,"name":"Viruses"}
{"taxid":2559587,"rank":"clade","names":{"Riboviria":"scientific_name"},"parentid":10239,"name":"Riboviria"}
{"taxid":2732396,"rank":"kingdom","names":{"Orthornavirae":"scientific_name"},"parentid":2559587,"name":"Orthornavirae"}
{"taxid":2732408,"rank":"phylum","names":{"Pisuviricota":"scientific_name"},"parentid":2732396,"name":"Pisuviricota"}
{"taxid":2732507,"rank":"class","names":{"Stelpaviricetes":"scientific_name"},"parentid":2732408,"name":"Stelpaviricetes"}
{"taxid":2732551,"rank":"order","names":{"Stellavirales":"scientific_name"},"parentid":2732507,"name":"Stellavirales"}
{"taxid":39733,"rank":"family","names":{"Astroviridae":"scientific_name"},"parentid":2732551,"name":"Astroviridae"}
{"taxid":249588,"rank":"genus","names":{"Mamastrovirus":"scientific_name"},"parentid":39733,"name":"Mamastrovirus"}
{"taxid":1239567,"rank":"species","names":{"Mamastrovirus 3":"scientific_name","Porcine astrovirus":"EquivalentName"},"parentid":249588,"name":"Mamastrovirus 3"}
{"taxid":2585030,"rank":"no rank","names":{"unclassified Riboviria":"scientific_name"},"parentid":2559587,"name":"unclassified Riboviria"}
{"taxid":439490,"rank":"no rank","names":{"unclassified ssRNA viruses":"scientific_name"},"parentid":2585030,"name":"unclassified ssRNA viruses"}
{"taxid":35278,"rank":"clade","names":{"unclassified ssRNA positive-strand viruses":"scientific_name"},"parentid":439490,"name":"unclassified ssRNA positive-strand viruses"}
{"taxid":1224525,"rank":"species","names":{"Cadicistrovirus":"scientific_name"},"parentid":35278,"name":"Cadicistrovirus"}

Creating a valid XML file from line based XML output

To create a valid XML document from the line based output, the output has to be encapsulated between two root XML tags. On Linux, this can be achieved via process substitution as shown in Listing 1.

Listing 1 Creating valid XML from line based output. Line 3 shows the command to create a valid XML output. The xmllint command on line 4 is not required but demonstrates the validity of the created XML output.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
$: ncbi-taxonomist map  --accessions QZWG01000002.1 MG831203 | \
   ncbi-taxonomist resolve --xml  --mapping                  | \
   (echo "<root>" && cat && echo "</root>")                  | \
   xmllint --pretty 1 -
<?xml version="1.0"?>
<root>
  <resolve>
    <query value="MG831203" cast="accession">
      <accession>
        <taxid>198112</taxid>
        <!-- skip -->
  </resolve>
</root>