Cookbook¶
Contents
Reformatting results¶
Examples how to use jq
to reformat JSON
output. For more jq
help,
please refer to:
Convert accession lineages into TSV¶
Converting the lineage of several nucleotide accessions into a tab separated output. The queried accession is printed in the first field.
Substituting @tsv
with @csv
in the example will result in CSV output.
1 2 3 4 | ncbi-taxonomist map -a QZWG01000002.1 MG831203 | ncbi-taxonomist resolve --mapping \|
jq -r '[.query, .lineage[].name]|@tsv'
MG831203 Deformed wing virus Iflavirus Iflaviridae Picornavirales Pisoniviricetes Pisuviricota Orthornavirae Riboviria Viruses
QZWG01000002.1 Glycine soja Glycine subgen. Soja Glycine Phaseoleae indigoferoid/millettioid clade NPAAA clade 50 kb inversion clade Papilionoideae Fabaceae Fabales fabids rosids Pentapetalae Gunneridae eudicotyledons Mesangiospermae Magnoliopsida Spermatophyta Euphyllophyta Tracheophyta Embryophyta Streptophytina Streptophyta Viridiplantae Eukaryota cellular organisms
|
Convert a lineage into a table¶
Convert the lineage into a table with the tab separated columns taxid
,
rank
, and parentid
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | ncbi-taxonomist resolve -t 9606 \ |
jq -r '.lin[]|"\(.taxon_id) \(.name) \(.rank) \(.parent_id)"'
9606 Homo sapiens species 9605
9605 Homo genus 207598
207598 Homininae subfamily 9604
9604 Hominidae family 314295
314295 Hominoidea superfamily 9526
9526 Catarrhini parvorder 314293
314293 Simiiformes infraorder 376913
376913 Haplorrhini suborder 9443
9443 Primates order 314146
314146 Euarchontoglires superorder 1437010
1437010 Boreoeutheria clade 9347
9347 Eutheria clade 32525
32525 Theria clade 40674
40674 Mammalia class 32524
32524 Amniota clade 32523
32523 Tetrapoda clade 1338369
1338369 Dipnotetrapodomorpha clade 8287
8287 Sarcopterygii superclass 117571
117571 Euteleostomi clade 117570
117570 Teleostomi clade 7776
7776 Gnathostomata clade 7742
7742 Vertebrata clade 89593
89593 Craniata subphylum 7711
7711 Chordata phylum 33511
33511 Deuterostomia clade 33213
33213 Bilateria clade 6072
6072 Eumetazoa clade 33208
33208 Metazoa kingdom 33154
33154 Opisthokonta clade 2759
2759 Eukaryota superkingdom 131567
131567 cellular organisms no rank null
|
Importing accessions¶
Mapping accessions fetched only the corresponding taxid but not all corresponding metadata.
Map accessions and collect
corresponding taxa¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ncbi-taxonomist map --entrezdb protein --accessions AFR11853 AIA66128.1 | \
ncbi-taxonomist import -db taxa.db | \
jq '.accession.taxid' | \
ncbi-taxonomist collect -t | \
ncbi-taxonomist import -db taxa.db
{"taxid":10239,"rank":"superkingdom","names":{"Viruses":"scientific_name"},"parentid":null,"name":"Viruses"}
{"taxid":2559587,"rank":"clade","names":{"Riboviria":"scientific_name"},"parentid":10239,"name":"Riboviria"}
{"taxid":2732396,"rank":"kingdom","names":{"Orthornavirae":"scientific_name"},"parentid":2559587,"name":"Orthornavirae"}
{"taxid":2732408,"rank":"phylum","names":{"Pisuviricota":"scientific_name"},"parentid":2732396,"name":"Pisuviricota"}
{"taxid":2732507,"rank":"class","names":{"Stelpaviricetes":"scientific_name"},"parentid":2732408,"name":"Stelpaviricetes"}
{"taxid":2732551,"rank":"order","names":{"Stellavirales":"scientific_name"},"parentid":2732507,"name":"Stellavirales"}
{"taxid":39733,"rank":"family","names":{"Astroviridae":"scientific_name"},"parentid":2732551,"name":"Astroviridae"}
{"taxid":249588,"rank":"genus","names":{"Mamastrovirus":"scientific_name"},"parentid":39733,"name":"Mamastrovirus"}
{"taxid":1239567,"rank":"species","names":{"Mamastrovirus 3":"scientific_name","Porcine astrovirus":"EquivalentName"},"parentid":249588,"name":"Mamastrovirus 3"}
{"taxid":2585030,"rank":"no rank","names":{"unclassified Riboviria":"scientific_name"},"parentid":2559587,"name":"unclassified Riboviria"}
{"taxid":439490,"rank":"no rank","names":{"unclassified ssRNA viruses":"scientific_name"},"parentid":2585030,"name":"unclassified ssRNA viruses"}
{"taxid":35278,"rank":"clade","names":{"unclassified ssRNA positive-strand viruses":"scientific_name"},"parentid":439490,"name":"unclassified ssRNA positive-strand viruses"}
{"taxid":1224525,"rank":"species","names":{"Cadicistrovirus":"scientific_name"},"parentid":35278,"name":"Cadicistrovirus"}
|
Creating a valid XML file from line based XML output¶
To create a valid XML document from the line based output, the output has to be encapsulated between two root XML tags. On Linux, this can be achieved via process substitution as shown in Listing 1.
1 2 3 4 5 6 7 8 9 10 11 12 13 | $: ncbi-taxonomist map --accessions QZWG01000002.1 MG831203 | \
ncbi-taxonomist resolve --xml --mapping | \
(echo "<root>" && cat && echo "</root>") | \
xmllint --pretty 1 -
<?xml version="1.0"?>
<root>
<resolve>
<query value="MG831203" cast="accession">
<accession>
<taxid>198112</taxid>
<!-- skip -->
</resolve>
</root>
|