.. include:: global.inc .. _cookbook: ********************** Cookbook ********************** .. contents:: Contents :local: .. _resolvelineage: Reformatting results ==================== Examples how to use ``jq`` to reformat ``JSON`` output. For more ``jq`` help, please refer to: - `jq manual `_ - `Reshaping JSON with jq `_ Convert accession lineages into TSV ----------------------------------- Converting the lineage of several nucleotide accessions into a tab separated output. The queried accession is printed in the first field. Substituting ``@tsv`` with ``@csv`` in the example will result in CSV output. .. code-block:: shell :linenos: ncbi-taxonomist map -a QZWG01000002.1 MG831203 | ncbi-taxonomist resolve --mapping \| jq -r '[.query, .lineage[].name]|@tsv' MG831203 Deformed wing virus Iflavirus Iflaviridae Picornavirales Pisoniviricetes Pisuviricota Orthornavirae Riboviria Viruses QZWG01000002.1 Glycine soja Glycine subgen. Soja Glycine Phaseoleae indigoferoid/millettioid clade NPAAA clade 50 kb inversion clade Papilionoideae Fabaceae Fabales fabids rosids Pentapetalae Gunneridae eudicotyledons Mesangiospermae Magnoliopsida Spermatophyta Euphyllophyta Tracheophyta Embryophyta Streptophytina Streptophyta Viridiplantae Eukaryota cellular organisms Convert a lineage into a table ------------------------------ Convert the lineage into a table with the tab separated columns ``taxid``, ``rank``, and ``parentid``. .. code-block:: shell :linenos: ncbi-taxonomist resolve -t 9606 \ | jq -r '.lin[]|"\(.taxon_id) \(.name) \(.rank) \(.parent_id)"' 9606 Homo sapiens species 9605 9605 Homo genus 207598 207598 Homininae subfamily 9604 9604 Hominidae family 314295 314295 Hominoidea superfamily 9526 9526 Catarrhini parvorder 314293 314293 Simiiformes infraorder 376913 376913 Haplorrhini suborder 9443 9443 Primates order 314146 314146 Euarchontoglires superorder 1437010 1437010 Boreoeutheria clade 9347 9347 Eutheria clade 32525 32525 Theria clade 40674 40674 Mammalia class 32524 32524 Amniota clade 32523 32523 Tetrapoda clade 1338369 1338369 Dipnotetrapodomorpha clade 8287 8287 Sarcopterygii superclass 117571 117571 Euteleostomi clade 117570 117570 Teleostomi clade 7776 7776 Gnathostomata clade 7742 7742 Vertebrata clade 89593 89593 Craniata subphylum 7711 7711 Chordata phylum 33511 33511 Deuterostomia clade 33213 33213 Bilateria clade 6072 6072 Eumetazoa clade 33208 33208 Metazoa kingdom 33154 33154 Opisthokonta clade 2759 2759 Eukaryota superkingdom 131567 131567 cellular organisms no rank null .. _importaccs: Importing accessions ==================== Mapping accessions fetched only the corresponding taxid but not all corresponding metadata. Map accessions and ``collect`` corresponding taxa ------------------------------------------------- .. code-block:: shell :linenos: ncbi-taxonomist map --entrezdb protein --accessions AFR11853 AIA66128.1 | \ ncbi-taxonomist import -db taxa.db | \ jq '.accession.taxid' | \ ncbi-taxonomist collect -t | \ ncbi-taxonomist import -db taxa.db {"taxid":10239,"rank":"superkingdom","names":{"Viruses":"scientific_name"},"parentid":null,"name":"Viruses"} {"taxid":2559587,"rank":"clade","names":{"Riboviria":"scientific_name"},"parentid":10239,"name":"Riboviria"} {"taxid":2732396,"rank":"kingdom","names":{"Orthornavirae":"scientific_name"},"parentid":2559587,"name":"Orthornavirae"} {"taxid":2732408,"rank":"phylum","names":{"Pisuviricota":"scientific_name"},"parentid":2732396,"name":"Pisuviricota"} {"taxid":2732507,"rank":"class","names":{"Stelpaviricetes":"scientific_name"},"parentid":2732408,"name":"Stelpaviricetes"} {"taxid":2732551,"rank":"order","names":{"Stellavirales":"scientific_name"},"parentid":2732507,"name":"Stellavirales"} {"taxid":39733,"rank":"family","names":{"Astroviridae":"scientific_name"},"parentid":2732551,"name":"Astroviridae"} {"taxid":249588,"rank":"genus","names":{"Mamastrovirus":"scientific_name"},"parentid":39733,"name":"Mamastrovirus"} {"taxid":1239567,"rank":"species","names":{"Mamastrovirus 3":"scientific_name","Porcine astrovirus":"EquivalentName"},"parentid":249588,"name":"Mamastrovirus 3"} {"taxid":2585030,"rank":"no rank","names":{"unclassified Riboviria":"scientific_name"},"parentid":2559587,"name":"unclassified Riboviria"} {"taxid":439490,"rank":"no rank","names":{"unclassified ssRNA viruses":"scientific_name"},"parentid":2585030,"name":"unclassified ssRNA viruses"} {"taxid":35278,"rank":"clade","names":{"unclassified ssRNA positive-strand viruses":"scientific_name"},"parentid":439490,"name":"unclassified ssRNA positive-strand viruses"} {"taxid":1224525,"rank":"species","names":{"Cadicistrovirus":"scientific_name"},"parentid":35278,"name":"Cadicistrovirus"} Creating a valid XML file from line based XML output ==================================================== To create a valid XML document from the line based output, the output has to be encapsulated between two root XML tags. On Linux, this can be achieved via process substitution as shown in :numref:`validxml`. .. code-block:: shell :linenos: :name: validxml :emphasize-lines: 3 :caption: Creating valid XML from line based output. Line 3 shows the command to create a valid XML output. The ``xmllint`` command on line 4 is not required but demonstrates the validity of the created XML output. $: ncbi-taxonomist map --accessions QZWG01000002.1 MG831203 | \ ncbi-taxonomist resolve --xml --mapping | \ (echo "" && cat && echo "") | \ xmllint --pretty 1 - 198112