Ontology-Based Analyses

Warning

In the future the analysis methods may migrate from the AssociationSet class to dedicated analysis engine classes.

Enrichment

See the Notebook example

OntoBio allows for generalized gene set enrichment: given a set of annotations that map genes to descriptor terms, and an input set of genes, and a background set, find what terms are enriched in the input set compared to the background.

With OntoBio, enrichment tests work for any annotation corpus, not necessarily just gene-oriented. For example, disease-phenotype. However, care must be taken with underlying assumptions with non-gene sets.

The very first thing you need to do before an enrichment analysis is fetch both an Ontology object and an AsssociationSet object. This could be a mix of local files or remote service/database. See Inputs for details.

Assume that we are using a remote ontology and local GAF:

from ontobio import OntologyFactory
from ontobio import AssociationSetFactory
ofactory = OntologyFactory()
afactory = AssociationSetFactory()
ont = ofactory.create('go')
aset = afactory.create_from_gaf('my.gaf', ontology=ont)

Assume also that we have a set of sample and background gene IDs, the test is:

enr = aset.enrichment_test(subjects=gene_ids, background=background_gene_ids, threshold=0.00005, labels=True)

This returns a list of dicts (TODO - decide if we want to make this an object and follow a standard class model)

NOTE the input gene IDs must be the same ones used in the AssociationSet. If you load from a GAF, this is the IDs that are formed by combining col1 and col2, separated by a “:”. E.g. UniProtKB:P123456

What if you have different IDs? Or what if you just have a list of gene symbols? In this case you will need to map these names or IDs, the subject of the next section.

Reproducibility

For reproducible analyses, use a versioned PURL for the ontology

Command line wrapper

You can use the ontobio-assoc command to run enrichment analyses. Some examples:

Create a gene set for all genes in “regulation of bone development” (GO:1903010). Find other terms for which this is enriched (in human)

# find all mouse genes that have 'abnormal synaptic transmission' phenotype
# (using remote sparql service for MP, and default (Monarch) for associations
ontobio-assoc.py -v -r mp -T NCBITaxon:10090 -C gene phenotype query -q MP:0003635 > genes.txt

# get IDs
cut -f1 -d ' ' genes.txt > genes.ids

# enrichment, using GO
ontobio-assoc.py  -r go -T NCBITaxon:10090 -C gene function enrichment -s genes.ids

# resulting GO terms are not very surprising...
2.48e-12 GO:0045202 synapse
2.87e-11 GO:0044456 synapse part
3.66e-08 GO:0007270 neuron-neuron synaptic transmission
3.95e-08 GO:0098793 presynapse
1.65e-07 GO:0099537 trans-synaptic signaling
1.65e-07 GO:0007268 chemical synaptic transmission

Further reading

For API docs, see enrichment_test in AssociationSet model

Identifier Mapping

TODO

Semantic Similarity

TODO

To follow progress, see this PR

Slimming

TODO

Graph Reduction

TODO

Lexical Analyses

See the lexmap API docs

You can also use the command line:

ontobio-lexmap.py ont1.json ont2.json > mappings.tsv

The inputs can be any kind of handle - a local ontology file or a remote ontology accessed via services.

For example, this will work:

ontobio-lexmap.py mp hp wbphenotype > mappings.tsv

See Inputs for more details.

For examples of lexical mapping pipelines, see:

These have examples of customizing configuration using a yaml file.