API

Ontology Access

Factory

The OntologyFactory class provides a means of creating an ontology object backed by either local files or remote services. See Inputs for more details.

class ontobio.ontol_factory.OntologyFactory(handle=None)[source]

Implements a factory for generating Ontology objects.

You should use a factory object rather than initializing Ontology directly. See Inputs for more details.

initializes based on an ontology name

Parameters:handle (str) – see create
create(handle=None, handle_type=None, **args)[source]

Creates an ontology based on a handle

Handle is one of the following

  • FILENAME.json : creates an ontology from an obographs json file
  • obo:ONTID : E.g. obo:pato - creates an ontology from obolibrary PURL (requires owltools)
  • ONTID : E.g. ‘pato’ - creates an ontology from a remote SPARQL query
Parameters:handle (str) – specifies how to retrieve the ontology info

Ontology Object Model

class ontobio.ontol.Ontology(handle=None, id=None, graph=None, xref_graph=None, meta=None, payload=None, graphdoc=None)[source]

An object that represents a basic graph-oriented view over an ontology.

The ontology may be represented in memory, or it may be located remotely. See subclasses for details.

The default implementation is an in-memory wrapper onto the python networkx library

initializes based on an ontology name.

Note: do not call this directly, use OntologyFactory instead

add_node(id, label=None, type='CLASS', meta=None)[source]

Add a new node to the ontology

add_parent(id, pid, relation='subClassOf')[source]

Add a new edge to the ontology

add_synonym(syn)[source]

Adds a synonym for a node

add_text_definition(textdef)[source]

Add a new text definition to the ontology

add_to_subset(id, s)[source]

Adds a node to a subset

add_xref(id, xref)[source]

Adds an xref to the xref graph

all_obsoletes()[source]

Returns all obsolete nodes

all_synonyms(include_label=False)[source]

Retrieves all synonyms

Parameters:include_label (bool) – If True, include label/names as Synonym objects
Returns:Synonym objects
Return type:list[Synonym]
ancestors(node, relations=None, reflexive=False)[source]

Return all ancestors of specified node.

The default implementation is to use networkx, but some implementations of the Ontology class may use a database or service backed implementation, for large graphs.

Parameters:
  • node (str) – identifier for node in ontology
  • reflexive (bool) – if true, return query node in graph
  • relations (list) – relation (object property) IDs used to filter
Returns:

ancestor node IDs

Return type:

list[str]

child_parent_relations(subj, obj, graph=None)[source]

Get all relationship type ids between a subject and a parent.

Typically only one relation ID returned, but in some cases there may be more than one

Parameters:
  • subj (string) – Child (subject) id
  • obj (string) – Parent (object) id
Returns:

Return type:

list

children(node, relations=None)[source]

Return all direct children of specified node.

Wraps networkx by default.

Parameters:
  • node (string) – identifier for node in ontology
  • relations (list of strings) – list of relation (object property) IDs used to filter
create_slim_mapping(subset=None, subset_nodes=None, relations=None, disable_checks=False)[source]

Create a dictionary that maps between all nodes in an ontology to a subset

Parameters:
  • ont (Ontology) – Complete ontology to be mapped. Assumed pre-filtered for relationship types
  • subset (str) – Name of subset to map to, e.g. goslim_generic
  • nodes (list) – If no named subset provided, subset is passed in as list of node ids
  • relations (list) – List of relations to filter on
  • disable_checks (bool) – Unless this is set, this will prevent a mapping being generated with non-standard relations. The motivation here is that the ontology graph may include relations that it is inappropriate to propagate gene products over, e.g. transports, has-part
Returns:

maps all nodes in ont to one or more non-redundant nodes in subset

Return type:

dict

Raises:

ValueError – if the subset is empty

descendants(node, relations=None, reflexive=False)[source]

Returns all descendants of specified node.

The default implementation is to use networkx, but some implementations of the Ontology class may use a database or service backed implementation, for large graphs.

Parameters:
  • node (str) – identifier for node in ontology
  • reflexive (bool) – if true, return query node in graph
  • relations (list) – relation (object property) IDs used to filter
Returns:

descendant node IDs

Return type:

list[str]

equiv_graph()[source]
Returns:bidirectional networkx graph of all equivalency relations
Return type:graph
extract_subset(subset, contract=True)[source]

Return all nodes in a subset.

We assume the oboInOwl encoding of subsets, and subset IDs are IRIs, or IR fragments

filter_redundant(ids)[source]

Return all non-redundant ids from a list

get_filtered_graph(relations=None, prefix=None)[source]

Returns a networkx graph for the whole ontology, for a subset of relations

Only implemented for eager methods.

Implementation notes: currently this is not cached

Parameters:
  • relations (-) – list of object property IDs, e.g. subClassOf, BFO:0000050. If empty, uses all.
  • prefix (-) – if specified, create a subgraph using only classes with this prefix, e.g. ENVO, PATO, GO
Returns:

A networkx MultiDiGraph object representing the filtered ontology

Return type:

nx.MultiDiGraph

get_graph()[source]

Return a networkx graph for the whole ontology.

Note: Only implemented for eager implementations

Returns:A networkx MultiDiGraph object representing the complete ontology
Return type:nx.MultiDiGraph
get_level(level, relations=None, **args)[source]

Get all nodes at a particular level

Parameters:relations (list[str]) – list of relations used to filter
get_property_chain_axioms(nid)[source]

Retrieves property chain axioms for a class id

Parameters:nid (str) – Node identifier for relation to be queried
Returns:
Return type:PropertyChainAxiom
get_roots(relations=None, prefix=None)[source]

Get all nodes that lack parents

Parameters:
  • relations (list[str]) – list of relations used to filter
  • prefix (str) – E.g. GO. Exclude nodes that lack this prefix when testing parentage
has_node(id)[source]

True if id identifies a node in the ontology graph

inline_xref_graph()[source]

Copy contents of xref_graph to inlined meta object for each node

is_obsolete(nid)[source]

True if node is obsolete

Parameters:nid (str) – Node identifier for entity to be queried
label(nid, id_if_null=False)[source]

Fetches label for a node

Parameters:
  • nid (str) – Node identifier for entity to be queried
  • id_if_null (bool) – If True and node has no label return id as label
Returns:

Return type:

str

logical_definitions(nid)[source]

Retrieves logical definitions for a class id

Parameters:nid (str) – Node identifier for entity to be queried
Returns:
Return type:LogicalDefinition
merge(ontologies)[source]

Merges specified ontology into current ontology

node(id)[source]

Return a node with a given ID. If the node with the ID exists the Node object is returned, otherwise None is returned.

Wraps networkx by default

node_type(id)[source]

If stated, either CLASS, PROPERTY or INDIVIDUAL

nodes()[source]

Return all nodes in ontology

Wraps networkx by default

parent_index(relations=None)[source]

Returns a mapping of nodes to all direct parents

Parameters:
  • relations (list[str]) – list of relations used to filter
  • Returns
  • list – list of lists [[CLASS_1, PARENT_1,1, …, PARENT_1,N], [CLASS_2, PARENT_2,1, PARENT_2,2, … ] … ]
parents(node, relations=None)[source]

Return all direct ‘parents’ of specified node.

Note that in the context of ontobio, ‘parent’ means any node that is traversed in a single hop along an edge from a subject to object. For example, if the ontology has an edge “finger part-of some hand”, then “hand” is the parent of finger. This can sometimes be counter-intutitive, for example, if the ontology contains has-part axioms. If the ontology has an edge “X receptor activity has-part some X binding”, then “X binding” is the ‘parent’ of “X receptor activity” over a has-part edge.

Wraps networkx by default.

Parameters:
  • node (string) – identifier for node in ontology
  • relations (list of strings) – list of relation (object property) IDs used to filter
prefix(nid)[source]

Return prefix for a node

prefix_fragment(nid)[source]

Return prefix and fragment/localid for a node

prefixes()[source]

list all prefixes used

relations_used()[source]

Return list of all relations used to connect edges

replaced_by(nid, strict=True)[source]

Returns value of ‘replaced by’ (IAO_0100001) property for obsolete nodes

Parameters:
  • nid (str) – Node identifier for entity to be queried
  • strict (bool) – If true, raise error if cardinality>1. If false, return list if cardinality>1
Returns:

Return type:

None if no value set, otherwise returns node id (or list if multiple values, see strict setting)

resolve_names(names, synonyms=False, **args)[source]

returns a list of identifiers based on an input list of labels and identifiers.

Parameters:
  • names (list) – search terms. ‘%’ treated as wildcard
  • synonyms (bool) – if true, search on synonyms in addition to labels
  • is_regex (bool) – if true, treats each name as a regular expression
  • is_partial_match (bool) – if true, treats each name as a regular expression .*name.*
search(searchterm, **args)[source]

Simple search. Returns list of IDs.

Parameters:
  • searchterm (list) – search term. ‘%’ treated as wildcard
  • synonyms (bool) – if true, search on synonyms in addition to labels
  • is_regex (bool) – if true, treats each name as a regular expression
  • is_partial_match (bool) – if true, treats each name as a regular expression .*name.*
Returns:

match node IDs

Return type:

list

sorted_nodes()[source]

Returns all nodes in ontology, after topological sort

subgraph(nodes=None)[source]

Return an induced subgraph

By default this wraps networkx subgraph, but this may be overridden in specific implementations

subontology(nodes=None, minimal=False, relations=None)[source]

Return a new ontology that is an extract of this one

Parameters:
  • nodes (-) – list of node IDs to include in subontology. If None, all are used
  • relations (-) – list of relation IDs to include in subontology. If None, all are used
subsets(nid, contract=True)[source]

Retrieves subset ids for a class or ontology object

synonyms(nid, include_label=False)[source]

Retrieves synonym objects for a class

Parameters:
  • nid (str) – Node identifier for entity to be queried
  • include_label (bool) – If True, include label/names as Synonym objects
Returns:

Synonym objects

Return type:

list[Synonym]

text_definition(nid)[source]

Retrieves logical definitions for a class or relation id

Parameters:nid (str) – Node identifier for entity to be queried
Returns:
Return type:TextDefinition
traverse_nodes(qids, up=True, down=False, **args)[source]

Traverse (optionally) up and (optionally) down from an input set of nodes

Parameters:
  • qids (list[str]) – list of seed node IDs to start from
  • up (bool) – if True, include ancestors
  • down (bool) – if True, include descendants
  • relations (list[str]) – list of relations used to filter
Returns:

nodes reachable from qids

Return type:

list[str]

xrefs(nid, bidirectional=False, prefix=None)[source]

Fetches xrefs for a node

Parameters:
  • nid (str) – Node identifier for entity to be queried
  • bidirection (bool) – If True, include nodes xreffed to nid
Returns:

Return type:

list[str]

class ontobio.ontol.Synonym(class_id, val=None, pred='hasRelatedSynonym', lextype=None, xrefs=None, ontology=None, confidence=1.0, synonymType=None)[source]

Represents a synonym using the OBO model

Parameters:
  • class_id (-) – the class that is being defined
  • val (-) – the synonym itself
  • pred (-) – oboInOwl predicate used to model scope. One of: has{Exact,Narrow,Related,Broad}Synonym - may also be ‘label’
  • lextype (-) – From an open ended set of types
  • xrefs (-) – Provenance or cross-references to same usage
as_dict()[source]

Returns Synonym as obograph dict

class ontobio.ontol.LogicalDefinition(class_id, genus_ids, restrictions)[source]

A simple OWL logical definition conforming to the pattern:

class_id = (genus_id_1 AND ... genus_id_n) AND (P_1 some FILLER_1) AND ... (P_m some FILLER_m)

See obographs docs for more details

Parameters:
  • class_id (string) – the class that is being defined
  • genus_ids (list) – a list of named classes (typically length 1)
  • restrictions (list) – a list of (PROPERTY_ID, FILLER_CLASS_ID) tuples

Assocation Access

Factory

class ontobio.assoc_factory.AssociationSetFactory[source]

Factory for creating AssociationSets

Currently support for golr (GO and Monarch) is provided but other stores possible

initializes based on an ontology name

create(ontology=None, subject_category=None, object_category=None, evidence=None, taxon=None, relation=None, file=None, fmt=None, skim=True)[source]

creates an AssociationSet

Currently, this uses an eager binding to a ontobio.golr instance. All compact associations for the particular combination of parameters are fetched.

Parameters:
  • ontology (an Ontology object) –
  • subject_category (string representing category of subjects (e.g. gene, disease, variant)) –
  • object_category (string representing category of objects (e.g. function, phenotype, disease)) –
  • taxon (string holding NCBITaxon:nnnn ID) –
create_from_assocs(assocs, **args)[source]

Creates from a list of association objects

create_from_file(file=None, fmt='gaf', skim=True, **args)[source]

Creates from a file. If fmt is set to None then the file suffixes will be used to choose a parser.

Parameters:
  • file (str or file) – input file or filename
  • fmt (str) – name of format e.g. gaf
create_from_gaf(file, **args)[source]

Creates from a GAF file

create_from_phenopacket(file)[source]

Creates from a phenopacket file

create_from_remote_file(group, snapshot=True, **args)[source]

Creates from remote GAF

create_from_simple_json(file)[source]

Creates from a simple json rendering

create_from_tuples(tuples, **args)[source]

Creates from a list of (subj,subj_name,obj) tuples

Assocation Object Model

class ontobio.assocmodel.AssociationSet(ontology=None, association_map=None, subject_label_map=None, meta=None)[source]

An object that represents a collection of associations

NOTE: the intention is that this class can be subclassed to provide either high-efficiency implementations, or implementations backed by services or external stores. The default implementation is in-memory.

NOTE: in general you do not need to call this yourself. See assoc_factory

initializes an association set, which minimally consists of:

  • an ontology (e.g. GO, HP)
  • a map between subjects (e.g genes) and sets/lists of term IDs
annotations(subject_id)[source]

Returns a list of classes used to describe a subject

@Deprecated: use objects_for_subject

as_dataframe(fillna=True, subjects=None)[source]

Return association set as pandas DataFrame

Each row is a subject (e.g. gene) Each column is the inferred class used to describe the subject

associations(subject, object=None)[source]

Given a subject-object pair (e.g. gene id to ontology class id), return all association objects that match.

enrichment_test(subjects=None, background=None, hypotheses=None, threshold=0.05, labels=False, direction='greater')[source]

Performs term enrichment analysis.

Parameters:
  • subjects (string list) – Sample set. Typically a gene ID list. These are assumed to have associations
  • background (string list) – Background set. If not set, uses full set of known subject IDs in the association set
  • threshold (float) – p values above this are filtered out
  • labels (boolean) – if true, labels for enriched classes are included in result objects
  • direction ('greater', 'less' or 'two-sided') – default is greater - i.e. enrichment test. Use ‘less’ for depletion test.
index()[source]

Creates indexes based on inferred terms.

You do not need to call this yourself; called on initialization

inferred_types(subj)[source]

Returns: set of reflexive inferred types for a subject.

E.g. if a gene is directly associated with terms A and B, and these terms have ancestors C, D and E then the set returned will be {A,B,C,D,E}

Parameters:- ID string (subj) –

Returns: set of class IDs

static intersectionlist_to_matrix(ilist, xterms, yterms)[source]

WILL BE DEPRECATED

Replace with method to return pandas dataframe

jaccard_similarity(s1, s2)[source]

Calculate jaccard index of inferred associations of two subjects

|ancs(s1) /\ ancs(s2)||ancs(s1) \/ ancs(s2)|

label(id)[source]

return label for a subject id

Will make use of both the ontology and the association set

objects_for_subject(subject_id)[source]

Returns a list of classes used to describe a subject

query(terms=None, negated_terms=None)[source]

Basic boolean query, using inference.

Parameters:
  • terms (-) –

    list

    list of class ids. Returns the set of subjects that have at least one inferred annotation to each of the specified classes.

  • negated_terms (-) –

    list

    list of class ids. Filters the set of subjects so that there are no inferred annotations to any of the specified classes

query_associations(subjects=None, infer_subjects=True, include_xrefs=True)[source]

Query for a set of associations.

Note: only a minimal association model is stored, so all results are returned as (subject_id,class_id) tuples

Parameters:
  • subjects

    list

    list of subjects (e.g. genes, diseases) used to query associations. Any association to one of these subjects or a descendant of these subjects (assuming infer_subjects=True) are returned.

  • infer_subjects

    boolean (default true)

    See above

  • include_xrefs

    boolean (default true)

    If true, then expand inferred subject set to include all xrefs of those subjects.

Example: if a high level disease node (e.g. DOID:14330 Parkinson disease) is specified, then the default behavior (infer_subjects=True, include_xrefs=True) and the ontology includes DO, results will include associations from both descendant DOID classes, and all xrefs (e.g. OMIM)

query_intersections(x_terms=None, y_terms=None, symmetric=False)[source]

Query for intersections of terms in two lists

Return a list of intersection result objects with keys:
  • x : term from x
  • y : term from y
  • c : count of intersection
  • j : jaccard score
similarity_matrix(x_subjects=None, y_subjects=None, symmetric=False)[source]

Query for similarity matrix between groups of subjects

Return a list of intersection result objects with keys:
  • x : term from x
  • y : term from y
  • c : count of intersection
  • j : jaccard score
subontology(minimal=False)[source]

Generates a sub-ontology based on associations

termset_ancestors(terms)[source]

reflexive ancestors

Parameters:- a set or list of class IDs (terms) –

Returns: set of class IDs

TODO - detailed association modeling

Association File Parsers

class ontobio.io.gafparser.GafParser(config=None, group='unknown', dataset='unknown', bio_entities=None)[source]

Parser for GO GAF format

config : a AssocParserConfig object

association_generator(file, skipheader=False, outfile=None) → Dict[KT, VT]

Returns a generator that yields successive associations from file

Yields:association
map_to_subset(file, outfile=None, ontology=None, subset=None, class_map=None, relations=None)

Map a file to a subset, writing out results

You can pass either a subset name (e.g. goslim_generic) or a dictionary with ready-made mappings

Parameters:
  • file (file) – Name or file object for input assoc file
  • outfile (file) – Name or file object for output (mapped) assoc file; writes to stdout if not set
  • subset (str) – Optional name of subset to map to, e.g. goslim_generic
  • class_map (dict) – Mapping between asserted class ids and ids to map to. Many to many
  • ontology (Ontology) – Ontology to extract subset from
parse(file, skipheader=False, outfile=None)

Parse a line-oriented association file into a list of association dict objects

Note the returned list is of dict objects. TODO: These will later be specified using marshmallow and it should be possible to generate objects

Parameters:
  • file (file or string) – The file is parsed into association objects. Can be a http URL, filename or file-like-object, for input assoc file
  • outfile (file) – Optional output file in which processed lines are written. This a file or file-like-object
Returns:

Associations generated from the file

Return type:

list

parse_line(line)[source]

Parses a single line of a GAF

Return a tuple (processed_line, associations). Typically there will be a single association, but in some cases there may be none (invalid line) or multiple (disjunctive clause in annotation extensions)

Note: most applications will only need to call this directly if they require fine-grained control of parsing. For most purposes, :method:`parse_file` can be used over the whole file

Parameters:line (str) – A single tab-seperated line from a GAF file
skim(file)[source]

Lightweight parse of a file into tuples.

Note this discards metadata such as evidence.

Return a list of tuples (subject_id, subject_label, object_id)

upgrade_empty_qualifier(assoc: ontobio.model.association.GoAssociation) → ontobio.model.association.GoAssociation[source]

From https://github.com/geneontology/go-site/issues/1558

For GAF 2.1 we will apply an algorithm to find a best fit relation if the qualifier column is empty. If the qualifiers field is empty, then:

If the GO Term is exactly GO:008150 Biological Process, then the qualifier should be involved_in If the GO Term is exactly GO:0008372 Cellular Component, then the qualifer should be is_active_in If the GO Term is a Molecular Function, then the new qualifier should be enables If the GO Term is a Biological Process, then the new qualifier should be acts_upstream_or_within Otherwise for Cellular Component, if it’s subclass of anatomical structure, than use `located_in

and if it’s a protein-containing complexes, use part_of
Parameters:assoc – GoAssociation
Returns:the possibly upgraded GoAssociation

Go Rules

class ontobio.io.qc.FailMode

An enumeration.

class ontobio.io.qc.GoRules

An enumeration.

class ontobio.io.qc.GoRulesResults(all_results, annotation)

Create new instance of GoRulesResults(all_results, annotation)

all_results

Alias for field number 0

annotation

Alias for field number 1

class ontobio.io.qc.RepairState

An enumeration.

ontobio.io.qc.ResultType

alias of ontobio.io.qc.Result

class ontobio.io.qc.TestResult(result_type: ontobio.io.qc.Result, message: str, result)[source]

Represents the result of a single association.GoAssociation being validated on some rule

Create a new TestResult

Parameters:
  • result_type (ResultType) – enum of PASS, WARNING, ERROR. Both WARNINGs and ERRORs are reported, but ERROR will filter the offending GoAssociation
  • message (str) – Description of the failure of GoAssociation to pass a rule. This is usually just the rule title
  • result – [description] True if the GoAssociation passes, False if not. If it’s repaired, this is the updated, repaired, GoAssociation
ontobio.io.qc.repair_result(repair_state: ontobio.io.qc.RepairState, fail_mode: ontobio.io.qc.FailMode) → ontobio.io.qc.Result[source]

Returns ResultType.PASS if the repair_state is OKAY, and WARNING if REPAIRED.

This is used by RepairRule implementations.

Parameters:
  • repair_state (RepairState) – If the GoAssocition was repaired during a rule, then this should be RepairState.REPAIRED, otherwise RepairState.OKAY
  • fail_mode (FailMode) – [description]
Returns:

[description]

Return type:

ResultType

ontobio.io.qc.result(passes: bool, fail_mode: ontobio.io.qc.FailMode) → ontobio.io.qc.Result[source]

Send True for passes, and this returns the PASS ResultType, and if False, then depending on the fail mode it returns either WARNING or ERROR ResultType.

GoAssociation internal Model

This contains the data model for parsing annotations from GAF and GPAD.

The idea is to make it easy to parse text lines of any source into a GoAssociation object and then give the GoAssociation object the ability to convert itself into GPAD or GAF of any version. Or any other format that is required.

class ontobio.model.association.ConjunctiveSet(elements: List[T])[source]

This respresents a comma separated list of objects which can be turned into strings.

This is used for the with/from and extensions fields in the GoAssociation.

The field elements can be a list of Curie or ExtensionUnit. Curie for with/from, and ExtensionUnit for extensions field.

display(conjunct_to_str=<function ConjunctiveSet.<lambda>>) → str[source]

Convert this ConjunctiveSet to a string separateted by commas.

This calls conjunct_to_str (which defaults to str) on each element before joining. To use a different string representation of each element, pass in a different function. This functionality is used to differentiate between GPAD 1.2 and GPAD 2.0, where relations are written differently per version.

classmethod list_to_str(conjunctions: List[T], conjunct_to_str=<function ConjunctiveSet.<lambda>>) → str[source]

List should be a list of ConjunctiveSet Given [ConjunctiveSet, ConjunctiveSet], this will call ConjunctiveSet.display() using the conjunct_to_str function (which defaults to str) and join them with a pipe.

To have elements of the ConjunctiveSet displayed differently, use a different conjunct_to_str function. This functionality is used to differentiate between GPAD 1.2 and GPAD 2.0, where relations are written differently per version.

classmethod str_to_conjunctions(entity: str, conjunct_element_builder: Union[C, ontobio.model.association.Error] = <function ConjunctiveSet.<lambda>>) → Union[List[C], ontobio.model.association.Error][source]

Takes a field that conforms to the pipe (|) and comma (,) separator type. The parsed version is a list of pipe separated values which are themselves a comma separated list.

If the elements inside the comma separated list should not just be strings, but be converted into a value of a type, conjunct_element_builder can be provided which should take a string and return a parsed value or an instance of an Error type (defined above).

If there is an error in producing the values of the conjunctions, then this function will return early with the error.

This function will return a List of ConjunctiveSet

class ontobio.model.association.Curie(namespace: str, identity: str)[source]

Object representing a Compact URI, with a namespace identifier along with an ID, like GO:1234567.

Use from_str to parse a string like “GO:1234567” into a Curie. The result should be checked for errors with is_error

class ontobio.model.association.Date(year, month, day, time)

Create new instance of Date(year, month, day, time)

day

Alias for field number 2

month

Alias for field number 1

time

Alias for field number 3

year

Alias for field number 0

class ontobio.model.association.Error(info: str, entity: str = '')[source]
class ontobio.model.association.Evidence(type: ontobio.model.association.Curie, has_supporting_reference: List[ontobio.model.association.Curie], with_support_from: List[ontobio.model.association.ConjunctiveSet])[source]
class ontobio.model.association.ExtensionUnit(relation: ontobio.model.association.Curie, term: ontobio.model.association.Curie)[source]

An ExtensionUnit is a single element of the extensions field of GAF or GPAD. This consists of a relation and a term.

Create an ExtensionUnit with from_str or from_curie_str. If there is an error in parsing then Error is returned. Results from these functions should be checked for Error.

The string representation will depend on the format, and so the display method should be used. By default this will write the relation using the label with undercores (example: part_of) as defined in ontobio.rdfgen.relations.py. To write the relation as a CURIE (as in gpad 2.0), set parameter use_rel_label to True.

display(use_rel_label=False)[source]

Turns the ExtensionUnit into a string. By default this uses the ontobio.rdfgen.relations module to lookup the relation label. To use the CURIE instead, pass use_rel_label=True.

classmethod from_curie_str(entity: str) → Union[source]

Attempts to parse string entity as an ExtensionUnit If the relation(term) is not formatted correctly, an Error is returned. relation is a Curie, and so is any errors in formatting are delegated to Curie.from_str()

classmethod from_str(entity: str) → Union[source]

Attempts to parse string entity as an ExtensionUnit If the relation(term) is not formatted correctly, an Error is returned. If the relation cannot be found in the relations dictionary then an error is also returned.

class ontobio.model.association.GoAssociation(source_line: Optional[str], subject: ontobio.model.association.Subject, relation: ontobio.model.association.Curie, object: ontobio.model.association.Term, negated: bool, qualifiers: List[ontobio.model.association.Curie], aspect: Optional[NewType.<locals>.new_type], interacting_taxon: Optional[ontobio.model.association.Curie], evidence: ontobio.model.association.Evidence, subject_extensions: List[ontobio.model.association.ExtensionUnit], object_extensions: List[ontobio.model.association.ConjunctiveSet], provided_by: NewType.<locals>.new_type, date: ontobio.model.association.Date, properties: List[Tuple[str, str]])[source]

The internal model used by the parsers and qc Rules engine that all annotations are parsed into.

If an annotation textual line cannot be parsed into a GoAssociation then it is not a well formed line.

This class provides several methods to convert this GoAssociation into other representations, like GAF and GPAD of each version, as well as the old style dictionary Association that this class replaced (for compatibility if needed).

Each parser has its own function or functions that converts an annotation line into a GoAssociation, and this is the first phase of parsing. In general, GoAssociations are only created by the parsers.

to_gaf_2_1_tsv() → List[T][source]

Converts the GoAssociation into a “TSV” columnar GAF 2.1 row as a list of strings.

to_gaf_2_2_tsv() → List[T][source]

Converts the GoAssociation into a “TSV” columnar GAF 2.2 row as a list of strings.

to_gpad_1_2_tsv() → List[T][source]

Converts the GoAssociation into a “TSV” columnar GPAD 1.2 row as a list of strings.

to_gpad_2_0_tsv() → List[T][source]

Converts the GoAssociation into a “TSV” columnar GAF 2.0 row as a list of strings.

to_hash_assoc() → dict[source]

Converts the GoAssociation into the old style dictionary association for backwards compatibility

class ontobio.model.association.Header(souce_line: Union[str, NoneType])[source]
class ontobio.model.association.Subject(id: ontobio.model.association.Curie, label: str, fullname: List[str], synonyms: List[str], type: Union[List[str], List[ontobio.model.association.Curie]], taxon: ontobio.model.association.Curie, encoded_by: List[ontobio.model.association.Curie] = None, parents: List[ontobio.model.association.Curie] = None, contained_complex_members: List[ontobio.model.association.Curie] = None, db_xrefs: List[ontobio.model.association.Curie] = None, properties: Dict = None)[source]
contained_complex_members = None

Optional, or cardinality 0+

db_xrefs = None

Optional, or cardinality 0+

encoded_by = None

Optional, or cardinality 0+

fullname = None

fullname is also DB_Object_Name in the GPI spec, cardinality 0+

fullname_field(max=None) → str[source]

Converts the fullname or DB_Object_Name into the field text string used in files

label = None

label is also DB_Object_Symbol in the GPI spec

parents = None

Optional, or cardinality 0+

properties = None

Optional, or cardinality 0+

synonyms = None

Cardinality 0+

taxon = None

Type:Should be NCBITaxon
type = None

In GPI 1.2, this was a string, corresponding to labels of the Sequence Ontology gene, protein_complex; protein; transcript; ncRNA; rRNA; tRNA; snRNA; snoRNA, any subclass of ncRNA. If the specific type is unknown, use gene_product.

When reading gpi 1.2, these labels should be mapped to the 2.0 spec, stating that the type must be a Curie in the Sequence Ontology OR Protein Ontology OR Gene Ontology

In GPI 1.2, there is only 1 value, and is required. In GPI 2.0 there is a minimum of 1, but maybe more.

If writing out to GPI 1.2/GAF just take the first value in the list.

class ontobio.model.association.Term(id: ontobio.model.association.Curie, taxon: ontobio.model.association.Curie)[source]

Represents a Gene Ontology term

ontobio.model.association.TwoTupleStr(items: List[str]) → tuple[source]

Create a tuple of of str that is guaranteed to be of length two from a list

If the list is larger, then only the first two elements will be used. If the list is smaller, then the empty string will be used

ontobio.model.association.gp_type_label_to_curie(type: ontobio.model.association.Curie) → str[source]

This is the reverse of map_gp_type_label_to_curie

ontobio.model.association.map_gp_type_label_to_curie(type_label: str) → ontobio.model.association.Curie[source]

Map entity types in GAF or GPI 1.2 into CURIEs in Sequence Ontology (SO), Protein Ontology (PRO), or Gene Ontology (GO).

This is a measure to upgrade the pseudo-labels into proper Curies. Present here are the existing set of labels in current use, and how they should be mapped into CURIEs.

GOlr Queries

class ontobio.golr.golr_query.GolrAssociationQuery(subject_category=None, object_category=None, relation=None, relationship_type=None, subject_or_object_ids=None, subject_or_object_category=None, subject=None, subjects=None, object=None, objects=None, subject_direct=False, object_direct=False, subject_taxon=None, subject_taxon_direct=False, object_taxon=None, object_taxon_direct=False, invert_subject_object=None, evidence=None, exclude_automatic_assertions=False, q=None, id=None, use_compact_associations=False, include_raw=False, field_mapping=None, solr=None, config=None, url=None, select_fields=None, fetch_objects=False, fetch_subjects=False, fq=None, slim=None, json_facet=None, iterate=False, map_identifiers=None, facet_fields=None, facet_field_limits=None, facet_limit=25, facet_mincount=1, facet_pivot_fields=None, stats=False, stats_field=None, facet=True, pivot_subject_object=False, unselect_evidence=False, rows=10, start=None, homology_type=None, non_null_fields=None, user_agent=None, association_type=None, sort=None, **kwargs)[source]

A Query object providing a higher level of abstraction over either GO or Monarch Solr indexes

All of these can be set when creating a new object

fetch_objects : bool

we frequently want a list of distinct association objects (in the RDF sense). for example, when querying for all phenotype associations for a gene, it is convenient to get a list of distinct phenotype terms. Although this can be obtained by iterating over the list of associations, it can be expensive to obtain all associations.

Results are in the ‘objects’ field

fetch_subjects : bool

This is the analog of the fetch_objects field. Note that due to an inherent asymmetry by which the list of subjects can be very large (e.g. all genes in all species for “metabolic process” or “metabolic phenotype”) it’s necessary to combine this with subject_category and subject_taxon filters

Results are in the ‘subjects’ field

slim : List

a list of either class ids (or in future subset ids), used to map up (slim) objects in associations. This will populate an additional ‘slim’ field in each association object corresponding to the slimmed-up value(s) from the direct objects. If fetch_objects is passed, this will be populated with slimmed IDs.

evidence: String

Evidence class from ECO. Inference is used.

exclude_automatic_assertions : bool

If true, then any annotations with ECO evidence code for IEA or subclasses will be excluded.

use_compact_associations : bool

If true, then the associations list will be false, instead compact_associations contains a more compact representation consisting of objects with (subject, relation and objects)

config : Config

See Config for details. The config object can be used to set values for the solr instance to be queried

TODO - Extract params into their own object

Fetch a set of association objects based on a query.

exec(**kwargs)[source]

Execute solr query

Result object is a dict with the following keys:

  • raw
  • associations : list
  • compact_associations : list
  • facet_counts
  • facet_pivot
infer_category(id)[source]

heuristic to infer a category from an id, e.g. DOID:nnn –> disease

make_canonical_identifier(id)[source]

E.g. MGI:MGI:nnnn –> MGI:nnnn

make_gostyle_identifier(id)[source]

E.g. MGI:nnnn –> MGI:MGI:nnnn

map_id(id, prefix, closure_list)[source]

Map identifiers based on an equivalence closure list.

solr_params()[source]

Generate HTTP parameters for passing to Solr.

In general you should not need to call this directly, calling exec() on a query object will transparently perform this step for you.

translate_doc(d, field_mapping=None, map_identifiers=None, **kwargs)[source]

Translate a solr document (i.e. a single result row)

translate_docs(ds, **kwargs)[source]

Translate a set of solr results

translate_docs_compact(ds, field_mapping=None, slim=None, map_identifiers=None, invert_subject_object=False, **kwargs)[source]

Translate golr association documents to a compact representation

translate_obj(d, fname)[source]

Translate a field value from a solr document.

This includes special logic for when the field value denotes an object, here we nest it

translate_objs(d, fname, default=None)[source]

Translate a field whose value is expected to be a list

class ontobio.golr.golr_query.GolrSearchQuery(term=None, category=None, is_go=False, url=None, solr=None, config=None, fq=None, fq_string=None, hl=True, facet_fields=None, facet=True, search_fields=None, taxon_map=True, rows=100, start=None, prefix=None, boost_fx=None, boost_q=None, highlight_class=None, taxon=None, min_match=None, minimal_tokenizer=False, include_eqs=False, exclude_groups=False, user_agent=None)[source]

Controller for monarch and go solr search cores Queries over a search document

autocomplete()[source]

Execute solr autocomplete

search()[source]

Execute solr search query

Lexmap

class ontobio.lexmap.LexicalMapEngine(wsmap={'': '', 'a': '', 'i': '1', 'ii': '2', 'iii': '3', 'iv': '4', 'ix': '9', 'of': '', 'the': '', 'v': '5', 'vi': '6', 'vii': '7', 'viii': '8', 'x': '10', 'xi': '11', 'xii': '12', 'xiii': '13', 'xiv': '14', 'xix': '19', 'xv': '15', 'xvi': '16', 'xvii': '17', 'xviii': '18', 'xx': '20'}, config=None)[source]

generates lexical matches between pairs of ontology classes

Parameters:
  • wdmap (dict) – maps words to normalized synonyms.
  • config (dict) – A configuration conforming to LexicalMapConfigSchema
assign_best_matches(xg)[source]

For each node in the xref graph, tag best match edges

cliques(xg)[source]

Return all equivalence set cliques, assuming each edge in the xref graph is treated as equivalent, and all edges in ontology are subClassOf

Parameters:xg (Graph) – an xref graph
Returns:
Return type:list of sets
compare_to_xrefs(xg1, xg2)[source]

Compares a base xref graph with another one

get_xref_graph()[source]

Generate mappings based on lexical properties and return as nx graph.

  • A dictionary is stored between ref:Synonym values and synonyms. See ref:index_synonym. Note that Synonyms include the primary label
  • Each key in the dictionary is examined to determine if there exist two Synonyms from different ontology classes

This avoids N^2 pairwise comparisons: instead the time taken is linear

After initial mapping is made, additional scoring is performed on each mapping

The return object is a nx graph, connecting pairs of ontology classes.

Edges are annotated with metadata about how the match was found:

syns: pair
pair of Synonym objects, corresponding to the synonyms for the two nodes
score: int
score indicating strength of mapping, between 0 and 100
Returns:nx graph (bidirectional)
Return type:Graph
grouped_mappings(id)[source]

return all mappings for a node, grouped by ID prefix

index_ontology(ont)[source]

Adds an ontology to the index

This iterates through all labels and synonyms in the ontology, creating an index

index_synonym(syn, ont)[source]

Index a synonym

Typically not called from outside this object; called by index_ontology

score_xrefs_by_semsim(xg, ont=None)[source]

Given an xref graph (see ref:get_xref_graph), this will adjust scores based on the semantic similarity of matches.

weighted_axioms(x, y, xg)[source]

return a tuple (sub,sup,equiv,other) indicating estimated prior probabilities for an interpretation of a mapping between x and y.

See kboom paper