API¶
Ontology Access¶
Factory¶
The OntologyFactory class provides a means of creating an ontology object backed by either local files or remote services. See Inputs for more details.
-
class
ontobio.ontol_factory.
OntologyFactory
(handle=None)[source]¶ Implements a factory for generating
Ontology
objects.You should use a factory object rather than initializing Ontology directly. See Inputs for more details.
initializes based on an ontology name
Parameters: handle (str) – see create -
create
(handle=None, handle_type=None, **args)[source]¶ Creates an ontology based on a handle
Handle is one of the following
- FILENAME.json : creates an ontology from an obographs json file
- obo:ONTID : E.g. obo:pato - creates an ontology from obolibrary PURL (requires owltools)
- ONTID : E.g. ‘pato’ - creates an ontology from a remote SPARQL query
Parameters: handle (str) – specifies how to retrieve the ontology info
-
Ontology Object Model¶
-
class
ontobio.ontol.
Ontology
(handle=None, id=None, graph=None, xref_graph=None, meta=None, payload=None, graphdoc=None)[source]¶ An object that represents a basic graph-oriented view over an ontology.
The ontology may be represented in memory, or it may be located remotely. See subclasses for details.
The default implementation is an in-memory wrapper onto the python networkx library
initializes based on an ontology name.
Note: do not call this directly, use OntologyFactory instead
-
all_synonyms
(include_label=False)[source]¶ Retrieves all synonyms
Parameters: include_label (bool) – If True, include label/names as Synonym objects Returns: Synonym
objectsReturn type: list[Synonym]
-
ancestors
(node, relations=None, reflexive=False)[source]¶ Return all ancestors of specified node.
The default implementation is to use networkx, but some implementations of the Ontology class may use a database or service backed implementation, for large graphs.
Parameters: - node (str) – identifier for node in ontology
- reflexive (bool) – if true, return query node in graph
- relations (list) – relation (object property) IDs used to filter
Returns: ancestor node IDs
Return type: list[str]
-
child_parent_relations
(subj, obj, graph=None)[source]¶ Get all relationship type ids between a subject and a parent.
Typically only one relation ID returned, but in some cases there may be more than one
Parameters: - subj (string) – Child (subject) id
- obj (string) – Parent (object) id
Returns: Return type: list
-
children
(node, relations=None)[source]¶ Return all direct children of specified node.
Wraps networkx by default.
Parameters: - node (string) – identifier for node in ontology
- relations (list of strings) – list of relation (object property) IDs used to filter
-
create_slim_mapping
(subset=None, subset_nodes=None, relations=None, disable_checks=False)[source]¶ Create a dictionary that maps between all nodes in an ontology to a subset
Parameters: - ont (Ontology) – Complete ontology to be mapped. Assumed pre-filtered for relationship types
- subset (str) – Name of subset to map to, e.g. goslim_generic
- nodes (list) – If no named subset provided, subset is passed in as list of node ids
- relations (list) – List of relations to filter on
- disable_checks (bool) – Unless this is set, this will prevent a mapping being generated with non-standard relations. The motivation here is that the ontology graph may include relations that it is inappropriate to propagate gene products over, e.g. transports, has-part
Returns: maps all nodes in ont to one or more non-redundant nodes in subset
Return type: dict
Raises: ValueError
– if the subset is empty
-
descendants
(node, relations=None, reflexive=False)[source]¶ Returns all descendants of specified node.
The default implementation is to use networkx, but some implementations of the Ontology class may use a database or service backed implementation, for large graphs.
Parameters: - node (str) – identifier for node in ontology
- reflexive (bool) – if true, return query node in graph
- relations (list) – relation (object property) IDs used to filter
Returns: descendant node IDs
Return type: list[str]
-
equiv_graph
()[source]¶ Returns: bidirectional networkx graph of all equivalency relations Return type: graph
-
extract_subset
(subset, contract=True)[source]¶ Return all nodes in a subset.
We assume the oboInOwl encoding of subsets, and subset IDs are IRIs, or IR fragments
-
get_filtered_graph
(relations=None, prefix=None)[source]¶ Returns a networkx graph for the whole ontology, for a subset of relations
Only implemented for eager methods.
Implementation notes: currently this is not cached
Parameters: - relations (-) – list of object property IDs, e.g. subClassOf, BFO:0000050. If empty, uses all.
- prefix (-) – if specified, create a subgraph using only classes with this prefix, e.g. ENVO, PATO, GO
Returns: A networkx MultiDiGraph object representing the filtered ontology
Return type: nx.MultiDiGraph
-
get_graph
()[source]¶ Return a networkx graph for the whole ontology.
Note: Only implemented for eager implementations
Returns: A networkx MultiDiGraph object representing the complete ontology Return type: nx.MultiDiGraph
-
get_level
(level, relations=None, **args)[source]¶ Get all nodes at a particular level
Parameters: relations (list[str]) – list of relations used to filter
-
get_property_chain_axioms
(nid)[source]¶ Retrieves property chain axioms for a class id
Parameters: nid (str) – Node identifier for relation to be queried Returns: Return type: PropertyChainAxiom
-
get_roots
(relations=None, prefix=None)[source]¶ Get all nodes that lack parents
Parameters: - relations (list[str]) – list of relations used to filter
- prefix (str) – E.g. GO. Exclude nodes that lack this prefix when testing parentage
-
is_obsolete
(nid)[source]¶ True if node is obsolete
Parameters: nid (str) – Node identifier for entity to be queried
-
label
(nid, id_if_null=False)[source]¶ Fetches label for a node
Parameters: - nid (str) – Node identifier for entity to be queried
- id_if_null (bool) – If True and node has no label return id as label
Returns: Return type: str
-
logical_definitions
(nid)[source]¶ Retrieves logical definitions for a class id
Parameters: nid (str) – Node identifier for entity to be queried Returns: Return type: LogicalDefinition
-
node
(id)[source]¶ Return a node with a given ID. If the node with the ID exists the Node object is returned, otherwise None is returned.
Wraps networkx by default
-
parent_index
(relations=None)[source]¶ Returns a mapping of nodes to all direct parents
Parameters: - relations (list[str]) – list of relations used to filter
- Returns –
- list – list of lists [[CLASS_1, PARENT_1,1, …, PARENT_1,N], [CLASS_2, PARENT_2,1, PARENT_2,2, … ] … ]
-
parents
(node, relations=None)[source]¶ Return all direct ‘parents’ of specified node.
Note that in the context of ontobio, ‘parent’ means any node that is traversed in a single hop along an edge from a subject to object. For example, if the ontology has an edge “finger part-of some hand”, then “hand” is the parent of finger. This can sometimes be counter-intutitive, for example, if the ontology contains has-part axioms. If the ontology has an edge “X receptor activity has-part some X binding”, then “X binding” is the ‘parent’ of “X receptor activity” over a has-part edge.
Wraps networkx by default.
Parameters: - node (string) – identifier for node in ontology
- relations (list of strings) – list of relation (object property) IDs used to filter
-
replaced_by
(nid, strict=True)[source]¶ Returns value of ‘replaced by’ (IAO_0100001) property for obsolete nodes
Parameters: - nid (str) – Node identifier for entity to be queried
- strict (bool) – If true, raise error if cardinality>1. If false, return list if cardinality>1
Returns: Return type: None if no value set, otherwise returns node id (or list if multiple values, see strict setting)
-
resolve_names
(names, synonyms=False, **args)[source]¶ returns a list of identifiers based on an input list of labels and identifiers.
Parameters: - names (list) – search terms. ‘%’ treated as wildcard
- synonyms (bool) – if true, search on synonyms in addition to labels
- is_regex (bool) – if true, treats each name as a regular expression
- is_partial_match (bool) – if true, treats each name as a regular expression .*name.*
-
search
(searchterm, **args)[source]¶ Simple search. Returns list of IDs.
Parameters: - searchterm (list) – search term. ‘%’ treated as wildcard
- synonyms (bool) – if true, search on synonyms in addition to labels
- is_regex (bool) – if true, treats each name as a regular expression
- is_partial_match (bool) – if true, treats each name as a regular expression .*name.*
Returns: match node IDs
Return type: list
-
subgraph
(nodes=None)[source]¶ Return an induced subgraph
By default this wraps networkx subgraph, but this may be overridden in specific implementations
-
subontology
(nodes=None, minimal=False, relations=None)[source]¶ Return a new ontology that is an extract of this one
Parameters: - nodes (-) – list of node IDs to include in subontology. If None, all are used
- relations (-) – list of relation IDs to include in subontology. If None, all are used
-
synonyms
(nid, include_label=False)[source]¶ Retrieves synonym objects for a class
Parameters: - nid (str) – Node identifier for entity to be queried
- include_label (bool) – If True, include label/names as Synonym objects
Returns: Synonym
objectsReturn type: list[Synonym]
-
text_definition
(nid)[source]¶ Retrieves logical definitions for a class or relation id
Parameters: nid (str) – Node identifier for entity to be queried Returns: Return type: TextDefinition
-
traverse_nodes
(qids, up=True, down=False, **args)[source]¶ Traverse (optionally) up and (optionally) down from an input set of nodes
Parameters: - qids (list[str]) – list of seed node IDs to start from
- up (bool) – if True, include ancestors
- down (bool) – if True, include descendants
- relations (list[str]) – list of relations used to filter
Returns: nodes reachable from qids
Return type: list[str]
-
-
class
ontobio.ontol.
Synonym
(class_id, val=None, pred='hasRelatedSynonym', lextype=None, xrefs=None, ontology=None, confidence=1.0, synonymType=None)[source]¶ Represents a synonym using the OBO model
Parameters: - class_id (-) – the class that is being defined
- val (-) – the synonym itself
- pred (-) – oboInOwl predicate used to model scope. One of: has{Exact,Narrow,Related,Broad}Synonym - may also be ‘label’
- lextype (-) – From an open ended set of types
- xrefs (-) – Provenance or cross-references to same usage
-
class
ontobio.ontol.
LogicalDefinition
(class_id, genus_ids, restrictions)[source]¶ A simple OWL logical definition conforming to the pattern:
class_id = (genus_id_1 AND ... genus_id_n) AND (P_1 some FILLER_1) AND ... (P_m some FILLER_m)
See obographs docs for more details
Parameters: - class_id (string) – the class that is being defined
- genus_ids (list) – a list of named classes (typically length 1)
- restrictions (list) – a list of (PROPERTY_ID, FILLER_CLASS_ID) tuples
Assocation Access¶
Factory¶
-
class
ontobio.assoc_factory.
AssociationSetFactory
[source]¶ Factory for creating AssociationSets
Currently support for golr (GO and Monarch) is provided but other stores possible
initializes based on an ontology name
-
create
(ontology=None, subject_category=None, object_category=None, evidence=None, taxon=None, relation=None, file=None, fmt=None, skim=True)[source]¶ creates an AssociationSet
Currently, this uses an eager binding to a ontobio.golr instance. All compact associations for the particular combination of parameters are fetched.
Parameters: - ontology (an Ontology object) –
- subject_category (string representing category of subjects (e.g. gene, disease, variant)) –
- object_category (string representing category of objects (e.g. function, phenotype, disease)) –
- taxon (string holding NCBITaxon:nnnn ID) –
-
Assocation Object Model¶
-
class
ontobio.assocmodel.
AssociationSet
(ontology=None, association_map=None, subject_label_map=None, meta=None)[source]¶ An object that represents a collection of associations
NOTE: the intention is that this class can be subclassed to provide either high-efficiency implementations, or implementations backed by services or external stores. The default implementation is in-memory.
NOTE: in general you do not need to call this yourself. See assoc_factory
initializes an association set, which minimally consists of:
- an ontology (e.g. GO, HP)
- a map between subjects (e.g genes) and sets/lists of term IDs
-
annotations
(subject_id)[source]¶ Returns a list of classes used to describe a subject
@Deprecated: use objects_for_subject
-
as_dataframe
(fillna=True, subjects=None)[source]¶ Return association set as pandas DataFrame
Each row is a subject (e.g. gene) Each column is the inferred class used to describe the subject
-
associations
(subject, object=None)[source]¶ Given a subject-object pair (e.g. gene id to ontology class id), return all association objects that match.
-
enrichment_test
(subjects=None, background=None, hypotheses=None, threshold=0.05, labels=False, direction='greater')[source]¶ Performs term enrichment analysis.
Parameters: - subjects (string list) – Sample set. Typically a gene ID list. These are assumed to have associations
- background (string list) – Background set. If not set, uses full set of known subject IDs in the association set
- threshold (float) – p values above this are filtered out
- labels (boolean) – if true, labels for enriched classes are included in result objects
- direction ('greater', 'less' or 'two-sided') – default is greater - i.e. enrichment test. Use ‘less’ for depletion test.
-
index
()[source]¶ Creates indexes based on inferred terms.
You do not need to call this yourself; called on initialization
-
inferred_types
(subj)[source]¶ Returns: set of reflexive inferred types for a subject.
E.g. if a gene is directly associated with terms A and B, and these terms have ancestors C, D and E then the set returned will be {A,B,C,D,E}
Parameters: - ID string (subj) – Returns: set of class IDs
-
static
intersectionlist_to_matrix
(ilist, xterms, yterms)[source]¶ WILL BE DEPRECATED
Replace with method to return pandas dataframe
-
jaccard_similarity
(s1, s2)[source]¶ Calculate jaccard index of inferred associations of two subjects
-
label
(id)[source]¶ return label for a subject id
Will make use of both the ontology and the association set
-
query
(terms=None, negated_terms=None)[source]¶ Basic boolean query, using inference.
Parameters: - terms (-) –
list
list of class ids. Returns the set of subjects that have at least one inferred annotation to each of the specified classes.
- negated_terms (-) –
list
list of class ids. Filters the set of subjects so that there are no inferred annotations to any of the specified classes
- terms (-) –
-
query_associations
(subjects=None, infer_subjects=True, include_xrefs=True)[source]¶ Query for a set of associations.
Note: only a minimal association model is stored, so all results are returned as (subject_id,class_id) tuples
Parameters: - subjects –
list
list of subjects (e.g. genes, diseases) used to query associations. Any association to one of these subjects or a descendant of these subjects (assuming infer_subjects=True) are returned.
- infer_subjects –
boolean (default true)
See above
- include_xrefs –
boolean (default true)
If true, then expand inferred subject set to include all xrefs of those subjects.
Example: if a high level disease node (e.g. DOID:14330 Parkinson disease) is specified, then the default behavior (infer_subjects=True, include_xrefs=True) and the ontology includes DO, results will include associations from both descendant DOID classes, and all xrefs (e.g. OMIM)
- subjects –
-
query_intersections
(x_terms=None, y_terms=None, symmetric=False)[source]¶ Query for intersections of terms in two lists
- Return a list of intersection result objects with keys:
- x : term from x
- y : term from y
- c : count of intersection
- j : jaccard score
TODO - detailed association modeling
Association File Parsers¶
-
class
ontobio.io.gafparser.
GafParser
(config=None, group='unknown', dataset='unknown', bio_entities=None)[source]¶ Parser for GO GAF format
config : a AssocParserConfig object
-
association_generator
(file, skipheader=False, outfile=None) → Dict[KT, VT]¶ Returns a generator that yields successive associations from file
Yields: association
-
map_to_subset
(file, outfile=None, ontology=None, subset=None, class_map=None, relations=None)¶ Map a file to a subset, writing out results
You can pass either a subset name (e.g. goslim_generic) or a dictionary with ready-made mappings
Parameters: - file (file) – Name or file object for input assoc file
- outfile (file) – Name or file object for output (mapped) assoc file; writes to stdout if not set
- subset (str) – Optional name of subset to map to, e.g. goslim_generic
- class_map (dict) – Mapping between asserted class ids and ids to map to. Many to many
- ontology (Ontology) – Ontology to extract subset from
-
parse
(file, skipheader=False, outfile=None)¶ Parse a line-oriented association file into a list of association dict objects
Note the returned list is of dict objects. TODO: These will later be specified using marshmallow and it should be possible to generate objects
Parameters: - file (file or string) – The file is parsed into association objects. Can be a http URL, filename or file-like-object, for input assoc file
- outfile (file) – Optional output file in which processed lines are written. This a file or file-like-object
Returns: Associations generated from the file
Return type: list
-
parse_line
(line)[source]¶ Parses a single line of a GAF
Return a tuple (processed_line, associations). Typically there will be a single association, but in some cases there may be none (invalid line) or multiple (disjunctive clause in annotation extensions)
Note: most applications will only need to call this directly if they require fine-grained control of parsing. For most purposes, :method:`parse_file` can be used over the whole file
Parameters: line (str) – A single tab-seperated line from a GAF file
-
skim
(file)[source]¶ Lightweight parse of a file into tuples.
Note this discards metadata such as evidence.
Return a list of tuples (subject_id, subject_label, object_id)
-
upgrade_empty_qualifier
(assoc: ontobio.model.association.GoAssociation) → ontobio.model.association.GoAssociation[source]¶ From https://github.com/geneontology/go-site/issues/1558
For GAF 2.1 we will apply an algorithm to find a best fit relation if the qualifier column is empty. If the qualifiers field is empty, then:
If the GO Term is exactly GO:008150 Biological Process, then the qualifier should be involved_in If the GO Term is exactly GO:0008372 Cellular Component, then the qualifer should be is_active_in If the GO Term is a Molecular Function, then the new qualifier should be enables If the GO Term is a Biological Process, then the new qualifier should be acts_upstream_or_within Otherwise for Cellular Component, if it’s subclass of anatomical structure, than use `located_in
and if it’s a protein-containing complexes, use part_ofParameters: assoc – GoAssociation Returns: the possibly upgraded GoAssociation
-
Go Rules¶
-
class
ontobio.io.qc.
FailMode
¶ An enumeration.
-
class
ontobio.io.qc.
GoRules
¶ An enumeration.
-
class
ontobio.io.qc.
GoRulesResults
(all_results, annotation)¶ Create new instance of GoRulesResults(all_results, annotation)
-
all_results
¶ Alias for field number 0
-
annotation
¶ Alias for field number 1
-
-
class
ontobio.io.qc.
RepairState
¶ An enumeration.
-
ontobio.io.qc.
ResultType
¶ alias of
ontobio.io.qc.Result
-
class
ontobio.io.qc.
TestResult
(result_type: ontobio.io.qc.Result, message: str, result)[source]¶ Represents the result of a single association.GoAssociation being validated on some rule
Create a new TestResult
Parameters: - result_type (ResultType) – enum of PASS, WARNING, ERROR. Both WARNINGs and ERRORs are reported, but ERROR will filter the offending GoAssociation
- message (str) – Description of the failure of GoAssociation to pass a rule. This is usually just the rule title
- result – [description] True if the GoAssociation passes, False if not. If it’s repaired, this is the updated, repaired, GoAssociation
-
ontobio.io.qc.
repair_result
(repair_state: ontobio.io.qc.RepairState, fail_mode: ontobio.io.qc.FailMode) → ontobio.io.qc.Result[source]¶ Returns ResultType.PASS if the repair_state is OKAY, and WARNING if REPAIRED.
This is used by RepairRule implementations.
Parameters: - repair_state (RepairState) – If the GoAssocition was repaired during a rule, then this should be RepairState.REPAIRED, otherwise RepairState.OKAY
- fail_mode (FailMode) – [description]
Returns: [description]
Return type: ResultType
GoAssociation internal Model¶
This contains the data model for parsing annotations from GAF and GPAD.
The idea is to make it easy to parse text lines of any source into a GoAssociation object and then give the GoAssociation object the ability to convert itself into GPAD or GAF of any version. Or any other format that is required.
-
class
ontobio.model.association.
ConjunctiveSet
(elements: List[T])[source]¶ This respresents a comma separated list of objects which can be turned into strings.
This is used for the with/from and extensions fields in the GoAssociation.
The field elements can be a list of Curie or ExtensionUnit. Curie for with/from, and ExtensionUnit for extensions field.
-
display
(conjunct_to_str=<function ConjunctiveSet.<lambda>>) → str[source]¶ Convert this ConjunctiveSet to a string separateted by commas.
This calls conjunct_to_str (which defaults to str) on each element before joining. To use a different string representation of each element, pass in a different function. This functionality is used to differentiate between GPAD 1.2 and GPAD 2.0, where relations are written differently per version.
-
classmethod
list_to_str
(conjunctions: List[T], conjunct_to_str=<function ConjunctiveSet.<lambda>>) → str[source]¶ List should be a list of ConjunctiveSet Given [ConjunctiveSet, ConjunctiveSet], this will call ConjunctiveSet.display() using the conjunct_to_str function (which defaults to str) and join them with a pipe.
To have elements of the ConjunctiveSet displayed differently, use a different conjunct_to_str function. This functionality is used to differentiate between GPAD 1.2 and GPAD 2.0, where relations are written differently per version.
-
classmethod
str_to_conjunctions
(entity: str, conjunct_element_builder: Union[C, ontobio.model.association.Error] = <function ConjunctiveSet.<lambda>>) → Union[List[C], ontobio.model.association.Error][source]¶ Takes a field that conforms to the pipe (|) and comma (,) separator type. The parsed version is a list of pipe separated values which are themselves a comma separated list.
If the elements inside the comma separated list should not just be strings, but be converted into a value of a type, conjunct_element_builder can be provided which should take a string and return a parsed value or an instance of an Error type (defined above).
If there is an error in producing the values of the conjunctions, then this function will return early with the error.
This function will return a List of ConjunctiveSet
-
-
class
ontobio.model.association.
Curie
(namespace: str, identity: str)[source]¶ Object representing a Compact URI, with a namespace identifier along with an ID, like GO:1234567.
Use from_str to parse a string like “GO:1234567” into a Curie. The result should be checked for errors with is_error
-
class
ontobio.model.association.
Date
(year, month, day, time)¶ Create new instance of Date(year, month, day, time)
-
day
¶ Alias for field number 2
-
month
¶ Alias for field number 1
-
time
¶ Alias for field number 3
-
year
¶ Alias for field number 0
-
-
class
ontobio.model.association.
Evidence
(type: ontobio.model.association.Curie, has_supporting_reference: List[ontobio.model.association.Curie], with_support_from: List[ontobio.model.association.ConjunctiveSet])[source]¶
-
class
ontobio.model.association.
ExtensionUnit
(relation: ontobio.model.association.Curie, term: ontobio.model.association.Curie)[source]¶ An ExtensionUnit is a single element of the extensions field of GAF or GPAD. This consists of a relation and a term.
Create an ExtensionUnit with from_str or from_curie_str. If there is an error in parsing then Error is returned. Results from these functions should be checked for Error.
The string representation will depend on the format, and so the display method should be used. By default this will write the relation using the label with undercores (example: part_of) as defined in ontobio.rdfgen.relations.py. To write the relation as a CURIE (as in gpad 2.0), set parameter use_rel_label to True.
-
display
(use_rel_label=False)[source]¶ Turns the ExtensionUnit into a string. By default this uses the ontobio.rdfgen.relations module to lookup the relation label. To use the CURIE instead, pass use_rel_label=True.
-
-
class
ontobio.model.association.
GoAssociation
(source_line: Optional[str], subject: ontobio.model.association.Subject, relation: ontobio.model.association.Curie, object: ontobio.model.association.Term, negated: bool, qualifiers: List[ontobio.model.association.Curie], aspect: Optional[NewType.<locals>.new_type], interacting_taxon: Optional[ontobio.model.association.Curie], evidence: ontobio.model.association.Evidence, subject_extensions: List[ontobio.model.association.ExtensionUnit], object_extensions: List[ontobio.model.association.ConjunctiveSet], provided_by: NewType.<locals>.new_type, date: ontobio.model.association.Date, properties: List[Tuple[str, str]])[source]¶ The internal model used by the parsers and qc Rules engine that all annotations are parsed into.
If an annotation textual line cannot be parsed into a GoAssociation then it is not a well formed line.
This class provides several methods to convert this GoAssociation into other representations, like GAF and GPAD of each version, as well as the old style dictionary Association that this class replaced (for compatibility if needed).
Each parser has its own function or functions that converts an annotation line into a GoAssociation, and this is the first phase of parsing. In general, GoAssociations are only created by the parsers.
-
to_gaf_2_1_tsv
() → List[T][source]¶ Converts the GoAssociation into a “TSV” columnar GAF 2.1 row as a list of strings.
-
to_gaf_2_2_tsv
() → List[T][source]¶ Converts the GoAssociation into a “TSV” columnar GAF 2.2 row as a list of strings.
-
to_gpad_1_2_tsv
() → List[T][source]¶ Converts the GoAssociation into a “TSV” columnar GPAD 1.2 row as a list of strings.
-
-
class
ontobio.model.association.
Subject
(id: ontobio.model.association.Curie, label: str, fullname: List[str], synonyms: List[str], type: Union[List[str], List[ontobio.model.association.Curie]], taxon: ontobio.model.association.Curie, encoded_by: List[ontobio.model.association.Curie] = None, parents: List[ontobio.model.association.Curie] = None, contained_complex_members: List[ontobio.model.association.Curie] = None, db_xrefs: List[ontobio.model.association.Curie] = None, properties: Dict = None)[source]¶ -
contained_complex_members
= None¶ Optional, or cardinality 0+
-
db_xrefs
= None¶ Optional, or cardinality 0+
-
encoded_by
= None¶ Optional, or cardinality 0+
-
fullname
= None¶ fullname is also DB_Object_Name in the GPI spec, cardinality 0+
-
fullname_field
(max=None) → str[source]¶ Converts the fullname or DB_Object_Name into the field text string used in files
-
label
= None¶ label is also DB_Object_Symbol in the GPI spec
-
parents
= None¶ Optional, or cardinality 0+
-
properties
= None¶ Optional, or cardinality 0+
-
synonyms
= None¶ Cardinality 0+
-
taxon
= None¶ …
Type: Should be NCBITaxon
-
type
= None¶ In GPI 1.2, this was a string, corresponding to labels of the Sequence Ontology gene, protein_complex; protein; transcript; ncRNA; rRNA; tRNA; snRNA; snoRNA, any subclass of ncRNA. If the specific type is unknown, use gene_product.
When reading gpi 1.2, these labels should be mapped to the 2.0 spec, stating that the type must be a Curie in the Sequence Ontology OR Protein Ontology OR Gene Ontology
In GPI 1.2, there is only 1 value, and is required. In GPI 2.0 there is a minimum of 1, but maybe more.
If writing out to GPI 1.2/GAF just take the first value in the list.
-
-
class
ontobio.model.association.
Term
(id: ontobio.model.association.Curie, taxon: ontobio.model.association.Curie)[source]¶ Represents a Gene Ontology term
-
ontobio.model.association.
TwoTupleStr
(items: List[str]) → tuple[source]¶ Create a tuple of of str that is guaranteed to be of length two from a list
If the list is larger, then only the first two elements will be used. If the list is smaller, then the empty string will be used
-
ontobio.model.association.
gp_type_label_to_curie
(type: ontobio.model.association.Curie) → str[source]¶ This is the reverse of map_gp_type_label_to_curie
-
ontobio.model.association.
map_gp_type_label_to_curie
(type_label: str) → ontobio.model.association.Curie[source]¶ Map entity types in GAF or GPI 1.2 into CURIEs in Sequence Ontology (SO), Protein Ontology (PRO), or Gene Ontology (GO).
This is a measure to upgrade the pseudo-labels into proper Curies. Present here are the existing set of labels in current use, and how they should be mapped into CURIEs.
GOlr Queries¶
-
class
ontobio.golr.golr_query.
GolrAssociationQuery
(subject_category=None, object_category=None, relation=None, relationship_type=None, subject_or_object_ids=None, subject_or_object_category=None, subject=None, subjects=None, object=None, objects=None, subject_direct=False, object_direct=False, subject_taxon=None, subject_taxon_direct=False, object_taxon=None, object_taxon_direct=False, invert_subject_object=None, evidence=None, exclude_automatic_assertions=False, q=None, id=None, use_compact_associations=False, include_raw=False, field_mapping=None, solr=None, config=None, url=None, select_fields=None, fetch_objects=False, fetch_subjects=False, fq=None, slim=None, json_facet=None, iterate=False, map_identifiers=None, facet_fields=None, facet_field_limits=None, facet_limit=25, facet_mincount=1, facet_pivot_fields=None, stats=False, stats_field=None, facet=True, pivot_subject_object=False, unselect_evidence=False, rows=10, start=None, homology_type=None, non_null_fields=None, user_agent=None, association_type=None, sort=None, **kwargs)[source]¶ A Query object providing a higher level of abstraction over either GO or Monarch Solr indexes
All of these can be set when creating a new object
fetch_objects : bool
we frequently want a list of distinct association objects (in the RDF sense). for example, when querying for all phenotype associations for a gene, it is convenient to get a list of distinct phenotype terms. Although this can be obtained by iterating over the list of associations, it can be expensive to obtain all associations.
Results are in the ‘objects’ field
fetch_subjects : bool
This is the analog of the fetch_objects field. Note that due to an inherent asymmetry by which the list of subjects can be very large (e.g. all genes in all species for “metabolic process” or “metabolic phenotype”) it’s necessary to combine this with subject_category and subject_taxon filters
Results are in the ‘subjects’ field
slim : List
a list of either class ids (or in future subset ids), used to map up (slim) objects in associations. This will populate an additional ‘slim’ field in each association object corresponding to the slimmed-up value(s) from the direct objects. If fetch_objects is passed, this will be populated with slimmed IDs.evidence: String
Evidence class from ECO. Inference is used.exclude_automatic_assertions : bool
If true, then any annotations with ECO evidence code for IEA or subclasses will be excluded.use_compact_associations : bool
If true, then the associations list will be false, instead compact_associations contains a more compact representation consisting of objects with (subject, relation and objects)config : Config
See Config for details. The config object can be used to set values for the solr instance to be queriedTODO - Extract params into their own object
Fetch a set of association objects based on a query.
-
exec
(**kwargs)[source]¶ Execute solr query
Result object is a dict with the following keys:
- raw
- associations : list
- compact_associations : list
- facet_counts
- facet_pivot
-
solr_params
()[source]¶ Generate HTTP parameters for passing to Solr.
In general you should not need to call this directly, calling exec() on a query object will transparently perform this step for you.
-
translate_doc
(d, field_mapping=None, map_identifiers=None, **kwargs)[source]¶ Translate a solr document (i.e. a single result row)
-
translate_docs_compact
(ds, field_mapping=None, slim=None, map_identifiers=None, invert_subject_object=False, **kwargs)[source]¶ Translate golr association documents to a compact representation
-
-
class
ontobio.golr.golr_query.
GolrSearchQuery
(term=None, category=None, is_go=False, url=None, solr=None, config=None, fq=None, fq_string=None, hl=True, facet_fields=None, facet=True, search_fields=None, taxon_map=True, rows=100, start=None, prefix=None, boost_fx=None, boost_q=None, highlight_class=None, taxon=None, min_match=None, minimal_tokenizer=False, include_eqs=False, exclude_groups=False, user_agent=None)[source]¶ Controller for monarch and go solr search cores Queries over a search document
Lexmap¶
-
class
ontobio.lexmap.
LexicalMapEngine
(wsmap={'': '', 'a': '', 'i': '1', 'ii': '2', 'iii': '3', 'iv': '4', 'ix': '9', 'of': '', 'the': '', 'v': '5', 'vi': '6', 'vii': '7', 'viii': '8', 'x': '10', 'xi': '11', 'xii': '12', 'xiii': '13', 'xiv': '14', 'xix': '19', 'xv': '15', 'xvi': '16', 'xvii': '17', 'xviii': '18', 'xx': '20'}, config=None)[source]¶ generates lexical matches between pairs of ontology classes
Parameters: - wdmap (dict) – maps words to normalized synonyms.
- config (dict) – A configuration conforming to LexicalMapConfigSchema
-
cliques
(xg)[source]¶ Return all equivalence set cliques, assuming each edge in the xref graph is treated as equivalent, and all edges in ontology are subClassOf
Parameters: xg (Graph) – an xref graph Returns: Return type: list of sets
-
get_xref_graph
()[source]¶ Generate mappings based on lexical properties and return as nx graph.
- A dictionary is stored between ref:Synonym values and synonyms. See ref:index_synonym. Note that Synonyms include the primary label
- Each key in the dictionary is examined to determine if there exist two Synonyms from different ontology classes
This avoids N^2 pairwise comparisons: instead the time taken is linear
After initial mapping is made, additional scoring is performed on each mapping
The return object is a nx graph, connecting pairs of ontology classes.
Edges are annotated with metadata about how the match was found:
- syns: pair
- pair of Synonym objects, corresponding to the synonyms for the two nodes
- score: int
- score indicating strength of mapping, between 0 and 100
Returns: nx graph (bidirectional) Return type: Graph
-
index_ontology
(ont)[source]¶ Adds an ontology to the index
This iterates through all labels and synonyms in the ontology, creating an index
-
index_synonym
(syn, ont)[source]¶ Index a synonym
Typically not called from outside this object; called by index_ontology