Species Naming ConventionsΒΆ
This page defines the standardized naming conventions for species in SPARCED, designed for compatibility with PEtab, SBML, and PCRE2.
General PrinciplesΒΆ
Uniqueness: Each species name must be unique across the model.
Readability: Names should be human-readable and concise, while avoiding ambiguity.
Compatibility: Naming conventions comply with PEtab, SBML, and PCRE2 standards.
Minimal Redundancy: Compartments are specified only if the species resides outside its βhomeβ compartment (e.g., TGFb is extracellular by default).
Naming StructureΒΆ
The general structure for species names is as follows:
[Modifiers][Residue&Position]_[BaseSpecies][__[AdditionalSpecies]][_Compartment]
Where:ΒΆ
Modifiers: Prefixes indicating the modification state (e.g., p for phosphorylation, u for ubiquitination).
Residue&Position: For species with specific post-translational modification (PTM) sites, denote the residue and its position (e.g., S15 for serine 15).
BaseSpecies: The core species name (e.g., CyclinD, Cdk4).
AdditionalSpecies: For complexes, additional species are concatenated using __ (double underscores).
Compartment: The compartment is specified only when the species is outside its default βhomeβ compartment.
ExamplesΒΆ
Single Species with PTM:
pS15_TGFb_cytoplasm: TGFb phosphorylated at serine 15, located in the cytoplasm.
Two-Component Complex:
pS15_CyclinD__pT298_Cdk4_nucleus: CyclinD phosphorylated at serine 15 and Cdk4 phosphorylated at threonine 298, in the nucleus.
Multi-Component Complex:
pS15_CyclinD__pT298_Cdk4__pY104_EGFR_membrane: CyclinD, Cdk4, and EGFR phosphorylated at specific residues, forming a complex at the membrane.
PrefixesΒΆ
The following table summarizes standard prefixes for species modifiers:
Prefix |
Description |
Example |
|---|---|---|
p |
Phosphorylation |
pS15_TGFb |
u |
Ubiquitination |
uK48_TGFbR1 |
m |
Methylation |
mK9_HistoneH3 |
a |
Acetylation |
aK27_HistoneH3 |
g |
Glycosylation |
gN100_EGFR |
c |
Cysteinylation |
cC100_EGFR |
i |
Inactivated state |
i_Cdk4 |
m |
mRNA species of Gene names |
m_CCND1 |
SuffixesΒΆ
The following table summarizes compartment suffixes and their use cases:
Suffix |
Description |
|---|---|
_extracellular |
Species outside the cell. |
_cytoplasm |
Species in the cytoplasm. |
_nucleus |
Species in the nucleus. |
_endosome |
Species bound to a membrane. |
_mitochondria |
Species in the mitochondria. |
Guidelines for ComplexesΒΆ
For complexes, each species is listed in order, separated by double underscores (__). If a component species has PTMs, these are specified as part of its name. Compartments are only appended to the entire complex name, not individual components.
Examples:ΒΆ
Potential PitfallsΒΆ
Ambiguity in Residue Position:
Always specify residues and positions for clarity in PTM names.
Example: Avoid p_CyclinD if pS15_CyclinD is more precise.
Complex Names Becoming Too Long:
Avoid redundancy in component names.
Use short, standard prefixes for PTMs. Look at synonyms on UniProt for inspiration.
Tool Compatibility:
Avoid special characters like parentheses () or symbols like & that might break tools.
Regex ValidationΒΆ
The following regular expression ensures compatibility with PEtab, SBML, and PCRE2:
^[a-zA-Z0-9_]+(__[a-zA-Z0-9_]+)*(_[a-zA-Z0-9_]+)?$
This regex enforces: - Alphanumeric names with underscores. - Double underscores (__) for complex species. - Optional compartment suffixes.
Automating NamingΒΆ
To minimize errors and maintain consistency, consider automating the naming process with a script. Below is an example Python snippet for generating valid species names:
def generate_species_name(base, ptms=None, compartment=None):
name = base
if ptms:
ptm_str = "__".join([f"{m}{r}" for m, r in ptms])
name = f"{ptm_str}_{name}"
if compartment:
name = f"{name}_{compartment}"
return name
# Example Usage
print(generate_species_name("CyclinD", ptms=[("pS", 15)], compartment="nucleus"))
# Output: pS15_CyclinD_nucleus
β