SPARCED input file format specification¶
This page explains the SPARCED data format, which is a standardized format for encoding biological models in a machine-readable format. SPARCED is designed to be compatible with PEtab, SBML, and PCRE2, and is intended to facilitate the exchange of biological models between different software tools.
The SPARCED file format is not trying to re-invent an already well defined and established format (i.e. SBML), we simply want to provide systems biologists with a standardized format with the goal of merging various desparate pathway models into a single kinetic model. Constructing the SPARCED model includes the use of such tools, including conversion from the tabular format sepcified here to Antimony, then to SBML, and finally to AMICI in order for the model to be simulated.
General Principles¶
Model-Merging: SPARCED files are intended to be easily merged with other SPARCED files, allowing users to combine models from different sources.
Compatibility: SPARCED is designed to be converted from the tabular format specified here (inspired by PEtab) to Antimony, SBML, and AMICI models.
Human-Readable: SPARCED files are designed to be human-readable, with clear and concise syntax that is easy to understand.
Machine-Readable: SPARCED files are also machine-readable, with a well-defined structure that can be parsed by software tools.
File Organization¶
SPARCED files are organized into 7 tabular files, each of which contains a specific type of information critical towards building systems biology models:
Species: Defines the species in the model, including their names, initial concentrations, compartment locations, and UniProt annotations.
RateLaws: Defines the rate laws for the model, including the reaction names, home compartments, equations, kinetic parameters, and references.
Compartments: Defines the compartments in the model, including their names, sizes, and GO annotations.
Observables: Defines the model observables; the compartment-corrected summations of all formats of a protein.
OmicsData (optional): Defines the omics data used to constrain the model, including the data type, values, and references.
GeneReg (optional): Defines transcriptional activation and inhibition relationships between genes and transcription factors.
Initializer (optional): Defines information for model initialization.
Remarks - all model entities, column names, and row names are case-sensitive.
Species table¶
The Species table defines the species in the model, including their names, initial concentrations, compartment locations, and UniProt annotations. Further, contains information about the species in the deterministic module. Each row corresponds to one species (protein, protein complex, post-transcriptionally modified species). Transcripts (in nM) are also included in this file because they are regarded as species with updated concentrations in the stochastic module every 30(s) and are used in translation rate laws.
speciesId |
compartment |
initialConcentration |
UniProt |
|---|---|---|---|
STRING |
STRING |
FLOAT |
STRING |
e.g. |
|||
Cd__Cdk4__p27 |
cytoplasm |
0.787522147 |
P24385,P30279, P30281, P11802, Q00534, P46527 |
Detailed field descriptions¶
speciesId [STRING, NOT NULL]: The unique identifier for the species, following
the SPARCED species naming convention. Strict adherence to the naming convention is required for compatibility with PEtab, SBML, and PCRE2.
see Species-Nomenclature <https://sparced.readthedocs.io/en/latest/tutorials/Building-SPARCED-Input-Files/Species-Nomenclature.html>_.
for more information.
- compartment [STRING, NOT NULL]: The compartment in which the species resides.
Must be one of the compartments defined in the Compartments table.
Must be consistent with the compartment specified in the species name.
- initialConcentration [FLOAT, NOT NULL]: The initial concentration of the species.
Must be a non-negative number.
- UniProt [STRING, OPTIONAL]: The UniProt identifiers for the species.
Multiple UniProt identifiers should be separated by commas.
Must be consistent with the species name.
Preferably, in the same order as the species name.
For species identifiers representing multiple proteins (i.e. Cd represents in
this case CyclinD1, CyclinD2, and CyclinD3), the UniProt identifiers should be separated by commas in alpha-numeric order.
Ratelaws Table¶
The Ratelaws table defines the reactions in the deterministic module, including reaction names, home compartments, rate laws, and parameters. Each row corresponds to a single reaction, and the order of rows must align with
the columns in the StoichiometricMatrix input file. Reactions can follow either a simple mass-action law or a complex rate law formula with parameters defined explicitly.
Column |
Description |
Example |
|---|---|---|
reactionId |
STRING |
vC23 |
compartment |
STRING |
Nucleus |
rateLaw |
STRING or FLOAT |
kC23_1*(Cd__Cdk4/(kC23_2+Cd__Cdk4)) |
parameter_1 |
FLOAT (OPTIONAL) |
0.09444444 |
parameter_2 |
FLOAT (OPTIONAL) |
10 |
Detailed Field Descriptions¶
`reactionId` [STRING, NOT NULL]: The unique identifier for the reaction. - Must be unique in the file and typically follows a naming convention indicating the sub-module (e.g., vA1 for Apoptosis, vC1 for Cell Cycle).
`compartment` [STRING, NOT NULL]: The home compartment where the reaction occurs. - Must match a defined compartment in the Compartments table. - Defines the effective search volume for reactants and products, and volumetric corrections may apply for species in different compartments.
`rateLaw` [STRING or FLOAT, NOT NULL]: Specifies the rate law for the reaction. - If a FLOAT is provided, the reaction follows a mass-action law, and the value is the rate constant (units: nM/s). - If a STRING is provided, the reaction follows the specified formula, which must include species names and parameter names consistent with the model.
`parameter_n` [FLOAT, OPTIONAL]: Parameters used in the rate law formula. - Parameter names must start with k and be unique within the formula (e.g., k1, k2). - Values are provided in appropriate units (e.g., nM or seconds). - Parameters are automatically renamed during model generation for consistency.
Notes for Users¶
The compartments for reactions must be defined in the Compartments table and
are used to rescale concentrations when reactants and products reside in different volumes. - Parameters in rate law formulas are extracted and renamed in the ParamsAll output file for reference. - Ensure the number and order of rows in this file match the columns in the StoichiometricMatrix.
Compartments Table¶
The Compartments table specifies the cellular compartments in the model, including their names, volumes, and corresponding Gene Ontology (GO) terms. These compartments define the spatial context for species and reactions, ensuring consistency across the model’s input files.
Column |
Description |
Example |
|---|---|---|
compartmentId |
STRING |
cytoplasm |
volume |
FLOAT (LITERS) |
2.1e-12 |
goTerm |
STRING |
GO:0005737 |
Detailed Field Descriptions¶
`compartmentId` [STRING, NOT NULL]: The unique identifier for the compartment. - Must match the compartment names listed in the Species and Ratelaws input files.
`volume` [FLOAT, NOT NULL]: The volume of the compartment in liters. - Must be a non-negative value. - Defines the physical size of the compartment for scaling concentrations and reactions.
`goTerm` [STRING, OPTIONAL]: The Gene Ontology (GO) term associated with the compartment. - Provides a standardized identifier for the compartment’s biological context. - Example: GO:0005737 for cytoplasm.
Notes for Users¶
The compartment names must be consistent across all input files, including Species and Ratelaws.
Volumes are used to calculate concentration-based scaling factors when species
and reactions involve multiple compartments. - GO terms are optional but recommended for better integration with external databases and annotation tools.
Observables table¶
The Observables table defines the mapping of model species to the measurable quantities (observables) used for simulations and analysis. Each observable corresponds to the compartmental-volume-corrected summation of
all formats of a protein. Entries in this table indicate whether a specific species contributes to a given observable.
Column |
Description |
Example |
|---|---|---|
observableId |
STRING: The unique identifier for each observable. |
pEGFR |
speciesId |
STRING: The unique identifier for each species, following SPARCED species naming conventions. |
EGFR_Y1068 |
compartment |
STRING: The compartment associated with the species. Must align with the compartments listed in the Compartments table. |
cytoplasm |
inObservable |
INTEGER: Binary indicator (1 if the species contributes to the observable, 0 otherwise). |
1 |
Detailed field descriptions¶
observableId [STRING, NOT NULL]: The unique identifier for each observable,
which typically corresponds to a measurable feature of interest, such as the total phosphorylation of a receptor.
Observables are defined based on experimental data and model requirements.
Ensure unique naming conventions to prevent conflicts during model generation.
- speciesId [STRING, NOT NULL]: The unique identifier for the species that may contribute to the observable.
Matches species defined in the Species table.
- compartment [STRING, OPTIONAL]: The compartment in which the species resides.
Should match the compartment associated with the species in the Species and Compartments tables.
- inObservable [INTEGER, NOT NULL]: Indicates whether the species contributes to the observable.
Must be either 1 (species contributes) or 0 (species does not contribute).
This binary mapping allows summation of species concentrations to compute observable values.
Usage¶
The Observables table is used as input to the AMICI model compiler during the simulation process. It defines how species concentrations are aggregated into
- observables, enabling the calculation of measurable outputs for model validation
and comparison with experimental data.
OmicsData Table¶
The OmicsData table contains information about gene-level parameters, mRNA levels, and protein levels used in the model. It serves as a central repository for integrating gene copy numbers, mRNA molecule counts, and protein abundance with
- rate constants for transcription, translation, and degradation. Each row
corresponds to one gene, identified by its HGNC name, and includes various ebiological and kinetic parameters essential for the deterministic and stochastic modules.
Detailed field descriptions¶
`geneId` [STRING, NOT NULL]: The HGNC identifier of the gene. - Must be unique in the file.
`geneCopyNumber` [INTEGER, NOT NULL]: The number of copies of the gene present in the cell. - Represents genomic-level data.
`mRNA_copyNumber` [FLOAT, NOT NULL]: The number of mRNA molecules per cell (mpc). - Represents transcript-level abundance.
`rateConstant_inactivation` [FLOAT (s⁻¹), NOT NULL]: The rate constant for gene inactivation.
`rateConstant_activation` [FLOAT (s⁻¹), NOT NULL]: The rate constant for gene activation.
`constitutiveTranscription` [FLOAT (molecules/s), OPTIONAL]: Baseline transcription rate for the gene.
`maximalTranscription` [FLOAT (molecules/s), OPTIONAL]: Maximal transcription rate for the gene under activation conditions.
`mRNA_degradation` [FLOAT (s⁻¹), NOT NULL]: The degradation rate constant of the mRNA.
`protein_copyNumber` [INTEGER, OPTIONAL]: The number of protein molecules per cell (mpc). - Represents proteomic-level abundance.
`protein_halfLife` [FLOAT (s), OPTIONAL]: The half-life of the protein in seconds.
`translationRate` [FLOAT (s⁻¹), OPTIONAL]: The rate constant for mRNA translation.
Notes for Users¶
All rate constants are based on data from the Bouhaddou2018 model and literature.
Users can add new genes (rows) using RNA-seq data for mRNA estimation.
For missing rate constants, median values from the existing dataset can provide a reasonable starting point.
GeneReg table¶
The GeneReg table defines transcriptional activation and inhibition interactions in the SPARCED model. Each row corresponds to a gene, and each column corresponds
to a species that acts as an activator or repressor of transcriptional activity.
Detailed field descriptions¶
- geneId [STRING, NOT NULL]: The name of the gene being regulated, written in HGNC format.
Gene names must match the names in the Species table for consistency.
- speciesId_* [STRING, OPTIONAL]: The species acting as regulators of the gene.
Defined in columns corresponding to regulatory species.
Regulatory species must exist in the Species table and can act as activators or repressors.
- regulation [STRING, OPTIONAL]: Describes the regulatory effect of a species on gene transcription.
Format: A; B, where: - A is the Hill coefficient, with positive values for activation and negative values for repression. - B is the half-maximal concentration of the regulatory effect.
If no regulation exists, this field is set to 0.
Usage¶
- The GeneReg table is utilized by the stochastic module of the SPARCED model to
- update mRNA levels during simulations. Non-zero entries define the quantitative
- parameters of transcriptional regulation, which determine how species influence
gene expression.
Guidelines for Extension¶
To include additional transcriptional regulators in the SPARCED model: 1. Add new columns for each additional regulatory species. 2. Populate the columns with the appropriate rate constants in the A; B format. 3. Ensure consistency with the Species and Genes tables for naming and structure.
Initializer file (Optional)¶
- The Initializer file provides optional information for model initialization.
- It defines species concentrations, mRNA level adjustments, parameter values,
observable exclusions, and parameter scan ranges. This file is used to establish a starting point for deterministic simulations, such as serum-starved
MCF10A cells under specific experimental conditions.
Column(s) |
Description |
Example |
|---|---|---|
speciesId, initialConcentration |
STRING, FLOAT: Specifies species and their starting concentrations. This information initializes species concentrations. |
EGFR, 0.5 |
mRNAId, initialLevel |
STRING, FLOAT: Defines adjustments to mRNA levels for specific genes. |
CDKN1A, 2.0 |
parameterId, initialValue, units |
STRING, FLOAT, STRING: Specifies parameter names, their initial values, and associated units. Used for initializing specific model parameters. |
k_deg, 0.01, 1/s |
observableId_excluded |
STRING: Defines observables excluded from translation rate adjustments. Ensures these observables are not modified during initialization. |
Cyclin_D1 |
parameterId, minValue, maxValue |
STRING, FLOAT, FLOAT: Describes the parameter scan range for a single parameter. Used for sensitivity analysis or optimization during initialization. |
k_on, 0.1, 10.0 |
Detailed field descriptions¶
Species Concentrations - speciesId [STRING, OPTIONAL]: Name of the species being initialized.
Must match species names in the Species table.
initialConcentration [FLOAT, OPTIONAL]: Initial concentration of the species (in model-defined units).
mRNA Level Adjustments - mRNAId [STRING, OPTIONAL]: Name of the mRNA species being adjusted. - initialLevel [FLOAT, OPTIONAL]: Initial mRNA level for the species.
Parameter Values - parameterId [STRING, OPTIONAL]: Name of the parameter being initialized.
Parameter names should match those defined in the ParamsAll or Ratelaws file.
initialValue [FLOAT, OPTIONAL]: Starting value of the parameter.
units [STRING, OPTIONAL]: Units of the parameter, consistent with the model specification.
Observable Exclusions - observableId_excluded [STRING, OPTIONAL]: Name of the observable excluded from translation rate adjustments.
Parameter Scan Range - parameterId [STRING, OPTIONAL]: Name of the parameter for single-parameter scans. - minValue [FLOAT, OPTIONAL]: Minimum value in the scan range. - maxValue [FLOAT, OPTIONAL]: Maximum value in the scan range.
Usage¶
The Initializer file is especially useful for setting up simulations where specific biological conditions must be reflected, such as: - Serum-starved MCF10A cells that remain quiescent without external growth factor stimulation. - Customizing initial species levels or parameter values to match experimental data. - Running sensitivity analyses by scanning parameter ranges.