`.long`

proteopy.read.long(intensities, level=None, *, sample_annotation=None, var_annotation=None, column_map=None, sep=None, fill_na=None, zero_to_na=False, sort_obs_by_annotation=False, verbose=False)[source]

Return type:

AnnData

Parameters:

intensities (str | Path | DataFrame)
level (Literal['peptide', 'protein'] | None)
sample_annotation (str | Path | DataFrame | None)
var_annotation (str | Path | DataFrame | None)
column_map (dict[str, str] | None)
sep (str | None)
fill_na (float | None)
zero_to_na (bool)
sort_obs_by_annotation (bool)
verbose (bool)

Read long-format peptide or protein tabular data into an

AnnData container.

The intensities table must be in long format with one row per (sample, feature) measurement. Required columns differ by level:

Peptide level: sample_id, intensity, and peptide_id must be present. protein_id may come from the intensities table or from var_annotation; see below.
Protein level: sample_id, intensity, and protein_id must all be present.

At peptide level, protein_id is resolved in two steps. If the intensities table already contains protein_id, it is used directly. Otherwise, var_annotation must be supplied and contain both peptide_id and protein_id.

sample_annotation, when supplied, must contain a sample_id column and is merged into adata.obs.

var_annotation, when supplied, must contain a peptide_id column (peptide level) or a protein_id column (protein level) and is merged into adata.var.

Column names that differ from the defaults above can be mapped to the canonical names via column_map.

intensitiesstr | Path | pd.DataFrame: Long-form intensities data. Accepts a file path (str or Path) or a pandas.DataFrame.
level{“peptide”, “protein”}, default None: Select whether to process peptide- or protein-level inputs. This argument is required.
sample_annotationstr | Path | pd.DataFrame, optional: Optional obs annotations. Accepts a file path or DataFrame.
var_annotationstr | Path | pd.DataFrame, optional: Optional var annotations. Accepts a file path or DataFrame. Interpreted as peptide annotations when level="peptide" and as protein annotations when level="protein".
column_mapdict, optional: Optional mapping that specifies custom column names for the expected keys: peptide_id, protein_id, sample_id, intensity.
sepstr, optional: Delimiter passed to pandas.read_csv. If None (the default), the separator is auto-detected from the file extension. Ignored when input is a DataFrame.
fill_nafloat, optional: Optional replacement value for missing intensity entries.
zero_to_nabool, optional: If True, zeros in the AnnData X matrix will be replaced with np.nan.
sort_obs_by_annotationbool, default False: When True, reorder observations to match the order of samples in the annotation (if supplied) or the original intensity table.
verbosebool, optional: If True, print status messages.

AnnData: Structured representation of the long-form intensities ready for downstream analysis.

Example 1: Minimal peptide-level read with protein_id in the intensities DataFrame.

>>> import pandas as pd
>>> import proteopy as pr
>>> intensities = pd.DataFrame({
...     "sample_id": [
...         "S1", "S1", "S2", "S2",
...     ],
...     "peptide_id": [
...         "PEP1", "PEP2", "PEP1", "PEP2",
...     ],
...     "protein_id": [
...         "PROT1", "PROT1", "PROT1", "PROT1",
...     ],
...     "intensity": [
...         12450.0, 8730.0, 15320.0, 6890.0,
...     ],
... })
>>> adata = pr.read.long(
...     intensities, level="peptide",
... )
>>> adata
AnnData object with n_obs × n_vars = 2 × 2
    obs: 'sample_id'
    var: 'peptide_id', 'protein_id'

Example 2: Peptide-level read with protein_id supplied via var_annotation instead of the intensities DataFrame.

>>> intensities = pd.DataFrame({
...     "sample_id": [
...         "S1", "S1", "S2", "S2",
...     ],
...     "peptide_id": [
...         "PEP1", "PEP2", "PEP1", "PEP2",
...     ],
...     "intensity": [
...         12450.0, 8730.0, 15320.0, 6890.0,
...     ],
... })
>>> var_ann = pd.DataFrame({
...     "peptide_id": ["PEP1", "PEP2"],
...     "protein_id": ["PROT1", "PROT1"],
... })
>>> adata = pr.read.long(
...     intensities,
...     level="peptide",
...     var_annotation=var_ann,
... )
>>> adata
AnnData object with n_obs × n_vars = 2 × 2
    obs: 'sample_id'
    var: 'peptide_id', 'protein_id'

Example 3: Minimal protein-level read from a CSV file.

>>> import tempfile
>>> from pathlib import Path
>>> csv_text = (
...     "sample_id,protein_id,intensity

“: … “S1,PROT1,12450.0

“: … “S1,PROT2,8730.0

“: … “S2,PROT1,15320.0

“: … “S2,PROT2,6890.0

“

… ) >>> tmp = tempfile.NamedTemporaryFile( … suffix=”.csv”, delete=False, mode=”w”, … ) >>> _ = tmp.write(csv_text) >>> tmp.close() >>> adata = pr.read.long( … Path(tmp.name), level=”protein”, … ) >>> adata AnnData object with n_obs × n_vars = 2 × 2

obs: ‘sample_id’ var: ‘protein_id’

>>> Path(tmp.name).unlink()

Example 4: Protein-level read with non-standard column names remapped via column_map.

>>> intensities = pd.DataFrame({
...     "run": ["S1", "S1", "S2", "S2"],
...     "prot": [
...         "PROT1", "PROT2", "PROT1", "PROT2",
...     ],
...     "quant": [
...         12450.0, 8730.0, 15320.0, 6890.0,
...     ],
... })
>>> adata = pr.read.long(
...     intensities,
...     level="protein",
...     column_map={
...         "sample_id": "run",
...         "protein_id": "prot",
...         "intensity": "quant",
...     },
... )
>>> adata
AnnData object with n_obs × n_vars = 2 × 2
    obs: 'sample_id'
    var: 'protein_id'

.long

`.long`