.long

proteopy.read.long(intensities, level=None, *, sample_annotation=None, var_annotation=None, column_map=None, sep=None, fill_na=None, zero_to_na=False, sort_obs_by_annotation=False, verbose=False)[source]
Return type:

AnnData

Parameters:
  • intensities (str | Path | DataFrame)

  • level (Literal['peptide', 'protein'] | None)

  • sample_annotation (str | Path | DataFrame | None)

  • var_annotation (str | Path | DataFrame | None)

  • column_map (dict[str, str] | None)

  • sep (str | None)

  • fill_na (float | None)

  • zero_to_na (bool)

  • sort_obs_by_annotation (bool)

  • verbose (bool)

Read long-format peptide or protein tabular data into an

AnnData container.

The intensities table must be in long format with one row per (sample, feature) measurement. Required columns differ by level:

  • Peptide level: sample_id, intensity, and peptide_id must be present. protein_id may come from the intensities table or from var_annotation; see below.

  • Protein level: sample_id, intensity, and protein_id must all be present.

At peptide level, protein_id is resolved in two steps. If the intensities table already contains protein_id, it is used directly. Otherwise, var_annotation must be supplied and contain both peptide_id and protein_id.

sample_annotation, when supplied, must contain a sample_id column and is merged into adata.obs.

var_annotation, when supplied, must contain a peptide_id column (peptide level) or a protein_id column (protein level) and is merged into adata.var.

Column names that differ from the defaults above can be mapped to the canonical names via column_map.

intensitiesstr | Path | pd.DataFrame

Long-form intensities data. Accepts a file path (str or Path) or a pandas.DataFrame.

level{“peptide”, “protein”}, default None

Select whether to process peptide- or protein-level inputs. This argument is required.

sample_annotationstr | Path | pd.DataFrame, optional

Optional obs annotations. Accepts a file path or DataFrame.

var_annotationstr | Path | pd.DataFrame, optional

Optional var annotations. Accepts a file path or DataFrame. Interpreted as peptide annotations when level="peptide" and as protein annotations when level="protein".

column_mapdict, optional

Optional mapping that specifies custom column names for the expected keys: peptide_id, protein_id, sample_id, intensity.

sepstr, optional

Delimiter passed to pandas.read_csv. If None (the default), the separator is auto-detected from the file extension. Ignored when input is a DataFrame.

fill_nafloat, optional

Optional replacement value for missing intensity entries.

zero_to_nabool, optional

If True, zeros in the AnnData X matrix will be replaced with np.nan.

sort_obs_by_annotationbool, default False

When True, reorder observations to match the order of samples in the annotation (if supplied) or the original intensity table.

verbosebool, optional

If True, print status messages.

AnnData

Structured representation of the long-form intensities ready for downstream analysis.

Example 1: Minimal peptide-level read with protein_id in the intensities DataFrame.

>>> import pandas as pd
>>> import proteopy as pr
>>> intensities = pd.DataFrame({
...     "sample_id": [
...         "S1", "S1", "S2", "S2",
...     ],
...     "peptide_id": [
...         "PEP1", "PEP2", "PEP1", "PEP2",
...     ],
...     "protein_id": [
...         "PROT1", "PROT1", "PROT1", "PROT1",
...     ],
...     "intensity": [
...         12450.0, 8730.0, 15320.0, 6890.0,
...     ],
... })
>>> adata = pr.read.long(
...     intensities, level="peptide",
... )
>>> adata
AnnData object with n_obs × n_vars = 2 × 2
    obs: 'sample_id'
    var: 'peptide_id', 'protein_id'

Example 2: Peptide-level read with protein_id supplied via var_annotation instead of the intensities DataFrame.

>>> intensities = pd.DataFrame({
...     "sample_id": [
...         "S1", "S1", "S2", "S2",
...     ],
...     "peptide_id": [
...         "PEP1", "PEP2", "PEP1", "PEP2",
...     ],
...     "intensity": [
...         12450.0, 8730.0, 15320.0, 6890.0,
...     ],
... })
>>> var_ann = pd.DataFrame({
...     "peptide_id": ["PEP1", "PEP2"],
...     "protein_id": ["PROT1", "PROT1"],
... })
>>> adata = pr.read.long(
...     intensities,
...     level="peptide",
...     var_annotation=var_ann,
... )
>>> adata
AnnData object with n_obs × n_vars = 2 × 2
    obs: 'sample_id'
    var: 'peptide_id', 'protein_id'

Example 3: Minimal protein-level read from a CSV file.

>>> import tempfile
>>> from pathlib import Path
>>> csv_text = (
...     "sample_id,protein_id,intensity

… “S1,PROT1,12450.0

… “S1,PROT2,8730.0

… “S2,PROT1,15320.0

… “S2,PROT2,6890.0

… ) >>> tmp = tempfile.NamedTemporaryFile( … suffix=”.csv”, delete=False, mode=”w”, … ) >>> _ = tmp.write(csv_text) >>> tmp.close() >>> adata = pr.read.long( … Path(tmp.name), level=”protein”, … ) >>> adata AnnData object with n_obs × n_vars = 2 × 2

obs: ‘sample_id’ var: ‘protein_id’

>>> Path(tmp.name).unlink()

Example 4: Protein-level read with non-standard column names remapped via column_map.

>>> intensities = pd.DataFrame({
...     "run": ["S1", "S1", "S2", "S2"],
...     "prot": [
...         "PROT1", "PROT2", "PROT1", "PROT2",
...     ],
...     "quant": [
...         12450.0, 8730.0, 15320.0, 6890.0,
...     ],
... })
>>> adata = pr.read.long(
...     intensities,
...     level="protein",
...     column_map={
...         "sample_id": "run",
...         "protein_id": "prot",
...         "intensity": "quant",
...     },
... )
>>> adata
AnnData object with n_obs × n_vars = 2 × 2
    obs: 'sample_id'
    var: 'protein_id'