.diann
- proteopy.read.diann(diann_output_path, aggr_level, version='1.0.0', **kwargs)[source]
Read a DIA-NN report into an
AnnDataobject.- Parameters:
diann_output_path (str | Path) – Path to the DIA-NN output file. TSV for version
"1.0.0"; parquet for version"1.9.1".aggr_level (str) –
Peptide aggregation level. Accepted values (case-insensitive regex match):
"Precursor.Id"— one row per charge-modified sequence pair; no intensity summing across precursors."Modified.Sequence"— sum precursor quantities per modified peptide sequence."Stripped.Sequence"— sum precursor quantities per unmodified peptide sequence.
version (str, optional) – DIA-NN version string used to select the parsing handler. Floor-matched against supported versions.
**kwargs –
Additional keyword arguments forwarded to the version-specific handler. Common options:
v1.0.0 handler (
_read_diann_v1):precursor_pval_max(float) — maximumQ.Value.gene_pval_max(float) — maximumProtein.Q.Value.global_precursor_pval_max(float) — maximumGlobal.Q.Value.show_input_stats(bool) — print Q-value distributions and proteotypicity fractions before and after filtering.run_parser(callable | None) — function applied to eachRunvalue to transform sample identifiers.fill_na(float | int | None) — value used to replaceNaNentries in the intensity matrix.
v1.9.1 handler (
_read_diann_v1_9_1):max_precursor_q(float | None) — maximumQ.Value.max_protein_q(float | None) — maximumProtein.Q.Value.max_global_precursor_q(float | None) — maximumGlobal.Q.Value.normalized(bool) — usePrecursor.Normalisedinstead ofPrecursor.Quantityas the intensity column.run_parser(callable | None) — function applied to eachRunvalue to transform sample identifiers.fill_na(float | int | None) — value used to replaceNaNentries in the intensity matrix.zero_to_na(bool) — replace zeros withnp.nanbefore returning. Mutually exclusive withfill_na.verbose(bool) — print row counts at each filtering step.
- Returns:
AnnData with shape
(n_samples, n_peptides). Observations (.obs) containsample_id; variables (.var) containpeptide_id,protein_id.- Return type:
ad.AnnData
- Raises:
ValueError – If
versionis below the minimum supported version.ValueError – If
aggr_leveldoes not match any recognised pattern.ValueError – If required columns are absent from the input file (v1.0.0).
ValueError – If no rows remain after Q-value and proteotypicity filtering.
NotImplementedError – If a protein-level
aggr_levelis requested for DIA-NN >= 1.9.1.
Examples
Read a DIA-NN v1.0.0 TSV report at stripped-sequence level:
>>> import proteopy as pr >>> adata = pr.read.diann( ... "report.tsv", ... aggr_level="Stripped.Sequence", ... version="1.0.0", ... precursor_pval_max=0.01, ... gene_pval_max=0.01, ... global_precursor_pval_max=0.01, ... )
Read a DIA-NN v1.9.1 parquet report at precursor level with a custom run-name parser:
>>> import proteopy as pr >>> adata = pr.read.diann( ... "report.parquet", ... aggr_level="Precursor.Id", ... version="1.9.1", ... max_precursor_q=0.01, ... run_parser=lambda s: s.split("/")[-1].split(".")[0], ... verbose=True, ... )