Actions Reference

BioMapper provides 25+ self-registering actions for biological data processing. All actions follow the 2025 standardizations for parameter naming, context handling, and type safety.

Data Operations

Protein Actions

Metabolite Actions

Chemistry Actions

CHEMISTRY_FUZZY_TEST_MATCH

Quick Reference

Data Operations

Action	Description
`LOAD_DATASET_IDENTIFIERS`	Load biological identifiers from CSV/TSV files
`MERGE_DATASETS`	Combine multiple datasets with deduplication
`FILTER_DATASET`	Apply filtering criteria to datasets
`EXPORT_DATASET`	Export results to various formats including CSV, JSON, and Excel
`CUSTOM_TRANSFORM`	Apply Python expressions to transform data columns
`CUSTOM_TRANSFORM_EXPRESSION`	Enhanced expression-based data transformation
`PARSE_COMPOSITE_IDENTIFIERS`	Parse and extract identifiers from composite fields

Protein Actions

Action	Description
`PROTEIN_EXTRACT_UNIPROT_FROM_XREFS`	Extract UniProt IDs from compound reference fields
`PROTEIN_NORMALIZE_ACCESSIONS`	Standardize protein accession formats

Metabolite Actions

Action	Description
`NIGHTINGALE_NMR_MATCH`	Nightingale NMR platform matching
`SEMANTIC_METABOLITE_MATCH`	AI-powered semantic matching
`HMDB_VECTOR_MATCH`	Vector-based metabolite matching using HMDB embeddings
`METABOLITE_FUZZY_STRING_MATCH`	Fuzzy string matching for metabolite names
`METABOLITE_RAMPDB_BRIDGE`	RaMP database integration for metabolite mapping
`PROGRESSIVE_SEMANTIC_MATCH`	Progressive multi-stage semantic metabolite matching

Chemistry Actions

Action	Description
`CHEMISTRY_FUZZY_TEST_MATCH`	Match clinical test names using fuzzy string matching

Integration Actions

Action	Description
`SYNC_TO_GOOGLE_DRIVE_V2`	Upload and sync results to Google Drive with chunked transfer
`SYNC_TO_GOOGLE_DRIVE_V3`	Enhanced Google Drive sync with improved error handling

Usage Example

name: protein_processing_workflow
description: Complete protein data processing and export

steps:
  - name: load_source
    action:
      type: LOAD_DATASET_IDENTIFIERS
      params:
        file_path: "/data/ukbb_proteins.csv"
        identifier_column: "uniprot"
        output_key: "source_data"

  - name: extract_uniprot_ids
    action:
      type: PROTEIN_EXTRACT_UNIPROT_FROM_XREFS
      params:
        input_key: "source_data"
        xref_column: "protein_refs"
        output_key: "extracted_data"

  - name: normalize_accessions
    action:
      type: PROTEIN_NORMALIZE_ACCESSIONS
      params:
        input_key: "extracted_data"
        accession_column: "uniprot_id"
        output_key: "normalized_data"

  - name: filter_results
    action:
      type: FILTER_DATASET
      params:
        input_key: "normalized_data"
        conditions:
          - "confidence > 0.8"
        output_key: "filtered_data"

  - name: export_results
    action:
      type: EXPORT_DATASET
      params:
        input_key: "filtered_data"
        file_path: "/results/processed_proteins.csv"

2025 Standardizations

Parameter Naming

All actions use standardized parameter names:

input_key (not dataset_key, data_key)
output_key (not result_key, target_key)
file_path (not filepath, input_file)

Context Handling

Actions use UniversalContext wrapper for robust context handling across different execution environments.

Type Safety

Actions inherit from TypedStrategyAction with Pydantic parameter validation and structured results.

Performance

All actions are audited for algorithmic complexity to prevent O(n²)+ performance issues.

File Loading

Uses BiologicalFileLoader for robust parsing with automatic encoding detection and biological data optimization.

Strategy File Locations

Strategy YAML files are located in src/biomapper/configs/strategies/ and organized by:

Entity Type: prot_*, met_*, chem_*, multi_*
Source-Target: ukb_to_hpa, arv_to_kg2c
Approach: uniprot_v1_base, semantic_v1_enhanced

Example: prot_ukb_to_hpa_uniprot_v1_base.yaml

—

## Verification Sources Last verified: 2025-08-22

This documentation was verified against the following project resources:

/biomapper/src/actions/ (comprehensive action implementations - verified 25+ registered actions)
/biomapper/src/actions/registry.py (global ACTION_REGISTRY and @register_action decorator system)
/biomapper/src/actions/typed_base.py (TypedStrategyAction base class with Pydantic models)
/biomapper/CLAUDE.md (2025 standardization requirements and architectural patterns)
/biomapper/src/configs/strategies/ (YAML strategy file organization and examples)
/biomapper/pyproject.toml (project dependencies and package structure)