Action System Architecture
The action system provides the core functionality for biological data processing in BioMapper through a self-registering, type-safe architecture.
Core Data Operations
Fundamental actions for data loading and analysis:
- LOAD_DATASET_IDENTIFIERS
Generic data loader supporting CSV/TSV files with intelligent identifier handling, automatic format detection, prefix stripping, and regex-based filtering.
- MERGE_DATASETS
Combine multiple datasets with intelligent deduplication and conflict resolution strategies.
- FILTER_DATASET
Apply complex filtering criteria using Python expressions for data subsetting.
- EXPORT_DATASET
Export results to CSV, TSV, or JSON formats with comprehensive metadata preservation.
- CUSTOM_TRANSFORM_EXPRESSION
Apply Python expressions to transform data columns dynamically without code changes.
Action Registry System
Actions self-register at import time using the @register_action decorator:
from actions.registry import register_action
from actions.typed_base import TypedStrategyAction
@register_action("ACTION_NAME")
class MyAction(TypedStrategyAction[ParamsModel, ActionResult]):
pass
The registry (ACTION_REGISTRY) is a global dictionary that enables dynamic action lookup based on YAML strategy configurations. No manual registration is required.
Type Safety
- Pydantic Models
All action parameters and results use Pydantic models for validation.
- TypedStrategyAction Base
New base class provides type-safe parameter handling.
- Backward Compatibility
Legacy dict-based interface maintained during migration.
Execution Context
- Shared Dictionary
Actions communicate through a shared context object.
- Data Storage
Results stored with descriptive keys like “ukbb_proteins”.
- Metadata Tracking
Automatic collection of execution statistics and timing.
- Error Handling
Comprehensive error reporting with context preservation.
Action Development Pattern
Follow Test-Driven Development (TDD) when creating new actions:
from pydantic import BaseModel, Field
from actions.typed_base import TypedStrategyAction, StandardActionResult
from actions.registry import register_action
from typing import Dict, Any, List
class MyActionParams(BaseModel):
"""Parameters for custom action with validation."""
input_key: str = Field(..., description="Input dataset key")
threshold: float = Field(0.8, ge=0.0, le=1.0, description="Processing threshold")
output_key: str = Field(..., description="Output dataset key")
@register_action("MY_ACTION")
class MyAction(TypedStrategyAction[MyActionParams, StandardActionResult]):
"""Process biological data with threshold filtering."""
def get_params_model(self) -> type[MyActionParams]:
return MyActionParams
async def execute_typed(
self,
params: MyActionParams,
context: Dict[str, Any]
) -> StandardActionResult:
# Access input data from context datasets
input_data = context.get("datasets", {}).get(params.input_key, pd.DataFrame())
# Process data using pandas operations
if not input_data.empty:
processed = input_data[input_data["score"] >= params.threshold]
else:
processed = pd.DataFrame()
# Store results in context
if "datasets" not in context:
context["datasets"] = {}
context["datasets"][params.output_key] = processed
return StandardActionResult(
success=True,
message=f"Processed {len(processed)} items from {len(input_data)} total",
data={"filtered_count": len(input_data) - len(processed)}
)
Entity-Specific Actions
Actions are organized by biological entity type:
- Protein Actions (
entities/proteins/) PROTEIN_EXTRACT_UNIPROT_FROM_XREFS- Extract UniProt IDs from compound fieldsPROTEIN_NORMALIZE_ACCESSIONS- Standardize protein identifier formatsPROTEIN_MULTI_BRIDGE- Multi-source protein resolutionMERGE_WITH_UNIPROT_RESOLUTION- Historical UniProt ID mapping
- Metabolite Actions (
entities/metabolites/) METABOLITE_CTS_BRIDGE- Chemical Translation Service integrationMETABOLITE_EXTRACT_IDENTIFIERS- Extract metabolite IDs from textMETABOLITE_NORMALIZE_HMDB- Standardize HMDB formatsMETABOLITE_MULTI_BRIDGE- Multi-database metabolite resolutionNIGHTINGALE_NMR_MATCH- Nightingale NMR platform matchingSEMANTIC_METABOLITE_MATCH- AI-powered semantic matchingVECTOR_ENHANCED_MATCH- Vector embedding similarityMETABOLITE_API_ENRICHMENT- External API enrichmentCOMBINE_METABOLITE_MATCHES- Merge multiple matching strategies
- Chemistry Actions (
entities/chemistry/) CHEMISTRY_EXTRACT_LOINC- Extract LOINC codes from clinical dataCHEMISTRY_FUZZY_TEST_MATCH- Fuzzy matching for clinical testsCHEMISTRY_VENDOR_HARMONIZATION- Harmonize vendor-specific codesCHEMISTRY_TO_PHENOTYPE_BRIDGE- Link chemistry to phenotypes
- Report Actions (
reports/) GENERATE_MAPPING_VISUALIZATIONS- Create visualization reports for mapping resultsGENERATE_LLM_ANALYSIS- Generate AI-powered analysis reports using LLM providers
Benefits
Modularity: Each action is self-contained and independently testable
Reusability: Actions work in any strategy combination
Type Safety: Compile-time validation with Pydantic models
Extensibility: Simple to add new action types without modifying core
Discoverability: Entity-based organization improves navigation
Error Handling: Comprehensive validation and error reporting
- Infrastructure Actions (
io/andutils/) SYNC_TO_GOOGLE_DRIVE_V2- Upload results to Google Drive with chunked transferPARSE_COMPOSITE_IDENTIFIERS- Parse complex identifier formats from compound fieldsCUSTOM_TRANSFORM- Apply custom Python expressions to transform data columns
—
## Verification Sources Last verified: 2025-01-18
This documentation was verified against the following project resources:
/biomapper/src/actions/registry.py (Global ACTION_REGISTRY dictionary with @register_action decorator)
/biomapper/src/actions/typed_base.py (TypedStrategyAction base class with execute_typed method)
/biomapper/src/actions/load_dataset_identifiers.py (LOAD_DATASET_IDENTIFIERS action implementation)
/biomapper/src/actions/merge_datasets.py (MERGE_DATASETS action with deduplication logic)
/biomapper/src/actions/semantic_metabolite_match.py (SEMANTIC_METABOLITE_MATCH AI-powered matching)
/biomapper/src/actions/reports/generate_mapping_visualizations.py (GENERATE_MAPPING_VISUALIZATIONS action)
/biomapper/src/actions/reports/generate_llm_analysis.py (GENERATE_LLM_ANALYSIS action)
/biomapper/src/actions/utils/data_processing/filter_dataset.py (FILTER_DATASET action implementation)
/biomapper/src/actions/utils/data_processing/custom_transform_expression.py (CUSTOM_TRANSFORM and CUSTOM_TRANSFORM_EXPRESSION actions)
/biomapper/src/actions/io/sync_to_google_drive_v2.py (SYNC_TO_GOOGLE_DRIVE_V2 implementation)
/biomapper/CLAUDE.md (2025 standardizations and TDD development patterns)