Creating New Actions ==================== This guide walks through creating new actions for BioMapper using Test-Driven Development (TDD). Overview -------- BioMapper actions are self-registering components that process biological data. Each action: * Inherits from ``TypedStrategyAction`` * Uses Pydantic models for parameters * Self-registers via ``@register_action`` decorator * Modifies a shared execution context Step 1: Write Tests First (TDD) -------------------------------- Always start by writing tests: .. code-block:: python # tests/unit/core/strategy_actions/test_my_action.py import pytest from biomapper.core.strategy_actions.my_action import ( MyAction, MyActionParams ) @pytest.mark.asyncio async def test_my_action_basic(): """Test basic functionality.""" # Arrange params = MyActionParams( input_key="test_data", threshold=0.8, output_key="filtered" ) context = { "datasets": { "test_data": [ {"id": "1", "score": 0.9}, {"id": "2", "score": 0.7}, {"id": "3", "score": 0.85} ] } } # Act action = MyAction() result = await action.execute_typed(params, context) # Assert assert result.success assert "filtered" in context["datasets"] assert len(context["datasets"]["filtered"]) == 2 @pytest.mark.asyncio async def test_my_action_validation(): """Test parameter validation.""" with pytest.raises(ValidationError): MyActionParams( input_key="", # Empty key should fail threshold=1.5 # Out of range ) Step 2: Define Parameters ------------------------- Create Pydantic models for type-safe parameters: .. code-block:: python # biomapper/core/strategy_actions/my_action.py from pydantic import BaseModel, Field, field_validator from typing import Optional, List, Dict, Any class MyActionParams(BaseModel): """Parameters for MyAction.""" input_key: str = Field( ..., description="Key to input dataset in context" ) threshold: float = Field( 0.8, ge=0.0, le=1.0, description="Score threshold for filtering" ) output_key: str = Field( "filtered_output", description="Key for output dataset" ) include_metadata: bool = Field( True, description="Include metadata in output" ) @field_validator("input_key") @classmethod def validate_input_key(cls, v: str) -> str: if not v or not v.strip(): raise ValueError("Input key cannot be empty") return v.strip() Step 3: Implement the Action ----------------------------- .. code-block:: python from biomapper.actions.typed_base import TypedStrategyAction from biomapper.actions.registry import register_action from biomapper.core.models.action_results import ActionResult from biomapper.core.models.execution_context import StrategyExecutionContext from typing import Dict, Any, List, Type import logging logger = logging.getLogger(__name__) @register_action("MY_ACTION") class MyAction(TypedStrategyAction[MyActionParams, ActionResult]): """ Filter biological data based on score threshold. This action filters items from an input dataset based on a configurable score threshold and stores results in the context. Example: Input: [{"id": "A", "score": 0.9}, {"id": "B", "score": 0.6}] Threshold: 0.8 Output: [{"id": "A", "score": 0.9}] """ def get_params_model(self) -> Type[MyActionParams]: """Return the parameters model class.""" return MyActionParams def get_result_model(self) -> Type[ActionResult]: """Return the result model class.""" return ActionResult async def execute_typed( self, current_identifiers: List[str], current_ontology_type: str, params: MyActionParams, source_endpoint: Any, target_endpoint: Any, context: StrategyExecutionContext ) -> ActionResult: """Execute the filtering action.""" try: # Get input data if params.input_key not in context.get("datasets", {}): return ActionResult( success=False, message=f"Input key '{params.input_key}' not found" ) input_data = context["datasets"][params.input_key] logger.info(f"Processing {len(input_data)} items") # Apply filtering filtered = [ item for item in input_data if item.get("score", 0) >= params.threshold ] # Add metadata if requested if params.include_metadata: for item in filtered: item["_metadata"] = { "filtered_by": "score", "threshold": params.threshold } # Store results if "datasets" not in context: context["datasets"] = {} context["datasets"][params.output_key] = filtered # Update statistics if "statistics" not in context: context["statistics"] = {} context["statistics"][params.output_key] = { "total_input": len(input_data), "total_output": len(filtered), "filter_rate": len(filtered) / len(input_data) } logger.info(f"Filtered {len(input_data)} to {len(filtered)} items") return ActionResult( success=True, message=f"Filtered {len(filtered)} items with threshold {params.threshold}", data={ "input_count": len(input_data), "output_count": len(filtered), "removed_count": len(input_data) - len(filtered) } ) except Exception as e: logger.error(f"Error in MyAction: {str(e)}") return ActionResult( success=False, message=f"Action failed: {str(e)}" ) Step 4: Choose Action Location ------------------------------- Place your action in the appropriate directory: .. code-block:: text actions/ ├── entities/ # Entity-specific actions │ ├── proteins/ # Protein processing │ ├── metabolites/ # Metabolite processing │ └── chemistry/ # Clinical chemistry ├── utils/ # General utilities │ └── data_processing/ ├── io/ # Input/output actions └── algorithms/ # Reusable algorithms Step 5: Register the Action ---------------------------- The ``@register_action`` decorator automatically registers your action. No manual registration needed! Step 6: Use in YAML Strategy ----------------------------- .. code-block:: yaml name: filter_example description: Example using custom filter action parameters: input_file: "/data/scores.csv" score_threshold: 0.75 steps: - name: load_data action: type: LOAD_DATASET_IDENTIFIERS params: file_path: "${parameters.input_file}" output_key: "raw_data" - name: filter_high_scores action: type: MY_ACTION params: input_key: "raw_data" threshold: "${parameters.score_threshold}" output_key: "high_scores" include_metadata: true - name: export_results action: type: EXPORT_DATASET_V2 params: input_key: "high_scores" output_file: "/results/filtered.csv" Best Practices -------------- **1. Always Use TDD** - Write tests first - Test edge cases - Test error conditions **2. Parameter Validation** - Use Pydantic Field constraints - Add custom validators for complex logic - Provide clear descriptions **3. Error Handling** - Return ActionResult with success=False on errors - Log errors with context - Don't raise exceptions **4. Documentation** - Add docstrings with examples - Document parameters clearly - Include usage in docstring **5. Performance** - Process large datasets in chunks - Use efficient data structures - Consider memory usage **6. Testing Checklist** - ✅ Unit tests pass - ✅ Parameter validation tested - ✅ Error cases handled - ✅ Integration with context tested - ✅ Performance acceptable Common Patterns --------------- **Reading from Context:** .. code-block:: python # Safe context access datasets = context.get("datasets", {}) input_data = datasets.get(params.input_key, []) **Writing to Context:** .. code-block:: python # Ensure datasets exists if "datasets" not in context: context["datasets"] = {} context["datasets"][params.output_key] = result **Updating Statistics:** .. code-block:: python # Track metrics if "statistics" not in context: context["statistics"] = {} context["statistics"][self.__class__.__name__] = { "processed": len(data), "runtime": elapsed_time } **Chunked Processing:** .. code-block:: python from biomapper.core.utils import chunk_list CHUNK_SIZE = 10000 results = [] for chunk in chunk_list(input_data, CHUNK_SIZE): chunk_result = process_chunk(chunk) results.extend(chunk_result) Debugging Tips -------------- 1. **Enable Debug Logging:** .. code-block:: python import logging logging.basicConfig(level=logging.DEBUG) 2. **Test Locally First:** .. code-block:: bash poetry run pytest tests/unit/core/strategy_actions/test_my_action.py -xvs 3. **Use Print Debugging in Tests:** .. code-block:: python print(f"Context after action: {context}") assert result.success 4. **Check Action Registration:** .. code-block:: python from biomapper.actions.registry import ACTION_REGISTRY print(ACTION_REGISTRY.keys()) Need Help? ---------- * Check existing actions in ``biomapper/actions/`` * Review tests in ``tests/unit/actions/`` * See ``CLAUDE.md`` for AI assistance with development --- Verification Sources -------------------- *Last verified: 2025-08-17* This documentation was verified against the following project resources: - ``/biomapper/src/actions/typed_base.py`` (TypedStrategyAction base class with execute_typed signature requiring StrategyExecutionContext) - ``/biomapper/src/actions/registry.py`` (self-registering action system with @register_action decorator) - ``/biomapper/src/core/models/action_results.py`` (ActionResult model for return values) - ``/biomapper/src/core/models/execution_context.py`` (StrategyExecutionContext for typed context) - ``/biomapper/src/actions/`` (current action directory structure under src/) - ``/biomapper/CLAUDE.md`` (action organization and development patterns) - ``/biomapper/src/configs/strategies/`` (YAML strategy examples and usage)