Creating New Actions

This guide walks through creating new actions for BioMapper using Test-Driven Development (TDD).

Overview

BioMapper actions are self-registering components that process biological data. Each action:

Inherits from TypedStrategyAction
Uses Pydantic models for parameters
Self-registers via @register_action decorator
Modifies a shared execution context

Step 1: Write Tests First (TDD)

Always start by writing tests:

# tests/unit/core/strategy_actions/test_my_action.py
import pytest
from biomapper.core.strategy_actions.my_action import (
    MyAction,
    MyActionParams
)

@pytest.mark.asyncio
async def test_my_action_basic():
    """Test basic functionality."""
    # Arrange
    params = MyActionParams(
        input_key="test_data",
        threshold=0.8,
        output_key="filtered"
    )

    context = {
        "datasets": {
            "test_data": [
                {"id": "1", "score": 0.9},
                {"id": "2", "score": 0.7},
                {"id": "3", "score": 0.85}
            ]
        }
    }

    # Act
    action = MyAction()
    result = await action.execute_typed(params, context)

    # Assert
    assert result.success
    assert "filtered" in context["datasets"]
    assert len(context["datasets"]["filtered"]) == 2

@pytest.mark.asyncio
async def test_my_action_validation():
    """Test parameter validation."""
    with pytest.raises(ValidationError):
        MyActionParams(
            input_key="",  # Empty key should fail
            threshold=1.5  # Out of range
        )

Step 2: Define Parameters

Create Pydantic models for type-safe parameters:

# biomapper/core/strategy_actions/my_action.py
from pydantic import BaseModel, Field, field_validator
from typing import Optional, List, Dict, Any

class MyActionParams(BaseModel):
    """Parameters for MyAction."""

    input_key: str = Field(
        ...,
        description="Key to input dataset in context"
    )

    threshold: float = Field(
        0.8,
        ge=0.0,
        le=1.0,
        description="Score threshold for filtering"
    )

    output_key: str = Field(
        "filtered_output",
        description="Key for output dataset"
    )

    include_metadata: bool = Field(
        True,
        description="Include metadata in output"
    )

    @field_validator("input_key")
    @classmethod
    def validate_input_key(cls, v: str) -> str:
        if not v or not v.strip():
            raise ValueError("Input key cannot be empty")
        return v.strip()

Step 3: Implement the Action

from biomapper.actions.typed_base import TypedStrategyAction
from biomapper.actions.registry import register_action
from biomapper.core.models.action_results import ActionResult
from biomapper.core.models.execution_context import StrategyExecutionContext
from typing import Dict, Any, List, Type
import logging

logger = logging.getLogger(__name__)

@register_action("MY_ACTION")
class MyAction(TypedStrategyAction[MyActionParams, ActionResult]):
    """
    Filter biological data based on score threshold.

    This action filters items from an input dataset based on a
    configurable score threshold and stores results in the context.

    Example:
        Input: [{"id": "A", "score": 0.9}, {"id": "B", "score": 0.6}]
        Threshold: 0.8
        Output: [{"id": "A", "score": 0.9}]
    """

    def get_params_model(self) -> Type[MyActionParams]:
        """Return the parameters model class."""
        return MyActionParams

    def get_result_model(self) -> Type[ActionResult]:
        """Return the result model class."""
        return ActionResult

    async def execute_typed(
        self,
        current_identifiers: List[str],
        current_ontology_type: str,
        params: MyActionParams,
        source_endpoint: Any,
        target_endpoint: Any,
        context: StrategyExecutionContext
    ) -> ActionResult:
        """Execute the filtering action."""
        try:
            # Get input data
            if params.input_key not in context.get("datasets", {}):
                return ActionResult(
                    success=False,
                    message=f"Input key '{params.input_key}' not found"
                )

            input_data = context["datasets"][params.input_key]
            logger.info(f"Processing {len(input_data)} items")

            # Apply filtering
            filtered = [
                item for item in input_data
                if item.get("score", 0) >= params.threshold
            ]

            # Add metadata if requested
            if params.include_metadata:
                for item in filtered:
                    item["_metadata"] = {
                        "filtered_by": "score",
                        "threshold": params.threshold
                    }

            # Store results
            if "datasets" not in context:
                context["datasets"] = {}
            context["datasets"][params.output_key] = filtered

            # Update statistics
            if "statistics" not in context:
                context["statistics"] = {}
            context["statistics"][params.output_key] = {
                "total_input": len(input_data),
                "total_output": len(filtered),
                "filter_rate": len(filtered) / len(input_data)
            }

            logger.info(f"Filtered {len(input_data)} to {len(filtered)} items")

            return ActionResult(
                success=True,
                message=f"Filtered {len(filtered)} items with threshold {params.threshold}",
                data={
                    "input_count": len(input_data),
                    "output_count": len(filtered),
                    "removed_count": len(input_data) - len(filtered)
                }
            )

        except Exception as e:
            logger.error(f"Error in MyAction: {str(e)}")
            return ActionResult(
                success=False,
                message=f"Action failed: {str(e)}"
            )

Step 4: Choose Action Location

Place your action in the appropriate directory:

actions/
├── entities/           # Entity-specific actions
│   ├── proteins/      # Protein processing
│   ├── metabolites/   # Metabolite processing
│   └── chemistry/     # Clinical chemistry
├── utils/             # General utilities
│   └── data_processing/
├── io/                # Input/output actions
└── algorithms/        # Reusable algorithms

Step 5: Register the Action

The @register_action decorator automatically registers your action. No manual registration needed!

Step 6: Use in YAML Strategy

name: filter_example
description: Example using custom filter action

parameters:
  input_file: "/data/scores.csv"
  score_threshold: 0.75

steps:
  - name: load_data
    action:
      type: LOAD_DATASET_IDENTIFIERS
      params:
        file_path: "${parameters.input_file}"
        output_key: "raw_data"

  - name: filter_high_scores
    action:
      type: MY_ACTION
      params:
        input_key: "raw_data"
        threshold: "${parameters.score_threshold}"
        output_key: "high_scores"
        include_metadata: true

  - name: export_results
    action:
      type: EXPORT_DATASET_V2
      params:
        input_key: "high_scores"
        output_file: "/results/filtered.csv"

Best Practices

1. Always Use TDD

Write tests first
Test edge cases
Test error conditions

2. Parameter Validation

Use Pydantic Field constraints
Add custom validators for complex logic
Provide clear descriptions

3. Error Handling

Return ActionResult with success=False on errors
Log errors with context
Don’t raise exceptions

4. Documentation

Add docstrings with examples
Document parameters clearly
Include usage in docstring

5. Performance

Process large datasets in chunks
Use efficient data structures
Consider memory usage

6. Testing Checklist

✅ Unit tests pass
✅ Parameter validation tested
✅ Error cases handled
✅ Integration with context tested
✅ Performance acceptable

Common Patterns

Reading from Context:

# Safe context access
datasets = context.get("datasets", {})
input_data = datasets.get(params.input_key, [])

Writing to Context:

# Ensure datasets exists
if "datasets" not in context:
    context["datasets"] = {}
context["datasets"][params.output_key] = result

Updating Statistics:

# Track metrics
if "statistics" not in context:
    context["statistics"] = {}
context["statistics"][self.__class__.__name__] = {
    "processed": len(data),
    "runtime": elapsed_time
}

Chunked Processing:

from biomapper.core.utils import chunk_list

CHUNK_SIZE = 10000
results = []

for chunk in chunk_list(input_data, CHUNK_SIZE):
    chunk_result = process_chunk(chunk)
    results.extend(chunk_result)

Debugging Tips

Enable Debug Logging:

import logging
logging.basicConfig(level=logging.DEBUG)

Test Locally First:

poetry run pytest tests/unit/core/strategy_actions/test_my_action.py -xvs

Use Print Debugging in Tests:

print(f"Context after action: {context}")
assert result.success

Check Action Registration:

from biomapper.actions.registry import ACTION_REGISTRY
print(ACTION_REGISTRY.keys())

Need Help?

Check existing actions in biomapper/actions/
Review tests in tests/unit/actions/
See CLAUDE.md for AI assistance with development

—

Verification Sources

Last verified: 2025-08-17

This documentation was verified against the following project resources:

/biomapper/src/actions/typed_base.py (TypedStrategyAction base class with execute_typed signature requiring StrategyExecutionContext)
/biomapper/src/actions/registry.py (self-registering action system with @register_action decorator)
/biomapper/src/core/models/action_results.py (ActionResult model for return values)
/biomapper/src/core/models/execution_context.py (StrategyExecutionContext for typed context)
/biomapper/src/actions/ (current action directory structure under src/)
/biomapper/CLAUDE.md (action organization and development patterns)
/biomapper/src/configs/strategies/ (YAML strategy examples and usage)