BioMapper Documentation

BioMapper is a general-purpose plugin- and strategy-based orchestration framework, with its first application in biological data harmonization. Architecturally, it blends elements of workflow engines (Nextflow, Snakemake, Kedro, Dagster) with a lightweight service-oriented API and a plugin registry backed by a unified UniversalContext. Its standout differentiator is an AI-native developer experience: CLAUDE.md, .claude/ scaffolding, custom slash commands, and the BioSherpa guide. This potentially makes it the first open-source bioinformatics orchestration platform with built-in LLM-assisted contributor workflows.

The result is a platform that is modular, extensible, and uniquely AI-augmented, well-positioned for long-term ecosystem growth. Built on a self-registering action system and YAML-based workflow definitions, it features a modern src-layout architecture with comprehensive test coverage and 2025 standardizations for production reliability.

🎯 Key Features

  • Self-registering action system - Actions automatically register via decorators

  • Type-safe parameters - Pydantic models provide validation and IDE support

  • YAML workflow definition - Declarative strategies without coding

  • Real-time progress tracking - SSE events for long-running jobs

  • Extensible architecture - Easy to add new actions and entity types

  • AI-ready design - Built for integration with Claude Code and LLM assistance

πŸš€ Quick Start

# Install with Poetry
poetry install --with dev,docs,api
poetry shell

# Start the API server
cd biomapper-api && poetry run uvicorn app.main:app --reload --port 8000

# Or use the CLI (from root directory)
poetry run biomapper --help
poetry run biomapper health
# Python client usage
from src.client.client_v2 import BiomapperClient

client = BiomapperClient(base_url="http://localhost:8000")
result = client.run("test_metabolite_simple", parameters={
    "input_file": "/data/metabolites.csv",
    "output_dir": "/tmp/results"
})
print(f"Success: {result.success}")  # StrategyResult object

πŸ—οΈ Architecture

BioMapper follows a modern microservices architecture with clear separation of concerns:

Core Design:

  • YAML Strategies - Declarative configs defining pipelines of actions

  • Action Registry - Self-registering via decorators; plug-and-play extensibility

  • UniversalContext - Normalizes state access across heterogeneous action types

  • Pydantic Models (v2) - Typed parameter models per action category

  • Progressive Mapping - Iterative enrichment stages (65% β†’ 80% coverage)

Comparison to Known Patterns:

  • Similar to: Nextflow & Snakemake (declarative pipelines), Kedro (typed configs + reproducibility), Dagster (observability and orchestration)

  • Different from: Heavy orchestrators (Airflow, Beam) β€” BioMapper is lighter, service/API-first, domain-agnostic, and tailored for interactive workflows

  • Unique: Combines API service with strategy-based pipeline engine; domain-specific operations first (bio), but extensible beyond

Three-Layer Design:

  1. Client Layer - Python client library (src.client.client_v2) for programmatic access

  2. API Layer - FastAPI service with SQLite job persistence and SSE progress tracking

  3. Core Layer - Self-registering actions with strategy execution engine

The system uses a registry pattern where actions self-register via @register_action decorators, a strategy pattern for YAML-based workflow configuration, and a pipeline pattern for data flow through shared execution context. Actions are organized by biological entity (proteins, metabolites, chemistry) and automatically discovered at runtime.

Actions Reference

πŸ€– AI Integration

BioMapper features an AI-native developer experience that sets it apart from traditional orchestration frameworks:

Current AI Features:

  • CLAUDE.md - Project β€œconstitution” providing role-defining guidance for AI agents

  • .claude/ folder - Structured agent configs and scaffolding

  • BiOMapper Framework Triad - Three automatic isolation frameworks for safe development

  • Hook System - Automatic TDD enforcement and validation

  • Type-safe actions - Enable better code completion and error detection

  • Self-documenting - Pydantic models include descriptions

BiOMapper Framework Triad:

The system includes three complementary frameworks that automatically activate based on natural language:

  • πŸ”’ Surgical: Fix internal action logic while preserving all external interfaces. Automatically activates when you describe counting, calculation, or statistics issues.

  • πŸ”„ Circuitous: Repair pipeline orchestration and parameter flow. Automatically activates when you describe parameters not passing between steps or substitution failures.

  • πŸ”— Interstitial: Ensure 100% backward compatibility during interface evolution. Automatically activates when you describe compatibility issues or parameter changes breaking existing code.

Automatic Activation: You don’t need to know framework names - just describe the problem naturally and the appropriate framework activates automatically. See Framework Triggering Mechanics for details on how this works.

Development Discipline: Separate from the frameworks, a hook system enforces:

  • TDD Requirements - Tests must exist before implementation

  • Parameter Validation - All ${parameters.x} must resolve

  • Import Verification - All modules must load cleanly

  • Quality Gates - Blocks premature success declarations

Comparisons:

  • Copilot/Cody: Offer IDE assistance but don’t ship with per-project scaffolding

  • Claude-Orchestrator/Flow frameworks: Orchestrate multiple Claude agents, but not tied to strategy orchestration

  • BioMapper: First to embed LLM-native scaffolding inside an orchestration framework repo, making the AI β€œpart of the project contract”

πŸ“š Available Actions

BioMapper includes actions across multiple categories:

  • Data Operations: Load, merge, filter, export, transform

  • Protein Actions: UniProt extraction, accession normalization, multi-bridge resolution

  • Metabolite Actions: CTS bridge, Nightingale NMR matching, semantic matching, vector matching, API enrichment

  • Chemistry Actions: LOINC extraction, fuzzy test matching, vendor harmonization

  • Analysis & Reporting: Set overlap, mapping quality, comprehensive reports

  • Integration Actions: Google Drive sync with chunked transfer

βœ… 2025 Standardizations

Production-Ready Architecture Achieved:

  • Barebones Architecture: Client β†’ API β†’ MinimalStrategyService β†’ Self-Registering Actions

  • Comprehensive Test Suite: 1,217 passing tests with 79.69% coverage

  • Type Safety: Comprehensive Pydantic v2 migration

  • Standards Compliance: All 10 biomapper 2025 standardizations implemented

  • Biological Data Testing: Real-world protein, metabolite, and chemistry data patterns

Architectural Strengths:

  • Clean modularity (strategy vs action vs context)

  • Low barrier for extension (just register a new action)

  • Declarative configuration approachable to non-programmers

  • Pragmatic service orientation (FastAPI, Poetry, pytest, Pydantic)

Gaps & Opportunities:

  • No DAG/conditional execution in YAML

  • Limited provenance/lineage tracking

  • Potential performance bottlenecks at scale (10K–1M records)

  • Observability/logging not yet first-class

  • Single-agent AI model; opportunity for multi-agent orchestration

Indices and tables

β€”

β€”

Verification Sources

Last verified: 2025-08-22

This documentation was verified against the following project resources:

  • /biomapper/README.md (Project overview and architectural analysis with 1,217 passing tests)

  • /biomapper/CLAUDE.md (Commands, patterns, and 2025 standardizations)

  • /biomapper/src/actions/registry.py (Self-registering action system implementation)

  • /biomapper/src/client/client_v2.py (BiomapperClient class with correct import path and methods)

  • /biomapper/src/api/main.py (FastAPI server configuration and endpoint routing)

  • /biomapper/pyproject.toml (Project configuration, Python 3.11+ requirement, src-layout packages)