API Reference
=============

BioMapper provides a comprehensive REST API for biological data harmonization workflow execution. The API uses standard JSON for request/response bodies, with strategies defined in YAML format for human readability and maintainability.

.. toctree::
   :maxdepth: 2
   :caption: API Documentation
   
   rest_endpoints
   strategy_execution
   client_reference

Quick Start
-----------

**Start the API Server:**

.. code-block:: bash

   cd /home/ubuntu/biomapper
   poetry run uvicorn src.biomapper.api.main:app --reload --port 8000

**Access API Documentation:**

* Interactive docs: http://localhost:8000/api/docs
* OpenAPI schema: http://localhost:8000/api/openapi.json
* Root endpoint: http://localhost:8000/

Core Endpoints
--------------

Health Check
~~~~~~~~~~~~

.. code-block:: bash

   GET /api/health
   
   # Response
   {
     "status": "healthy",
     "version": "0.5.2",
     "services": {
       "database": "connected",
       "mapper_service": "initialized",
       "resource_manager": "running"
     }
   }

Execute Strategy
~~~~~~~~~~~~~~~~

**How it works:**
- The REST API uses JSON for HTTP request/response bodies (standard for REST APIs)
- Strategies are defined in YAML format (stored as files or embedded in JSON)
- The API can either reference pre-defined YAML files or accept YAML content

.. code-block:: bash

   POST /api/strategies/v2/execute
   Content-Type: application/json
   
   # Option 1: Execute pre-defined YAML strategy by name
   {
     "strategy": "protein_harmonization",  # References a .yaml file
     "parameters": {
       "input_file": "/data/proteins.csv",
       "output_dir": "/results"
     }
   }
   
   # Option 2: Submit strategy content directly (as a dict in JSON)
   {
     "strategy": {
       "name": "custom_workflow",
       "steps": [
         {
           "action": {
             "type": "LOAD_DATASET_IDENTIFIERS",
             "params": {"file_path": "/data/input.csv"}
           }
         }
       ]
     },
     "parameters": {}
   }
   
   # Response
   {
     "job_id": "job_123",
     "status": "running",
     "created_at": "2024-08-13T10:00:00Z"
   }

Get Job Status
~~~~~~~~~~~~~~

.. code-block:: bash

   GET /api/jobs/{job_id}/status
   
   # Response
   {
     "job_id": "550e8400-e29b-41d4-a716-446655440000",
     "status": "completed",
     "progress": 100,
     "current_step": "export_results",
     "total_steps": 5,
     "started_at": "2024-08-13T10:00:00Z",
     "completed_at": "2024-08-13T10:01:00Z"
   }

Python Client Usage
-------------------

Synchronous
~~~~~~~~~~~

.. code-block:: python

   from biomapper.client import BiomapperClient
   
   client = BiomapperClient(base_url="http://localhost:8000")
   
   # Execute strategy
   result = client.run("protein_harmonization", parameters={
       "input_file": "/data/proteins.csv",
       "output_dir": "/results"
   })
   
   print(f"Success: {result.success}")
   print(f"Records processed: {result.results['statistics']['total_records']}")

Asynchronous
~~~~~~~~~~~~

.. code-block:: python

   import asyncio
   from biomapper.client import BiomapperClient
   
   async def run_strategy():
       async with BiomapperClient() as client:
           # Execute with progress tracking
           async for event in client.execute_with_progress(
               "protein_harmonization",
               parameters={"input_file": "/data/proteins.csv"}
           ):
               print(f"Progress: {event['progress']}%")
           
           return event['result']
   
   result = asyncio.run(run_strategy())

Authentication
--------------

Currently, BioMapper API does not require authentication for local deployments. The API supports optional API key authentication through the ``BIOMAPPER_API_KEY`` environment variable. For production deployments, consider implementing:

* API key authentication (partially supported)
* OAuth2 with JWT tokens (future)
* Basic authentication with HTTPS (future)

Rate Limiting
-------------

Default rate limits:

* 100 requests per minute per IP
* 10 concurrent strategy executions
* 1GB maximum file upload size

Error Handling
--------------

The API returns standard HTTP status codes:

.. list-table::
   :header-rows: 1
   :widths: 20 80

   * - Status Code
     - Description
   * - 200
     - Success
   * - 201
     - Created (job submitted)
   * - 400
     - Bad request (invalid parameters)
   * - 404
     - Resource not found
   * - 422
     - Validation error
   * - 500
     - Internal server error

Error Response Format:

.. code-block:: json

   {
     "detail": "Validation error in strategy parameters",
     "errors": [
       {
         "field": "input_file",
         "message": "File not found: /data/missing.csv"
       }
     ]
   }

Real-time Updates Support
-------------------------

Progress updates via Server-Sent Events (SSE) and WebSocket connections:

.. code-block:: python

   import requests
   import json
   
   # SSE endpoint for real-time progress
   response = requests.get(
       f"http://localhost:8000/api/jobs/{job_id}/events",
       stream=True
   )
   
   for line in response.iter_lines():
       if line:
           event = json.loads(line.decode('utf-8'))
           print(f"Progress: {event['progress']}%")
           print(f"Current step: {event.get('current_step', 'N/A')}")

---

Verification Sources
~~~~~~~~~~~~~~~~~~~~
*Last verified: 2025-08-17*

This documentation was verified against the following project resources:

- ``/biomapper/src/biomapper/api/main.py`` (API initialization, routers, and startup events)
- ``/biomapper/src/biomapper/api/api/routes/strategies_v2_simple.py`` (V2 strategy execution endpoint implementation)
- ``/biomapper/src/biomapper/api/api/routes/jobs.py`` (Job management and persistence endpoints)
- ``/biomapper/src/biomapper/api/api/routes/health.py`` (Health check endpoint)
- ``/biomapper/pyproject.toml`` (API dependencies and version)
- ``/biomapper/src/biomapper/client/client_v2.py`` (BiomapperClient implementation)
- ``/biomapper/CLAUDE.md`` (Project conventions, commands, and architecture)