RampDB Integration

Overview

RampDB (Relational Database of Metabolic Pathways) is a comprehensive metabolite database that provides cross-references between multiple metabolomics databases including HMDB, KEGG, ChEBI, PubChem, and others. Biomapper integrates with RampDB through its REST API to enhance metabolite identifier resolution.

The RampDB integration is primarily used in Stage 3 of the progressive metabolomics pipeline via the METABOLITE_RAMPDB_BRIDGE action, which is implemented as RAMPDB_BRIDGE in the current codebase.

Key Features

  • Cross-Database Mapping: Maps identifiers across 15+ metabolite databases

  • Real-Time API Access: Query RampDB service in real-time

  • Batch Processing: Optimized batch queries for better performance

  • Comprehensive Coverage: Access to pathway and reaction information

  • Standardized Outputs: Consistent identifier formats across databases

Supported Databases

Primary Metabolite Databases

Database

Description

HMDB

Human Metabolome Database - comprehensive human metabolite data

KEGG

Kyoto Encyclopedia of Genes and Genomes - metabolic pathways

ChEBI

Chemical Entities of Biological Interest - chemical compounds

PubChem

PubChem Compound Database - chemical information

BioCyc

Metabolic pathway database collection

LIPID MAPS

Lipidomics database and tools

Secondary Databases

  • MetaCyc - Metabolic pathway database

  • Reactome - Pathway database

  • WikiPathways - Community pathway database

  • CAS Registry - Chemical abstracts service

  • InChI/InChIKey - International chemical identifier

Setup and Configuration

API Access Setup

  1. API Access Configuration

    The RampDB client uses the modern RaMP database API for metabolite matching:

    # RampDB integration is available without API key registration
    # The client uses async HTTP requests with rate limiting
    # No manual API key setup required
    
  2. Configure Rate Limiting (Optional)

    Set up environment variables for performance tuning:

    # Optional: Configure rate limiting (defaults are built-in)
    export RAMPDB_RATE_LIMIT="5"   # requests per second (default)
    export RAMPDB_TIMEOUT="30"     # seconds (default)
    export RAMPDB_BATCH_SIZE="50" # batch processing size
    
  3. Install Required Dependencies

    # Install all dependencies (includes aiohttp for RampDB client)
    poetry install --with dev,docs,api
    
    # No separate RampDB verification script needed
    # Integration testing handled through biomapper test suite
    

Environment Configuration

Create a configuration file for RampDB settings:

# config/rampdb_config.yaml
rampdb:
  api:
    base_url: "https://rampdb.nih.gov/api/v1/"
    timeout: 30
    max_retries: 3
    rate_limit: 60  # requests per minute

  databases:
    primary: ["hmdb", "kegg", "chebi", "pubchem"]
    secondary: ["biocyc", "lipidmaps", "metacyc"]

  batch_processing:
    default_batch_size: 50
    max_batch_size: 100
    batch_delay: 1.0  # seconds between batches

API Usage Patterns

Basic Query

from actions.entities.metabolites.external.ramp_client_modern import RaMPClientModern, create_ramp_client

# Initialize client with default configuration
client = create_ramp_client()

# Single metabolite query
result = await client.search_metabolite_by_name("glucose")
print(f"Found {len(result)} matches")

# Access cross-references
for match in result:
    print(f"Common Name: {match.common_name}")
    print(f"Source ID: {match.source_id}")
    print(f"Database: {match.id_type}")

Batch Query

# Batch query for multiple metabolites
metabolite_names = ["glucose", "fructose", "galactose"]

batch_results = await client.batch_metabolite_search(
    metabolite_names=metabolite_names,
    batch_size=25
)

# Process batch results
for metabolite, matches in batch_results.items():
    print(f"{metabolite}: {len(matches)} matches found")

Advanced Queries

# Query with specific database targets
# Search with specific options
result = await client.search_metabolite_by_name(
    name="citric acid",
    analyte_type="metabolite"
)

# Get pathway information (if available)
pathway_info = await client.get_pathways_from_analytes(
    analytes=["HMDB0000094"]
)

Integration with Biomapper Actions

YAML Strategy Configuration

steps:
  - name: rampdb_metabolite_mapping
    action:
      type: RAMPDB_BRIDGE
      params:
        unmapped_key: unmatched_metabolites
        output_key: rampdb_matches
        final_unmapped_key: rampdb_unmapped
        identifier_column: metabolite_name
        batch_size: 40
        timeout: 45
        max_retries: 3

Python Client Usage

from src.client.client_v2 import BiomapperClient

client = BiomapperClient(base_url="http://localhost:8000")

result = await client.run_action(
    action_type="RAMPDB_BRIDGE",
    params={
        "unmapped_key": "metabolites",
        "output_key": "rampdb_results",
        "final_unmapped_key": "still_unmapped",
        "identifier_column": "compound_name",
        "batch_size": 30
    },
    context={"datasets": {"metabolites": metabolite_df}}
)

Performance Optimization

Batch Size Optimization

Optimize batch sizes based on your use case:

Dataset Size

Recommended Batch Size

Expected Time

Memory Usage

< 100 metabolites

25

< 30 seconds

Low

100-1,000 metabolites

50

1-5 minutes

Medium

1,000-10,000 metabolites

75

10-30 minutes

High

> 10,000 metabolites

100

30+ minutes

Very High

Rate Limiting Configuration

# Conservative rate limiting (high reliability)
rate_limiting:
  requests_per_minute: 30
  batch_delay: 2.0
  exponential_backoff: true

# Aggressive rate limiting (faster processing)
rate_limiting:
  requests_per_minute: 100
  batch_delay: 0.5
  exponential_backoff: false

# Balanced rate limiting (recommended)
rate_limiting:
  requests_per_minute: 60
  batch_delay: 1.0
  exponential_backoff: true

Caching Strategy

# Enable caching for repeated queries
client_config = {
    "cache_enabled": True,
    "cache_ttl": 3600,  # 1 hour
    "cache_backend": "redis",  # or "memory"
    "cache_key_prefix": "rampdb_"
}

client = RampDBClient(config=client_config)

Error Handling and Monitoring

Common API Errors

Error Type

Handling Strategy

401 Unauthorized

Check API key configuration and registration status

429 Rate Limited

Implement exponential backoff and reduce request rate

500 Server Error

Retry with exponential backoff, consider service status

503 Service Unavailable

Wait and retry, check RampDB service status

Timeout

Increase timeout values or reduce batch sizes

Monitoring Implementation

import logging
from src.integrations.rampdb_monitor import RampDBMonitor

# Setup monitoring
monitor = RampDBMonitor()

# Track API metrics
@monitor.track_api_call
async def query_rampdb(query):
    result = await client.search_metabolite(query)

    # Log metrics
    monitor.log_success_rate(result.success)
    monitor.log_response_time(result.response_time)
    monitor.log_quota_usage(result.quota_used)

    return result

Error Recovery Patterns

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=4, max=10)
)
async def robust_rampdb_query(identifiers):
    try:
        return await client.batch_search(identifiers)
    except RampDBAPIError as e:
        if e.status_code == 429:  # Rate limited
            await asyncio.sleep(e.retry_after or 60)
            raise
        elif e.status_code >= 500:  # Server error
            raise
        else:  # Client error - don't retry
            return None

Data Quality and Validation

Result Validation

def validate_rampdb_results(results):
    """Validate RampDB API results for quality"""
    validation_report = {
        "total_queries": len(results),
        "successful_matches": 0,
        "failed_queries": 0,
        "quality_issues": []
    }

    for query, result in results.items():
        if result.success:
            validation_report["successful_matches"] += 1

            # Check for quality issues
            if not result.hmdb_id and not result.kegg_id:
                validation_report["quality_issues"].append(
                    f"No primary database IDs for {query}"
                )
        else:
            validation_report["failed_queries"] += 1

    return validation_report

Cross-Reference Consistency

def check_cross_reference_consistency(rampdb_result):
    """Verify cross-references are consistent"""
    issues = []

    # Check if HMDB and KEGG refer to same compound
    if rampdb_result.hmdb_id and rampdb_result.kegg_id:
        hmdb_name = get_compound_name_from_hmdb(rampdb_result.hmdb_id)
        kegg_name = get_compound_name_from_kegg(rampdb_result.kegg_id)

        if not compounds_are_equivalent(hmdb_name, kegg_name):
            issues.append("HMDB and KEGG cross-references inconsistent")

    return issues

Best Practices

  1. API Key Management

    • Store API keys securely in environment variables

    • Use different keys for development and production

    • Monitor API key usage and quotas

    • Rotate keys regularly for security

  2. Rate Limiting

    • Respect RampDB rate limits to maintain service availability

    • Implement exponential backoff for rate limit errors

    • Monitor quota usage to avoid service interruption

    • Use batch queries to maximize efficiency

  3. Error Handling

    • Implement comprehensive error handling for all API calls

    • Log errors with sufficient context for debugging

    • Use retry logic with exponential backoff

    • Have fallback strategies for service unavailability

  4. Data Quality

    • Validate API responses before using results

    • Check for cross-reference consistency

    • Monitor match rates and quality metrics

    • Flag low-confidence matches for manual review

  5. Performance

    • Use appropriate batch sizes for your dataset

    • Cache results to avoid repeated API calls

    • Process in parallel where possible

    • Monitor response times and optimize accordingly

Troubleshooting Guide

Connection Issues

# Test basic connectivity
curl -X GET "https://rampdb.nih.gov/api/v1/status" \
     -H "Authorization: Bearer YOUR_API_KEY"

# Check DNS resolution
nslookup rampdb.nih.gov

# Test from Python
python -c "
import requests
response = requests.get('https://rampdb.nih.gov/api/v1/status')
print(f'Status: {response.status_code}')
"

Authentication Issues

  1. Verify API key is correctly set in environment

  2. Check API key has not expired

  3. Confirm registration is active and approved

  4. Test with simple API call to verify credentials

Performance Issues

  1. Slow API Responses

    • Reduce batch sizes

    • Increase timeout values

    • Check network connectivity

    • Monitor RampDB service status

  2. Rate Limiting

    • Implement longer delays between requests

    • Use exponential backoff

    • Reduce concurrent requests

    • Monitor quota usage patterns

Data Quality Issues

  1. Low Match Rates

    • Verify input data quality and formatting

    • Check metabolite name normalization

    • Try different database targets

    • Review confidence thresholds

  2. Inconsistent Results

    • Check RampDB service version and updates

    • Validate cross-references manually

    • Compare with alternative data sources

    • Report data quality issues to RampDB team

See Also

## Verification Sources Last verified: August 22, 2025

This documentation was verified against the following project resources:

  • /biomapper/src/actions/entities/metabolites/matching/rampdb_bridge.py (RampDB bridge action implementation for Stage 3 progressive metabolite mapping)

  • /biomapper/src/actions/entities/metabolites/external/ramp_client_modern.py (Modern RaMP-DB API client with async support, rate limiting, and comprehensive error handling)

  • /biomapper/src/actions/registry.py (Action registration system showing RAMPDB_BRIDGE action registration)

  • /biomapper/pyproject.toml (Project dependencies including aiohttp for async HTTP requests)

  • /biomapper/src/core/standards/base_models.py (Standardized parameter models and validation)

  • /biomapper/src/client/client_v2.py (Main BiomapperClient for action execution)

  • /biomapper/CLAUDE.md (Development standards and parameter naming conventions)