RampDB Integration
Overview
RampDB (Relational Database of Metabolic Pathways) is a comprehensive metabolite database that provides cross-references between multiple metabolomics databases including HMDB, KEGG, ChEBI, PubChem, and others. Biomapper integrates with RampDB through its REST API to enhance metabolite identifier resolution.
The RampDB integration is primarily used in Stage 3 of the progressive metabolomics pipeline via the METABOLITE_RAMPDB_BRIDGE action, which is implemented as RAMPDB_BRIDGE in the current codebase.
Key Features
Cross-Database Mapping: Maps identifiers across 15+ metabolite databases
Real-Time API Access: Query RampDB service in real-time
Batch Processing: Optimized batch queries for better performance
Comprehensive Coverage: Access to pathway and reaction information
Standardized Outputs: Consistent identifier formats across databases
Supported Databases
Primary Metabolite Databases
Database |
Description |
|---|---|
HMDB |
Human Metabolome Database - comprehensive human metabolite data |
KEGG |
Kyoto Encyclopedia of Genes and Genomes - metabolic pathways |
ChEBI |
Chemical Entities of Biological Interest - chemical compounds |
PubChem |
PubChem Compound Database - chemical information |
BioCyc |
Metabolic pathway database collection |
LIPID MAPS |
Lipidomics database and tools |
Secondary Databases
MetaCyc - Metabolic pathway database
Reactome - Pathway database
WikiPathways - Community pathway database
CAS Registry - Chemical abstracts service
InChI/InChIKey - International chemical identifier
Setup and Configuration
API Access Setup
API Access Configuration
The RampDB client uses the modern RaMP database API for metabolite matching:
# RampDB integration is available without API key registration # The client uses async HTTP requests with rate limiting # No manual API key setup required
Configure Rate Limiting (Optional)
Set up environment variables for performance tuning:
# Optional: Configure rate limiting (defaults are built-in) export RAMPDB_RATE_LIMIT="5" # requests per second (default) export RAMPDB_TIMEOUT="30" # seconds (default) export RAMPDB_BATCH_SIZE="50" # batch processing size
Install Required Dependencies
# Install all dependencies (includes aiohttp for RampDB client) poetry install --with dev,docs,api # No separate RampDB verification script needed # Integration testing handled through biomapper test suite
Environment Configuration
Create a configuration file for RampDB settings:
# config/rampdb_config.yaml
rampdb:
api:
base_url: "https://rampdb.nih.gov/api/v1/"
timeout: 30
max_retries: 3
rate_limit: 60 # requests per minute
databases:
primary: ["hmdb", "kegg", "chebi", "pubchem"]
secondary: ["biocyc", "lipidmaps", "metacyc"]
batch_processing:
default_batch_size: 50
max_batch_size: 100
batch_delay: 1.0 # seconds between batches
API Usage Patterns
Basic Query
from actions.entities.metabolites.external.ramp_client_modern import RaMPClientModern, create_ramp_client
# Initialize client with default configuration
client = create_ramp_client()
# Single metabolite query
result = await client.search_metabolite_by_name("glucose")
print(f"Found {len(result)} matches")
# Access cross-references
for match in result:
print(f"Common Name: {match.common_name}")
print(f"Source ID: {match.source_id}")
print(f"Database: {match.id_type}")
Batch Query
# Batch query for multiple metabolites
metabolite_names = ["glucose", "fructose", "galactose"]
batch_results = await client.batch_metabolite_search(
metabolite_names=metabolite_names,
batch_size=25
)
# Process batch results
for metabolite, matches in batch_results.items():
print(f"{metabolite}: {len(matches)} matches found")
Advanced Queries
# Query with specific database targets
# Search with specific options
result = await client.search_metabolite_by_name(
name="citric acid",
analyte_type="metabolite"
)
# Get pathway information (if available)
pathway_info = await client.get_pathways_from_analytes(
analytes=["HMDB0000094"]
)
Integration with Biomapper Actions
YAML Strategy Configuration
steps:
- name: rampdb_metabolite_mapping
action:
type: RAMPDB_BRIDGE
params:
unmapped_key: unmatched_metabolites
output_key: rampdb_matches
final_unmapped_key: rampdb_unmapped
identifier_column: metabolite_name
batch_size: 40
timeout: 45
max_retries: 3
Python Client Usage
from src.client.client_v2 import BiomapperClient
client = BiomapperClient(base_url="http://localhost:8000")
result = await client.run_action(
action_type="RAMPDB_BRIDGE",
params={
"unmapped_key": "metabolites",
"output_key": "rampdb_results",
"final_unmapped_key": "still_unmapped",
"identifier_column": "compound_name",
"batch_size": 30
},
context={"datasets": {"metabolites": metabolite_df}}
)
Performance Optimization
Batch Size Optimization
Optimize batch sizes based on your use case:
Dataset Size |
Recommended Batch Size |
Expected Time |
Memory Usage |
|---|---|---|---|
< 100 metabolites |
25 |
< 30 seconds |
Low |
100-1,000 metabolites |
50 |
1-5 minutes |
Medium |
1,000-10,000 metabolites |
75 |
10-30 minutes |
High |
> 10,000 metabolites |
100 |
30+ minutes |
Very High |
Rate Limiting Configuration
# Conservative rate limiting (high reliability)
rate_limiting:
requests_per_minute: 30
batch_delay: 2.0
exponential_backoff: true
# Aggressive rate limiting (faster processing)
rate_limiting:
requests_per_minute: 100
batch_delay: 0.5
exponential_backoff: false
# Balanced rate limiting (recommended)
rate_limiting:
requests_per_minute: 60
batch_delay: 1.0
exponential_backoff: true
Caching Strategy
# Enable caching for repeated queries
client_config = {
"cache_enabled": True,
"cache_ttl": 3600, # 1 hour
"cache_backend": "redis", # or "memory"
"cache_key_prefix": "rampdb_"
}
client = RampDBClient(config=client_config)
Error Handling and Monitoring
Common API Errors
Error Type |
Handling Strategy |
|---|---|
|
Check API key configuration and registration status |
|
Implement exponential backoff and reduce request rate |
|
Retry with exponential backoff, consider service status |
|
Wait and retry, check RampDB service status |
|
Increase timeout values or reduce batch sizes |
Monitoring Implementation
import logging
from src.integrations.rampdb_monitor import RampDBMonitor
# Setup monitoring
monitor = RampDBMonitor()
# Track API metrics
@monitor.track_api_call
async def query_rampdb(query):
result = await client.search_metabolite(query)
# Log metrics
monitor.log_success_rate(result.success)
monitor.log_response_time(result.response_time)
monitor.log_quota_usage(result.quota_used)
return result
Error Recovery Patterns
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=4, max=10)
)
async def robust_rampdb_query(identifiers):
try:
return await client.batch_search(identifiers)
except RampDBAPIError as e:
if e.status_code == 429: # Rate limited
await asyncio.sleep(e.retry_after or 60)
raise
elif e.status_code >= 500: # Server error
raise
else: # Client error - don't retry
return None
Data Quality and Validation
Result Validation
def validate_rampdb_results(results):
"""Validate RampDB API results for quality"""
validation_report = {
"total_queries": len(results),
"successful_matches": 0,
"failed_queries": 0,
"quality_issues": []
}
for query, result in results.items():
if result.success:
validation_report["successful_matches"] += 1
# Check for quality issues
if not result.hmdb_id and not result.kegg_id:
validation_report["quality_issues"].append(
f"No primary database IDs for {query}"
)
else:
validation_report["failed_queries"] += 1
return validation_report
Cross-Reference Consistency
def check_cross_reference_consistency(rampdb_result):
"""Verify cross-references are consistent"""
issues = []
# Check if HMDB and KEGG refer to same compound
if rampdb_result.hmdb_id and rampdb_result.kegg_id:
hmdb_name = get_compound_name_from_hmdb(rampdb_result.hmdb_id)
kegg_name = get_compound_name_from_kegg(rampdb_result.kegg_id)
if not compounds_are_equivalent(hmdb_name, kegg_name):
issues.append("HMDB and KEGG cross-references inconsistent")
return issues
Best Practices
API Key Management
Store API keys securely in environment variables
Use different keys for development and production
Monitor API key usage and quotas
Rotate keys regularly for security
Rate Limiting
Respect RampDB rate limits to maintain service availability
Implement exponential backoff for rate limit errors
Monitor quota usage to avoid service interruption
Use batch queries to maximize efficiency
Error Handling
Implement comprehensive error handling for all API calls
Log errors with sufficient context for debugging
Use retry logic with exponential backoff
Have fallback strategies for service unavailability
Data Quality
Validate API responses before using results
Check for cross-reference consistency
Monitor match rates and quality metrics
Flag low-confidence matches for manual review
Performance
Use appropriate batch sizes for your dataset
Cache results to avoid repeated API calls
Process in parallel where possible
Monitor response times and optimize accordingly
Troubleshooting Guide
Connection Issues
# Test basic connectivity
curl -X GET "https://rampdb.nih.gov/api/v1/status" \
-H "Authorization: Bearer YOUR_API_KEY"
# Check DNS resolution
nslookup rampdb.nih.gov
# Test from Python
python -c "
import requests
response = requests.get('https://rampdb.nih.gov/api/v1/status')
print(f'Status: {response.status_code}')
"
Authentication Issues
Verify API key is correctly set in environment
Check API key has not expired
Confirm registration is active and approved
Test with simple API call to verify credentials
Performance Issues
Slow API Responses
Reduce batch sizes
Increase timeout values
Check network connectivity
Monitor RampDB service status
Rate Limiting
Implement longer delays between requests
Use exponential backoff
Reduce concurrent requests
Monitor quota usage patterns
Data Quality Issues
Low Match Rates
Verify input data quality and formatting
Check metabolite name normalization
Try different database targets
Review confidence thresholds
Inconsistent Results
Check RampDB service version and updates
Validate cross-references manually
Compare with alternative data sources
Report data quality issues to RampDB team
See Also
Metabolite RampDB Bridge - RampDB action documentation
Metabolomics Progressive Pipeline - Pipeline integration examples
../examples/api_error_handling - Error handling patterns
../performance/api_optimization - API performance optimization
—
## Verification Sources Last verified: August 22, 2025
This documentation was verified against the following project resources:
/biomapper/src/actions/entities/metabolites/matching/rampdb_bridge.py (RampDB bridge action implementation for Stage 3 progressive metabolite mapping)
/biomapper/src/actions/entities/metabolites/external/ramp_client_modern.py (Modern RaMP-DB API client with async support, rate limiting, and comprehensive error handling)
/biomapper/src/actions/registry.py (Action registration system showing RAMPDB_BRIDGE action registration)
/biomapper/pyproject.toml (Project dependencies including aiohttp for async HTTP requests)
/biomapper/src/core/standards/base_models.py (Standardized parameter models and validation)
/biomapper/src/client/client_v2.py (Main BiomapperClient for action execution)
/biomapper/CLAUDE.md (Development standards and parameter naming conventions)