NIGHTINGALE_NMR_MATCH
Match Nightingale NMR biomarkers to standard identifiers (HMDB/LOINC) with specialized platform knowledge.
Purpose
This action provides specialized matching for UK Biobank NMR metabolomics data from the Nightingale Health platform. It offers:
Exact matching for known Nightingale biomarkers
Fuzzy matching for naming variations
Lipoprotein particle pattern recognition
Abbreviation expansion and standardization
Category classification (lipids, amino acids, etc.)
Unit standardization
Integration with external reference files
Parameters
Required Parameters
- input_key (string)
Dataset key from context containing Nightingale biomarker data.
- output_key (string)
Key where matched results will be stored in context.
Optional Parameters
- biomarker_column (string)
Column containing Nightingale biomarker names. Default: “biomarker”
- unit_column (string)
Column containing measurement units (optional). Default: None
- reference_file (string)
Path to Nightingale reference mapping file. Default: “/procedure/data/local_data/references/nightingale_nmr_reference.csv”
- use_cached_reference (boolean)
Cache reference file in memory for performance. Default: true
- target_format (string)
Target identifier format: ‘hmdb’, ‘loinc’, or ‘both’. Default: “hmdb”
- match_threshold (float)
Fuzzy match threshold for biomarker names (0.0-1.0). Default: 0.85
- use_abbreviations (boolean)
Expand and match common abbreviations. Default: true
- case_sensitive (boolean)
Case-sensitive matching. Default: false
- add_metadata (boolean)
Add Nightingale metadata columns to output. Default: true
- include_units (boolean)
Include standardized units in output. Default: true
- include_categories (boolean)
Include biomarker categories in output. Default: true
Built-in Biomarker Patterns
The action includes built-in patterns for common Nightingale biomarkers:
Lipids and Lipoproteins * Total_C (Total cholesterol) → HMDB0000067, LOINC 2093-3 * LDL_C (LDL cholesterol) → HMDB0000067, LOINC 13457-7 * HDL_C (HDL cholesterol) → HMDB0000067, LOINC 2085-9 * Triglycerides → HMDB0000827, LOINC 2571-8
Apolipoproteins * ApoA1 (Apolipoprotein A1) → LOINC 1869-7 * ApoB (Apolipoprotein B) → LOINC 1884-6
Amino Acids * Ala (Alanine) → HMDB0000161, LOINC 1916-6 * Gln (Glutamine) → HMDB0000641, LOINC 14681-2
Metabolic Markers * Glucose → HMDB0000122, LOINC 2345-7 * Lactate → HMDB0000190, LOINC 2524-7 * bOHbutyrate (Beta-hydroxybutyrate) → HMDB0000357, LOINC 53060-9
Inflammation * GlycA (Glycoprotein acetyls) → No standard IDs (Nightingale-specific)
Lipoprotein Particle Patterns
The action recognizes complex lipoprotein particle naming patterns:
VLDL particles: XXL_VLDL_*, XL_VLDL_*, L_VLDL_*, etc.
LDL particles: L_LDL_*, M_LDL_*, S_LDL_*
HDL particles: XL_HDL_*, L_HDL_*, M_HDL_*, S_HDL_*
Each pattern includes appropriate units (nmol/L for particles, mmol/L for concentrations).
Example Usage
Basic HMDB Matching
- name: match_nmr_biomarkers
action:
type: NIGHTINGALE_NMR_MATCH
params:
input_key: "ukbb_nmr_data"
output_key: "matched_biomarkers"
biomarker_column: "biomarker_name"
target_format: "hmdb"
match_threshold: 0.85
LOINC Code Mapping
- name: map_to_loinc
action:
type: NIGHTINGALE_NMR_MATCH
params:
input_key: "clinical_metabolites"
output_key: "loinc_mapped"
biomarker_column: "test_name"
target_format: "loinc"
include_units: true
include_categories: true
Both HMDB and LOINC
- name: comprehensive_mapping
action:
type: NIGHTINGALE_NMR_MATCH
params:
input_key: "nmr_metabolomics"
output_key: "fully_mapped"
target_format: "both"
add_metadata: true
use_abbreviations: true
Custom Reference File
- name: custom_nightingale_match
action:
type: NIGHTINGALE_NMR_MATCH
params:
input_key: "biomarker_data"
output_key: "custom_matched"
reference_file: "/data/custom_nightingale_reference.csv"
use_cached_reference: false
match_threshold: 0.90
Strict Matching
- name: exact_matches_only
action:
type: NIGHTINGALE_NMR_MATCH
params:
input_key: "quality_controlled_data"
output_key: "exact_matches"
match_threshold: 1.0 # Only exact matches
use_abbreviations: false
case_sensitive: true
Input Data Format
Expected biomarker data structure: .. code-block:: python
- [
- {
“biomarker”: “Total_C”, “value”: 5.2, “unit”: “mmol/L”, “sample_id”: “UKB_001”
}, {
“biomarker”: “Ala”, “value”: 0.45, “unit”: “mmol/L”, “sample_id”: “UKB_002”
}, {
“biomarker”: “XXL_VLDL_P”, “value”: 1.8, “unit”: “nmol/L”, “sample_id”: “UKB_003”
}
]
Output Format
HMDB format output: .. code-block:: python
- [
- {
“original_biomarker”: “Total_C”, “matched_name”: “Total_C”, “hmdb_id”: “HMDB0000067”, “description”: “Total cholesterol”, “category”: “lipids”, “unit”: “mmol/L”, “confidence”: 1.0, “value”: 5.2, “sample_id”: “UKB_001”
}, {
“original_biomarker”: “Ala”, “matched_name”: “Ala”, “hmdb_id”: “HMDB0000161”, “description”: “Alanine”, “category”: “amino_acids”, “unit”: “mmol/L”, “confidence”: 1.0, “value”: 0.45, “sample_id”: “UKB_002”
}
]
Both HMDB and LOINC format: .. code-block:: python
- [
- {
“original_biomarker”: “Total_C”, “matched_name”: “Total_C”, “hmdb_id”: “HMDB0000067”, “loinc_code”: “2093-3”, “description”: “Total cholesterol”, “category”: “lipids”, “unit”: “mmol/L”, “confidence”: 1.0, “value”: 5.2, “sample_id”: “UKB_001”
}
]
Reference File Format
If using a custom reference file, it should follow this CSV structure:
nightingale_name,hmdb_id,loinc_code,description,category,unit
Total_C,HMDB0000067,2093-3,Total cholesterol,lipids,mmol/L
LDL_C,HMDB0000067,13457-7,LDL cholesterol,lipids,mmol/L
Ala,HMDB0000161,1916-6,Alanine,amino_acids,mmol/L
GlycA,,,"Glycoprotein acetyls",inflammation,mmol/L
Matching Algorithm
The action uses a multi-step matching approach:
Exact match against reference file or built-in patterns
Lipoprotein pattern matching for particle measurements
Fuzzy matching with abbreviation expansion
Confidence scoring based on match quality
Abbreviation Expansion
Common abbreviations are automatically expanded:
C → cholesterol
TG → triglycerides
PL → phospholipids
P → particles
XXL/XL/L/M/S → size descriptors
Statistics and Metadata
The action provides comprehensive matching statistics:
{
"statistics": {
"nightingale_nmr_match": {
"total_biomarkers": 150,
"matched_biomarkers": 142,
"match_rate": 0.947,
"category_breakdown": {
"lipids": 65,
"amino_acids": 22,
"glycolysis": 18,
"lipoproteins": 25,
"inflammation": 8,
"unknown": 4
}
}
}
}
Error Handling
- Dataset not found
Error: Dataset 'missing_data' not found in context
Solution: Verify input_key exists in context datasets.
- Missing biomarker column
Error: Column 'biomarker' not found in dataset
Solution: Check biomarker_column parameter matches dataset structure.
- Reference file issues
Warning: Reference file not found, using built-in patterns only
Solution: Verify reference file path or rely on built-in patterns.
Best Practices
Use appropriate target format - HMDB for metabolomics, LOINC for clinical
Adjust match threshold based on data quality - higher for clean data
Enable abbreviation expansion for varied naming conventions
Include metadata for comprehensive biomarker annotation
Cache reference files for repeated strategy executions
Validate match rates - low rates may indicate data format issues
Performance Notes
Built-in patterns provide fast exact matching
Fuzzy matching adds computational overhead but improves coverage
Reference file caching significantly improves repeated execution
Memory usage scales with dataset size and reference complexity
Common Use Cases
- UK Biobank NMR Processing
Map Nightingale biomarker names to standard metabolomics identifiers
- Clinical Data Integration
Convert platform-specific names to standardized clinical codes
- Multi-Platform Studies
Harmonize biomarker names across different NMR platforms
- Metabolomics Database Mapping
Prepare data for integration with metabolomics databases
Integration
This action typically follows data loading and precedes metabolomics analysis:
steps:
# 1. Load Nightingale NMR data
- name: load_nmr_data
action:
type: LOAD_DATASET_IDENTIFIERS
params:
file_path: "/data/ukbb_nmr_biomarkers.csv"
identifier_column: "biomarker"
output_key: "raw_nmr"
# 2. Match to standard identifiers
- name: standardize_biomarkers
action:
type: NIGHTINGALE_NMR_MATCH
params:
input_key: "raw_nmr"
output_key: "standardized_nmr"
target_format: "both"
match_threshold: 0.85
# 3. Continue with metabolomics analysis
- name: analyze_metabolites
action:
type: SEMANTIC_METABOLITE_MATCH
params:
input_key: "standardized_nmr"
target_database: "hmdb"