CHEMISTRY_FUZZY_TEST_MATCH
Fuzzy matching for clinical chemistry test names and descriptions.
Purpose
This action performs intelligent fuzzy matching for clinical chemistry tests by:
Matching test names using multiple algorithms (Levenshtein, Jaro-Winkler, token-based)
Handling abbreviations and synonyms in clinical test nomenclature
Normalizing units and reference ranges
Resolving ambiguous test mappings
Supporting LOINC code cross-referencing
Parameters
Required Parameters
- input_key (string)
Context key containing source chemistry test data.
- target_key (string)
Context key containing target reference test database.
- output_key (string)
Context key to store matched results.
Optional Parameters
- test_name_column (string)
Column containing test names in source data. Default: “test_name”
- match_threshold (float)
Minimum similarity score for matches (0.0-1.0). Default: 0.85
- matching_strategy (string)
Strategy: ‘best_match’, ‘all_above_threshold’, ‘top_n’. Default: ‘best_match’
- top_n (integer)
Number of top matches to return (if strategy=’top_n’). Default: 3
- use_abbreviations (boolean)
Enable abbreviation expansion (e.g., ‘Hgb’ → ‘Hemoglobin’). Default: true
- use_synonyms (boolean)
Enable synonym matching from clinical dictionaries. Default: true
- normalize_units (boolean)
Standardize measurement units before matching. Default: true
Example Usage
Basic Fuzzy Matching
- name: match_chemistry_tests
action:
type: CHEMISTRY_FUZZY_TEST_MATCH
params:
input_key: "lab_tests"
target_key: "reference_tests"
output_key: "matched_tests"
match_threshold: 0.85
Advanced Configuration
- name: comprehensive_matching
action:
type: CHEMISTRY_FUZZY_TEST_MATCH
params:
input_key: "clinical_chemistry"
target_key: "loinc_database"
output_key: "matched_chemistry"
test_name_column: "assay_name"
match_threshold: 0.80
matching_strategy: "top_n"
top_n: 5
use_abbreviations: true
use_synonyms: true
normalize_units: true
Input Format
Source Test Data
[
{
"test_name": "Glucose, Serum",
"value": "95",
"units": "mg/dL",
"reference_range": "70-100"
},
{
"test_name": "Hgb", # Abbreviation
"value": "14.5",
"units": "g/dl",
"reference_range": "13.5-17.5"
}
]
Target Reference Database
[
{
"standard_name": "Glucose in Serum or Plasma",
"loinc_code": "2345-7",
"units": "mg/dL",
"synonyms": ["Blood Glucose", "Serum Glucose"]
},
{
"standard_name": "Hemoglobin",
"loinc_code": "718-7",
"units": "g/dL",
"abbreviations": ["Hgb", "Hb"]
}
]
Output Format
Matched Results
{
"datasets": {
"matched_tests": [
{
# Original fields
"test_name": "Glucose, Serum",
"value": "95",
"units": "mg/dL",
# Match metadata
"matched_name": "Glucose in Serum or Plasma",
"loinc_code": "2345-7",
"match_score": 0.92,
"match_method": "fuzzy_token",
"match_confidence": "high",
# Normalized values
"normalized_units": "mg/dL",
"standardized_value": 95.0
}
]
}
}
Matching Statistics
{
"statistics": {
"fuzzy_matching": {
"total_tests": 150,
"matched": 142,
"unmatched": 8,
"match_rate": 0.947,
"confidence_distribution": {
"high": 120,
"medium": 22,
"low": 0
},
"method_usage": {
"exact": 45,
"abbreviation": 28,
"synonym": 15,
"fuzzy_token": 54
}
}
}
}
Matching Algorithms
Matching Methods (in order)
Exact Match: Direct string comparison
Abbreviation Expansion: Hgb → Hemoglobin
Synonym Matching: Uses clinical dictionaries
Token-Based Fuzzy: Compares word tokens
Levenshtein Distance: Character-level similarity
Jaro-Winkler: Optimized for short strings
Confidence Scoring
High (>0.90): Exact or near-exact matches
Medium (0.80-0.90): Good fuzzy matches
Low (<0.80): Weak matches (if above threshold)
Best Practices
Start with higher thresholds (0.85+) and adjust based on results
Review unmatched tests to identify missing synonyms
Use top_n strategy for manual validation workflows
Enable all normalization options for heterogeneous data
Validate LOINC codes when available
Performance Notes
Optimized for datasets with <10,000 tests
Uses indexed search for large reference databases
Caches abbreviation and synonym lookups
Parallel processing for batch matching
Integration Example
name: clinical_chemistry_pipeline
description: Map clinical chemistry tests to standards
steps:
- name: load_lab_data
action:
type: LOAD_DATASET_IDENTIFIERS
params:
file_path: "/data/lab_results.csv"
identifier_column: "patient_id"
output_key: "lab_data"
- name: extract_loinc
action:
type: CHEMISTRY_EXTRACT_LOINC
params:
input_key: "lab_data"
output_key: "loinc_extracted"
- name: fuzzy_match
action:
type: CHEMISTRY_FUZZY_TEST_MATCH
params:
input_key: "loinc_extracted"
target_key: "loinc_reference"
output_key: "matched_tests"
match_threshold: 0.85
- name: export_results
action:
type: EXPORT_DATASET
params:
input_key: "matched_tests"
output_file: "/results/matched_chemistry.xlsx"
format: "excel"
See Also
chemistry_extract_loinc - Extract LOINC codes
chemistry_vendor_harmonization - Harmonize vendor-specific tests
calculate_mapping_quality - Assess match quality