CHEMISTRY_FUZZY_TEST_MATCH

Fuzzy matching for clinical chemistry test names and descriptions.

Purpose

This action performs intelligent fuzzy matching for clinical chemistry tests by:

Matching test names using multiple algorithms (Levenshtein, Jaro-Winkler, token-based)
Handling abbreviations and synonyms in clinical test nomenclature
Normalizing units and reference ranges
Resolving ambiguous test mappings
Supporting LOINC code cross-referencing

Parameters

Required Parameters

input_key (string): Context key containing source chemistry test data.
target_key (string): Context key containing target reference test database.
output_key (string): Context key to store matched results.

Optional Parameters

test_name_column (string): Column containing test names in source data. Default: “test_name”
match_threshold (float): Minimum similarity score for matches (0.0-1.0). Default: 0.85
matching_strategy (string): Strategy: ‘best_match’, ‘all_above_threshold’, ‘top_n’. Default: ‘best_match’
top_n (integer): Number of top matches to return (if strategy=’top_n’). Default: 3
use_abbreviations (boolean): Enable abbreviation expansion (e.g., ‘Hgb’ → ‘Hemoglobin’). Default: true
use_synonyms (boolean): Enable synonym matching from clinical dictionaries. Default: true
normalize_units (boolean): Standardize measurement units before matching. Default: true

Example Usage

Basic Fuzzy Matching

- name: match_chemistry_tests
  action:
    type: CHEMISTRY_FUZZY_TEST_MATCH
    params:
      input_key: "lab_tests"
      target_key: "reference_tests"
      output_key: "matched_tests"
      match_threshold: 0.85

Advanced Configuration

- name: comprehensive_matching
  action:
    type: CHEMISTRY_FUZZY_TEST_MATCH
    params:
      input_key: "clinical_chemistry"
      target_key: "loinc_database"
      output_key: "matched_chemistry"
      test_name_column: "assay_name"
      match_threshold: 0.80
      matching_strategy: "top_n"
      top_n: 5
      use_abbreviations: true
      use_synonyms: true
      normalize_units: true

Input Format

Source Test Data

[
    {
        "test_name": "Glucose, Serum",
        "value": "95",
        "units": "mg/dL",
        "reference_range": "70-100"
    },
    {
        "test_name": "Hgb",  # Abbreviation
        "value": "14.5",
        "units": "g/dl",
        "reference_range": "13.5-17.5"
    }
]

Target Reference Database

[
    {
        "standard_name": "Glucose in Serum or Plasma",
        "loinc_code": "2345-7",
        "units": "mg/dL",
        "synonyms": ["Blood Glucose", "Serum Glucose"]
    },
    {
        "standard_name": "Hemoglobin",
        "loinc_code": "718-7",
        "units": "g/dL",
        "abbreviations": ["Hgb", "Hb"]
    }
]

Output Format

Matched Results

{
    "datasets": {
        "matched_tests": [
            {
                # Original fields
                "test_name": "Glucose, Serum",
                "value": "95",
                "units": "mg/dL",

                # Match metadata
                "matched_name": "Glucose in Serum or Plasma",
                "loinc_code": "2345-7",
                "match_score": 0.92,
                "match_method": "fuzzy_token",
                "match_confidence": "high",

                # Normalized values
                "normalized_units": "mg/dL",
                "standardized_value": 95.0
            }
        ]
    }
}

Matching Statistics

{
    "statistics": {
        "fuzzy_matching": {
            "total_tests": 150,
            "matched": 142,
            "unmatched": 8,
            "match_rate": 0.947,
            "confidence_distribution": {
                "high": 120,
                "medium": 22,
                "low": 0
            },
            "method_usage": {
                "exact": 45,
                "abbreviation": 28,
                "synonym": 15,
                "fuzzy_token": 54
            }
        }
    }
}

Matching Algorithms

Matching Methods (in order)

Exact Match: Direct string comparison
Abbreviation Expansion: Hgb → Hemoglobin
Synonym Matching: Uses clinical dictionaries
Token-Based Fuzzy: Compares word tokens
Levenshtein Distance: Character-level similarity
Jaro-Winkler: Optimized for short strings

Confidence Scoring

High (>0.90): Exact or near-exact matches
Medium (0.80-0.90): Good fuzzy matches
Low (<0.80): Weak matches (if above threshold)

Best Practices

Start with higher thresholds (0.85+) and adjust based on results
Review unmatched tests to identify missing synonyms
Use top_n strategy for manual validation workflows
Enable all normalization options for heterogeneous data
Validate LOINC codes when available

Performance Notes

Optimized for datasets with <10,000 tests
Uses indexed search for large reference databases
Caches abbreviation and synonym lookups
Parallel processing for batch matching

Integration Example

name: clinical_chemistry_pipeline
description: Map clinical chemistry tests to standards

steps:
  - name: load_lab_data
    action:
      type: LOAD_DATASET_IDENTIFIERS
      params:
        file_path: "/data/lab_results.csv"
        identifier_column: "patient_id"
        output_key: "lab_data"

  - name: extract_loinc
    action:
      type: CHEMISTRY_EXTRACT_LOINC
      params:
        input_key: "lab_data"
        output_key: "loinc_extracted"

  - name: fuzzy_match
    action:
      type: CHEMISTRY_FUZZY_TEST_MATCH
      params:
        input_key: "loinc_extracted"
        target_key: "loinc_reference"
        output_key: "matched_tests"
        match_threshold: 0.85

  - name: export_results
    action:
      type: EXPORT_DATASET
      params:
        input_key: "matched_tests"
        output_file: "/results/matched_chemistry.xlsx"
        format: "excel"