EXPORT_DATASET
Export datasets from the execution context to files in various formats.
Purpose
This action saves processed datasets to files for external use, sharing, or archival. It provides:
Multiple output formats (TSV, CSV, JSON, Excel)
Selective column export
Automatic directory creation
Integration with output file tracking
Flexible path specification
Parameters
Required Parameters
- input_key (string)
Key of the dataset to export from context[‘datasets’].
- output_path (string)
Full file path where the dataset will be saved. Supports absolute and relative paths.
Optional Parameters
- format (string)
Export format: ‘tsv’, ‘csv’, ‘json’, or ‘xlsx’. Default: ‘tsv’
- columns (list of strings)
Specific columns to export. If not specified, all columns are exported. Default: None (export all columns)
Supported Formats
- TSV (Tab-Separated Values)
Extension: .tsv, .txt
Delimiter: Tab character
Headers: Included
Best for: Large datasets, programmatic processing
- CSV (Comma-Separated Values)
Extension: .csv
Delimiter: Comma
Headers: Included
Best for: Excel compatibility, general data exchange
- JSON (JavaScript Object Notation)
Extension: .json
Format: Array of objects (records orientation)
Indented: 2 spaces for readability
Best for: Web applications, APIs
- Excel (XLSX)
Extension: .xlsx
Format: Excel workbook
Headers: Included
Best for: Manual analysis, reporting
Example Usage
Basic TSV Export
- name: export_results
action:
type: EXPORT_DATASET
params:
input_key: "final_proteins"
output_path: "/results/protein_matches.tsv"
format: "tsv"
Export Specific Columns
- name: export_summary
action:
type: EXPORT_DATASET
params:
input_key: "metabolite_matches"
output_path: "/output/metabolite_summary.csv"
format: "csv"
columns: ["compound_name", "hmdb_id", "confidence", "category"]
JSON Export for Web Use
- name: export_api_data
action:
type: EXPORT_DATASET
params:
input_key: "processed_compounds"
output_path: "/web/data/compounds.json"
format: "json"
Excel Export for Analysis
- name: export_excel_report
action:
type: EXPORT_DATASET
params:
input_key: "comprehensive_results"
output_path: "/reports/analysis_${date}.xlsx"
format: "xlsx"
Multiple Exports
- name: export_tsv
action:
type: EXPORT_DATASET
params:
input_key: "final_data"
output_path: "/output/data.tsv"
format: "tsv"
- name: export_excel
action:
type: EXPORT_DATASET
params:
input_key: "final_data"
output_path: "/output/data.xlsx"
format: "xlsx"
columns: ["id", "name", "description", "category"]
Variable Substitution in Paths
- name: export_timestamped
action:
type: EXPORT_DATASET
params:
input_key: "results"
output_path: "${OUTPUT_DIR}/results_${timestamp}.csv"
format: "csv"
Output Format Examples
TSV Format .. code-block:: tsv
uniprot_id gene_name confidence category P12345 EXAMPLE1 0.95 reviewed Q67890 EXAMPLE2 0.87 reviewed
CSV Format .. code-block:: csv
uniprot_id,gene_name,confidence,category P12345,EXAMPLE1,0.95,reviewed Q67890,EXAMPLE2,0.87,reviewed
JSON Format .. code-block:: json
- [
- {
“uniprot_id”: “P12345”, “gene_name”: “EXAMPLE1”, “confidence”: 0.95, “category”: “reviewed”
}, {
“uniprot_id”: “Q67890”, “gene_name”: “EXAMPLE2”, “confidence”: 0.87, “category”: “reviewed”
}
]
Context Integration
The action updates the execution context with output file information:
# Context after execution
{
"output_files": {
"final_proteins": "/results/protein_matches.tsv"
}
}
This enables downstream actions to reference exported files.
Path Handling
- Absolute Paths
Use full file system paths:
/home/user/data/results.csv- Relative Paths
Relative to current working directory:
./output/data.tsv- Directory Creation
Parent directories are created automatically if they don’t exist.
- Path Variables
Support for environment variables and strategy parameters:
${OUTPUT_DIR}/results.csv${parameters.output_path}${metadata.timestamp}
Error Handling
- Dataset not found
Error: Dataset 'missing_data' not found in context
Solution: Verify the input_key exists in context[‘datasets’].
- Unsupported format
Error: Unsupported format: xml
Solution: Use supported formats: tsv, csv, json, xlsx.
- Permission denied
Error: Export failed: Permission denied
Solution: Check write permissions for output directory.
- Invalid columns
Error: Column 'missing_col' not found in dataset
Solution: Verify column names exist in the dataset.
Best Practices
Use descriptive filenames including dataset type and timestamp
Choose appropriate formats for intended use:
TSV/CSV for data processing
JSON for web applications
Excel for manual analysis
Specify column subsets to reduce file size and focus on key data
Use absolute paths in production environments
Include metadata in filenames (date, version, parameters)
Plan directory structure for organized output management
Performance Notes
Export speed depends on dataset size and format complexity
TSV exports are fastest for large datasets
Excel exports may be slower due to formatting overhead
JSON exports with many columns can be memory-intensive
Column filtering reduces export time and file size
File Size Considerations
- Large Datasets (>100K rows)
Prefer TSV format for efficiency
Consider column filtering to reduce size
Use compression if supported by downstream tools
- Memory Usage
Scales with dataset size
JSON format uses more memory during export
Excel format may require significant memory for large datasets
Integration Patterns
End-of-Pipeline Export .. code-block:: yaml
- steps:
# … processing steps …
name: export_final_results action:
type: EXPORT_DATASET params:
input_key: “processed_data” output_path: “/results/final_analysis.tsv”
Multi-Format Export .. code-block:: yaml
- steps:
# … processing steps …
name: export_for_analysis action:
type: EXPORT_DATASET params:
input_key: “results” output_path: “/output/analysis.xlsx” format: “xlsx”
name: export_for_api action:
type: EXPORT_DATASET params:
input_key: “results” output_path: “/api/data.json” format: “json” columns: [“id”, “name”, “value”]
Conditional Export .. code-block:: yaml
- steps:
# … processing steps …
name: export_if_successful action:
type: EXPORT_DATASET params:
input_key: “validated_results” output_path: “/output/success_${date}.tsv” format: “tsv”
—
## Verification Sources Last verified: 2025-08-22
This documentation was verified against the following project resources:
/biomapper/src/actions/export_dataset.py (actual implementation with pandas export and UniversalContext integration)
/biomapper/src/actions/typed_base.py (TypedStrategyAction base class)
/biomapper/src/actions/registry.py (self-registration via @register_action decorator)
/biomapper/src/core/standards/context_handler.py (UniversalContext for unified context access)
/biomapper/src/core/standards/base_models.py (ActionParamsBase inheritance)
/biomapper/CLAUDE.md (2025 standardizations and parameter naming)