Sync to Google Drive
Overview
The Google Drive sync actions (SYNC_TO_GOOGLE_DRIVE_V2 and SYNC_TO_GOOGLE_DRIVE_V3) enable uploading biomapper results directly to Google Drive for sharing and archival. Both actions support OAuth2 and Service Account authentication methods.
V3 is recommended for new implementations as it includes enhanced error handling, chunked upload support, and better progress tracking.
Key Features
Multiple Authentication: OAuth2 (personal) and Service Account (automated)
Chunked Upload: Efficient handling of large files
Progress Tracking: Real-time upload progress
Automatic Organization: Creates organized folder structures
Error Recovery: Robust handling of network issues and timeouts
Authentication Methods
OAuth2 (Recommended for Personal Use)
Benefits: - Upload to your personal Drive storage - No quota limitations (uses your personal storage) - Full ownership and control of uploaded files - Easy sharing using Drive’s native sharing features
Setup: See Google Drive Integration for complete OAuth2 setup guide.
Service Account (For Automation)
Benefits: - Programmatic access without user interaction - Suitable for automated pipelines - Can be shared across team members
Limitations: - Cannot store files (no storage quota) - Must upload to folders shared with the service account - More complex permission management
Parameters
SYNC_TO_GOOGLE_DRIVE_V3
Parameter |
Type |
Required |
Description |
|---|---|---|---|
|
string |
Yes |
Key for the dataset to upload |
|
string |
Yes |
Key for the output with upload metadata |
|
string |
Yes |
Local file path to upload |
|
string |
Yes |
Google Drive folder name for organization |
|
string |
No |
Strategy name for folder organization |
|
string |
No |
‘oauth2’ or ‘service_account’ (default: auto-detect) |
|
integer |
No |
Upload chunk size in bytes (default: 10MB) |
SYNC_TO_GOOGLE_DRIVE_V2
Legacy version with similar parameters but without chunked upload support.
Example Usage
YAML Strategy (V3)
steps:
- name: upload_results
action:
type: SYNC_TO_GOOGLE_DRIVE_V3
params:
input_key: final_results
output_key: upload_status
file_path: "/tmp/biomapper_results.csv"
drive_folder_name: "Biomapper Results"
strategy_name: "metabolomics_pipeline_v3"
auth_method: "oauth2"
chunk_size: 5242880 # 5MB chunks
Python Client
from src.client.client_v2 import BiomapperClient
client = BiomapperClient(base_url="http://localhost:8000")
result = await client.run_action(
action_type="SYNC_TO_GOOGLE_DRIVE_V3",
params={
"input_key": "results_dataset",
"output_key": "drive_upload",
"file_path": "/results/metabolite_mapping.csv",
"drive_folder_name": "Metabolomics Analysis",
"strategy_name": "arivale_harmonization",
"auth_method": "oauth2"
},
context=context
)
# Access upload metadata
upload_info = result.context["datasets"]["drive_upload"]
print(f"Uploaded to: {upload_info['drive_url']}")
Folder Organization
The actions create organized folder structures on Google Drive:
Biomapper Results/
├── metabolomics_pipeline_v3/
│ ├── 2025-08-22_14-30-15/
│ │ ├── mapping_results.csv
│ │ ├── statistics.json
│ │ └── visualizations.png
│ └── latest/ # Symlink to most recent
└── protein_harmonization/
├── 2025-08-22_15-45-30/
└── latest/
Output Format
The action returns metadata about the upload:
Field |
Description |
|---|---|
|
Google Drive file ID |
|
Direct link to view the file |
|
Parent folder ID |
|
File size in bytes |
|
Time taken for upload |
|
Number of chunks (V3 only) |
Performance Considerations
Upload Speed
Typical performance metrics: - Small files (<10MB): 2-5 seconds - Medium files (10-100MB): 10-30 seconds - Large files (>100MB): 1-3 minutes
Factors affecting speed: - Network bandwidth - File size and type - Chunk size configuration - Google Drive API limits
Optimization Tips
Adjust chunk size: Larger chunks for faster networks
Use compression: Pre-compress large CSV files
Batch uploads: Group related files together
Monitor quotas: Be aware of daily API limits
Authentication Setup
OAuth2 Setup
Create OAuth2 credentials in Google Cloud Console
Download client configuration JSON file
Run setup script:
poetry run python scripts/setup_oauth2_drive.py
Follow browser authorization flow
Test upload:
python scripts/verify_google_drive_setup.py
Service Account Setup
Create service account in Google Cloud Console
Download credentials JSON file
Set environment variable:
export GOOGLE_SERVICE_ACCOUNT_PATH="/path/to/credentials.json"
Create shared folder and share with service account email
Error Handling
Common Issues
Error |
Solution |
|---|---|
|
Re-run OAuth2 setup or check service account credentials |
|
Wait for quota reset or reduce upload frequency |
|
Ensure folder exists and is shared with service account |
|
Retry upload or reduce chunk size |
|
Verify folder sharing permissions |
Retry Logic
V3 includes automatic retry logic: - Exponential backoff for rate limiting - Automatic retry on network errors - Chunked resume for interrupted uploads - Progress preservation across retries
Best Practices
File Naming: Use timestamp and strategy name for uniqueness
Folder Organization: Create logical folder hierarchies
Authentication: Prefer OAuth2 for personal use, Service Account for automation
Error Handling: Always check upload status in workflows
Cleanup: Regularly clean up old result files
Security Considerations
Credentials Protection: Never commit OAuth2 tokens or service account keys
Access Control: Limit folder sharing to necessary users only
Data Sensitivity: Be aware of data privacy when uploading to cloud storage
Audit Trail: Monitor upload logs for security compliance
Integration Examples
Complete Pipeline with Upload
name: "metabolomics_with_drive_sync"
description: "Complete metabolomics pipeline with Google Drive upload"
steps:
- name: load_data
action:
type: LOAD_DATASET_IDENTIFIERS
# ... data loading parameters
- name: progressive_matching
action:
type: PROGRESSIVE_SEMANTIC_MATCH
# ... matching parameters
- name: export_results
action:
type: EXPORT_DATASET
params:
input_key: final_results
file_path: "/tmp/metabolomics_results.csv"
- name: sync_to_drive
action:
type: SYNC_TO_GOOGLE_DRIVE_V3
params:
input_key: final_results
output_key: drive_metadata
file_path: "/tmp/metabolomics_results.csv"
drive_folder_name: "Metabolomics Analysis"
strategy_name: "progressive_pipeline_v3"
See Also
Google Drive Integration - Complete Google Drive setup guide
EXPORT_DATASET - Exporting data for upload
../examples/advanced_pipelines - Complete pipeline examples
—
## Verification Sources Last verified: 2025-08-22
This documentation was verified against the following project resources:
/biomapper/src/actions/io/sync_to_google_drive_v2.py (actual implementation of SYNC_TO_GOOGLE_DRIVE_V2)
/biomapper/src/actions/io/sync_to_google_drive_v3.py (enhanced V3 implementation with improved error handling)
/biomapper/src/actions/typed_base.py (TypedStrategyAction base class)
/biomapper/src/actions/registry.py (self-registration via @register_action decorator)
/biomapper/CLAUDE.md (2025 standardizations and integration patterns)
/biomapper/pyproject.toml (Google API client dependencies)