Google Drive Integration

Overview

Biomapper supports seamless integration with Google Drive for storing and sharing analysis results. This integration offers two authentication methods to suit different use cases: OAuth2 for personal use and Service Accounts for automated workflows.

The Google Drive integration solves the common problem of sharing large biological datasets and analysis results with collaborators, while providing organized folder structures and automatic uploads.

Authentication Methods

OAuth2 Authentication (Recommended)

OAuth2 authentication allows biomapper to upload files directly to your personal Google Drive, using your own storage quota and permissions.

Benefits:

✅ Upload to personal Drive storage
✅ No storage quota limitations (uses your personal allocation)
✅ Full ownership and control of uploaded files
✅ Easy sharing using Drive’s native sharing features
✅ No cross-domain permission issues

Best for: Personal research, individual analysis, small teams

Service Account Authentication

Service Account authentication provides programmatic access without user interaction, ideal for automated pipelines and server deployments.

Benefits:

✅ Automated access without user intervention
✅ Suitable for server-side automation
✅ Can be shared across team members
✅ Consistent API access

Limitations:

❌ Cannot store files directly (no storage quota)
❌ Must upload to folders shared with the service account
❌ More complex permission management

Best for: Automated pipelines, continuous integration, enterprise deployments

Setup Instructions

OAuth2 Setup (Personal Use)

Step 1: Create Google Cloud Project

Go to Google Cloud Console
Create a new project or select existing one
Enable the Google Drive API:
- Navigate to APIs & Services → Library
- Search for “Google Drive API”
- Click Enable

Step 2: Create OAuth2 Credentials

Navigate to APIs & Services → Credentials
Click + CREATE CREDENTIALS → OAuth client ID
Configure the consent screen (if prompted):
- User Type: External (for personal use)
- App name: “Biomapper Integration”
- User support email: Your email
- Developer contact: Your email
Create OAuth client ID:
- Application type: Desktop app
- Name: “Biomapper OAuth2”
- Click CREATE
Download the credentials JSON file
Save it securely (never commit to version control)

Step 3: Run Biomapper OAuth2 Setup

# Ensure dependencies are installed (Google dependencies included in main install)
poetry install --with dev,docs,api

# Run the OAuth2 setup wizard
poetry run python scripts/setup_oauth2_drive.py

# Follow the interactive prompts:
# 1. Provide path to OAuth2 credentials JSON
# 2. Complete browser authorization flow
# 3. Verify upload permissions

Step 4: Test the Setup

# Verify Google Drive integration
poetry run python scripts/verify_google_drive_setup.py

# Expected output:
# ✅ OAuth2 credentials found
# ✅ Google Drive API accessible
# ✅ Test file upload successful
# ✅ Setup complete!

Service Account Setup (Automation)

Step 1: Create Service Account

Go to Google Cloud Console
Navigate to IAM & Admin → Service Accounts
Click + CREATE SERVICE ACCOUNT
Configure:
- Service account name: “biomapper-drive-integration”
- Service account ID: Auto-generated
- Description: “Service account for biomapper Google Drive uploads”
Grant roles (optional): Editor or Storage Admin
Click CREATE AND CONTINUE

Step 2: Generate Credentials

Find your service account in the list
Click Actions (⋮) → Manage keys
Click ADD KEY → Create new key
Select JSON format
Click CREATE
Download and securely store the JSON file

Step 3: Configure Environment

# Set environment variable for service account
export GOOGLE_SERVICE_ACCOUNT_PATH="/path/to/service-account-key.json"

# Or add to your .bashrc/.zshrc for persistence
echo 'export GOOGLE_SERVICE_ACCOUNT_PATH="/path/to/service-account-key.json"' >> ~/.bashrc

Step 4: Create Shared Folder

In Google Drive, create a folder for biomapper results
Right-click the folder → Share
Add the service account email as Editor
- Service account email format: service-account-name@project-id.iam.gserviceaccount.com
- Find the email in the downloaded JSON file under client_email
Note the folder ID from the URL for configuration

Usage Examples

Basic Upload with OAuth2

from client.client_v2 import BiomapperClient

client = BiomapperClient(base_url="http://localhost:8000")

# Simple file upload using OAuth2
result = await client.run_action(
    action_type="SYNC_TO_GOOGLE_DRIVE_V3",
    params={
        "input_key": "analysis_results",
        "output_key": "drive_metadata",
        "file_path": "/results/metabolomics_analysis.csv",
        "drive_folder_id": "your_folder_id_here",
        "auth_type": "oauth2"
    },
    context={"datasets": {"analysis_results": results_df}}
)

print(f"File uploaded: {result.context['datasets']['drive_metadata']['drive_url']}")

Complete Pipeline with Drive Upload

name: "metabolomics_with_drive_backup"
description: "Complete metabolomics pipeline with automatic Google Drive backup"

parameters:
  input_file: "/data/metabolites.csv"
  project_name: "arivale_analysis_2025"

steps:
  # Data processing pipeline
  - name: load_metabolites
    action:
      type: LOAD_DATASET_IDENTIFIERS
      params:
        file_path: "${parameters.input_file}"
        identifier_column: "metabolite_name"
        output_key: "raw_metabolites"

  - name: progressive_matching
    action:
      type: PROGRESSIVE_SEMANTIC_MATCH
      params:
        input_key: "raw_metabolites"
        output_key: "matched_metabolites"
        identifier_column: "metabolite_name"

  # Results export and upload
  - name: export_results
    action:
      type: EXPORT_DATASET
      params:
        input_key: "matched_metabolites"
        file_path: "/tmp/${parameters.project_name}_results.csv"

  - name: upload_to_drive
    action:
      type: SYNC_TO_GOOGLE_DRIVE_V3
      params:
        input_key: "matched_metabolites"
        output_key: "drive_upload_status"
        file_path: "/tmp/${parameters.project_name}_results.csv"
        drive_folder_id: "your_folder_id_here"
        auth_type: "oauth2"

  - name: generate_summary_report
    action:
      type: GENERATE_LLM_ANALYSIS
      params:
        input_key: "matched_metabolites"
        output_key: "analysis_report"
        report_type: "coverage_summary"

  - name: upload_report
    action:
      type: SYNC_TO_GOOGLE_DRIVE_V3
      params:
        input_key: "analysis_report"
        output_key: "report_upload_status"
        file_path: "/tmp/${parameters.project_name}_report.md"
        drive_folder_id: "your_folder_id_here"
        auth_type: "oauth2"

Batch Upload Multiple Files

async def upload_analysis_batch():
    client = BiomapperClient(base_url="http://localhost:8000")

    files_to_upload = [
        {"path": "/results/metabolites.csv", "name": "metabolite_results"},
        {"path": "/results/proteins.csv", "name": "protein_results"},
        {"path": "/results/summary.json", "name": "analysis_summary"}
    ]

    upload_results = []

    for file_info in files_to_upload:
        result = await client.run_action(
            action_type="SYNC_TO_GOOGLE_DRIVE_V3",
            params={
                "input_key": "dummy",  # Not used for file uploads
                "output_key": f"upload_{file_info['name']}",
                "file_path": file_info["path"],
                "drive_folder_id": "your_folder_id_here",
                "auth_type": "oauth2"
            },
            context={"datasets": {"dummy": {}}}
        )

        upload_results.append(result)

    return upload_results

Folder Organization

Automatic Folder Structure

Biomapper automatically creates organized folder structures in Google Drive:

Biomapper Results/
├── metabolomics_pipeline_v3/
│   ├── 2025-08-22_14-30-15/
│   │   ├── metabolite_mapping_results.csv
│   │   ├── coverage_statistics.json
│   │   ├── unmatched_metabolites.csv
│   │   └── analysis_summary.md
│   ├── 2025-08-21_09-15-42/
│   └── latest/  # Always points to most recent run
├── protein_harmonization/
│   ├── 2025-08-22_15-45-30/
│   └── latest/
└── shared/
    ├── reference_data/
    └── templates/

Custom Folder Organization

You can customize the folder structure:

# Custom folder hierarchy using folder ID
drive_folder_id: "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms"
auto_organize: true
folder_prefix: "progressive_matching"

# Results in organized folder structure within the specified folder

Advanced Configuration

Performance Optimization

For large files and datasets:

# Optimize upload performance
params:
  chunk_size: 10485760  # 10MB chunks
  timeout: 300          # 5 minute timeout
  retry_attempts: 3     # Retry failed uploads
  compression: true     # Compress before upload

Error Handling and Monitoring

# Robust error handling
params:
  on_error: "continue"           # Don't fail pipeline on upload error
  backup_local: true             # Keep local copy as backup
  verify_upload: true            # Verify file integrity after upload
  notification_email: "user@domain.com"  # Email on upload completion

Authentication Management

Environment Configuration

# OAuth2 configuration
export GOOGLE_OAUTH2_CREDENTIALS_PATH="/secure/path/to/oauth2_credentials.json"
export GOOGLE_OAUTH2_TOKEN_PATH="/secure/path/to/token.json"

# Service Account configuration
export GOOGLE_SERVICE_ACCOUNT_PATH="/secure/path/to/service_account.json"
export GOOGLE_DRIVE_FOLDER_ID="1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms"

Token Management

# Check OAuth2 token status
from utils.google_auth_helper import GoogleAuthHelper

auth_helper = GoogleAuthHelper()

if auth_helper.token_expired():
    print("Token expired, refreshing...")
    auth_helper.refresh_token()
else:
    print("Token valid")

Troubleshooting

Common Issues and Solutions

Problem	Solution
“Authentication failed” error	Re-run OAuth2 setup or check service account credentials
“Insufficient permissions” error	Verify folder sharing with service account email
Upload timeouts	Reduce chunk size or increase timeout values
“Quota exceeded” error	Wait for quota reset (24 hours) or reduce upload frequency
Files not appearing in Drive	Check folder permissions and refresh Drive view
Upload seems slow	Verify network connection and try smaller chunk sizes

Debugging Tools

# Test OAuth2 authentication
poetry run python -c "
from utils.google_auth_helper import GoogleAuthHelper
auth = GoogleAuthHelper()
print('OAuth2 status:', auth.check_oauth2_status())
"

# Test service account access
poetry run python -c "
from utils.google_auth_helper import GoogleAuthHelper
auth = GoogleAuthHelper()
print('Service account status:', auth.check_service_account_status())
"

# List accessible Drive folders
poetry run python scripts/list_drive_folders.py

Performance Considerations

Upload Speed Optimization

Typical performance metrics:

File Size	Upload Time (OAuth2)	Upload Time (Service Account)	Recommended Chunk Size
< 10MB	2-5 seconds	2-5 seconds	Default (5MB)
10-100MB	10-30 seconds	10-30 seconds	10MB
100MB-1GB	1-5 minutes	1-5 minutes	25MB
> 1GB	5-20 minutes	5-20 minutes	50MB

Network and API Limits

Google Drive API Quota: 1,000 requests per 100 seconds per user
Upload Limits: 5TB per day per user
File Size Limits: 5TB per file maximum
Concurrent Uploads: Recommend max 3-5 simultaneous uploads

Best Practices

Security and Privacy

Credential Protection
- Never commit OAuth2 or service account credentials to version control
- Store credentials in secure, encrypted locations
- Use environment variables for credential paths
- Regularly rotate service account keys
Access Control
- Use principle of least privilege for folder sharing
- Regularly audit who has access to shared folders
- Consider using organization-managed shared drives
Data Sensitivity
- Be aware of data privacy regulations (GDPR, HIPAA, etc.)
- Consider encryption for sensitive biological data
- Document data handling and retention policies

Operational Guidelines

Monitoring
- Track upload success rates and performance
- Monitor Google Drive API quota usage
- Set up alerts for upload failures
Maintenance
- Regularly clean up old result files
- Archive completed analyses
- Update credentials before expiration
Documentation
- Document folder organization conventions
- Maintain sharing permission records
- Record credential renewal procedures