Google Drive Integration

Overview

Biomapper supports seamless integration with Google Drive for storing and sharing analysis results. This integration offers two authentication methods to suit different use cases: OAuth2 for personal use and Service Accounts for automated workflows.

The Google Drive integration solves the common problem of sharing large biological datasets and analysis results with collaborators, while providing organized folder structures and automatic uploads.

Authentication Methods

Service Account Authentication

Service Account authentication provides programmatic access without user interaction, ideal for automated pipelines and server deployments.

Benefits:
  • ✅ Automated access without user intervention

  • ✅ Suitable for server-side automation

  • ✅ Can be shared across team members

  • ✅ Consistent API access

Limitations:
  • ❌ Cannot store files directly (no storage quota)

  • ❌ Must upload to folders shared with the service account

  • ❌ More complex permission management

Best for: Automated pipelines, continuous integration, enterprise deployments

Setup Instructions

OAuth2 Setup (Personal Use)

Step 1: Create Google Cloud Project

  1. Go to Google Cloud Console

  2. Create a new project or select existing one

  3. Enable the Google Drive API:

    • Navigate to APIs & ServicesLibrary

    • Search for “Google Drive API”

    • Click Enable

Step 2: Create OAuth2 Credentials

  1. Navigate to APIs & ServicesCredentials

  2. Click + CREATE CREDENTIALSOAuth client ID

  3. Configure the consent screen (if prompted):

    • User Type: External (for personal use)

    • App name: “Biomapper Integration”

    • User support email: Your email

    • Developer contact: Your email

  4. Create OAuth client ID:

    • Application type: Desktop app

    • Name: “Biomapper OAuth2”

    • Click CREATE

  5. Download the credentials JSON file

  6. Save it securely (never commit to version control)

Step 3: Run Biomapper OAuth2 Setup

# Ensure dependencies are installed (Google dependencies included in main install)
poetry install --with dev,docs,api

# Run the OAuth2 setup wizard
poetry run python scripts/setup_oauth2_drive.py

# Follow the interactive prompts:
# 1. Provide path to OAuth2 credentials JSON
# 2. Complete browser authorization flow
# 3. Verify upload permissions

Step 4: Test the Setup

# Verify Google Drive integration
poetry run python scripts/verify_google_drive_setup.py

# Expected output:
# ✅ OAuth2 credentials found
# ✅ Google Drive API accessible
# ✅ Test file upload successful
# ✅ Setup complete!

Service Account Setup (Automation)

Step 1: Create Service Account

  1. Go to Google Cloud Console

  2. Navigate to IAM & AdminService Accounts

  3. Click + CREATE SERVICE ACCOUNT

  4. Configure:

    • Service account name: “biomapper-drive-integration”

    • Service account ID: Auto-generated

    • Description: “Service account for biomapper Google Drive uploads”

  5. Grant roles (optional): Editor or Storage Admin

  6. Click CREATE AND CONTINUE

Step 2: Generate Credentials

  1. Find your service account in the list

  2. Click Actions (⋮) → Manage keys

  3. Click ADD KEYCreate new key

  4. Select JSON format

  5. Click CREATE

  6. Download and securely store the JSON file

Step 3: Configure Environment

# Set environment variable for service account
export GOOGLE_SERVICE_ACCOUNT_PATH="/path/to/service-account-key.json"

# Or add to your .bashrc/.zshrc for persistence
echo 'export GOOGLE_SERVICE_ACCOUNT_PATH="/path/to/service-account-key.json"' >> ~/.bashrc

Step 4: Create Shared Folder

  1. In Google Drive, create a folder for biomapper results

  2. Right-click the folder → Share

  3. Add the service account email as Editor

    • Service account email format: service-account-name@project-id.iam.gserviceaccount.com

    • Find the email in the downloaded JSON file under client_email

  4. Note the folder ID from the URL for configuration

Usage Examples

Basic Upload with OAuth2

from client.client_v2 import BiomapperClient

client = BiomapperClient(base_url="http://localhost:8000")

# Simple file upload using OAuth2
result = await client.run_action(
    action_type="SYNC_TO_GOOGLE_DRIVE_V3",
    params={
        "input_key": "analysis_results",
        "output_key": "drive_metadata",
        "file_path": "/results/metabolomics_analysis.csv",
        "drive_folder_id": "your_folder_id_here",
        "auth_type": "oauth2"
    },
    context={"datasets": {"analysis_results": results_df}}
)

print(f"File uploaded: {result.context['datasets']['drive_metadata']['drive_url']}")

Complete Pipeline with Drive Upload

name: "metabolomics_with_drive_backup"
description: "Complete metabolomics pipeline with automatic Google Drive backup"

parameters:
  input_file: "/data/metabolites.csv"
  project_name: "arivale_analysis_2025"

steps:
  # Data processing pipeline
  - name: load_metabolites
    action:
      type: LOAD_DATASET_IDENTIFIERS
      params:
        file_path: "${parameters.input_file}"
        identifier_column: "metabolite_name"
        output_key: "raw_metabolites"

  - name: progressive_matching
    action:
      type: PROGRESSIVE_SEMANTIC_MATCH
      params:
        input_key: "raw_metabolites"
        output_key: "matched_metabolites"
        identifier_column: "metabolite_name"

  # Results export and upload
  - name: export_results
    action:
      type: EXPORT_DATASET
      params:
        input_key: "matched_metabolites"
        file_path: "/tmp/${parameters.project_name}_results.csv"

  - name: upload_to_drive
    action:
      type: SYNC_TO_GOOGLE_DRIVE_V3
      params:
        input_key: "matched_metabolites"
        output_key: "drive_upload_status"
        file_path: "/tmp/${parameters.project_name}_results.csv"
        drive_folder_id: "your_folder_id_here"
        auth_type: "oauth2"

  - name: generate_summary_report
    action:
      type: GENERATE_LLM_ANALYSIS
      params:
        input_key: "matched_metabolites"
        output_key: "analysis_report"
        report_type: "coverage_summary"

  - name: upload_report
    action:
      type: SYNC_TO_GOOGLE_DRIVE_V3
      params:
        input_key: "analysis_report"
        output_key: "report_upload_status"
        file_path: "/tmp/${parameters.project_name}_report.md"
        drive_folder_id: "your_folder_id_here"
        auth_type: "oauth2"

Batch Upload Multiple Files

async def upload_analysis_batch():
    client = BiomapperClient(base_url="http://localhost:8000")

    files_to_upload = [
        {"path": "/results/metabolites.csv", "name": "metabolite_results"},
        {"path": "/results/proteins.csv", "name": "protein_results"},
        {"path": "/results/summary.json", "name": "analysis_summary"}
    ]

    upload_results = []

    for file_info in files_to_upload:
        result = await client.run_action(
            action_type="SYNC_TO_GOOGLE_DRIVE_V3",
            params={
                "input_key": "dummy",  # Not used for file uploads
                "output_key": f"upload_{file_info['name']}",
                "file_path": file_info["path"],
                "drive_folder_id": "your_folder_id_here",
                "auth_type": "oauth2"
            },
            context={"datasets": {"dummy": {}}}
        )

        upload_results.append(result)

    return upload_results

Folder Organization

Automatic Folder Structure

Biomapper automatically creates organized folder structures in Google Drive:

Biomapper Results/
├── metabolomics_pipeline_v3/
│   ├── 2025-08-22_14-30-15/
│   │   ├── metabolite_mapping_results.csv
│   │   ├── coverage_statistics.json
│   │   ├── unmatched_metabolites.csv
│   │   └── analysis_summary.md
│   ├── 2025-08-21_09-15-42/
│   └── latest/  # Always points to most recent run
├── protein_harmonization/
│   ├── 2025-08-22_15-45-30/
│   └── latest/
└── shared/
    ├── reference_data/
    └── templates/

Custom Folder Organization

You can customize the folder structure:

# Custom folder hierarchy using folder ID
drive_folder_id: "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms"
auto_organize: true
folder_prefix: "progressive_matching"

# Results in organized folder structure within the specified folder

Sharing and Permissions

OAuth2 Files: - Uploaded to your personal Drive - You own the files and control sharing - Use Drive’s native sharing (right-click → Share) - Can set view/edit/comment permissions

Service Account Files: - Uploaded to shared folders - Folder owner controls overall access - Files inherit folder permissions - Can be accessed by all folder collaborators

Advanced Configuration

Performance Optimization

For large files and datasets:

# Optimize upload performance
params:
  chunk_size: 10485760  # 10MB chunks
  timeout: 300          # 5 minute timeout
  retry_attempts: 3     # Retry failed uploads
  compression: true     # Compress before upload

Error Handling and Monitoring

# Robust error handling
params:
  on_error: "continue"           # Don't fail pipeline on upload error
  backup_local: true             # Keep local copy as backup
  verify_upload: true            # Verify file integrity after upload
  notification_email: "user@domain.com"  # Email on upload completion

Authentication Management

Environment Configuration

# OAuth2 configuration
export GOOGLE_OAUTH2_CREDENTIALS_PATH="/secure/path/to/oauth2_credentials.json"
export GOOGLE_OAUTH2_TOKEN_PATH="/secure/path/to/token.json"

# Service Account configuration
export GOOGLE_SERVICE_ACCOUNT_PATH="/secure/path/to/service_account.json"
export GOOGLE_DRIVE_FOLDER_ID="1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms"

Token Management

# Check OAuth2 token status
from utils.google_auth_helper import GoogleAuthHelper

auth_helper = GoogleAuthHelper()

if auth_helper.token_expired():
    print("Token expired, refreshing...")
    auth_helper.refresh_token()
else:
    print("Token valid")

Troubleshooting

Common Issues and Solutions

Problem

Solution

“Authentication failed” error

Re-run OAuth2 setup or check service account credentials

“Insufficient permissions” error

Verify folder sharing with service account email

Upload timeouts

Reduce chunk size or increase timeout values

“Quota exceeded” error

Wait for quota reset (24 hours) or reduce upload frequency

Files not appearing in Drive

Check folder permissions and refresh Drive view

Upload seems slow

Verify network connection and try smaller chunk sizes

Debugging Tools

# Test OAuth2 authentication
poetry run python -c "
from utils.google_auth_helper import GoogleAuthHelper
auth = GoogleAuthHelper()
print('OAuth2 status:', auth.check_oauth2_status())
"

# Test service account access
poetry run python -c "
from utils.google_auth_helper import GoogleAuthHelper
auth = GoogleAuthHelper()
print('Service account status:', auth.check_service_account_status())
"

# List accessible Drive folders
poetry run python scripts/list_drive_folders.py

Performance Considerations

Upload Speed Optimization

Typical performance metrics:

File Size

Upload Time (OAuth2)

Upload Time (Service Account)

Recommended Chunk Size

< 10MB

2-5 seconds

2-5 seconds

Default (5MB)

10-100MB

10-30 seconds

10-30 seconds

10MB

100MB-1GB

1-5 minutes

1-5 minutes

25MB

> 1GB

5-20 minutes

5-20 minutes

50MB

Network and API Limits

  • Google Drive API Quota: 1,000 requests per 100 seconds per user

  • Upload Limits: 5TB per day per user

  • File Size Limits: 5TB per file maximum

  • Concurrent Uploads: Recommend max 3-5 simultaneous uploads

Best Practices

Security and Privacy

  1. Credential Protection

    • Never commit OAuth2 or service account credentials to version control

    • Store credentials in secure, encrypted locations

    • Use environment variables for credential paths

    • Regularly rotate service account keys

  2. Access Control

    • Use principle of least privilege for folder sharing

    • Regularly audit who has access to shared folders

    • Consider using organization-managed shared drives

  3. Data Sensitivity

    • Be aware of data privacy regulations (GDPR, HIPAA, etc.)

    • Consider encryption for sensitive biological data

    • Document data handling and retention policies

Operational Guidelines

  1. Monitoring

    • Track upload success rates and performance

    • Monitor Google Drive API quota usage

    • Set up alerts for upload failures

  2. Maintenance

    • Regularly clean up old result files

    • Archive completed analyses

    • Update credentials before expiration

  3. Documentation

    • Document folder organization conventions

    • Maintain sharing permission records

    • Record credential renewal procedures

See Also

## Verification Sources Last verified: August 22, 2025

This documentation was verified against the following project resources:

  • /biomapper/src/actions/io/sync_to_google_drive_v3.py (Google Drive sync action implementation with OAuth2 and Service Account support)

  • /biomapper/src/utils/google_auth_helper.py (Authentication helper class supporting both OAuth2 and Service Account methods)

  • /biomapper/scripts/setup_oauth2_drive.py (OAuth2 setup wizard script for interactive credential configuration)

  • /biomapper/scripts/verify_google_drive_setup.py (Google Drive integration verification script)

  • /biomapper/pyproject.toml (Project dependencies including google-auth, google-auth-oauthlib, and googleapiclient)

  • /biomapper/src/client/client_v2.py (Main BiomapperClient for API interactions)

  • /biomapper/README.md (Project architecture and usage patterns)