Treasure Data → S3 Tables

Concise guide for exporting data from Treasure Data to S3 buckets, tailored for a Python/Lambda-based infrastructure.

Overview

Treasure Data's S3 Export Integration (V1) lets you write query results directly to an S3 bucket. Three ways to trigger it: TD Console, TD CLI, or Workflow (Digdag). Since your infra is Python-based with Lambda functions, the CLI/Workflow approach integrates cleanest.

Official Docs: Amazon S3 Export Integration V1

Prerequisites

AWS IAM User with only these permissions:
- s3:PutObject
- s3:AbortMultipartUpload
TD Toolbelt installed (for CLI approach)
Access Key and Secret Key (URL-encoded when used in CLI)

Limitations

Query result export cap: 100GB (split queries if exceeded)
Default format: CSV (RFC 4180)
Supported formats: CSV, TSV, JSONL
Compression: gz or none
If using S3 bucket policies that reject unencrypted requests, enable use_sse=true

Step 1: IAM Setup (AWS Side)

Create a dedicated IAM user for TD exports. Minimal policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:AbortMultipartUpload"
      ],
      "Resource": "arn:aws:s3:::your-bucket-name/*"
    }
  ]
}

If your security policy requires IP whitelisting, add TD's static IPs to your bucket policy:
TD Static IP Addresses

Step 2: S3 Bucket Prep

Create the target bucket (or use an existing one)
If SSE is required, enable AES-256 server-side encryption on the bucket
Set up a folder structure for exports, e.g.:

s3://your-bucket/
  └── td-exports/
      └── daily/
      └── adhoc/

Step 3: Create Authentication in TD Console

Integrations Hub → Catalog → search AWS S3
Create Authentication with:

Parameter	Value
Endpoint	`s3-ap-northeast-1.amazonaws.com` (match your bucket region)
Auth Method	`basic` (use `session` for imports only)
Access Key ID	Your IAM user access key
Secret Access Key	Your IAM user secret key

Name the connection → Done

Endpoint reference: AWS S3 Region Endpoints

Step 4: Configure Export

Option A: TD Console (Manual/Scheduled)

Data Workbench → Queries → select or create a query
Run the query to validate results
Click Export Results → select your S3 authentication
Configure:

Field	Recommended Value
Bucket	`your-bucket-name`
Path	`td-exports/daily/export_${date}.csv.gz`
Format	`csv` or `jsonl`
Compression	`gz`
Include header	Yes
Null string	empty string
Part Size	10 MB (default)

(Optional) Set a schedule: @daily, @hourly, or custom cron

Option B: CLI (Best for Lambda Integration)

td query \
  --result 's3://ACCESS_KEY:SECRET_KEY@/bucket-name/td-exports/output.csv.gz?compression=gz' \
  -w -d your_database \
  "SELECT * FROM your_table WHERE time > 1234567890"

With SSE enabled:

td query \
  --result 's3://ACCESS_KEY:SECRET_KEY@/bucket-name/path/file.csv?use_sse=true&sse_algorithm=AES256' \
  -w -d your_database \
  "SELECT * FROM your_table"

Access key and secret key must be URL-encoded.

Option C: Workflow (Digdag)

timezone: UTC
_export:
  td:
    database: your_database

+export-to-s3:
  td>: queries/export_query.sql
  result_connection: your_connection_name
  result_settings:
    bucket: your-bucket-name
    path: /td-exports/daily/data_${moment(session_time).format("YYYYMMDD")}.csv.gz
    compression: 'gz'
    header: true
    newline: \r\n

Workflow examples: Treasure Boxes — S3 Export

Step 5: Integrating with Your Python/Lambda Stack

Since your infra uses Python Lambda functions, here's how to fit TD exports into your pipeline:

Approach 1: TD Python Client → Lambda Trigger

import pytd
import os

# TD client setup
client = pytd.Client(
    apikey=os.environ['TD_API_KEY'],
    endpoint='https://api.treasuredata.com'
)

# Run export query
client.query(
    'your_database',
    'SELECT * FROM your_table',
    result_url=f"s3://{os.environ['AWS_ACCESS_KEY']}:{os.environ['AWS_SECRET_KEY']}@/your-bucket/exports/data.csv.gz?compression=gz"
)

Approach 2: Lambda Listens for S3 Landing

TD scheduled export writes to s3://bucket/td-exports/
S3 Event Notification triggers your Lambda on PutObject
Lambda processes the landed file (transform, load to RDS, etc.)

# Lambda handler triggered by S3 event
def handler(event, context):
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key']

    # Process the TD export file
    # e.g., read CSV, transform, load elsewhere

Approach 3: Step Functions Orchestration

TD Scheduled Export → S3 Landing → S3 Event → Lambda (Transform) → Target (RDS/DynamoDB/Athena)

Export Format Options Quick Reference

Option	Values	Default
`format`	`csv`, `tsv`, `jsonl`	`csv`
`compression`	`gz`, none	none
`delimiter`	`,`, `\t`, `\|`	`,`
`header`	`true`, `false`	`true`
`null`	empty, `\N`, `NULL`, `null`	empty
`newline`	`CRLF`, `LF`, `CR`	`CRLF`
`quote`	`"` or custom	`"`

Scheduling (Cron Reference)

Cron	Schedule
`0 * * * *`	Every hour
`0 0 * * *`	Daily at midnight
`0 0 1 * *`	First day of each month
`0 /6 * *`	Every 6 hours
`30 2 * * *`	Daily at 2:30 AM

Useful Links

Resource	URL
S3 Export Integration Docs	docs.treasuredata.com/int/amazon-s3-export-integration-v1
TD Toolbelt (CLI)	toolbelt.treasuredata.com
TD Python Client (pytd)	github.com/treasure-data/pytd
Workflow Examples	Treasure Boxes S3 Export
Workflow Secrets	Secret Management Docs
TD Static IPs (for whitelisting)	IP Addresses Doc
AWS S3 Endpoints by Region	AWS Docs
AWS IAM Best Practices	AWS IAM Docs
S3 Server-Side Encryption	AWS SSE Docs

Overview​

Prerequisites​

Limitations​

Step 1: IAM Setup (AWS Side)​

Step 2: S3 Bucket Prep​

Step 3: Create Authentication in TD Console​

Step 4: Configure Export​

Option A: TD Console (Manual/Scheduled)​

Option B: CLI (Best for Lambda Integration)​

Option C: Workflow (Digdag)​

Step 5: Integrating with Your Python/Lambda Stack​

Approach 1: TD Python Client → Lambda Trigger​

Approach 2: Lambda Listens for S3 Landing​

Approach 3: Step Functions Orchestration​

Export Format Options Quick Reference​

Scheduling (Cron Reference)​

Useful Links​