Treasure Data → S3 Tables
Concise guide for exporting data from Treasure Data to S3 buckets, tailored for a Python/Lambda-based infrastructure.
Overview
Treasure Data's S3 Export Integration (V1) lets you write query results directly to an S3 bucket. Three ways to trigger it: TD Console, TD CLI, or Workflow (Digdag). Since your infra is Python-based with Lambda functions, the CLI/Workflow approach integrates cleanest.
Official Docs: Amazon S3 Export Integration V1
Prerequisites
-
AWS IAM User with only these permissions:
s3:PutObjects3:AbortMultipartUpload
-
TD Toolbelt installed (for CLI approach)
-
Access Key and Secret Key (URL-encoded when used in CLI)
Limitations
-
Query result export cap: 100GB (split queries if exceeded)
-
Default format: CSV (RFC 4180)
-
Supported formats: CSV, TSV, JSONL
-
Compression:
gzor none -
If using S3 bucket policies that reject unencrypted requests, enable
use_sse=true
Step 1: IAM Setup (AWS Side)
Create a dedicated IAM user for TD exports. Minimal policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:AbortMultipartUpload"
],
"Resource": "arn:aws:s3:::your-bucket-name/*"
}
]
}
If your security policy requires IP whitelisting, add TD's static IPs to your bucket policy:
TD Static IP Addresses
Step 2: S3 Bucket Prep
-
Create the target bucket (or use an existing one)
-
If SSE is required, enable AES-256 server-side encryption on the bucket
-
Set up a folder structure for exports, e.g.:
s3://your-bucket/
└── td-exports/
└── daily/
└── adhoc/
Step 3: Create Authentication in TD Console
-
Integrations Hub → Catalog → search AWS S3
-
Create Authentication with:
| Parameter | Value |
| Endpoint | s3-ap-northeast-1.amazonaws.com (match your bucket region) |
| Auth Method | basic (use session for imports only) |
| Access Key ID | Your IAM user access key |
| Secret Access Key | Your IAM user secret key |
- Name the connection → Done
Endpoint reference: AWS S3 Region Endpoints
Step 4: Configure Export
Option A: TD Console (Manual/Scheduled)
-
Data Workbench → Queries → select or create a query
-
Run the query to validate results
-
Click Export Results → select your S3 authentication
-
Configure:
| Field | Recommended Value |
| Bucket | your-bucket-name |
| Path | td-exports/daily/export_${date}.csv.gz |
| Format | csv or jsonl |
| Compression | gz |
| Include header | Yes |
| Null string | empty string |
| Part Size | 10 MB (default) |
- (Optional) Set a schedule:
@daily,@hourly, or custom cron
Option B: CLI (Best for Lambda Integration)
td query \
--result 's3://ACCESS_KEY:SECRET_KEY@/bucket-name/td-exports/output.csv.gz?compression=gz' \
-w -d your_database \
"SELECT * FROM your_table WHERE time > 1234567890"
With SSE enabled:
td query \
--result 's3://ACCESS_KEY:SECRET_KEY@/bucket-name/path/file.csv?use_sse=true&sse_algorithm=AES256' \
-w -d your_database \
"SELECT * FROM your_table"
Access key and secret key must be URL-encoded.
Option C: Workflow (Digdag)
timezone: UTC
_export:
td:
database: your_database
+export-to-s3:
td>: queries/export_query.sql
result_connection: your_connection_name
result_settings:
bucket: your-bucket-name
path: /td-exports/daily/data_${moment(session_time).format("YYYYMMDD")}.csv.gz
compression: 'gz'
header: true
newline: \r\n
Workflow examples: Treasure Boxes — S3 Export
Step 5: Integrating with Your Python/Lambda Stack
Since your infra uses Python Lambda functions, here's how to fit TD exports into your pipeline:
Approach 1: TD Python Client → Lambda Trigger
import pytd
import os
# TD client setup
client = pytd.Client(
apikey=os.environ['TD_API_KEY'],
endpoint='https://api.treasuredata.com'
)
# Run export query
client.query(
'your_database',
'SELECT * FROM your_table',
result_url=f"s3://{os.environ['AWS_ACCESS_KEY']}:{os.environ['AWS_SECRET_KEY']}@/your-bucket/exports/data.csv.gz?compression=gz"
)
Approach 2: Lambda Listens for S3 Landing
-
TD scheduled export writes to
s3://bucket/td-exports/ -
S3 Event Notification triggers your Lambda on
PutObject -
Lambda processes the landed file (transform, load to RDS, etc.)
# Lambda handler triggered by S3 event
def handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
# Process the TD export file
# e.g., read CSV, transform, load elsewhere
Approach 3: Step Functions Orchestration
TD Scheduled Export → S3 Landing → S3 Event → Lambda (Transform) → Target (RDS/DynamoDB/Athena)
Export Format Options Quick Reference
| Option | Values | Default |
format | csv, tsv, jsonl | csv |
compression | gz, none | none |
delimiter | ,, \t, | | , |
header | true, false | true |
null | empty, \N, NULL, null | empty |
newline | CRLF, LF, CR | CRLF |
quote | " or custom | " |
Scheduling (Cron Reference)
| Cron | Schedule |
0 * * * * | Every hour |
0 0 * * * | Daily at midnight |
0 0 1 * * | First day of each month |
0 */6 * * * | Every 6 hours |
30 2 * * * | Daily at 2:30 AM |
Useful Links
| Resource | URL |
| S3 Export Integration Docs | docs.treasuredata.com/int/amazon-s3-export-integration-v1 |
| TD Toolbelt (CLI) | toolbelt.treasuredata.com |
| TD Python Client (pytd) | github.com/treasure-data/pytd |
| Workflow Examples | Treasure Boxes S3 Export |
| Workflow Secrets | Secret Management Docs |
| TD Static IPs (for whitelisting) | IP Addresses Doc |
| AWS S3 Endpoints by Region | AWS Docs |
| AWS IAM Best Practices | AWS IAM Docs |
| S3 Server-Side Encryption | AWS SSE Docs |