Skip to main content
Current1mo ago

Treasure Data → S3 Tables

Concise guide for exporting data from Treasure Data to S3 buckets, tailored for a Python/Lambda-based infrastructure.


Overview

Treasure Data's S3 Export Integration (V1) lets you write query results directly to an S3 bucket. Three ways to trigger it: TD Console, TD CLI, or Workflow (Digdag). Since your infra is Python-based with Lambda functions, the CLI/Workflow approach integrates cleanest.

Official Docs: Amazon S3 Export Integration V1


Prerequisites

  • AWS IAM User with only these permissions:

    • s3:PutObject
    • s3:AbortMultipartUpload
  • TD Toolbelt installed (for CLI approach)

  • Access Key and Secret Key (URL-encoded when used in CLI)


Limitations

  • Query result export cap: 100GB (split queries if exceeded)

  • Default format: CSV (RFC 4180)

  • Supported formats: CSV, TSV, JSONL

  • Compression: gz or none

  • If using S3 bucket policies that reject unencrypted requests, enable use_sse=true


Step 1: IAM Setup (AWS Side)

Create a dedicated IAM user for TD exports. Minimal policy:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:AbortMultipartUpload"
],
"Resource": "arn:aws:s3:::your-bucket-name/*"
}
]
}

If your security policy requires IP whitelisting, add TD's static IPs to your bucket policy:
TD Static IP Addresses


Step 2: S3 Bucket Prep

  1. Create the target bucket (or use an existing one)

  2. If SSE is required, enable AES-256 server-side encryption on the bucket

  3. Set up a folder structure for exports, e.g.:

s3://your-bucket/
└── td-exports/
└── daily/
└── adhoc/


Step 3: Create Authentication in TD Console

  1. Integrations HubCatalog → search AWS S3

  2. Create Authentication with:

ParameterValue
Endpoints3-ap-northeast-1.amazonaws.com (match your bucket region)
Auth Methodbasic (use session for imports only)
Access Key IDYour IAM user access key
Secret Access KeyYour IAM user secret key
  1. Name the connection → Done

Endpoint reference: AWS S3 Region Endpoints


Step 4: Configure Export

Option A: TD Console (Manual/Scheduled)

  1. Data WorkbenchQueries → select or create a query

  2. Run the query to validate results

  3. Click Export Results → select your S3 authentication

  4. Configure:

FieldRecommended Value
Bucketyour-bucket-name
Pathtd-exports/daily/export_${date}.csv.gz
Formatcsv or jsonl
Compressiongz
Include headerYes
Null stringempty string
Part Size10 MB (default)
  1. (Optional) Set a schedule: @daily, @hourly, or custom cron

Option B: CLI (Best for Lambda Integration)

td query \
--result 's3://ACCESS_KEY:SECRET_KEY@/bucket-name/td-exports/output.csv.gz?compression=gz' \
-w -d your_database \
"SELECT * FROM your_table WHERE time > 1234567890"

With SSE enabled:

td query \
--result 's3://ACCESS_KEY:SECRET_KEY@/bucket-name/path/file.csv?use_sse=true&sse_algorithm=AES256' \
-w -d your_database \
"SELECT * FROM your_table"

Access key and secret key must be URL-encoded.

Option C: Workflow (Digdag)

timezone: UTC
_export:
td:
database: your_database

+export-to-s3:
td>: queries/export_query.sql
result_connection: your_connection_name
result_settings:
bucket: your-bucket-name
path: /td-exports/daily/data_${moment(session_time).format("YYYYMMDD")}.csv.gz
compression: 'gz'
header: true
newline: \r\n

Workflow examples: Treasure Boxes — S3 Export


Step 5: Integrating with Your Python/Lambda Stack

Since your infra uses Python Lambda functions, here's how to fit TD exports into your pipeline:

Approach 1: TD Python Client → Lambda Trigger

import pytd
import os

# TD client setup
client = pytd.Client(
apikey=os.environ['TD_API_KEY'],
endpoint='https://api.treasuredata.com'
)

# Run export query
client.query(
'your_database',
'SELECT * FROM your_table',
result_url=f"s3://{os.environ['AWS_ACCESS_KEY']}:{os.environ['AWS_SECRET_KEY']}@/your-bucket/exports/data.csv.gz?compression=gz"
)

Approach 2: Lambda Listens for S3 Landing

  1. TD scheduled export writes to s3://bucket/td-exports/

  2. S3 Event Notification triggers your Lambda on PutObject

  3. Lambda processes the landed file (transform, load to RDS, etc.)


# Lambda handler triggered by S3 event
def handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']

# Process the TD export file
# e.g., read CSV, transform, load elsewhere

Approach 3: Step Functions Orchestration

TD Scheduled Export → S3 Landing → S3 Event → Lambda (Transform) → Target (RDS/DynamoDB/Athena)


Export Format Options Quick Reference

OptionValuesDefault
formatcsv, tsv, jsonlcsv
compressiongz, nonenone
delimiter,, \t, |,
headertrue, falsetrue
nullempty, \N, NULL, nullempty
newlineCRLF, LF, CRCRLF
quote" or custom"

Scheduling (Cron Reference)

CronSchedule
0 * * * *Every hour
0 0 * * *Daily at midnight
0 0 1 * *First day of each month
0 */6 * * *Every 6 hours
30 2 * * *Daily at 2:30 AM

ResourceURL
S3 Export Integration Docsdocs.treasuredata.com/int/amazon-s3-export-integration-v1
TD Toolbelt (CLI)toolbelt.treasuredata.com
TD Python Client (pytd)github.com/treasure-data/pytd
Workflow ExamplesTreasure Boxes S3 Export
Workflow SecretsSecret Management Docs
TD Static IPs (for whitelisting)IP Addresses Doc
AWS S3 Endpoints by RegionAWS Docs
AWS IAM Best PracticesAWS IAM Docs
S3 Server-Side EncryptionAWS SSE Docs