Athena Dynamo DB Connector
Connect DynamoDB tables to Redash/Athena with automatic schema detection — one command deploys everything.
Features
-
Automatic Schema Inference — Scans your DynamoDB table and automatically detects all columns and types
-
One-Command Deployment — Single CLI command creates Lambda connector, Athena catalog, and Glue table
-
Query with SQL — Immediately query DynamoDB tables using standard SQL in Redash or Athena Console
-
Multiple Tables Support — Add unlimited tables to the same Glue database
-
Clean Architecture — Each table gets its own Lambda connector with proper IAM isolation
Quick Start
cd tool/redash-dynamodb-connector
# Add a DynamoDB table
pnpm exec tsx src/index.ts add \
--table pascal-alert-history \
--glue-database unit-dev-analytics-database
The tool will:
-
✅ Validate table exists
-
✅ Scan and infer schema (automatically detects all columns and types)
-
✅ Create Lambda connector function
-
✅ Create Athena Data Catalog
-
✅ Create Glue Table with detected schema
-
✅ Deploy infrastructure via CDK Your table is now queryable!
Prerequisites
-
AWS credentials configured (via
AWS_PROFILEor default credentials) -
Node.js and pnpm installed
-
CDK CLI available (
npx cdkwill be used automatically)
Usage
Basic Command
pnpm exec tsx src/index.ts add --table <table-name> --glue-database <database-name>
Command Options
| Option | Description | Required | Default |
--table | DynamoDB table name | Yes | - |
--glue-database | Existing Glue database name | No | Creates new database |
--catalog-name | Custom Athena catalog name | No | Sanitized table name |
--column-mapping | Map mixed-case columns | No | - |
--product | Product name (unit/spring/plants) | No | unit or $PRODUCT_NAME |
--env | Environment (dev/stg/prod) | No | dev or $ENV_NAME |
--region | AWS region | No | ap-northeast-1 |
Examples
Add Table with Automatic Schema Detection
pnpm exec tsx src/index.ts add \
--table pascal-alert-history \
--glue-database unit-dev-analytics-database
Output:
✓ Table found
Inferring table schema...
✓ Inferred 12 columns from 10 sample items
- alert_id: string
- code: string
- created_at: string
- level: int
- status: string
...
Deploying connector...
✓ Deployment complete!
Add Multiple Tables to Same Database
# Table 1
pnpm exec tsx src/index.ts add \
--table pascal-alert-history \
--glue-database unit-dev-analytics-database
# Table 2
pnpm exec tsx src/index.ts add \
--table device-metrics \
--glue-database unit-dev-analytics-database
# Table 3
pnpm exec tsx src/index.ts add \
--table user-events \
--glue-database unit-dev-analytics-database
All tables share the same Glue database but have separate Lambda connectors.
With Column Mapping (for mixed-case columns)
pnpm exec tsx src/index.ts add \
--table MyDynamoDBTable \
--column-mapping "deviceid=DeviceId,timestamp=TimeStamp" \
--glue-database unit-dev-analytics-database
What Gets Created
Per Table
Lambda Function: {product}-{env}-dynamodb-{table_name}
-
Example:
unit-dev-dynamodb-pascal_alert_history -
Handles federated queries from Athena to DynamoDB
-
Automatically provisioned with correct IAM permissions
Athena Data Catalog: Same name as Lambda function
-
Points to the Lambda connector
-
Enables SQL queries via Athena
Glue Table: {table_name} in the specified database
-
Contains automatically inferred column schema
-
Metadata for Athena to understand table structure
Shared Resources (created once per stack)
S3 Spill Bucket: {product}-{env}-dynamodb-spill
-
Used by Lambda when query results exceed memory
-
Auto-cleanup after 1 day
S3 Results Bucket: {product}-{env}-athena-results
-
Stores Athena query results
-
Auto-cleanup after 30 days
Athena Workgroup: {product}-{env}-analytics
-
Configured for query execution
-
CloudWatch metrics enabled
Naming Conventions
The tool follows consistent naming patterns:
| Resource | Pattern | Example |
| Lambda Function | {product}-{env}-dynamodb-{table} | unit-dev-dynamodb-pascal_alert_history |
| Athena Catalog | Same as Lambda | unit-dev-dynamodb-pascal_alert_history |
| Glue Table | {table} (sanitized) | pascal_alert_history |
| Spill Bucket | {product}-{env}-dynamodb-spill | unit-dev-dynamodb-spill |
| Results Bucket | {product}-{env}-athena-results | unit-dev-athena-results |
| Workgroup | {product}-{env}-analytics | unit-dev-analytics |
Note: Table names are converted to lowercase with hyphens replaced by underscores to meet AWS naming requirements.
Querying in Redash
1. Create Redash Data Source
Configure an Athena data source in Redash:
| Setting | Value |
| Name | dynamodb-connector-dev |
| Type | Amazon Athena |
| AWS Region | ap-northeast-1 |
| AWS Access Key | Your IAM user access key |
| AWS Secret Key | Your IAM user secret key |
| S3 Staging Path | s3://unit-dev-athena-results/ |
| Schema Name | unit-dev-analytics-database |
| Athena Work Group | unit-dev-analytics |
Additional Settings (expand section):
-
✅ Check "Use Glue Data Catalog"
-
Glue Data Catalog IDs:
unit-dev-dynamodb-pascal_alert_history,unit-dev-dynamodb-device_metrics(comma-separated list of all your catalog names)
2. Query Your Data
Use the three-part naming format:
SELECT *
FROM "unit-dev-dynamodb-pascal_alert_history"."unit-dev-analytics-database"."pascal_alert_history"
LIMIT 10;
Format: "catalog_name"."database_name"."table_name"
Query Examples
Filter by status:
SELECT alert_id, code, status, created_at, level
FROM "unit-dev-dynamodb-pascal_alert_history"."unit-dev-analytics-database"."pascal_alert_history"
WHERE status = 'active'
ORDER BY created_at DESC
LIMIT 100;
Aggregate data:
SELECT
code,
COUNT(*) as alert_count,
AVG(duration_seconds) as avg_duration
FROM "unit-dev-dynamodb-pascal_alert_history"."unit-dev-analytics-database"."pascal_alert_history"
WHERE level > 1
GROUP BY code
ORDER BY alert_count DESC;
Join with other tables:
SELECT
a.alert_id,
a.code,
d.device_name,
a.created_at
FROM "unit-dev-dynamodb-pascal_alert_history"."unit-dev-analytics-database"."pascal_alert_history" a
JOIN "unit-dev-dynamodb-devices"."unit-dev-analytics-database"."devices" d
ON a.device_id = d.device_id
WHERE a.status = 'active'
LIMIT 50;
3. Adding New Tables
When you add a new table:
Step 1: Deploy with the tool:
pnpm exec tsx src/index.ts add --table new-table --glue-database unit-dev-analytics-database
Step 2: Update Redash data source:
-
Go to data source settings
-
Add new catalog ID to Glue Data Catalog IDs field:
unit-dev-dynamodb-pascal_alert_history,unit-dev-dynamodb-device_metrics,unit-dev-dynamodb-new_table
**Step 3:** Query immediately:
```sql
SELECT * FROM "unit-dev-dynamodb-new_table"."unit-dev-analytics-database"."new_table" LIMIT 10;
IAM Permissions Required
For Deployment (Developer/Admin)
The AWS credentials running the deployment need:
-
cloudformation:*on the stack -
lambda:*for creating connectors -
athena:*for creating catalogs and workgroups -
glue:*for creating databases and tables -
s3:*for creating buckets -
iam:CreateRole,iam:AttachRolePolicyfor Lambda execution roles -
serverlessrepo:CreateCloudFormationTemplatefor deploying connector from SAR
For Querying (Redash IAM User)
Create a customer-managed IAM policy with these permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject",
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:AbortMultipartUpload"
],
"Resource": [
"arn:aws:s3:::unit-dev-dynamodb-spill/*",
"arn:aws:s3:::unit-dev-athena-results",
"arn:aws:s3:::unit-dev-athena-results/*"
]
},
{
"Effect": "Allow",
"Action": [
"athena:StartQueryExecution",
"athena:GetQueryExecution",
"athena:GetQueryResults",
"athena:StopQueryExecution",
"athena:GetWorkGroup",
"athena:GetDataCatalog",
"athena:GetDatabase",
"athena:GetTableMetadata",
"athena:ListDataCatalogs",
"athena:ListDatabases",
"athena:ListTableMetadata"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"glue:GetDatabase",
"glue:GetTable",
"glue:GetTables",
"glue:GetPartition",
"glue:GetPartitions",
"glue:BatchGetPartition"
],
"Resource": [
"arn:aws:glue:ap-northeast-1:404232320784:catalog",
"arn:aws:glue:ap-northeast-1:404232320784:database/unit-dev-analytics-database",
"arn:aws:glue:ap-northeast-1:404232320784:table/unit-dev-analytics-database/*"
]
},
{
"Effect": "Allow",
"Action": ["lambda:InvokeFunction"],
"Resource": [
"arn:aws:lambda:ap-northeast-1:404232320784:function:unit-dev-dynamodb-*"
]
},
{
"Effect": "Allow",
"Action": [
"dynamodb:List*",
"dynamodb:DescribeStream",
"dynamodb:DescribeTable",
"dynamodb:Get*",
"dynamodb:Query",
"dynamodb:Scan"
],
"Resource": "arn:aws:dynamodb:ap-northeast-1:404232320784:table/*"
}
]
}
Note: The wildcard patterns (
unit-dev-dynamodb-*,table/*) allow querying all current and future tables without modifying IAM permissions each time.
Steps to Add IAM Permissions
-
Go to IAM Console → Policies
-
Click Create Policy → JSON tab
-
Paste the policy above (adjust region/account ID as needed)
-
Name it:
RedashDynamoDBConnectorAccess -
Create the policy
-
Go to your Redash IAM user → Permissions → Attach policies
-
Search for
RedashDynamoDBConnectorAccessand attach it
Automatic Schema Inference
The tool automatically detects your DynamoDB table schema by:
-
Scanning Sample Items — Reads 10 items from your table (configurable)
-
Detecting Types — Maps JavaScript types to Athena/Glue types:
string→stringnumber(integer) →intnumber(decimal) →doubleboolean→booleanarray→array\<type>object→string(stored as JSON)
-
Creating Schema — Generates Glue table with detected columns
-
Handling Mixed Types — If a column has multiple types across items, defaults to
string
Example output:
Inferring table schema...
Scanning 10 items to infer schema...
✓ Inferred 12 columns from 10 sample items
- alert_data: string
- alert_id: string
- clear_reason: string
- cleared_at: string
- code: string
- created_at: string
- created_month: string
- device_id: string
- duration_seconds: int
- level: int
- severity: string
- status: string
Limitations
-
Empty Tables — If table has no items, schema inference will warn but deployment continues with empty schema
-
Nested Objects — Complex nested structures are represented as JSON strings
-
Type Consistency — Schema is based on sampled items; if your data has inconsistent types, some queries may fail
Architecture
┌─────────────────┐
│ Redash Query │
└────────┬────────┘
│ SQL Query
▼
┌─────────────────────────────┐
│ Amazon Athena │
│ (unit-dev-analytics) │
└────────┬────────────────────┘
│
▼
┌─────────────────────────────┐
│ Athena Data Catalog │
│ (Lambda Connector) │
└────────┬────────────────────┘
│ Invoke
▼
┌─────────────────────────────┐
│ Lambda Function │
│ (AthenaDynamoDBConnector) │
└────────┬────────────────────┘
│ Read
▼
┌─────────────────────────────┐
│ DynamoDB Table │
│ (Your Data) │
└─────────────────────────────┘
│
▼
┌─────────────────────────────┐
│ S3 Spill Bucket │
│ (Large Result Sets) │
└─────────────────────────────┘
│
▼
┌─────────────────────────────┐
│ S3 Results Bucket │
│ (Query Results) │
└─────────────────────────────┘
Data Flow:
-
Redash sends SQL query to Athena
-
Athena looks up table metadata in Glue Catalog
-
Athena invokes Lambda connector for the specific catalog
-
Lambda reads data from DynamoDB
-
Lambda returns results (uses S3 spill bucket if data is large)
-
Athena writes final results to S3 results bucket
-
Redash retrieves and displays results
Troubleshooting
Connection Test Failed: Access Denied to S3
Problem: IAM user cannot write to results bucket
Solution: Add S3 permissions to IAM policy (see IAM Permissions section)
Insufficient Permissions to Execute Query
Problem: Missing Athena/Lambda/Glue/DynamoDB permissions
Solution: Attach the full RedashDynamoDBConnectorAccess policy to IAM user
SCHEMA_NOT_FOUND Error
Problem: Querying without specifying the catalog name
Solution: Use the fully-qualified table name: "catalog"."database"."table"
COLUMN_NOT_FOUND Error
Problem: Glue table has no column definitions (should not happen with automatic schema inference)
Solution: Re-deploy the table — the tool will re-infer the schema
Table Not Found in Redash Schema Browser
Problem: Catalog ID not added to Redash data source
Solution: Add catalog ID to "Glue Data Catalog IDs" field in data source settings
Query Takes Too Long / Times Out
Problem: Large DynamoDB table with full scan
Solutions:
-
Add indexes to DynamoDB table for common query patterns
-
Use
LIMITclause in queries -
Add
WHEREconditions to reduce scan size -
Consider increasing Lambda memory in the tool configuration
Advanced Configuration
Custom Connector Version
Edit src/types.ts to change the AWS SAR connector version:
connectorVersion: z.string().default('2022.47.1'); // Change this version
Custom Lambda Memory/Timeout
Edit src/types.ts:
lambdaMemory: z.number().min(512).max(10240).default(3008); // Change memory in MB
lambdaTimeout: z.number().min(60).max(900).default(900); // Change timeout in seconds
Custom Lifecycle Policies
Edit src/types.ts:
spillBucketLifecycleDays: z.number().min(1).default(1); // Days before spill data is deleted
resultsBucketLifecycleDays: z.number().min(1).default(30); // Days before query results are deleted
Development
Project Structure
redash-dynamodb-connector/
├── src/
│ ├── index.ts # CLI entry point
│ ├── types.ts # Type definitions and schemas
│ ├── aws/
│ │ ├── credentials.ts # AWS credential provider
│ │ └── dynamodb.ts # DynamoDB operations & schema inference
│ ├── utils/
│ │ └── shared-arg.ts # Shared argument utilities
│ └── cdk/
│ ├── app.ts # CDK app factory
│ ├── stacks/
│ │ └── connector-stack.ts # Main stack
│ └── constructs/
│ └── athena-connector.ts # Connector construct
├── package.json
├── cdk.json # CDK configuration
├── tsconfig.json
└── README.md
Running Locally
# Install dependencies
pnpm install
# Run linter
pnpm lint
# Type check
pnpm tsc
# Deploy a table
pnpm exec tsx src/index.ts add --table my-table --glue-database my-db
Testing
The tool validates inputs but doesn't include automated tests yet. Testing is done through actual deployments.
Cost Considerations
Per Query Costs
-
Athena: $5 per TB scanned
-
Lambda: $0.20 per 1M requests + compute time
-
DynamoDB: Read capacity units consumed
-
S3: Storage + requests (minimal)
Monthly Costs (Estimate)
For 1000 queries/month on a 1GB table:
-
Athena: ~$0.005 (1GB × 1000 queries = 1TB × $5)
-
Lambda: ~$0.20
-
DynamoDB: Depends on your table's RCU configuration
-
S3: < $1
Total: ~$1-5/month for typical usage
Cost Optimization Tips
-
Use
LIMITclauses to reduce data scanned -
Query specific columns instead of
SELECT * -
Add DynamoDB indexes for frequently queried patterns
-
Set appropriate lifecycle policies on S3 buckets
-
Use Athena workgroups to track and limit costs
Limitations
-
No Write Operations — Tool only supports read queries (SELECT)
-
Scan Performance — Full table scans can be slow on large tables
-
Type Inference — Based on sampled data; may not capture all edge cases
-
No Updates — Cannot modify existing connector configurations; must delete and recreate
-
Lambda Concurrency — Limited by AWS Lambda concurrency limits
-
Result Size — Large result sets require S3 spill (automatically handled)
FAQ
Q: Can I query multiple DynamoDB tables in a single query? A: Yes! Use JOIN clauses with the fully-qualified table name for each table.
Q: Do I need to recreate the connector when DynamoDB schema changes? A: Yes, re-run the add command to re-infer schema and update the Glue table.
Q: Can I use this with DynamoDB Global Tables? A: Yes, just point to the table in your region. Each region would need its own connector.
Q: How do I delete a connector?
A: Delete the CloudFormation stack: aws cloudformation delete-stack --stack-name unit-dev-dynamodb-connector
Q: Can I query DynamoDB streams? A: No, this tool only queries the table itself, not streams.
Q: Does this work with DynamoDB DAX? A: No, the Lambda connector queries DynamoDB directly, not through DAX.
Q: Can I use this with LocalStack or DynamoDB Local? A: Not directly, as it relies on AWS Serverless Application Repository for the connector.
Resources
License
ISC
Contributing
Issues and pull requests welcome! Please ensure code passes linting before submitting.
Support
For issues or questions:
-
Check the Troubleshooting section above
-
Review AWS CloudFormation stack events for deployment errors
-
Check Lambda function logs in CloudWatch
-
Open an issue in this repository