Skip to main content
Current1mo ago

Athena Dynamo DB Connector

Connect DynamoDB tables to Redash/Athena with automatic schema detection — one command deploys everything.


Features

  • Automatic Schema Inference — Scans your DynamoDB table and automatically detects all columns and types

  • One-Command Deployment — Single CLI command creates Lambda connector, Athena catalog, and Glue table

  • Query with SQL — Immediately query DynamoDB tables using standard SQL in Redash or Athena Console

  • Multiple Tables Support — Add unlimited tables to the same Glue database

  • Clean Architecture — Each table gets its own Lambda connector with proper IAM isolation


Quick Start

cd tool/redash-dynamodb-connector

# Add a DynamoDB table
pnpm exec tsx src/index.ts add \
--table pascal-alert-history \
--glue-database unit-dev-analytics-database

The tool will:

  1. ✅ Validate table exists

  2. ✅ Scan and infer schema (automatically detects all columns and types)

  3. ✅ Create Lambda connector function

  4. ✅ Create Athena Data Catalog

  5. ✅ Create Glue Table with detected schema

  6. ✅ Deploy infrastructure via CDK Your table is now queryable!


Prerequisites

  • AWS credentials configured (via AWS_PROFILE or default credentials)

  • Node.js and pnpm installed

  • CDK CLI available (npx cdk will be used automatically)


Usage

Basic Command

pnpm exec tsx src/index.ts add --table <table-name> --glue-database <database-name>

Command Options

OptionDescriptionRequiredDefault
--tableDynamoDB table nameYes-
--glue-databaseExisting Glue database nameNoCreates new database
--catalog-nameCustom Athena catalog nameNoSanitized table name
--column-mappingMap mixed-case columnsNo-
--productProduct name (unit/spring/plants)Nounit or $PRODUCT_NAME
--envEnvironment (dev/stg/prod)Nodev or $ENV_NAME
--regionAWS regionNoap-northeast-1

Examples

Add Table with Automatic Schema Detection

pnpm exec tsx src/index.ts add \
--table pascal-alert-history \
--glue-database unit-dev-analytics-database

Output:

✓ Table found
Inferring table schema...
✓ Inferred 12 columns from 10 sample items
- alert_id: string
- code: string
- created_at: string
- level: int
- status: string
...
Deploying connector...
✓ Deployment complete!

Add Multiple Tables to Same Database


# Table 1
pnpm exec tsx src/index.ts add \
--table pascal-alert-history \
--glue-database unit-dev-analytics-database

# Table 2
pnpm exec tsx src/index.ts add \
--table device-metrics \
--glue-database unit-dev-analytics-database

# Table 3
pnpm exec tsx src/index.ts add \
--table user-events \
--glue-database unit-dev-analytics-database

All tables share the same Glue database but have separate Lambda connectors.

With Column Mapping (for mixed-case columns)

pnpm exec tsx src/index.ts add \
--table MyDynamoDBTable \
--column-mapping "deviceid=DeviceId,timestamp=TimeStamp" \
--glue-database unit-dev-analytics-database


What Gets Created

Per Table

Lambda Function: {product}-{env}-dynamodb-{table_name}

  • Example: unit-dev-dynamodb-pascal_alert_history

  • Handles federated queries from Athena to DynamoDB

  • Automatically provisioned with correct IAM permissions

Athena Data Catalog: Same name as Lambda function

  • Points to the Lambda connector

  • Enables SQL queries via Athena

Glue Table: {table_name} in the specified database

  • Contains automatically inferred column schema

  • Metadata for Athena to understand table structure

Shared Resources (created once per stack)

S3 Spill Bucket: {product}-{env}-dynamodb-spill

  • Used by Lambda when query results exceed memory

  • Auto-cleanup after 1 day

S3 Results Bucket: {product}-{env}-athena-results

  • Stores Athena query results

  • Auto-cleanup after 30 days

Athena Workgroup: {product}-{env}-analytics

  • Configured for query execution

  • CloudWatch metrics enabled


Naming Conventions

The tool follows consistent naming patterns:

ResourcePatternExample
Lambda Function{product}-{env}-dynamodb-{table}unit-dev-dynamodb-pascal_alert_history
Athena CatalogSame as Lambdaunit-dev-dynamodb-pascal_alert_history
Glue Table{table} (sanitized)pascal_alert_history
Spill Bucket{product}-{env}-dynamodb-spillunit-dev-dynamodb-spill
Results Bucket{product}-{env}-athena-resultsunit-dev-athena-results
Workgroup{product}-{env}-analyticsunit-dev-analytics

Note: Table names are converted to lowercase with hyphens replaced by underscores to meet AWS naming requirements.


Querying in Redash

1. Create Redash Data Source

Configure an Athena data source in Redash:

SettingValue
Namedynamodb-connector-dev
TypeAmazon Athena
AWS Regionap-northeast-1
AWS Access KeyYour IAM user access key
AWS Secret KeyYour IAM user secret key
S3 Staging Paths3://unit-dev-athena-results/
Schema Nameunit-dev-analytics-database
Athena Work Groupunit-dev-analytics

Additional Settings (expand section):

  • ✅ Check "Use Glue Data Catalog"

  • Glue Data Catalog IDs: unit-dev-dynamodb-pascal_alert_history,unit-dev-dynamodb-device_metrics (comma-separated list of all your catalog names)

2. Query Your Data

Use the three-part naming format:

SELECT *
FROM "unit-dev-dynamodb-pascal_alert_history"."unit-dev-analytics-database"."pascal_alert_history"
LIMIT 10;

Format: "catalog_name"."database_name"."table_name"

Query Examples

Filter by status:

SELECT alert_id, code, status, created_at, level
FROM "unit-dev-dynamodb-pascal_alert_history"."unit-dev-analytics-database"."pascal_alert_history"
WHERE status = 'active'
ORDER BY created_at DESC
LIMIT 100;

Aggregate data:

SELECT
code,
COUNT(*) as alert_count,
AVG(duration_seconds) as avg_duration
FROM "unit-dev-dynamodb-pascal_alert_history"."unit-dev-analytics-database"."pascal_alert_history"
WHERE level > 1
GROUP BY code
ORDER BY alert_count DESC;

Join with other tables:

SELECT
a.alert_id,
a.code,
d.device_name,
a.created_at
FROM "unit-dev-dynamodb-pascal_alert_history"."unit-dev-analytics-database"."pascal_alert_history" a
JOIN "unit-dev-dynamodb-devices"."unit-dev-analytics-database"."devices" d
ON a.device_id = d.device_id
WHERE a.status = 'active'
LIMIT 50;

3. Adding New Tables

When you add a new table:

Step 1: Deploy with the tool:

pnpm exec tsx src/index.ts add --table new-table --glue-database unit-dev-analytics-database

Step 2: Update Redash data source:

  • Go to data source settings

  • Add new catalog ID to Glue Data Catalog IDs field:

unit-dev-dynamodb-pascal_alert_history,unit-dev-dynamodb-device_metrics,unit-dev-dynamodb-new_table


**Step 3:** Query immediately:

```sql
SELECT * FROM "unit-dev-dynamodb-new_table"."unit-dev-analytics-database"."new_table" LIMIT 10;


IAM Permissions Required

For Deployment (Developer/Admin)

The AWS credentials running the deployment need:

  • cloudformation:* on the stack

  • lambda:* for creating connectors

  • athena:* for creating catalogs and workgroups

  • glue:* for creating databases and tables

  • s3:* for creating buckets

  • iam:CreateRole, iam:AttachRolePolicy for Lambda execution roles

  • serverlessrepo:CreateCloudFormationTemplate for deploying connector from SAR

For Querying (Redash IAM User)

Create a customer-managed IAM policy with these permissions:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject",
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:AbortMultipartUpload"
],
"Resource": [
"arn:aws:s3:::unit-dev-dynamodb-spill/*",
"arn:aws:s3:::unit-dev-athena-results",
"arn:aws:s3:::unit-dev-athena-results/*"
]
},
{
"Effect": "Allow",
"Action": [
"athena:StartQueryExecution",
"athena:GetQueryExecution",
"athena:GetQueryResults",
"athena:StopQueryExecution",
"athena:GetWorkGroup",
"athena:GetDataCatalog",
"athena:GetDatabase",
"athena:GetTableMetadata",
"athena:ListDataCatalogs",
"athena:ListDatabases",
"athena:ListTableMetadata"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"glue:GetDatabase",
"glue:GetTable",
"glue:GetTables",
"glue:GetPartition",
"glue:GetPartitions",
"glue:BatchGetPartition"
],
"Resource": [
"arn:aws:glue:ap-northeast-1:404232320784:catalog",
"arn:aws:glue:ap-northeast-1:404232320784:database/unit-dev-analytics-database",
"arn:aws:glue:ap-northeast-1:404232320784:table/unit-dev-analytics-database/*"
]
},
{
"Effect": "Allow",
"Action": ["lambda:InvokeFunction"],
"Resource": [
"arn:aws:lambda:ap-northeast-1:404232320784:function:unit-dev-dynamodb-*"
]
},
{
"Effect": "Allow",
"Action": [
"dynamodb:List*",
"dynamodb:DescribeStream",
"dynamodb:DescribeTable",
"dynamodb:Get*",
"dynamodb:Query",
"dynamodb:Scan"
],
"Resource": "arn:aws:dynamodb:ap-northeast-1:404232320784:table/*"
}
]
}

Note: The wildcard patterns (unit-dev-dynamodb-*, table/*) allow querying all current and future tables without modifying IAM permissions each time.

Steps to Add IAM Permissions

  1. Go to IAM ConsolePolicies

  2. Click Create PolicyJSON tab

  3. Paste the policy above (adjust region/account ID as needed)

  4. Name it: RedashDynamoDBConnectorAccess

  5. Create the policy

  6. Go to your Redash IAM user → PermissionsAttach policies

  7. Search for RedashDynamoDBConnectorAccess and attach it


Automatic Schema Inference

The tool automatically detects your DynamoDB table schema by:

  1. Scanning Sample Items — Reads 10 items from your table (configurable)

  2. Detecting Types — Maps JavaScript types to Athena/Glue types:

    • stringstring
    • number (integer) → int
    • number (decimal) → double
    • booleanboolean
    • arrayarray\<type>
    • objectstring (stored as JSON)
  3. Creating Schema — Generates Glue table with detected columns

  4. Handling Mixed Types — If a column has multiple types across items, defaults to string

Example output:

Inferring table schema...
Scanning 10 items to infer schema...
✓ Inferred 12 columns from 10 sample items
- alert_data: string
- alert_id: string
- clear_reason: string
- cleared_at: string
- code: string
- created_at: string
- created_month: string
- device_id: string
- duration_seconds: int
- level: int
- severity: string
- status: string

Limitations

  • Empty Tables — If table has no items, schema inference will warn but deployment continues with empty schema

  • Nested Objects — Complex nested structures are represented as JSON strings

  • Type Consistency — Schema is based on sampled items; if your data has inconsistent types, some queries may fail


Architecture

┌─────────────────┐
│ Redash Query │
└────────┬────────┘
│ SQL Query

┌─────────────────────────────┐
│ Amazon Athena │
│ (unit-dev-analytics) │
└────────┬────────────────────┘


┌─────────────────────────────┐
│ Athena Data Catalog │
│ (Lambda Connector) │
└────────┬────────────────────┘
│ Invoke

┌─────────────────────────────┐
│ Lambda Function │
│ (AthenaDynamoDBConnector) │
└────────┬────────────────────┘
│ Read

┌─────────────────────────────┐
│ DynamoDB Table │
│ (Your Data) │
└─────────────────────────────┘


┌─────────────────────────────┐
│ S3 Spill Bucket │
│ (Large Result Sets) │
└─────────────────────────────┘


┌─────────────────────────────┐
│ S3 Results Bucket │
│ (Query Results) │
└─────────────────────────────┘

Data Flow:

  1. Redash sends SQL query to Athena

  2. Athena looks up table metadata in Glue Catalog

  3. Athena invokes Lambda connector for the specific catalog

  4. Lambda reads data from DynamoDB

  5. Lambda returns results (uses S3 spill bucket if data is large)

  6. Athena writes final results to S3 results bucket

  7. Redash retrieves and displays results


Troubleshooting

Connection Test Failed: Access Denied to S3

Problem: IAM user cannot write to results bucket

Solution: Add S3 permissions to IAM policy (see IAM Permissions section)

Insufficient Permissions to Execute Query

Problem: Missing Athena/Lambda/Glue/DynamoDB permissions

Solution: Attach the full RedashDynamoDBConnectorAccess policy to IAM user

SCHEMA_NOT_FOUND Error

Problem: Querying without specifying the catalog name

Solution: Use the fully-qualified table name: "catalog"."database"."table"

COLUMN_NOT_FOUND Error

Problem: Glue table has no column definitions (should not happen with automatic schema inference)

Solution: Re-deploy the table — the tool will re-infer the schema

Table Not Found in Redash Schema Browser

Problem: Catalog ID not added to Redash data source

Solution: Add catalog ID to "Glue Data Catalog IDs" field in data source settings

Query Takes Too Long / Times Out

Problem: Large DynamoDB table with full scan

Solutions:

  • Add indexes to DynamoDB table for common query patterns

  • Use LIMIT clause in queries

  • Add WHERE conditions to reduce scan size

  • Consider increasing Lambda memory in the tool configuration


Advanced Configuration

Custom Connector Version

Edit src/types.ts to change the AWS SAR connector version:

connectorVersion: z.string().default('2022.47.1'); // Change this version

Custom Lambda Memory/Timeout

Edit src/types.ts:

lambdaMemory: z.number().min(512).max(10240).default(3008); // Change memory in MB

lambdaTimeout: z.number().min(60).max(900).default(900); // Change timeout in seconds

Custom Lifecycle Policies

Edit src/types.ts:

spillBucketLifecycleDays: z.number().min(1).default(1); // Days before spill data is deleted

resultsBucketLifecycleDays: z.number().min(1).default(30); // Days before query results are deleted


Development

Project Structure

redash-dynamodb-connector/
├── src/
│ ├── index.ts # CLI entry point
│ ├── types.ts # Type definitions and schemas
│ ├── aws/
│ │ ├── credentials.ts # AWS credential provider
│ │ └── dynamodb.ts # DynamoDB operations & schema inference
│ ├── utils/
│ │ └── shared-arg.ts # Shared argument utilities
│ └── cdk/
│ ├── app.ts # CDK app factory
│ ├── stacks/
│ │ └── connector-stack.ts # Main stack
│ └── constructs/
│ └── athena-connector.ts # Connector construct
├── package.json
├── cdk.json # CDK configuration
├── tsconfig.json
└── README.md

Running Locally


# Install dependencies
pnpm install

# Run linter
pnpm lint

# Type check
pnpm tsc

# Deploy a table
pnpm exec tsx src/index.ts add --table my-table --glue-database my-db

Testing

The tool validates inputs but doesn't include automated tests yet. Testing is done through actual deployments.


Cost Considerations

Per Query Costs

  • Athena: $5 per TB scanned

  • Lambda: $0.20 per 1M requests + compute time

  • DynamoDB: Read capacity units consumed

  • S3: Storage + requests (minimal)

Monthly Costs (Estimate)

For 1000 queries/month on a 1GB table:

  • Athena: ~$0.005 (1GB × 1000 queries = 1TB × $5)

  • Lambda: ~$0.20

  • DynamoDB: Depends on your table's RCU configuration

  • S3: < $1

Total: ~$1-5/month for typical usage

Cost Optimization Tips

  • Use LIMIT clauses to reduce data scanned

  • Query specific columns instead of SELECT *

  • Add DynamoDB indexes for frequently queried patterns

  • Set appropriate lifecycle policies on S3 buckets

  • Use Athena workgroups to track and limit costs


Limitations

  1. No Write Operations — Tool only supports read queries (SELECT)

  2. Scan Performance — Full table scans can be slow on large tables

  3. Type Inference — Based on sampled data; may not capture all edge cases

  4. No Updates — Cannot modify existing connector configurations; must delete and recreate

  5. Lambda Concurrency — Limited by AWS Lambda concurrency limits

  6. Result Size — Large result sets require S3 spill (automatically handled)


FAQ

Q: Can I query multiple DynamoDB tables in a single query? A: Yes! Use JOIN clauses with the fully-qualified table name for each table.

Q: Do I need to recreate the connector when DynamoDB schema changes? A: Yes, re-run the add command to re-infer schema and update the Glue table.

Q: Can I use this with DynamoDB Global Tables? A: Yes, just point to the table in your region. Each region would need its own connector.

Q: How do I delete a connector? A: Delete the CloudFormation stack: aws cloudformation delete-stack --stack-name unit-dev-dynamodb-connector

Q: Can I query DynamoDB streams? A: No, this tool only queries the table itself, not streams.

Q: Does this work with DynamoDB DAX? A: No, the Lambda connector queries DynamoDB directly, not through DAX.

Q: Can I use this with LocalStack or DynamoDB Local? A: Not directly, as it relies on AWS Serverless Application Repository for the connector.


Resources


License

ISC


Contributing

Issues and pull requests welcome! Please ensure code passes linting before submitting.


Support

For issues or questions:

  1. Check the Troubleshooting section above

  2. Review AWS CloudFormation stack events for deployment errors

  3. Check Lambda function logs in CloudWatch

  4. Open an issue in this repository

Related Articles