Athena Dynamo DB Connector

Connect DynamoDB tables to Redash/Athena with automatic schema detection — one command deploys everything.

Features

Automatic Schema Inference — Scans your DynamoDB table and automatically detects all columns and types
One-Command Deployment — Single CLI command creates Lambda connector, Athena catalog, and Glue table
Query with SQL — Immediately query DynamoDB tables using standard SQL in Redash or Athena Console
Multiple Tables Support — Add unlimited tables to the same Glue database
Clean Architecture — Each table gets its own Lambda connector with proper IAM isolation

Quick Start

cd tool/redash-dynamodb-connector

# Add a DynamoDB table
pnpm exec tsx src/index.ts add \
  --table pascal-alert-history \
  --glue-database unit-dev-analytics-database

The tool will:

✅ Validate table exists
✅ Scan and infer schema (automatically detects all columns and types)
✅ Create Lambda connector function
✅ Create Athena Data Catalog
✅ Create Glue Table with detected schema
✅ Deploy infrastructure via CDK Your table is now queryable!

Prerequisites

AWS credentials configured (via AWS_PROFILE or default credentials)
Node.js and pnpm installed
CDK CLI available (npx cdk will be used automatically)

Usage

Basic Command

pnpm exec tsx src/index.ts add --table <table-name> --glue-database <database-name>

Command Options

Option	Description	Required	Default
`--table`	DynamoDB table name	Yes	-
`--glue-database`	Existing Glue database name	No	Creates new database
`--catalog-name`	Custom Athena catalog name	No	Sanitized table name
`--column-mapping`	Map mixed-case columns	No	-
`--product`	Product name (unit/spring/plants)	No	`unit` or `$PRODUCT_NAME`
`--env`	Environment (dev/stg/prod)	No	`dev` or `$ENV_NAME`
`--region`	AWS region	No	`ap-northeast-1`

Examples

Add Table with Automatic Schema Detection

pnpm exec tsx src/index.ts add \
  --table pascal-alert-history \
  --glue-database unit-dev-analytics-database

Output:

✓ Table found
Inferring table schema...
✓ Inferred 12 columns from 10 sample items
  - alert_id: string
  - code: string
  - created_at: string
  - level: int
  - status: string
  ...
Deploying connector...
✓ Deployment complete!

Add Multiple Tables to Same Database

# Table 1
pnpm exec tsx src/index.ts add \
  --table pascal-alert-history \
  --glue-database unit-dev-analytics-database

# Table 2
pnpm exec tsx src/index.ts add \
  --table device-metrics \
  --glue-database unit-dev-analytics-database

# Table 3
pnpm exec tsx src/index.ts add \
  --table user-events \
  --glue-database unit-dev-analytics-database

All tables share the same Glue database but have separate Lambda connectors.

With Column Mapping (for mixed-case columns)

pnpm exec tsx src/index.ts add \
  --table MyDynamoDBTable \
  --column-mapping "deviceid=DeviceId,timestamp=TimeStamp" \
  --glue-database unit-dev-analytics-database

What Gets Created

Per Table

Lambda Function: {product}-{env}-dynamodb-{table_name}

Example: unit-dev-dynamodb-pascal_alert_history
Handles federated queries from Athena to DynamoDB
Automatically provisioned with correct IAM permissions

Athena Data Catalog: Same name as Lambda function

Points to the Lambda connector
Enables SQL queries via Athena

Glue Table: {table_name} in the specified database

Contains automatically inferred column schema
Metadata for Athena to understand table structure

Shared Resources (created once per stack)

S3 Spill Bucket: {product}-{env}-dynamodb-spill

Used by Lambda when query results exceed memory
Auto-cleanup after 1 day

S3 Results Bucket: {product}-{env}-athena-results

Stores Athena query results
Auto-cleanup after 30 days

Athena Workgroup: {product}-{env}-analytics

Configured for query execution
CloudWatch metrics enabled

Naming Conventions

The tool follows consistent naming patterns:

Resource	Pattern	Example
Lambda Function	`{product}-{env}-dynamodb-{table}`	`unit-dev-dynamodb-pascal_alert_history`
Athena Catalog	Same as Lambda	`unit-dev-dynamodb-pascal_alert_history`
Glue Table	`{table}` (sanitized)	`pascal_alert_history`
Spill Bucket	`{product}-{env}-dynamodb-spill`	`unit-dev-dynamodb-spill`
Results Bucket	`{product}-{env}-athena-results`	`unit-dev-athena-results`
Workgroup	`{product}-{env}-analytics`	`unit-dev-analytics`

Note: Table names are converted to lowercase with hyphens replaced by underscores to meet AWS naming requirements.

Querying in Redash

1. Create Redash Data Source

Configure an Athena data source in Redash:

Setting	Value
Name	`dynamodb-connector-dev`
Type	Amazon Athena
AWS Region	`ap-northeast-1`
AWS Access Key	Your IAM user access key
AWS Secret Key	Your IAM user secret key
S3 Staging Path	`s3://unit-dev-athena-results/`
Schema Name	`unit-dev-analytics-database`
Athena Work Group	`unit-dev-analytics`

Additional Settings (expand section):

✅ Check "Use Glue Data Catalog"
Glue Data Catalog IDs: unit-dev-dynamodb-pascal_alert_history,unit-dev-dynamodb-device_metrics (comma-separated list of all your catalog names)

2. Query Your Data

Use the three-part naming format:

SELECT *
FROM "unit-dev-dynamodb-pascal_alert_history"."unit-dev-analytics-database"."pascal_alert_history"
LIMIT 10;

Format: "catalog_name"."database_name"."table_name"

Query Examples

Filter by status:

SELECT alert_id, code, status, created_at, level
FROM "unit-dev-dynamodb-pascal_alert_history"."unit-dev-analytics-database"."pascal_alert_history"
WHERE status = 'active'
ORDER BY created_at DESC
LIMIT 100;

Aggregate data:

SELECT
  code,
  COUNT(*) as alert_count,
  AVG(duration_seconds) as avg_duration
FROM "unit-dev-dynamodb-pascal_alert_history"."unit-dev-analytics-database"."pascal_alert_history"
WHERE level > 1
GROUP BY code
ORDER BY alert_count DESC;

Join with other tables:

SELECT
  a.alert_id,
  a.code,
  d.device_name,
  a.created_at
FROM "unit-dev-dynamodb-pascal_alert_history"."unit-dev-analytics-database"."pascal_alert_history" a
JOIN "unit-dev-dynamodb-devices"."unit-dev-analytics-database"."devices" d
  ON a.device_id = d.device_id
WHERE a.status = 'active'
LIMIT 50;

3. Adding New Tables

When you add a new table:

Step 1: Deploy with the tool:

pnpm exec tsx src/index.ts add --table new-table --glue-database unit-dev-analytics-database

Step 2: Update Redash data source:

Go to data source settings
Add new catalog ID to Glue Data Catalog IDs field:

unit-dev-dynamodb-pascal_alert_history,unit-dev-dynamodb-device_metrics,unit-dev-dynamodb-new_table

**Step 3:** Query immediately:

```sql
SELECT * FROM "unit-dev-dynamodb-new_table"."unit-dev-analytics-database"."new_table" LIMIT 10;

IAM Permissions Required

For Deployment (Developer/Admin)

The AWS credentials running the deployment need:

cloudformation:* on the stack
lambda:* for creating connectors
athena:* for creating catalogs and workgroups
glue:* for creating databases and tables
s3:* for creating buckets
iam:CreateRole, iam:AttachRolePolicy for Lambda execution roles
serverlessrepo:CreateCloudFormationTemplate for deploying connector from SAR

For Querying (Redash IAM User)

Create a customer-managed IAM policy with these permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:DeleteObject",
        "s3:ListBucket",
        "s3:GetBucketLocation",
        "s3:AbortMultipartUpload"
      ],
      "Resource": [
        "arn:aws:s3:::unit-dev-dynamodb-spill/*",
        "arn:aws:s3:::unit-dev-athena-results",
        "arn:aws:s3:::unit-dev-athena-results/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "athena:StartQueryExecution",
        "athena:GetQueryExecution",
        "athena:GetQueryResults",
        "athena:StopQueryExecution",
        "athena:GetWorkGroup",
        "athena:GetDataCatalog",
        "athena:GetDatabase",
        "athena:GetTableMetadata",
        "athena:ListDataCatalogs",
        "athena:ListDatabases",
        "athena:ListTableMetadata"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "glue:GetDatabase",
        "glue:GetTable",
        "glue:GetTables",
        "glue:GetPartition",
        "glue:GetPartitions",
        "glue:BatchGetPartition"
      ],
      "Resource": [
        "arn:aws:glue:ap-northeast-1:404232320784:catalog",
        "arn:aws:glue:ap-northeast-1:404232320784:database/unit-dev-analytics-database",
        "arn:aws:glue:ap-northeast-1:404232320784:table/unit-dev-analytics-database/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": ["lambda:InvokeFunction"],
      "Resource": [
        "arn:aws:lambda:ap-northeast-1:404232320784:function:unit-dev-dynamodb-*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:List*",
        "dynamodb:DescribeStream",
        "dynamodb:DescribeTable",
        "dynamodb:Get*",
        "dynamodb:Query",
        "dynamodb:Scan"
      ],
      "Resource": "arn:aws:dynamodb:ap-northeast-1:404232320784:table/*"
    }
  ]
}

Note: The wildcard patterns (unit-dev-dynamodb-*, table/*) allow querying all current and future tables without modifying IAM permissions each time.

Steps to Add IAM Permissions

Go to IAM Console → Policies
Click Create Policy → JSON tab
Paste the policy above (adjust region/account ID as needed)
Name it: RedashDynamoDBConnectorAccess
Create the policy
Go to your Redash IAM user → Permissions → Attach policies
Search for RedashDynamoDBConnectorAccess and attach it

Automatic Schema Inference

The tool automatically detects your DynamoDB table schema by:

Scanning Sample Items — Reads 10 items from your table (configurable)
Detecting Types — Maps JavaScript types to Athena/Glue types:
- string → string
- number (integer) → int
- number (decimal) → double
- boolean → boolean
- array → array\<type>
- object → string (stored as JSON)
Creating Schema — Generates Glue table with detected columns
Handling Mixed Types — If a column has multiple types across items, defaults to string

Example output:

Inferring table schema...
Scanning 10 items to infer schema...
✓ Inferred 12 columns from 10 sample items
  - alert_data: string
  - alert_id: string
  - clear_reason: string
  - cleared_at: string
  - code: string
  - created_at: string
  - created_month: string
  - device_id: string
  - duration_seconds: int
  - level: int
  - severity: string
  - status: string

Limitations

Empty Tables — If table has no items, schema inference will warn but deployment continues with empty schema
Nested Objects — Complex nested structures are represented as JSON strings
Type Consistency — Schema is based on sampled items; if your data has inconsistent types, some queries may fail

Architecture

┌─────────────────┐
│  Redash Query   │
└────────┬────────┘
         │ SQL Query
         ▼
┌─────────────────────────────┐
│  Amazon Athena              │
│  (unit-dev-analytics)       │
└────────┬────────────────────┘
         │
         ▼
┌─────────────────────────────┐
│  Athena Data Catalog        │
│  (Lambda Connector)         │
└────────┬────────────────────┘
         │ Invoke
         ▼
┌─────────────────────────────┐
│  Lambda Function            │
│  (AthenaDynamoDBConnector)  │
└────────┬────────────────────┘
         │ Read
         ▼
┌─────────────────────────────┐
│  DynamoDB Table             │
│  (Your Data)                │
└─────────────────────────────┘
         │
         ▼
┌─────────────────────────────┐
│  S3 Spill Bucket            │
│  (Large Result Sets)        │
└─────────────────────────────┘
         │
         ▼
┌─────────────────────────────┐
│  S3 Results Bucket          │
│  (Query Results)            │
└─────────────────────────────┘

Data Flow:

Redash sends SQL query to Athena
Athena looks up table metadata in Glue Catalog
Athena invokes Lambda connector for the specific catalog
Lambda reads data from DynamoDB
Lambda returns results (uses S3 spill bucket if data is large)
Athena writes final results to S3 results bucket
Redash retrieves and displays results

Troubleshooting

Connection Test Failed: Access Denied to S3

Problem: IAM user cannot write to results bucket

Solution: Add S3 permissions to IAM policy (see IAM Permissions section)

Insufficient Permissions to Execute Query

Problem: Missing Athena/Lambda/Glue/DynamoDB permissions

Solution: Attach the full RedashDynamoDBConnectorAccess policy to IAM user

SCHEMA_NOT_FOUND Error

Problem: Querying without specifying the catalog name

Solution: Use the fully-qualified table name: "catalog"."database"."table"

COLUMN_NOT_FOUND Error

Problem: Glue table has no column definitions (should not happen with automatic schema inference)

Solution: Re-deploy the table — the tool will re-infer the schema

Table Not Found in Redash Schema Browser

Problem: Catalog ID not added to Redash data source

Solution: Add catalog ID to "Glue Data Catalog IDs" field in data source settings

Query Takes Too Long / Times Out

Problem: Large DynamoDB table with full scan

Solutions:

Add indexes to DynamoDB table for common query patterns
Use LIMIT clause in queries
Add WHERE conditions to reduce scan size
Consider increasing Lambda memory in the tool configuration

Advanced Configuration

Custom Connector Version

Edit src/types.ts to change the AWS SAR connector version:

connectorVersion: z.string().default('2022.47.1'); // Change this version

Custom Lambda Memory/Timeout

Edit src/types.ts:

lambdaMemory: z.number().min(512).max(10240).default(3008); // Change memory in MB

lambdaTimeout: z.number().min(60).max(900).default(900); // Change timeout in seconds

Custom Lifecycle Policies

Edit src/types.ts:

spillBucketLifecycleDays: z.number().min(1).default(1); // Days before spill data is deleted

resultsBucketLifecycleDays: z.number().min(1).default(30); // Days before query results are deleted

Development

Project Structure

redash-dynamodb-connector/
├── src/
│   ├── index.ts              # CLI entry point
│   ├── types.ts              # Type definitions and schemas
│   ├── aws/
│   │   ├── credentials.ts    # AWS credential provider
│   │   └── dynamodb.ts       # DynamoDB operations & schema inference
│   ├── utils/
│   │   └── shared-arg.ts     # Shared argument utilities
│   └── cdk/
│       ├── app.ts            # CDK app factory
│       ├── stacks/
│       │   └── connector-stack.ts  # Main stack
│       └── constructs/
│           └── athena-connector.ts # Connector construct
├── package.json
├── cdk.json                  # CDK configuration
├── tsconfig.json
└── README.md

Running Locally

# Install dependencies
pnpm install

# Run linter
pnpm lint

# Type check
pnpm tsc

# Deploy a table
pnpm exec tsx src/index.ts add --table my-table --glue-database my-db

Testing

The tool validates inputs but doesn't include automated tests yet. Testing is done through actual deployments.

Cost Considerations

Per Query Costs

Athena: $5 per TB scanned
Lambda: $0.20 per 1M requests + compute time
DynamoDB: Read capacity units consumed
S3: Storage + requests (minimal)

Monthly Costs (Estimate)

For 1000 queries/month on a 1GB table:

Athena: ~$0.005 (1GB × 1000 queries = 1TB × $5)
Lambda: ~$0.20
DynamoDB: Depends on your table's RCU configuration
S3: < $1

Total: ~$1-5/month for typical usage

Cost Optimization Tips

Use LIMIT clauses to reduce data scanned
Query specific columns instead of SELECT *
Add DynamoDB indexes for frequently queried patterns
Set appropriate lifecycle policies on S3 buckets
Use Athena workgroups to track and limit costs

Limitations

No Write Operations — Tool only supports read queries (SELECT)
Scan Performance — Full table scans can be slow on large tables
Type Inference — Based on sampled data; may not capture all edge cases
No Updates — Cannot modify existing connector configurations; must delete and recreate
Lambda Concurrency — Limited by AWS Lambda concurrency limits
Result Size — Large result sets require S3 spill (automatically handled)

FAQ

Q: Can I query multiple DynamoDB tables in a single query? A: Yes! Use JOIN clauses with the fully-qualified table name for each table.

Q: Do I need to recreate the connector when DynamoDB schema changes? A: Yes, re-run the add command to re-infer schema and update the Glue table.

Q: Can I use this with DynamoDB Global Tables? A: Yes, just point to the table in your region. Each region would need its own connector.

Q: How do I delete a connector? A: Delete the CloudFormation stack: aws cloudformation delete-stack --stack-name unit-dev-dynamodb-connector

Q: Can I query DynamoDB streams? A: No, this tool only queries the table itself, not streams.

Q: Does this work with DynamoDB DAX? A: No, the Lambda connector queries DynamoDB directly, not through DAX.

Q: Can I use this with LocalStack or DynamoDB Local? A: Not directly, as it relies on AWS Serverless Application Repository for the connector.

Resources

License

ISC

Contributing

Issues and pull requests welcome! Please ensure code passes linting before submitting.

Support

For issues or questions:

Check the Troubleshooting section above
Review AWS CloudFormation stack events for deployment errors
Check Lambda function logs in CloudWatch
Open an issue in this repository

Features​

Quick Start​

Prerequisites​

Usage​

Basic Command​

Command Options​

Examples​

Add Table with Automatic Schema Detection​

Add Multiple Tables to Same Database​

With Column Mapping (for mixed-case columns)​

What Gets Created​

Per Table​

Shared Resources (created once per stack)​

Naming Conventions​

Querying in Redash​

1. Create Redash Data Source​

2. Query Your Data​

Query Examples​

3. Adding New Tables​

IAM Permissions Required​

For Deployment (Developer/Admin)​

For Querying (Redash IAM User)​

Steps to Add IAM Permissions​

Automatic Schema Inference​

Limitations​

Architecture​

Troubleshooting​

Connection Test Failed: Access Denied to S3​

Insufficient Permissions to Execute Query​

SCHEMA_NOT_FOUND Error​

COLUMN_NOT_FOUND Error​

Table Not Found in Redash Schema Browser​

Query Takes Too Long / Times Out​

Advanced Configuration​

Custom Connector Version​

Custom Lambda Memory/Timeout​

Custom Lifecycle Policies​

Development​

Project Structure​

Running Locally​

Testing​

Cost Considerations​

Per Query Costs​

Monthly Costs (Estimate)​

Cost Optimization Tips​

Limitations​

FAQ​

Resources​

License​

Contributing​

Support​

Related Articles

Features

Quick Start

Prerequisites

Usage

Basic Command

Command Options

Examples

Add Table with Automatic Schema Detection

Add Multiple Tables to Same Database

With Column Mapping (for mixed-case columns)

What Gets Created

Per Table

Shared Resources (created once per stack)

Naming Conventions

Querying in Redash

1. Create Redash Data Source

2. Query Your Data

Query Examples

3. Adding New Tables

IAM Permissions Required

For Deployment (Developer/Admin)

For Querying (Redash IAM User)

Steps to Add IAM Permissions

Automatic Schema Inference

Limitations

Architecture

Troubleshooting

Connection Test Failed: Access Denied to S3

Insufficient Permissions to Execute Query

SCHEMA_NOT_FOUND Error

COLUMN_NOT_FOUND Error

Table Not Found in Redash Schema Browser

Query Takes Too Long / Times Out

Advanced Configuration

Custom Connector Version

Custom Lambda Memory/Timeout

Custom Lifecycle Policies

Development

Project Structure

Running Locally

Testing

Cost Considerations

Per Query Costs

Monthly Costs (Estimate)

Cost Optimization Tips

Limitations

FAQ

Resources

License

Contributing

Support