Deploying Grafana On AWS Using ECS
Practical Example: Pull the official Grafana image from Docker Hub, run it on ECS Fargate, and get a fully working monitoring dashboard accessible from the browser — with the complete networking, IAM, and service setup explained from scratch.
📌 What Are We Building?
┌─────────────────────────────────────────────────┐
│ AWS Cloud (your account) │
│ │
You (browser) │ ┌──────────────┐ │
http://\<IP>:3000 ──┼───►│ Security │ │
│ │ Group :3000 │ │
│ └──────┬───────┘ │
│ │ │
│ ┌──────▼───────────────────────────────┐ │
│ │ ECS Cluster (Fargate) │ │
│ │ │ │
│ │ ┌─────────────────────────────────┐ │ │
│ │ │ ECS Task (awsvpc mode) │ │ │
│ │ │ │ │ │
│ │ │ ┌───────────────────────────┐ │ │ │
│ │ │ │ grafana/grafana:latest │ │ │ │
│ │ │ │ (from Docker Hub) │ │ │ │
│ │ │ │ Port 3000 │ │ │ │
│ │ │ └───────────────────────────┘ │ │ │
│ │ │ │ │ │
│ │ │ Task Execution Role │ │ │
│ │ │ → pulls image │ │ │
│ │ │ → writes CloudWatch logs │ │ │
│ │ └─────────────────────────────────┘ │ │
│ └──────────────────────────────────────┘ │
│ │
│ VPC → Public Subnet → Internet Gateway │
└─────────────────────────────────────────────────┘
Two approaches covered in this guide:
-
Path A — Pull a public image directly from Docker Hub (e.g.,
grafana/grafana) — simplest, no ECR needed -
Path B — Push your own image to ECR (private registry), then pull from ECR — for custom/private backends
🧱 Core ECS Concepts
Before diving into steps, here's what each piece does:
| Component | What It Is | Analogy |
| ECS Cluster | Logical grouping of tasks and services | A "workspace" for your containers |
| Task Definition | Blueprint for your container — image URI, CPU, memory, ports, env vars, IAM roles | A docker-compose.yml for AWS |
| Task | A running instance of a task definition | A running docker run command |
| Service | Manages desired count of tasks, restarts on failure, integrates with load balancers | A process supervisor |
| Fargate | Serverless compute engine — no EC2 instances to manage | AWS runs the container for you |
| Task Execution Role | IAM role that lets the ECS agent pull images and write logs | Permissions for the infrastructure |
| Task Role | IAM role that your application code uses to call AWS APIs | Permissions for your app |
📋 Prerequisites
-
An AWS account with admin or sufficient IAM permissions
-
AWS CLI installed and configured (
aws configure) -
Docker installed locally (only needed for Path B — pushing to ECR)
-
A VPC with at least one public subnet and an Internet Gateway (default VPC works)
Path A — Pull a Public Docker Hub Image (Grafana)
This is the simplest approach. ECS Fargate can pull public images from Docker Hub directly — you just specify the image name in the task definition. No ECR repository needed.
Step 1 — Create a VPC (or Use the Default VPC)
Every AWS account comes with a default VPC in each region. For a quick setup, the default VPC works fine. If you want to create a custom VPC:
-
VPC Console → Create VPC
-
Choose VPC and more (wizard creates subnets, route table, IGW automatically)
-
Configure:
- Name:
grafana-vpc - IPv4 CIDR:
10.0.0.0/16 - Number of AZs:
2 - Public subnets:
2 - Private subnets:
0(for this simple setup) - NAT Gateway: None
- VPC Endpoints: None
- Name:
What must be true for Fargate tasks to pull images from the internet:
-
The task runs in a public subnet
-
The subnet's route table has a route
0.0.0.0/0 → Internet Gateway -
Auto-assign public IP is set to
ENABLEDwhen launching the task -
OR the task runs in a private subnet with a route to a NAT Gateway
Step 2 — Create a Security Group
The security group controls what traffic can reach your Grafana container.
-
EC2 Console → Security Groups → Create security group
-
Configure:
| Field | Value |
| Name | grafana-sg |
| Description | Allow Grafana UI access |
| VPC | Select your VPC |
- Inbound Rules:
| Type | Protocol | Port | Source | Description |
| Custom TCP | TCP | 3000 | 0.0.0.0/0 | Grafana web UI |
- Outbound Rules: Leave default (all traffic allowed outbound — required for image pulls and internet access)
AWS CLI:
# Create the security group
SG_ID=$(aws ec2 create-security-group \
--group-name grafana-sg \
--description "Allow Grafana UI on port 3000" \
--vpc-id vpc-0abc123 \
--query 'GroupId' --output text)
# Add inbound rule for port 3000
aws ec2 authorize-security-group-ingress \
--group-id $SG_ID \
--protocol tcp \
--port 3000 \
--cidr 0.0.0.0/0
Step 3 — Create the ECS Cluster
-
ECS Console → Clusters → Create cluster
-
Configure:
- Cluster name:
grafana-cluster - Infrastructure: Select AWS Fargate (default)
- Cluster name:
-
Click Create That's it. The cluster is just a logical container — no servers are provisioned.
AWS CLI:
aws ecs create-cluster --cluster-name grafana-cluster
Step 4 — Create the Task Execution IAM Role
The Task Execution Role allows the ECS agent (not your application) to pull images and write logs. If you've used ECS before, you likely already have ecsTaskExecutionRole.
Check if it exists:
aws iam get-role --role-name ecsTaskExecutionRole
If it doesn't exist, create it:
-
IAM Console → Roles → Create role
-
Trusted entity: AWS service → Elastic Container Service → Use case: Elastic Container Service Task
-
Attach the managed policy:
AmazonECSTaskExecutionRolePolicy -
Role name:
ecsTaskExecutionRoleThis policy grants:
-
ecr:GetAuthorizationToken— authenticate to ECR -
ecr:BatchGetImage,ecr:GetDownloadUrlForLayer— pull images from ECR -
logs:CreateLogStream,logs:PutLogEvents— write container logs to CloudWatch
AWS CLI:
# Create the trust policy document
cat > trust-policy.json << 'EOF'
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": { "Service": "ecs-tasks.amazonaws.com" },
"Action": "sts:AssumeRole"
}]
}
EOF
# Create the role
aws iam create-role \
--role-name ecsTaskExecutionRole \
--assume-role-policy-document file://trust-policy.json
# Attach the managed policy
aws iam attach-role-policy \
--role-name ecsTaskExecutionRole \
--policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy
Step 5 — Create the Task Definition
This is the heart of the setup — the blueprint that tells ECS which image to run, how much CPU/memory to allocate, which ports to expose, and what roles to use.
-
ECS Console → Task definitions → Create new task definition
-
Configure:
| Setting | Value |
| Task definition family | grafana-task |
| Launch type | AWS Fargate |
| OS/Architecture | Linux/X86_64 |
| Task size — CPU | 0.5 vCPU (512) |
| Task size — Memory | 1 GB (1024) |
| Task execution role | ecsTaskExecutionRole |
| Task role | None (Grafana doesn't need to call AWS APIs) |
- Container definition:
| Setting | Value |
| Container name | grafana |
| Image URI | grafana/grafana:latest |
| Essential | Yes |
| Port mappings | Container port: 3000, Protocol: TCP |
- (Optional) Environment variables:
| Key | Value | Purpose |
GF_SECURITY_ADMIN_USER | admin | Default admin username |
GF_SECURITY_ADMIN_PASSWORD | YourStrongPassword123! | Override default password |
- (Optional) Logging — CloudWatch:
| Setting | Value |
| Log driver | awslogs |
| Log group | /ecs/grafana |
| Region | Your region (e.g., ap-northeast-1) |
| Stream prefix | grafana |
- Click Create
Equivalent JSON task definition:
{
"family": "grafana-task",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "512",
"memory": "1024",
"executionRoleArn": "arn:aws:iam::YOUR_ACCOUNT_ID:role/ecsTaskExecutionRole",
"containerDefinitions": [
{
"name": "grafana",
"image": "grafana/grafana:latest",
"essential": true,
"portMappings": [
{
"containerPort": 3000,
"protocol": "tcp"
}
],
"environment": [
{ "name": "GF_SECURITY_ADMIN_USER", "value": "admin" },
{ "name": "GF_SECURITY_ADMIN_PASSWORD", "value": "YourStrongPassword123!" }
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/grafana",
"awslogs-region": "ap-northeast-1",
"awslogs-stream-prefix": "grafana",
"awslogs-create-group": "true"
}
}
}
]
}
Register via CLI:
aws ecs register-task-definition --cli-input-json file://grafana-task-def.json
Key point: The
imagefield is justgrafana/grafana:latest— a Docker Hub public image. ECS pulls it directly from Docker Hub. No ECR repository needed for public images.
Step 6 — Create the ECS Service (or Run a Standalone Task)
Option A — Run as a Service (Recommended)
A service ensures Grafana stays running. If the task crashes, ECS automatically starts a new one.
-
ECS Console → Clusters →
grafana-cluster→ Services → Create -
Configure:
| Setting | Value |
| Launch type | Fargate |
| Task definition | grafana-task (latest revision) |
| Service name | grafana-service |
| Desired tasks | 1 |
- Networking:
| Setting | Value |
| VPC | Your VPC |
| Subnets | Select your public subnet(s) |
| Security group | grafana-sg (the one allowing port 3000) |
| Public IP | Enabled (this is critical!) |
- Click Create
AWS CLI:
aws ecs create-service \
--cluster grafana-cluster \
--service-name grafana-service \
--task-definition grafana-task \
--desired-count 1 \
--launch-type FARGATE \
--network-configuration '{
"awsvpcConfiguration": {
"subnets": ["subnet-0abc123"],
"securityGroups": ["sg-0def456"],
"assignPublicIp": "ENABLED"
}
}'
Option B — Run a Standalone Task (Quick Test)
aws ecs run-task \
--cluster grafana-cluster \
--task-definition grafana-task \
--launch-type FARGATE \
--network-configuration '{
"awsvpcConfiguration": {
"subnets": ["subnet-0abc123"],
"securityGroups": ["sg-0def456"],
"assignPublicIp": "ENABLED"
}
}'
Step 7 — Access Grafana
-
Go to ECS Console → Clusters →
grafana-cluster→ Tasks tab -
Click on the running task ID
-
Under Configuration, find the Public IP (e.g.,
3.112.45.67) -
Open your browser and navigate to:
http://3.112.45.67:3000 -
Login with:
- Username:
admin - Password:
admin(or whatever you set viaGF_SECURITY_ADMIN_PASSWORD)
- Username:
-
Grafana will prompt you to change the password on first login You now have Grafana running on ECS Fargate, pulled directly from Docker Hub.
Path B — Push Your Own Image to ECR, Then Deploy on ECS
Use this path when you have a custom backend (your own Dockerfile) or want to use a private registry. The workflow is: Build locally → Push to ECR → ECS pulls from ECR.
Step B1 — Create an ECR Repository
-
ECR Console → Repositories → Create repository
-
Configure:
- Visibility: Private
- Repository name:
my-backend
AWS CLI:
aws ecr create-repository \
--repository-name my-backend \
--region ap-northeast-1
This gives you a repository URI like:123456789012.dkr.ecr.ap-northeast-1.amazonaws.com/my-backend
Step B2 — Build, Tag, and Push Your Docker Image
# 1. Authenticate Docker to ECR
aws ecr get-login-password --region ap-northeast-1 | \
docker login --username AWS --password-stdin \
123456789012.dkr.ecr.ap-northeast-1.amazonaws.com
# 2. Build your Docker image
docker build -t my-backend .
# 3. Tag the image for ECR
docker tag my-backend:latest \
123456789012.dkr.ecr.ap-northeast-1.amazonaws.com/my-backend:latest
# 4. Push to ECR
docker push \
123456789012.dkr.ecr.ap-northeast-1.amazonaws.com/my-backend:latest
Verify in the ECR Console that the image appears under your repository.
Step B3 — Create the Task Definition (Using ECR Image)
The only difference from Path A is the image field — it now points to your ECR URI instead of Docker Hub:
{
"family": "my-backend-task",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "256",
"memory": "512",
"executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
"containerDefinitions": [
{
"name": "my-backend",
"image": "123456789012.dkr.ecr.ap-northeast-1.amazonaws.com/my-backend:latest",
"essential": true,
"portMappings": [
{
"containerPort": 8080,
"protocol": "tcp"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/my-backend",
"awslogs-region": "ap-northeast-1",
"awslogs-stream-prefix": "backend",
"awslogs-create-group": "true"
}
}
}
]
}
The ecsTaskExecutionRole already has permissions to pull from ECR (via the AmazonECSTaskExecutionRolePolicy). No extra configuration needed.
Then create the cluster, service, security group — same as Steps 1–7 from Path A.
🔄 Updating Your Image (CI/CD Cycle)
When you push a new image to ECR (or Docker Hub updates), ECS doesn't automatically pick it up for already running tasks. To deploy the new image:
# Force a new deployment — ECS will pull the latest image
aws ecs update-service \
--cluster grafana-cluster \
--service grafana-service \
--force-new-deployment
This triggers a rolling update: ECS starts new tasks with the latest image, then drains the old tasks.
🔒 Production Hardening
The setup above is great for learning and testing. For production, make these improvements:
1. Move Tasks to Private Subnets + Add ALB
Instead of exposing the task's public IP directly:
Internet → ALB (public subnet) → ECS Task (private subnet) → NAT GW (outbound only)
This way Grafana is only reachable through the load balancer, and the container itself has no public IP.
2. Use an Application Load Balancer (ALB)
-
Create an ALB in your public subnets
-
Create a target group (type:
ip, port 3000, health check path/api/health) -
Attach the target group to your ECS service
-
Configure an HTTPS listener (443) with an ACM certificate
-
Redirect HTTP (80) → HTTPS (443) Now you access Grafana via
https://grafana.yourdomain.cominstead of a raw IP.
3. Use Secrets Manager for Passwords
Never hardcode GF_SECURITY_ADMIN_PASSWORD in the task definition. Instead:
# Store the secret
aws secretsmanager create-secret \
--name grafana/admin-password \
--secret-string "YourStrongPassword123!"
Reference it in the task definition:
"secrets": [
{
"name": "GF_SECURITY_ADMIN_PASSWORD",
"valueFrom": "arn:aws:secretsmanager:ap-northeast-1:123456789012:secret:grafana/admin-password-AbCdEf"
}
]
The task execution role needs additional permissions:
{
"Effect": "Allow",
"Action": ["secretsmanager:GetSecretValue"],
"Resource": "arn:aws:secretsmanager:ap-northeast-1:123456789012:secret:grafana/*"
}
4. Persistent Storage with EFS
By default, Grafana data (dashboards, data sources) is lost when the task restarts. Mount an EFS volume:
"volumes": [{
"name": "grafana-data",
"efsVolumeConfiguration": {
"fileSystemId": "fs-0abc123",
"rootDirectory": "/grafana"
}
}]
And in the container definition:
"mountPoints": [{
"sourceVolume": "grafana-data",
"containerPath": "/var/lib/grafana"
}]
5. Auto Scaling
# Register scalable target
aws application-autoscaling register-scalable-target \
--service-namespace ecs \
--resource-id service/grafana-cluster/grafana-service \
--scalable-dimension ecs:service:DesiredCount \
--min-capacity 1 \
--max-capacity 3
# Scale based on CPU
aws application-autoscaling put-scaling-policy \
--service-namespace ecs \
--resource-id service/grafana-cluster/grafana-service \
--scalable-dimension ecs:service:DesiredCount \
--policy-name cpu-scaling \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration '{
"TargetValue": 70.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ECSServiceAverageCPUUtilization"
}
}'
🔍 Troubleshooting
| Problem | Likely Cause | Fix |
Task stuck in PROVISIONING | Subnet can't reach the internet | Verify public subnet has IGW route and public IP is enabled |
CannotPullContainerError | No internet access OR wrong image name | Check subnet route table, verify image name is exactly right (case-sensitive) |
CannotPullContainerError for ECR image | Task execution role missing ECR permissions | Attach AmazonECSTaskExecutionRolePolicy to ecsTaskExecutionRole |
| Task starts then immediately stops | Container crashes | Check CloudWatch logs at /ecs/grafana |
| Can't access Grafana in browser | Security group doesn't allow port 3000 | Add inbound TCP rule for port 3000 |
ResourceInitializationError | Missing execution role or permissions | Verify executionRoleArn is set in task definition |
toomanyrequests from Docker Hub | Docker Hub rate limit hit | Use ECR Public image instead: public.ecr.aws/grafana/grafana:latest |
| Task running but no public IP | Public IP not assigned | Set assignPublicIp: ENABLED in network configuration |
| ECS service keeps restarting tasks | Health check failing | Check container health, ensure the app starts within the health check grace period |
Checking Container Logs
# Find the task ID
aws ecs list-tasks --cluster grafana-cluster --service-name grafana-service
# Get task details (including public IP)
aws ecs describe-tasks --cluster grafana-cluster --tasks <TASK_ID>
# View logs in CloudWatch
aws logs tail /ecs/grafana --follow
📊 Docker Hub vs ECR — When to Use Which
| Scenario | Use Docker Hub Directly | Use ECR (Private) |
| Public/official images (Grafana, Nginx, Redis) | ✅ Simplest — just use the image name | ❌ Unnecessary overhead |
| Custom application images | ❌ Requires Docker Hub account + push | ✅ Integrated with AWS IAM, no rate limits |
| Production workloads | ⚠️ Subject to Docker Hub rate limits | ✅ No pull rate limits, faster pulls within AWS |
| Air-gapped / private environments | ❌ Requires internet access | ✅ Works with VPC endpoints (no internet needed) |
| CI/CD pipelines | ⚠️ Needs Docker Hub credentials | ✅ aws ecr get-login-password works natively |
Pro tip: For public images in production, use ECR Public instead of Docker Hub to avoid rate limiting: public.ecr.aws/grafana/grafana:latest
🏗️ Terraform Example (Complete)
# ─── Cluster ───
resource "aws_ecs_cluster" "grafana" {
name = "grafana-cluster"
}
# ─── CloudWatch Log Group ───
resource "aws_cloudwatch_log_group" "grafana" {
name = "/ecs/grafana"
retention_in_days = 7
}
# ─── Task Definition ───
resource "aws_ecs_task_definition" "grafana" {
family = "grafana-task"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = "512"
memory = "1024"
execution_role_arn = aws_iam_role.ecs_task_execution.arn
container_definitions = jsonencode([{
name = "grafana"
image = "grafana/grafana:latest"
essential = true
portMappings = [{
containerPort = 3000
protocol = "tcp"
}]
environment = [
{ name = "GF_SECURITY_ADMIN_USER", value = "admin" }
]
logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-group" = aws_cloudwatch_log_group.grafana.name
"awslogs-region" = var.region
"awslogs-stream-prefix" = "grafana"
}
}
}])
}
# ─── Security Group ───
resource "aws_security_group" "grafana" {
name = "grafana-sg"
description = "Allow Grafana UI access"
vpc_id = var.vpc_id
ingress {
from_port = 3000
to_port = 3000
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
# ─── ECS Service ───
resource "aws_ecs_service" "grafana" {
name = "grafana-service"
cluster = aws_ecs_cluster.grafana.id
task_definition = aws_ecs_task_definition.grafana.arn
desired_count = 1
launch_type = "FARGATE"
network_configuration {
subnets = var.public_subnet_ids
security_groups = [aws_security_group.grafana.id]
assign_public_ip = true
}
}
# ─── IAM: Task Execution Role ───
resource "aws_iam_role" "ecs_task_execution" {
name = "ecsTaskExecutionRole"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = { Service = "ecs-tasks.amazonaws.com" }
}]
})
}
resource "aws_iam_role_policy_attachment" "ecs_task_execution" {
role = aws_iam_role.ecs_task_execution.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}
📚 References
Last updated: March 2026