Deploying Grafana On AWS Using ECS

Practical Example: Pull the official Grafana image from Docker Hub, run it on ECS Fargate, and get a fully working monitoring dashboard accessible from the browser — with the complete networking, IAM, and service setup explained from scratch.

📌 What Are We Building?

                    ┌─────────────────────────────────────────────────┐
                    │               AWS Cloud (your account)          │
                    │                                                 │
 You (browser)      │    ┌──────────────┐                             │
 http://\<IP>:3000 ──┼───►│  Security     │                             │
                    │    │  Group :3000  │                             │
                    │    └──────┬───────┘                             │
                    │           │                                     │
                    │    ┌──────▼───────────────────────────────┐     │
                    │    │         ECS Cluster (Fargate)         │     │
                    │    │                                       │     │
                    │    │  ┌─────────────────────────────────┐ │     │
                    │    │  │  ECS Task (awsvpc mode)          │ │     │
                    │    │  │                                   │ │     │
                    │    │  │  ┌───────────────────────────┐   │ │     │
                    │    │  │  │  grafana/grafana:latest    │   │ │     │
                    │    │  │  │  (from Docker Hub)         │   │ │     │
                    │    │  │  │  Port 3000                 │   │ │     │
                    │    │  │  └───────────────────────────┘   │ │     │
                    │    │  │                                   │ │     │
                    │    │  │  Task Execution Role              │ │     │
                    │    │  │  → pulls image                    │ │     │
                    │    │  │  → writes CloudWatch logs         │ │     │
                    │    │  └─────────────────────────────────┘ │     │
                    │    └──────────────────────────────────────┘     │
                    │                                                 │
                    │    VPC → Public Subnet → Internet Gateway       │
                    └─────────────────────────────────────────────────┘

Two approaches covered in this guide:

Path A — Pull a public image directly from Docker Hub (e.g., grafana/grafana) — simplest, no ECR needed
Path B — Push your own image to ECR (private registry), then pull from ECR — for custom/private backends

🧱 Core ECS Concepts

Before diving into steps, here's what each piece does:

Component	What It Is	Analogy
ECS Cluster	Logical grouping of tasks and services	A "workspace" for your containers
Task Definition	Blueprint for your container — image URI, CPU, memory, ports, env vars, IAM roles	A `docker-compose.yml` for AWS
Task	A running instance of a task definition	A running `docker run` command
Service	Manages desired count of tasks, restarts on failure, integrates with load balancers	A process supervisor
Fargate	Serverless compute engine — no EC2 instances to manage	AWS runs the container for you
Task Execution Role	IAM role that lets the ECS agent pull images and write logs	Permissions for the infrastructure
Task Role	IAM role that your application code uses to call AWS APIs	Permissions for your app

📋 Prerequisites

An AWS account with admin or sufficient IAM permissions
AWS CLI installed and configured (aws configure)
Docker installed locally (only needed for Path B — pushing to ECR)
A VPC with at least one public subnet and an Internet Gateway (default VPC works)

Path A — Pull a Public Docker Hub Image (Grafana)

This is the simplest approach. ECS Fargate can pull public images from Docker Hub directly — you just specify the image name in the task definition. No ECR repository needed.

Step 1 — Create a VPC (or Use the Default VPC)

Every AWS account comes with a default VPC in each region. For a quick setup, the default VPC works fine. If you want to create a custom VPC:

VPC Console → Create VPC
Choose VPC and more (wizard creates subnets, route table, IGW automatically)
Configure:
- Name: grafana-vpc
- IPv4 CIDR: 10.0.0.0/16
- Number of AZs: 2
- Public subnets: 2
- Private subnets: 0 (for this simple setup)
- NAT Gateway: None
- VPC Endpoints: None

What must be true for Fargate tasks to pull images from the internet:

The task runs in a public subnet
The subnet's route table has a route 0.0.0.0/0 → Internet Gateway
Auto-assign public IP is set to ENABLED when launching the task
OR the task runs in a private subnet with a route to a NAT Gateway

Step 2 — Create a Security Group

The security group controls what traffic can reach your Grafana container.

EC2 Console → Security Groups → Create security group
Configure:

Field	Value
Name	`grafana-sg`
Description	Allow Grafana UI access
VPC	Select your VPC

Inbound Rules:

Type	Protocol	Port	Source	Description
Custom TCP	TCP	`3000`	`0.0.0.0/0`	Grafana web UI

Outbound Rules: Leave default (all traffic allowed outbound — required for image pulls and internet access)

AWS CLI:

# Create the security group
SG_ID=$(aws ec2 create-security-group \
  --group-name grafana-sg \
  --description "Allow Grafana UI on port 3000" \
  --vpc-id vpc-0abc123 \
  --query 'GroupId' --output text)

# Add inbound rule for port 3000
aws ec2 authorize-security-group-ingress \
  --group-id $SG_ID \
  --protocol tcp \
  --port 3000 \
  --cidr 0.0.0.0/0

Step 3 — Create the ECS Cluster

ECS Console → Clusters → Create cluster
Configure:
- Cluster name: grafana-cluster
- Infrastructure: Select AWS Fargate (default)
Click Create That's it. The cluster is just a logical container — no servers are provisioned.

AWS CLI:

aws ecs create-cluster --cluster-name grafana-cluster

Step 4 — Create the Task Execution IAM Role

The Task Execution Role allows the ECS agent (not your application) to pull images and write logs. If you've used ECS before, you likely already have ecsTaskExecutionRole.

Check if it exists:

aws iam get-role --role-name ecsTaskExecutionRole

If it doesn't exist, create it:

IAM Console → Roles → Create role
Trusted entity: AWS service → Elastic Container Service → Use case: Elastic Container Service Task
Attach the managed policy: AmazonECSTaskExecutionRolePolicy
Role name: ecsTaskExecutionRole This policy grants:

ecr:GetAuthorizationToken — authenticate to ECR
ecr:BatchGetImage, ecr:GetDownloadUrlForLayer — pull images from ECR
logs:CreateLogStream, logs:PutLogEvents — write container logs to CloudWatch

AWS CLI:

# Create the trust policy document
cat > trust-policy.json << 'EOF'
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": { "Service": "ecs-tasks.amazonaws.com" },
    "Action": "sts:AssumeRole"
  }]
}
EOF

# Create the role
aws iam create-role \
  --role-name ecsTaskExecutionRole \
  --assume-role-policy-document file://trust-policy.json

# Attach the managed policy
aws iam attach-role-policy \
  --role-name ecsTaskExecutionRole \
  --policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy

Step 5 — Create the Task Definition

This is the heart of the setup — the blueprint that tells ECS which image to run, how much CPU/memory to allocate, which ports to expose, and what roles to use.

ECS Console → Task definitions → Create new task definition
Configure:

Setting	Value
Task definition family	`grafana-task`
Launch type	AWS Fargate
OS/Architecture	Linux/X86_64
Task size — CPU	`0.5 vCPU` (512)
Task size — Memory	`1 GB` (1024)
Task execution role	`ecsTaskExecutionRole`
Task role	None (Grafana doesn't need to call AWS APIs)

Container definition:

Setting	Value
Container name	`grafana`
Image URI	`grafana/grafana:latest`
Essential	Yes
Port mappings	Container port: `3000`, Protocol: TCP

(Optional) Environment variables:

Key	Value	Purpose
`GF_SECURITY_ADMIN_USER`	`admin`	Default admin username
`GF_SECURITY_ADMIN_PASSWORD`	`YourStrongPassword123!`	Override default password

(Optional) Logging — CloudWatch:

Setting	Value
Log driver	`awslogs`
Log group	`/ecs/grafana`
Region	Your region (e.g., `ap-northeast-1`)
Stream prefix	`grafana`

Click Create

Equivalent JSON task definition:

{
  "family": "grafana-task",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "512",
  "memory": "1024",
  "executionRoleArn": "arn:aws:iam::YOUR_ACCOUNT_ID:role/ecsTaskExecutionRole",
  "containerDefinitions": [
    {
      "name": "grafana",
      "image": "grafana/grafana:latest",
      "essential": true,
      "portMappings": [
        {
          "containerPort": 3000,
          "protocol": "tcp"
        }
      ],
      "environment": [
        { "name": "GF_SECURITY_ADMIN_USER", "value": "admin" },
        { "name": "GF_SECURITY_ADMIN_PASSWORD", "value": "YourStrongPassword123!" }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/grafana",
          "awslogs-region": "ap-northeast-1",
          "awslogs-stream-prefix": "grafana",
          "awslogs-create-group": "true"
        }
      }
    }
  ]
}

Register via CLI:

aws ecs register-task-definition --cli-input-json file://grafana-task-def.json

Key point: The image field is just grafana/grafana:latest — a Docker Hub public image. ECS pulls it directly from Docker Hub. No ECR repository needed for public images.

Step 6 — Create the ECS Service (or Run a Standalone Task)

Option A — Run as a Service (Recommended)

A service ensures Grafana stays running. If the task crashes, ECS automatically starts a new one.

ECS Console → Clusters → grafana-cluster → Services → Create
Configure:

Setting	Value
Launch type	Fargate
Task definition	`grafana-task` (latest revision)
Service name	`grafana-service`
Desired tasks	`1`

Networking:

Setting	Value
VPC	Your VPC
Subnets	Select your public subnet(s)
Security group	`grafana-sg` (the one allowing port 3000)
Public IP	Enabled (this is critical!)

Click Create

AWS CLI:

aws ecs create-service \
  --cluster grafana-cluster \
  --service-name grafana-service \
  --task-definition grafana-task \
  --desired-count 1 \
  --launch-type FARGATE \
  --network-configuration '{
    "awsvpcConfiguration": {
      "subnets": ["subnet-0abc123"],
      "securityGroups": ["sg-0def456"],
      "assignPublicIp": "ENABLED"
    }
  }'

Option B — Run a Standalone Task (Quick Test)

aws ecs run-task \
  --cluster grafana-cluster \
  --task-definition grafana-task \
  --launch-type FARGATE \
  --network-configuration '{
    "awsvpcConfiguration": {
      "subnets": ["subnet-0abc123"],
      "securityGroups": ["sg-0def456"],
      "assignPublicIp": "ENABLED"
    }
  }'

Step 7 — Access Grafana

Go to ECS Console → Clusters → grafana-cluster → Tasks tab
Click on the running task ID
Under Configuration, find the Public IP (e.g., 3.112.45.67)
Open your browser and navigate to: http://3.112.45.67:3000
Login with:
- Username: admin
- Password: admin (or whatever you set via GF_SECURITY_ADMIN_PASSWORD)
Grafana will prompt you to change the password on first login You now have Grafana running on ECS Fargate, pulled directly from Docker Hub.

Path B — Push Your Own Image to ECR, Then Deploy on ECS

Use this path when you have a custom backend (your own Dockerfile) or want to use a private registry. The workflow is: Build locally → Push to ECR → ECS pulls from ECR.

Step B1 — Create an ECR Repository

ECR Console → Repositories → Create repository
Configure:
- Visibility: Private
- Repository name: my-backend

AWS CLI:

aws ecr create-repository \
  --repository-name my-backend \
  --region ap-northeast-1

This gives you a repository URI like:
123456789012.dkr.ecr.ap-northeast-1.amazonaws.com/my-backend

Step B2 — Build, Tag, and Push Your Docker Image

# 1. Authenticate Docker to ECR
aws ecr get-login-password --region ap-northeast-1 | \
  docker login --username AWS --password-stdin \
  123456789012.dkr.ecr.ap-northeast-1.amazonaws.com

# 2. Build your Docker image
docker build -t my-backend .

# 3. Tag the image for ECR
docker tag my-backend:latest \
  123456789012.dkr.ecr.ap-northeast-1.amazonaws.com/my-backend:latest

# 4. Push to ECR
docker push \
  123456789012.dkr.ecr.ap-northeast-1.amazonaws.com/my-backend:latest

Verify in the ECR Console that the image appears under your repository.

Step B3 — Create the Task Definition (Using ECR Image)

The only difference from Path A is the image field — it now points to your ECR URI instead of Docker Hub:

{
  "family": "my-backend-task",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "256",
  "memory": "512",
  "executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
  "containerDefinitions": [
    {
      "name": "my-backend",
      "image": "123456789012.dkr.ecr.ap-northeast-1.amazonaws.com/my-backend:latest",
      "essential": true,
      "portMappings": [
        {
          "containerPort": 8080,
          "protocol": "tcp"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/my-backend",
          "awslogs-region": "ap-northeast-1",
          "awslogs-stream-prefix": "backend",
          "awslogs-create-group": "true"
        }
      }
    }
  ]
}

The ecsTaskExecutionRole already has permissions to pull from ECR (via the AmazonECSTaskExecutionRolePolicy). No extra configuration needed. Then create the cluster, service, security group — same as Steps 1–7 from Path A.

🔄 Updating Your Image (CI/CD Cycle)

When you push a new image to ECR (or Docker Hub updates), ECS doesn't automatically pick it up for already running tasks. To deploy the new image:

# Force a new deployment — ECS will pull the latest image
aws ecs update-service \
  --cluster grafana-cluster \
  --service grafana-service \
  --force-new-deployment

This triggers a rolling update: ECS starts new tasks with the latest image, then drains the old tasks.

🔒 Production Hardening

The setup above is great for learning and testing. For production, make these improvements:

1. Move Tasks to Private Subnets + Add ALB

Instead of exposing the task's public IP directly:

Internet → ALB (public subnet) → ECS Task (private subnet) → NAT GW (outbound only)

This way Grafana is only reachable through the load balancer, and the container itself has no public IP.

2. Use an Application Load Balancer (ALB)

Create an ALB in your public subnets
Create a target group (type: ip, port 3000, health check path /api/health)
Attach the target group to your ECS service
Configure an HTTPS listener (443) with an ACM certificate
Redirect HTTP (80) → HTTPS (443) Now you access Grafana via https://grafana.yourdomain.com instead of a raw IP.

3. Use Secrets Manager for Passwords

Never hardcode GF_SECURITY_ADMIN_PASSWORD in the task definition. Instead:

# Store the secret
aws secretsmanager create-secret \
  --name grafana/admin-password \
  --secret-string "YourStrongPassword123!"

Reference it in the task definition:

"secrets": [
  {
    "name": "GF_SECURITY_ADMIN_PASSWORD",
    "valueFrom": "arn:aws:secretsmanager:ap-northeast-1:123456789012:secret:grafana/admin-password-AbCdEf"
  }
]

The task execution role needs additional permissions:

{
  "Effect": "Allow",
  "Action": ["secretsmanager:GetSecretValue"],
  "Resource": "arn:aws:secretsmanager:ap-northeast-1:123456789012:secret:grafana/*"
}

4. Persistent Storage with EFS

By default, Grafana data (dashboards, data sources) is lost when the task restarts. Mount an EFS volume:

"volumes": [{
  "name": "grafana-data",
  "efsVolumeConfiguration": {
    "fileSystemId": "fs-0abc123",
    "rootDirectory": "/grafana"
  }
}]

And in the container definition:

"mountPoints": [{
  "sourceVolume": "grafana-data",
  "containerPath": "/var/lib/grafana"
}]

5. Auto Scaling

# Register scalable target
aws application-autoscaling register-scalable-target \
  --service-namespace ecs \
  --resource-id service/grafana-cluster/grafana-service \
  --scalable-dimension ecs:service:DesiredCount \
  --min-capacity 1 \
  --max-capacity 3

# Scale based on CPU
aws application-autoscaling put-scaling-policy \
  --service-namespace ecs \
  --resource-id service/grafana-cluster/grafana-service \
  --scalable-dimension ecs:service:DesiredCount \
  --policy-name cpu-scaling \
  --policy-type TargetTrackingScaling \
  --target-tracking-scaling-policy-configuration '{
    "TargetValue": 70.0,
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ECSServiceAverageCPUUtilization"
    }
  }'

🔍 Troubleshooting

Problem	Likely Cause	Fix
Task stuck in `PROVISIONING`	Subnet can't reach the internet	Verify public subnet has IGW route and public IP is enabled
`CannotPullContainerError`	No internet access OR wrong image name	Check subnet route table, verify image name is exactly right (case-sensitive)
`CannotPullContainerError` for ECR image	Task execution role missing ECR permissions	Attach `AmazonECSTaskExecutionRolePolicy` to `ecsTaskExecutionRole`
Task starts then immediately stops	Container crashes	Check CloudWatch logs at `/ecs/grafana`
Can't access Grafana in browser	Security group doesn't allow port 3000	Add inbound TCP rule for port 3000
`ResourceInitializationError`	Missing execution role or permissions	Verify `executionRoleArn` is set in task definition
`toomanyrequests` from Docker Hub	Docker Hub rate limit hit	Use ECR Public image instead: `public.ecr.aws/grafana/grafana:latest`
Task running but no public IP	Public IP not assigned	Set `assignPublicIp: ENABLED` in network configuration
ECS service keeps restarting tasks	Health check failing	Check container health, ensure the app starts within the health check grace period

Checking Container Logs

# Find the task ID
aws ecs list-tasks --cluster grafana-cluster --service-name grafana-service

# Get task details (including public IP)
aws ecs describe-tasks --cluster grafana-cluster --tasks <TASK_ID>

# View logs in CloudWatch
aws logs tail /ecs/grafana --follow

📊 Docker Hub vs ECR — When to Use Which

Scenario	Use Docker Hub Directly	Use ECR (Private)
Public/official images (Grafana, Nginx, Redis)	✅ Simplest — just use the image name	❌ Unnecessary overhead
Custom application images	❌ Requires Docker Hub account + push	✅ Integrated with AWS IAM, no rate limits
Production workloads	⚠️ Subject to Docker Hub rate limits	✅ No pull rate limits, faster pulls within AWS
Air-gapped / private environments	❌ Requires internet access	✅ Works with VPC endpoints (no internet needed)
CI/CD pipelines	⚠️ Needs Docker Hub credentials	✅ `aws ecr get-login-password` works natively

Pro tip: For public images in production, use ECR Public instead of Docker Hub to avoid rate limiting: public.ecr.aws/grafana/grafana:latest

🏗️ Terraform Example (Complete)

# ─── Cluster ───
resource "aws_ecs_cluster" "grafana" {
  name = "grafana-cluster"
}

# ─── CloudWatch Log Group ───
resource "aws_cloudwatch_log_group" "grafana" {
  name              = "/ecs/grafana"
  retention_in_days = 7
}

# ─── Task Definition ───
resource "aws_ecs_task_definition" "grafana" {
  family                   = "grafana-task"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = "512"
  memory                   = "1024"
  execution_role_arn       = aws_iam_role.ecs_task_execution.arn

  container_definitions = jsonencode([{
    name      = "grafana"
    image     = "grafana/grafana:latest"
    essential = true

    portMappings = [{
      containerPort = 3000
      protocol      = "tcp"
    }]

    environment = [
      { name = "GF_SECURITY_ADMIN_USER", value = "admin" }
    ]

    logConfiguration = {
      logDriver = "awslogs"
      options = {
        "awslogs-group"         = aws_cloudwatch_log_group.grafana.name
        "awslogs-region"        = var.region
        "awslogs-stream-prefix" = "grafana"
      }
    }
  }])
}

# ─── Security Group ───
resource "aws_security_group" "grafana" {
  name        = "grafana-sg"
  description = "Allow Grafana UI access"
  vpc_id      = var.vpc_id

  ingress {
    from_port   = 3000
    to_port     = 3000
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

# ─── ECS Service ───
resource "aws_ecs_service" "grafana" {
  name            = "grafana-service"
  cluster         = aws_ecs_cluster.grafana.id
  task_definition = aws_ecs_task_definition.grafana.arn
  desired_count   = 1
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = var.public_subnet_ids
    security_groups  = [aws_security_group.grafana.id]
    assign_public_ip = true
  }
}

# ─── IAM: Task Execution Role ───
resource "aws_iam_role" "ecs_task_execution" {
  name = "ecsTaskExecutionRole"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action    = "sts:AssumeRole"
      Effect    = "Allow"
      Principal = { Service = "ecs-tasks.amazonaws.com" }
    }]
  })
}

resource "aws_iam_role_policy_attachment" "ecs_task_execution" {
  role       = aws_iam_role.ecs_task_execution.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}

📚 References

Last updated: March 2026

📌 What Are We Building?​

🧱 Core ECS Concepts​

📋 Prerequisites​

Path A — Pull a Public Docker Hub Image (Grafana)

Step 1 — Create a VPC (or Use the Default VPC)​

Step 2 — Create a Security Group​

Step 3 — Create the ECS Cluster​

Step 4 — Create the Task Execution IAM Role​

Step 5 — Create the Task Definition​

Step 6 — Create the ECS Service (or Run a Standalone Task)​

Option A — Run as a Service (Recommended)​

Option B — Run a Standalone Task (Quick Test)​

Step 7 — Access Grafana​

Path B — Push Your Own Image to ECR, Then Deploy on ECS

Step B1 — Create an ECR Repository​

Step B2 — Build, Tag, and Push Your Docker Image​

Step B3 — Create the Task Definition (Using ECR Image)​

🔄 Updating Your Image (CI/CD Cycle)​

🔒 Production Hardening​

1. Move Tasks to Private Subnets + Add ALB​

2. Use an Application Load Balancer (ALB)​

3. Use Secrets Manager for Passwords​

4. Persistent Storage with EFS​

5. Auto Scaling​

🔍 Troubleshooting​

Checking Container Logs​

📊 Docker Hub vs ECR — When to Use Which​

🏗️ Terraform Example (Complete)​

📚 References​

Related Articles

📌 What Are We Building?

🧱 Core ECS Concepts

📋 Prerequisites

Step 1 — Create a VPC (or Use the Default VPC)

Step 2 — Create a Security Group

Step 3 — Create the ECS Cluster

Step 4 — Create the Task Execution IAM Role

Step 5 — Create the Task Definition

Step 6 — Create the ECS Service (or Run a Standalone Task)

Option A — Run as a Service (Recommended)

Option B — Run a Standalone Task (Quick Test)

Step 7 — Access Grafana

Step B1 — Create an ECR Repository

Step B2 — Build, Tag, and Push Your Docker Image

Step B3 — Create the Task Definition (Using ECR Image)

🔄 Updating Your Image (CI/CD Cycle)

🔒 Production Hardening

1. Move Tasks to Private Subnets + Add ALB

2. Use an Application Load Balancer (ALB)

3. Use Secrets Manager for Passwords

4. Persistent Storage with EFS

5. Auto Scaling

🔍 Troubleshooting

Checking Container Logs

📊 Docker Hub vs ECR — When to Use Which

🏗️ Terraform Example (Complete)

📚 References