Skip to main content
Current1mo ago

Deploying Grafana On AWS Using ECS

Practical Example: Pull the official Grafana image from Docker Hub, run it on ECS Fargate, and get a fully working monitoring dashboard accessible from the browser — with the complete networking, IAM, and service setup explained from scratch.


📌 What Are We Building?

                    ┌─────────────────────────────────────────────────┐
│ AWS Cloud (your account) │
│ │
You (browser) │ ┌──────────────┐ │
http://\<IP>:3000 ──┼───►│ Security │ │
│ │ Group :3000 │ │
│ └──────┬───────┘ │
│ │ │
│ ┌──────▼───────────────────────────────┐ │
│ │ ECS Cluster (Fargate) │ │
│ │ │ │
│ │ ┌─────────────────────────────────┐ │ │
│ │ │ ECS Task (awsvpc mode) │ │ │
│ │ │ │ │ │
│ │ │ ┌───────────────────────────┐ │ │ │
│ │ │ │ grafana/grafana:latest │ │ │ │
│ │ │ │ (from Docker Hub) │ │ │ │
│ │ │ │ Port 3000 │ │ │ │
│ │ │ └───────────────────────────┘ │ │ │
│ │ │ │ │ │
│ │ │ Task Execution Role │ │ │
│ │ │ → pulls image │ │ │
│ │ │ → writes CloudWatch logs │ │ │
│ │ └─────────────────────────────────┘ │ │
│ └──────────────────────────────────────┘ │
│ │
│ VPC → Public Subnet → Internet Gateway │
└─────────────────────────────────────────────────┘

Two approaches covered in this guide:

  1. Path A — Pull a public image directly from Docker Hub (e.g., grafana/grafana) — simplest, no ECR needed

  2. Path B — Push your own image to ECR (private registry), then pull from ECR — for custom/private backends


🧱 Core ECS Concepts

Before diving into steps, here's what each piece does:

ComponentWhat It IsAnalogy
ECS ClusterLogical grouping of tasks and servicesA "workspace" for your containers
Task DefinitionBlueprint for your container — image URI, CPU, memory, ports, env vars, IAM rolesA docker-compose.yml for AWS
TaskA running instance of a task definitionA running docker run command
ServiceManages desired count of tasks, restarts on failure, integrates with load balancersA process supervisor
FargateServerless compute engine — no EC2 instances to manageAWS runs the container for you
Task Execution RoleIAM role that lets the ECS agent pull images and write logsPermissions for the infrastructure
Task RoleIAM role that your application code uses to call AWS APIsPermissions for your app

📋 Prerequisites

  • An AWS account with admin or sufficient IAM permissions

  • AWS CLI installed and configured (aws configure)

  • Docker installed locally (only needed for Path B — pushing to ECR)

  • A VPC with at least one public subnet and an Internet Gateway (default VPC works)


Path A — Pull a Public Docker Hub Image (Grafana)

This is the simplest approach. ECS Fargate can pull public images from Docker Hub directly — you just specify the image name in the task definition. No ECR repository needed.


Step 1 — Create a VPC (or Use the Default VPC)

Every AWS account comes with a default VPC in each region. For a quick setup, the default VPC works fine. If you want to create a custom VPC:

  1. VPC ConsoleCreate VPC

  2. Choose VPC and more (wizard creates subnets, route table, IGW automatically)

  3. Configure:

    • Name: grafana-vpc
    • IPv4 CIDR: 10.0.0.0/16
    • Number of AZs: 2
    • Public subnets: 2
    • Private subnets: 0 (for this simple setup)
    • NAT Gateway: None
    • VPC Endpoints: None

What must be true for Fargate tasks to pull images from the internet:

  • The task runs in a public subnet

  • The subnet's route table has a route 0.0.0.0/0 → Internet Gateway

  • Auto-assign public IP is set to ENABLED when launching the task

  • OR the task runs in a private subnet with a route to a NAT Gateway


Step 2 — Create a Security Group

The security group controls what traffic can reach your Grafana container.

  1. EC2 ConsoleSecurity GroupsCreate security group

  2. Configure:

FieldValue
Namegrafana-sg
DescriptionAllow Grafana UI access
VPCSelect your VPC
  1. Inbound Rules:
TypeProtocolPortSourceDescription
Custom TCPTCP30000.0.0.0/0Grafana web UI
  1. Outbound Rules: Leave default (all traffic allowed outbound — required for image pulls and internet access)

AWS CLI:


# Create the security group
SG_ID=$(aws ec2 create-security-group \
--group-name grafana-sg \
--description "Allow Grafana UI on port 3000" \
--vpc-id vpc-0abc123 \
--query 'GroupId' --output text)

# Add inbound rule for port 3000
aws ec2 authorize-security-group-ingress \
--group-id $SG_ID \
--protocol tcp \
--port 3000 \
--cidr 0.0.0.0/0


Step 3 — Create the ECS Cluster

  1. ECS ConsoleClustersCreate cluster

  2. Configure:

    • Cluster name: grafana-cluster
    • Infrastructure: Select AWS Fargate (default)
  3. Click Create That's it. The cluster is just a logical container — no servers are provisioned.

AWS CLI:

aws ecs create-cluster --cluster-name grafana-cluster


Step 4 — Create the Task Execution IAM Role

The Task Execution Role allows the ECS agent (not your application) to pull images and write logs. If you've used ECS before, you likely already have ecsTaskExecutionRole.

Check if it exists:

aws iam get-role --role-name ecsTaskExecutionRole

If it doesn't exist, create it:

  1. IAM ConsoleRolesCreate role

  2. Trusted entity: AWS serviceElastic Container Service → Use case: Elastic Container Service Task

  3. Attach the managed policy: AmazonECSTaskExecutionRolePolicy

  4. Role name: ecsTaskExecutionRole This policy grants:

  • ecr:GetAuthorizationToken — authenticate to ECR

  • ecr:BatchGetImage, ecr:GetDownloadUrlForLayer — pull images from ECR

  • logs:CreateLogStream, logs:PutLogEvents — write container logs to CloudWatch

AWS CLI:


# Create the trust policy document
cat > trust-policy.json << 'EOF'
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": { "Service": "ecs-tasks.amazonaws.com" },
"Action": "sts:AssumeRole"
}]
}
EOF

# Create the role
aws iam create-role \
--role-name ecsTaskExecutionRole \
--assume-role-policy-document file://trust-policy.json

# Attach the managed policy
aws iam attach-role-policy \
--role-name ecsTaskExecutionRole \
--policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy


Step 5 — Create the Task Definition

This is the heart of the setup — the blueprint that tells ECS which image to run, how much CPU/memory to allocate, which ports to expose, and what roles to use.

  1. ECS ConsoleTask definitionsCreate new task definition

  2. Configure:

SettingValue
Task definition familygrafana-task
Launch typeAWS Fargate
OS/ArchitectureLinux/X86_64
Task size — CPU0.5 vCPU (512)
Task size — Memory1 GB (1024)
Task execution roleecsTaskExecutionRole
Task roleNone (Grafana doesn't need to call AWS APIs)
  1. Container definition:
SettingValue
Container namegrafana
Image URIgrafana/grafana:latest
EssentialYes
Port mappingsContainer port: 3000, Protocol: TCP
  1. (Optional) Environment variables:
KeyValuePurpose
GF_SECURITY_ADMIN_USERadminDefault admin username
GF_SECURITY_ADMIN_PASSWORDYourStrongPassword123!Override default password
  1. (Optional) Logging — CloudWatch:
SettingValue
Log driverawslogs
Log group/ecs/grafana
RegionYour region (e.g., ap-northeast-1)
Stream prefixgrafana
  1. Click Create

Equivalent JSON task definition:

{
"family": "grafana-task",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "512",
"memory": "1024",
"executionRoleArn": "arn:aws:iam::YOUR_ACCOUNT_ID:role/ecsTaskExecutionRole",
"containerDefinitions": [
{
"name": "grafana",
"image": "grafana/grafana:latest",
"essential": true,
"portMappings": [
{
"containerPort": 3000,
"protocol": "tcp"
}
],
"environment": [
{ "name": "GF_SECURITY_ADMIN_USER", "value": "admin" },
{ "name": "GF_SECURITY_ADMIN_PASSWORD", "value": "YourStrongPassword123!" }
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/grafana",
"awslogs-region": "ap-northeast-1",
"awslogs-stream-prefix": "grafana",
"awslogs-create-group": "true"
}
}
}
]
}

Register via CLI:

aws ecs register-task-definition --cli-input-json file://grafana-task-def.json

Key point: The image field is just grafana/grafana:latest — a Docker Hub public image. ECS pulls it directly from Docker Hub. No ECR repository needed for public images.


Step 6 — Create the ECS Service (or Run a Standalone Task)

A service ensures Grafana stays running. If the task crashes, ECS automatically starts a new one.

  1. ECS ConsoleClustersgrafana-clusterServicesCreate

  2. Configure:

SettingValue
Launch typeFargate
Task definitiongrafana-task (latest revision)
Service namegrafana-service
Desired tasks1
  1. Networking:
SettingValue
VPCYour VPC
SubnetsSelect your public subnet(s)
Security groupgrafana-sg (the one allowing port 3000)
Public IPEnabled (this is critical!)
  1. Click Create

AWS CLI:

aws ecs create-service \
--cluster grafana-cluster \
--service-name grafana-service \
--task-definition grafana-task \
--desired-count 1 \
--launch-type FARGATE \
--network-configuration '{
"awsvpcConfiguration": {
"subnets": ["subnet-0abc123"],
"securityGroups": ["sg-0def456"],
"assignPublicIp": "ENABLED"
}
}'

Option B — Run a Standalone Task (Quick Test)

aws ecs run-task \
--cluster grafana-cluster \
--task-definition grafana-task \
--launch-type FARGATE \
--network-configuration '{
"awsvpcConfiguration": {
"subnets": ["subnet-0abc123"],
"securityGroups": ["sg-0def456"],
"assignPublicIp": "ENABLED"
}
}'


Step 7 — Access Grafana

  1. Go to ECS ConsoleClustersgrafana-clusterTasks tab

  2. Click on the running task ID

  3. Under Configuration, find the Public IP (e.g., 3.112.45.67)

  4. Open your browser and navigate to: http://3.112.45.67:3000

  5. Login with:

    • Username: admin
    • Password: admin (or whatever you set via GF_SECURITY_ADMIN_PASSWORD)
  6. Grafana will prompt you to change the password on first login You now have Grafana running on ECS Fargate, pulled directly from Docker Hub.


Path B — Push Your Own Image to ECR, Then Deploy on ECS

Use this path when you have a custom backend (your own Dockerfile) or want to use a private registry. The workflow is: Build locally → Push to ECR → ECS pulls from ECR.


Step B1 — Create an ECR Repository

  1. ECR ConsoleRepositoriesCreate repository

  2. Configure:

    • Visibility: Private
    • Repository name: my-backend

AWS CLI:

aws ecr create-repository \
--repository-name my-backend \
--region ap-northeast-1

This gives you a repository URI like:
123456789012.dkr.ecr.ap-northeast-1.amazonaws.com/my-backend


Step B2 — Build, Tag, and Push Your Docker Image


# 1. Authenticate Docker to ECR
aws ecr get-login-password --region ap-northeast-1 | \
docker login --username AWS --password-stdin \
123456789012.dkr.ecr.ap-northeast-1.amazonaws.com

# 2. Build your Docker image
docker build -t my-backend .

# 3. Tag the image for ECR
docker tag my-backend:latest \
123456789012.dkr.ecr.ap-northeast-1.amazonaws.com/my-backend:latest

# 4. Push to ECR
docker push \
123456789012.dkr.ecr.ap-northeast-1.amazonaws.com/my-backend:latest

Verify in the ECR Console that the image appears under your repository.


Step B3 — Create the Task Definition (Using ECR Image)

The only difference from Path A is the image field — it now points to your ECR URI instead of Docker Hub:

{
"family": "my-backend-task",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "256",
"memory": "512",
"executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
"containerDefinitions": [
{
"name": "my-backend",
"image": "123456789012.dkr.ecr.ap-northeast-1.amazonaws.com/my-backend:latest",
"essential": true,
"portMappings": [
{
"containerPort": 8080,
"protocol": "tcp"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/my-backend",
"awslogs-region": "ap-northeast-1",
"awslogs-stream-prefix": "backend",
"awslogs-create-group": "true"
}
}
}
]
}

The ecsTaskExecutionRole already has permissions to pull from ECR (via the AmazonECSTaskExecutionRolePolicy). No extra configuration needed. Then create the cluster, service, security group — same as Steps 1–7 from Path A.


🔄 Updating Your Image (CI/CD Cycle)

When you push a new image to ECR (or Docker Hub updates), ECS doesn't automatically pick it up for already running tasks. To deploy the new image:


# Force a new deployment — ECS will pull the latest image
aws ecs update-service \
--cluster grafana-cluster \
--service grafana-service \
--force-new-deployment

This triggers a rolling update: ECS starts new tasks with the latest image, then drains the old tasks.


🔒 Production Hardening

The setup above is great for learning and testing. For production, make these improvements:

1. Move Tasks to Private Subnets + Add ALB

Instead of exposing the task's public IP directly:

Internet → ALB (public subnet) → ECS Task (private subnet) → NAT GW (outbound only)

This way Grafana is only reachable through the load balancer, and the container itself has no public IP.

2. Use an Application Load Balancer (ALB)

  • Create an ALB in your public subnets

  • Create a target group (type: ip, port 3000, health check path /api/health)

  • Attach the target group to your ECS service

  • Configure an HTTPS listener (443) with an ACM certificate

  • Redirect HTTP (80) → HTTPS (443) Now you access Grafana via https://grafana.yourdomain.com instead of a raw IP.

3. Use Secrets Manager for Passwords

Never hardcode GF_SECURITY_ADMIN_PASSWORD in the task definition. Instead:


# Store the secret
aws secretsmanager create-secret \
--name grafana/admin-password \
--secret-string "YourStrongPassword123!"

Reference it in the task definition:

"secrets": [
{
"name": "GF_SECURITY_ADMIN_PASSWORD",
"valueFrom": "arn:aws:secretsmanager:ap-northeast-1:123456789012:secret:grafana/admin-password-AbCdEf"
}
]

The task execution role needs additional permissions:

{
"Effect": "Allow",
"Action": ["secretsmanager:GetSecretValue"],
"Resource": "arn:aws:secretsmanager:ap-northeast-1:123456789012:secret:grafana/*"
}

4. Persistent Storage with EFS

By default, Grafana data (dashboards, data sources) is lost when the task restarts. Mount an EFS volume:

"volumes": [{
"name": "grafana-data",
"efsVolumeConfiguration": {
"fileSystemId": "fs-0abc123",
"rootDirectory": "/grafana"
}
}]

And in the container definition:

"mountPoints": [{
"sourceVolume": "grafana-data",
"containerPath": "/var/lib/grafana"
}]

5. Auto Scaling


# Register scalable target
aws application-autoscaling register-scalable-target \
--service-namespace ecs \
--resource-id service/grafana-cluster/grafana-service \
--scalable-dimension ecs:service:DesiredCount \
--min-capacity 1 \
--max-capacity 3

# Scale based on CPU
aws application-autoscaling put-scaling-policy \
--service-namespace ecs \
--resource-id service/grafana-cluster/grafana-service \
--scalable-dimension ecs:service:DesiredCount \
--policy-name cpu-scaling \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration '{
"TargetValue": 70.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ECSServiceAverageCPUUtilization"
}
}'


🔍 Troubleshooting

ProblemLikely CauseFix
Task stuck in PROVISIONINGSubnet can't reach the internetVerify public subnet has IGW route and public IP is enabled
CannotPullContainerErrorNo internet access OR wrong image nameCheck subnet route table, verify image name is exactly right (case-sensitive)
CannotPullContainerError for ECR imageTask execution role missing ECR permissionsAttach AmazonECSTaskExecutionRolePolicy to ecsTaskExecutionRole
Task starts then immediately stopsContainer crashesCheck CloudWatch logs at /ecs/grafana
Can't access Grafana in browserSecurity group doesn't allow port 3000Add inbound TCP rule for port 3000
ResourceInitializationErrorMissing execution role or permissionsVerify executionRoleArn is set in task definition
toomanyrequests from Docker HubDocker Hub rate limit hitUse ECR Public image instead: public.ecr.aws/grafana/grafana:latest
Task running but no public IPPublic IP not assignedSet assignPublicIp: ENABLED in network configuration
ECS service keeps restarting tasksHealth check failingCheck container health, ensure the app starts within the health check grace period

Checking Container Logs


# Find the task ID
aws ecs list-tasks --cluster grafana-cluster --service-name grafana-service

# Get task details (including public IP)
aws ecs describe-tasks --cluster grafana-cluster --tasks <TASK_ID>

# View logs in CloudWatch
aws logs tail /ecs/grafana --follow


📊 Docker Hub vs ECR — When to Use Which

ScenarioUse Docker Hub DirectlyUse ECR (Private)
Public/official images (Grafana, Nginx, Redis)✅ Simplest — just use the image name❌ Unnecessary overhead
Custom application images❌ Requires Docker Hub account + push✅ Integrated with AWS IAM, no rate limits
Production workloads⚠️ Subject to Docker Hub rate limits✅ No pull rate limits, faster pulls within AWS
Air-gapped / private environments❌ Requires internet access✅ Works with VPC endpoints (no internet needed)
CI/CD pipelines⚠️ Needs Docker Hub credentialsaws ecr get-login-password works natively

Pro tip: For public images in production, use ECR Public instead of Docker Hub to avoid rate limiting: public.ecr.aws/grafana/grafana:latest


🏗️ Terraform Example (Complete)


# ─── Cluster ───
resource "aws_ecs_cluster" "grafana" {
name = "grafana-cluster"
}

# ─── CloudWatch Log Group ───
resource "aws_cloudwatch_log_group" "grafana" {
name = "/ecs/grafana"
retention_in_days = 7
}

# ─── Task Definition ───
resource "aws_ecs_task_definition" "grafana" {
family = "grafana-task"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = "512"
memory = "1024"
execution_role_arn = aws_iam_role.ecs_task_execution.arn

container_definitions = jsonencode([{
name = "grafana"
image = "grafana/grafana:latest"
essential = true

portMappings = [{
containerPort = 3000
protocol = "tcp"
}]

environment = [
{ name = "GF_SECURITY_ADMIN_USER", value = "admin" }
]

logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-group" = aws_cloudwatch_log_group.grafana.name
"awslogs-region" = var.region
"awslogs-stream-prefix" = "grafana"
}
}
}])
}

# ─── Security Group ───
resource "aws_security_group" "grafana" {
name = "grafana-sg"
description = "Allow Grafana UI access"
vpc_id = var.vpc_id

ingress {
from_port = 3000
to_port = 3000
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}

egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}

# ─── ECS Service ───
resource "aws_ecs_service" "grafana" {
name = "grafana-service"
cluster = aws_ecs_cluster.grafana.id
task_definition = aws_ecs_task_definition.grafana.arn
desired_count = 1
launch_type = "FARGATE"

network_configuration {
subnets = var.public_subnet_ids
security_groups = [aws_security_group.grafana.id]
assign_public_ip = true
}
}

# ─── IAM: Task Execution Role ───
resource "aws_iam_role" "ecs_task_execution" {
name = "ecsTaskExecutionRole"

assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = { Service = "ecs-tasks.amazonaws.com" }
}]
})
}

resource "aws_iam_role_policy_attachment" "ecs_task_execution" {
role = aws_iam_role.ecs_task_execution.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}


📚 References


Last updated: March 2026

Related Articles