Introduction
Manual server configuration doesn’t scale. After managing infrastructure manually for years, I’ve learned that automation isn’t optional—it’s survival. Ansible has become my tool of choice for infrastructure automation: agentless, readable, and powerful.
This guide covers building production-ready Ansible automation, from basic playbooks to complex roles with CI/CD integration.
Why Ansible?
| Feature | Ansible | Puppet | Chef | Terraform |
|---|---|---|---|---|
| Agent Required | No | Yes | Yes | No |
| Language | YAML | DSL | Ruby | HCL |
| Learning Curve | Low | High | High | Medium |
| Best For | Config Management | Config Management | Config Management | Infrastructure Provisioning |
Ansible’s agentless architecture and YAML syntax make it accessible while remaining powerful enough for complex automation.
Project Structure
A well-organized Ansible project:
ansible/
├── ansible.cfg
├── inventory/
│ ├── production/
│ │ ├── hosts.yml
│ │ └── group_vars/
│ │ ├── all.yml
│ │ ├── webservers.yml
│ │ └── databases.yml
│ └── staging/
│ ├── hosts.yml
│ └── group_vars/
├── playbooks/
│ ├── site.yml
│ ├── webservers.yml
│ └── databases.yml
├── roles/
│ ├── common/
│ ├── nginx/
│ ├── postgresql/
│ └── monitoring/
└── requirements.yml
Configuration
# ansible.cfg
[defaults]
inventory = inventory/production
roles_path = roles
remote_user = ansible
private_key_file = ~/.ssh/ansible_key
host_key_checking = False
retry_files_enabled = False
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 86400
[privilege_escalation]
become = True
become_method = sudo
become_user = root
[ssh_connection]
pipelining = True
control_path = /tmp/ansible-%%h-%%p-%%r
Inventory Management
Dynamic Inventory
Static inventory files don’t scale. Use dynamic inventory:
# inventory/production/hosts.yml
all:
children:
webservers:
hosts:
web-[01:05].prod.yourorg.com:
databases:
hosts:
db-01.prod.yourorg.com:
db-02.prod.yourorg.com:
vars:
postgresql_version: 15
monitoring:
hosts:
mon-01.prod.yourorg.com:
Cloud Dynamic Inventory
#!/usr/bin/env python3
# inventory/azure_inventory.py
import json
from azure.identity import DefaultAzureCredential
from azure.mgmt.compute import ComputeManagementClient
def get_inventory():
credential = DefaultAzureCredential()
subscription_id = "your-subscription-id"
compute_client = ComputeManagementClient(credential, subscription_id)
inventory = {
"_meta": {"hostvars": {}},
"all": {"children": ["webservers", "databases"]},
"webservers": {"hosts": []},
"databases": {"hosts": []}
}
for vm in compute_client.virtual_machines.list_all():
tags = vm.tags or {}
hostname = vm.name
# Categorize by tag
role = tags.get("role", "other")
if role not in inventory:
inventory[role] = {"hosts": []}
inventory[role]["hosts"].append(hostname)
# Add host variables
inventory["_meta"]["hostvars"][hostname] = {
"ansible_host": get_private_ip(vm),
"vm_size": vm.hardware_profile.vm_size,
"environment": tags.get("environment", "unknown")
}
return inventory
if __name__ == "__main__":
print(json.dumps(get_inventory(), indent=2))
Group Variables
# inventory/production/group_vars/all.yml
---
# Common settings for all hosts
ansible_python_interpreter: /usr/bin/python3
timezone: UTC
ntp_servers:
- 0.pool.ntp.org
- 1.pool.ntp.org
# Security baseline
ssh_permit_root_login: "no"
ssh_password_authentication: "no"
fail2ban_enabled: true
# Monitoring
zabbix_server: mon-01.prod.yourorg.com
# inventory/production/group_vars/webservers.yml
---
nginx_worker_processes: auto
nginx_worker_connections: 4096
ssl_protocols: "TLSv1.2 TLSv1.3"
ssl_ciphers: "ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256"
Role Development
Role Structure
roles/nginx/
├── defaults/
│ └── main.yml
├── files/
│ └── nginx.conf
├── handlers/
│ └── main.yml
├── meta/
│ └── main.yml
├── tasks/
│ ├── main.yml
│ ├── install.yml
│ └── configure.yml
├── templates/
│ ├── nginx.conf.j2
│ └── vhost.conf.j2
└── vars/
└── main.yml
Complete Role Example
# roles/nginx/defaults/main.yml
---
nginx_user: www-data
nginx_worker_processes: auto
nginx_worker_connections: 1024
nginx_keepalive_timeout: 65
nginx_server_tokens: "off"
nginx_ssl_session_timeout: 1d
nginx_ssl_session_cache: shared:SSL:50m
# roles/nginx/tasks/main.yml
---
- name: Include OS-specific variables
include_vars: "{{ item }}"
with_first_found:
- "{{ ansible_distribution | lower }}-{{ ansible_distribution_major_version }}.yml"
- "{{ ansible_distribution | lower }}.yml"
- "{{ ansible_os_family | lower }}.yml"
- name: Install Nginx
include_tasks: install.yml
- name: Configure Nginx
include_tasks: configure.yml
- name: Configure virtual hosts
include_tasks: vhosts.yml
when: nginx_vhosts is defined
# roles/nginx/tasks/install.yml
---
- name: Install Nginx (Debian/Ubuntu)
apt:
name: nginx
state: present
update_cache: yes
when: ansible_os_family == "Debian"
- name: Install Nginx (RHEL/CentOS)
yum:
name: nginx
state: present
when: ansible_os_family == "RedHat"
- name: Ensure Nginx is enabled
service:
name: nginx
enabled: yes
state: started
# roles/nginx/tasks/configure.yml
---
- name: Configure main nginx.conf
template:
src: nginx.conf.j2
dest: /etc/nginx/nginx.conf
owner: root
group: root
mode: '0644'
validate: nginx -t -c %s
notify: Reload Nginx
- name: Create SSL directory
file:
path: /etc/nginx/ssl
state: directory
owner: root
group: root
mode: '0700'
- name: Configure DH parameters
command: openssl dhparam -out /etc/nginx/ssl/dhparam.pem 2048
args:
creates: /etc/nginx/ssl/dhparam.pem
{# roles/nginx/templates/nginx.conf.j2 #}
user {{ nginx_user }};
worker_processes {{ nginx_worker_processes }};
pid /run/nginx.pid;
events {
worker_connections {{ nginx_worker_connections }};
use epoll;
multi_accept on;
}
http {
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout {{ nginx_keepalive_timeout }};
types_hash_max_size 2048;
server_tokens {{ nginx_server_tokens }};
include /etc/nginx/mime.types;
default_type application/octet-stream;
# SSL Settings
ssl_protocols TLSv1.2 TLSv1.3;
ssl_prefer_server_ciphers on;
ssl_session_timeout {{ nginx_ssl_session_timeout }};
ssl_session_cache {{ nginx_ssl_session_cache }};
ssl_dhparam /etc/nginx/ssl/dhparam.pem;
# Logging
access_log /var/log/nginx/access.log;
error_log /var/log/nginx/error.log;
# Gzip
gzip on;
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_types text/plain text/css text/xml application/json application/javascript;
include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;
}
# roles/nginx/handlers/main.yml
---
- name: Reload Nginx
service:
name: nginx
state: reloaded
- name: Restart Nginx
service:
name: nginx
state: restarted
Advanced Patterns
Handlers with Listen
# Aggregate handlers to avoid multiple restarts
handlers:
- name: Restart all services
listen: "restart services"
debug:
msg: "Restarting services..."
- name: Restart nginx
listen: "restart services"
service:
name: nginx
state: restarted
- name: Restart php-fpm
listen: "restart services"
service:
name: php-fpm
state: restarted
Conditional Execution
- name: Deploy application
block:
- name: Pull latest code
git:
repo: "{{ app_repo }}"
dest: "{{ app_path }}"
version: "{{ app_version }}"
register: git_result
- name: Install dependencies
command: composer install --no-dev
args:
chdir: "{{ app_path }}"
when: git_result.changed
- name: Run migrations
command: php artisan migrate --force
args:
chdir: "{{ app_path }}"
when: git_result.changed
rescue:
- name: Rollback on failure
command: git checkout {{ previous_version }}
args:
chdir: "{{ app_path }}"
- name: Notify team
slack:
token: "{{ slack_token }}"
msg: "Deployment failed on {{ inventory_hostname }}"
Delegation and Serial Execution
# Rolling updates with health checks
- name: Deploy to webservers
hosts: webservers
serial: 2 # Deploy to 2 hosts at a time
max_fail_percentage: 25
pre_tasks:
- name: Remove from load balancer
delegate_to: localhost
uri:
url: "https://lb.yourorg.com/api/servers/{{ inventory_hostname }}"
method: DELETE
headers:
Authorization: "Bearer {{ lb_token }}"
- name: Wait for connections to drain
wait_for:
timeout: 30
roles:
- nginx
- application
post_tasks:
- name: Wait for application to be ready
uri:
url: "http://localhost:8080/health"
status_code: 200
register: health
until: health.status == 200
retries: 10
delay: 5
- name: Add back to load balancer
delegate_to: localhost
uri:
url: "https://lb.yourorg.com/api/servers"
method: POST
body_format: json
body:
hostname: "{{ inventory_hostname }}"
Vault for Secrets
# Create encrypted file
ansible-vault create group_vars/all/vault.yml
# Edit encrypted file
ansible-vault edit group_vars/all/vault.yml
# Encrypt existing file
ansible-vault encrypt secrets.yml
# Use in playbook
ansible-playbook site.yml --ask-vault-pass
# group_vars/all/vault.yml (encrypted)
vault_db_password: "super_secret_password"
vault_api_key: "api_key_here"
vault_ssl_key: |
-----BEGIN PRIVATE KEY-----
...
-----END PRIVATE KEY-----
# Reference in variables
# group_vars/all/main.yml
db_password: "{{ vault_db_password }}"
api_key: "{{ vault_api_key }}"
CI/CD Integration
GitLab CI Pipeline
# .gitlab-ci.yml
stages:
- lint
- test
- deploy
variables:
ANSIBLE_HOST_KEY_CHECKING: "False"
ANSIBLE_FORCE_COLOR: "True"
lint:
stage: lint
image: cytopia/ansible-lint
script:
- ansible-lint playbooks/*.yml roles/*/
test:
stage: test
image: ansible/ansible:latest
script:
- ansible-playbook playbooks/site.yml --syntax-check
- ansible-inventory --list -i inventory/staging/ > /dev/null
deploy_staging:
stage: deploy
image: ansible/ansible:latest
only:
- develop
script:
- eval $(ssh-agent -s)
- echo "$SSH_PRIVATE_KEY" | ssh-add -
- ansible-playbook -i inventory/staging playbooks/site.yml
environment:
name: staging
deploy_production:
stage: deploy
image: ansible/ansible:latest
only:
- main
when: manual
script:
- eval $(ssh-agent -s)
- echo "$SSH_PRIVATE_KEY" | ssh-add -
- ansible-playbook -i inventory/production playbooks/site.yml
environment:
name: production
Testing with Molecule
# molecule/default/molecule.yml
---
dependency:
name: galaxy
driver:
name: docker
platforms:
- name: ubuntu-22
image: ubuntu:22.04
pre_build_image: true
- name: debian-11
image: debian:11
pre_build_image: true
provisioner:
name: ansible
verifier:
name: ansible
# molecule/default/converge.yml
---
- name: Converge
hosts: all
become: true
roles:
- role: nginx
# molecule/default/verify.yml
---
- name: Verify
hosts: all
tasks:
- name: Check nginx is installed
command: nginx -v
register: nginx_version
changed_when: false
- name: Check nginx is running
service_facts:
- name: Assert nginx is running
assert:
that:
- "'nginx' in services"
- "services['nginx']['state'] == 'running'"
Lessons Learned
- Idempotency is everything. Playbooks should be safe to run multiple times without side effects.
- Keep roles focused. One role = one responsibility. Compose complex configurations from simple roles.
- Version your roles. Use
requirements.ymland pin versions for reproducibility. - Test before production. Molecule testing catches issues before they hit real servers.
- Document your variables. Future you will thank present you.
Conclusion
Ansible transforms infrastructure management from tribal knowledge into version-controlled code. Start with simple playbooks, graduate to roles as complexity grows, and integrate with CI/CD for confidence in changes.
The investment in automation pays dividends: faster deployments, fewer mistakes, and servers that configure themselves the same way every time.