Infrastructure Automation with Ansible: From Basics to Production

Introduction

Manual server configuration doesn’t scale. After managing infrastructure manually for years, I’ve learned that automation isn’t optional—it’s survival. Ansible has become my tool of choice for infrastructure automation: agentless, readable, and powerful.

This guide covers building production-ready Ansible automation, from basic playbooks to complex roles with CI/CD integration.

Why Ansible?

FeatureAnsiblePuppetChefTerraform
Agent RequiredNoYesYesNo
LanguageYAMLDSLRubyHCL
Learning CurveLowHighHighMedium
Best ForConfig ManagementConfig ManagementConfig ManagementInfrastructure Provisioning

Ansible’s agentless architecture and YAML syntax make it accessible while remaining powerful enough for complex automation.

Project Structure

A well-organized Ansible project:

ansible/
├── ansible.cfg
├── inventory/
│   ├── production/
│   │   ├── hosts.yml
│   │   └── group_vars/
│   │       ├── all.yml
│   │       ├── webservers.yml
│   │       └── databases.yml
│   └── staging/
│       ├── hosts.yml
│       └── group_vars/
├── playbooks/
│   ├── site.yml
│   ├── webservers.yml
│   └── databases.yml
├── roles/
│   ├── common/
│   ├── nginx/
│   ├── postgresql/
│   └── monitoring/
└── requirements.yml

Configuration

# ansible.cfg
[defaults]
inventory = inventory/production
roles_path = roles
remote_user = ansible
private_key_file = ~/.ssh/ansible_key
host_key_checking = False
retry_files_enabled = False
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 86400

[privilege_escalation]
become = True
become_method = sudo
become_user = root

[ssh_connection]
pipelining = True
control_path = /tmp/ansible-%%h-%%p-%%r

Inventory Management

Dynamic Inventory

Static inventory files don’t scale. Use dynamic inventory:

# inventory/production/hosts.yml
all:
  children:
    webservers:
      hosts:
        web-[01:05].prod.yourorg.com:
    databases:
      hosts:
        db-01.prod.yourorg.com:
        db-02.prod.yourorg.com:
      vars:
        postgresql_version: 15
    monitoring:
      hosts:
        mon-01.prod.yourorg.com:

Cloud Dynamic Inventory

#!/usr/bin/env python3
# inventory/azure_inventory.py

import json
from azure.identity import DefaultAzureCredential
from azure.mgmt.compute import ComputeManagementClient

def get_inventory():
    credential = DefaultAzureCredential()
    subscription_id = "your-subscription-id"
    compute_client = ComputeManagementClient(credential, subscription_id)
    
    inventory = {
        "_meta": {"hostvars": {}},
        "all": {"children": ["webservers", "databases"]},
        "webservers": {"hosts": []},
        "databases": {"hosts": []}
    }
    
    for vm in compute_client.virtual_machines.list_all():
        tags = vm.tags or {}
        hostname = vm.name
        
        # Categorize by tag
        role = tags.get("role", "other")
        if role not in inventory:
            inventory[role] = {"hosts": []}
        inventory[role]["hosts"].append(hostname)
        
        # Add host variables
        inventory["_meta"]["hostvars"][hostname] = {
            "ansible_host": get_private_ip(vm),
            "vm_size": vm.hardware_profile.vm_size,
            "environment": tags.get("environment", "unknown")
        }
    
    return inventory

if __name__ == "__main__":
    print(json.dumps(get_inventory(), indent=2))

Group Variables

# inventory/production/group_vars/all.yml
---
# Common settings for all hosts
ansible_python_interpreter: /usr/bin/python3
timezone: UTC
ntp_servers:
  - 0.pool.ntp.org
  - 1.pool.ntp.org

# Security baseline
ssh_permit_root_login: "no"
ssh_password_authentication: "no"
fail2ban_enabled: true

# Monitoring
zabbix_server: mon-01.prod.yourorg.com
# inventory/production/group_vars/webservers.yml
---
nginx_worker_processes: auto
nginx_worker_connections: 4096
ssl_protocols: "TLSv1.2 TLSv1.3"
ssl_ciphers: "ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256"

Role Development

Role Structure

roles/nginx/
├── defaults/
│   └── main.yml
├── files/
│   └── nginx.conf
├── handlers/
│   └── main.yml
├── meta/
│   └── main.yml
├── tasks/
│   ├── main.yml
│   ├── install.yml
│   └── configure.yml
├── templates/
│   ├── nginx.conf.j2
│   └── vhost.conf.j2
└── vars/
    └── main.yml

Complete Role Example

# roles/nginx/defaults/main.yml
---
nginx_user: www-data
nginx_worker_processes: auto
nginx_worker_connections: 1024
nginx_keepalive_timeout: 65
nginx_server_tokens: "off"
nginx_ssl_session_timeout: 1d
nginx_ssl_session_cache: shared:SSL:50m
# roles/nginx/tasks/main.yml
---
- name: Include OS-specific variables
  include_vars: "{{ item }}"
  with_first_found:
    - "{{ ansible_distribution | lower }}-{{ ansible_distribution_major_version }}.yml"
    - "{{ ansible_distribution | lower }}.yml"
    - "{{ ansible_os_family | lower }}.yml"

- name: Install Nginx
  include_tasks: install.yml

- name: Configure Nginx
  include_tasks: configure.yml

- name: Configure virtual hosts
  include_tasks: vhosts.yml
  when: nginx_vhosts is defined
# roles/nginx/tasks/install.yml
---
- name: Install Nginx (Debian/Ubuntu)
  apt:
    name: nginx
    state: present
    update_cache: yes
  when: ansible_os_family == "Debian"

- name: Install Nginx (RHEL/CentOS)
  yum:
    name: nginx
    state: present
  when: ansible_os_family == "RedHat"

- name: Ensure Nginx is enabled
  service:
    name: nginx
    enabled: yes
    state: started
# roles/nginx/tasks/configure.yml
---
- name: Configure main nginx.conf
  template:
    src: nginx.conf.j2
    dest: /etc/nginx/nginx.conf
    owner: root
    group: root
    mode: '0644'
    validate: nginx -t -c %s
  notify: Reload Nginx

- name: Create SSL directory
  file:
    path: /etc/nginx/ssl
    state: directory
    owner: root
    group: root
    mode: '0700'

- name: Configure DH parameters
  command: openssl dhparam -out /etc/nginx/ssl/dhparam.pem 2048
  args:
    creates: /etc/nginx/ssl/dhparam.pem
{# roles/nginx/templates/nginx.conf.j2 #}
user {{ nginx_user }};
worker_processes {{ nginx_worker_processes }};
pid /run/nginx.pid;

events {
    worker_connections {{ nginx_worker_connections }};
    use epoll;
    multi_accept on;
}

http {
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout {{ nginx_keepalive_timeout }};
    types_hash_max_size 2048;
    server_tokens {{ nginx_server_tokens }};

    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    # SSL Settings
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_prefer_server_ciphers on;
    ssl_session_timeout {{ nginx_ssl_session_timeout }};
    ssl_session_cache {{ nginx_ssl_session_cache }};
    ssl_dhparam /etc/nginx/ssl/dhparam.pem;

    # Logging
    access_log /var/log/nginx/access.log;
    error_log /var/log/nginx/error.log;

    # Gzip
    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_types text/plain text/css text/xml application/json application/javascript;

    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}
# roles/nginx/handlers/main.yml
---
- name: Reload Nginx
  service:
    name: nginx
    state: reloaded

- name: Restart Nginx
  service:
    name: nginx
    state: restarted

Advanced Patterns

Handlers with Listen

# Aggregate handlers to avoid multiple restarts
handlers:
  - name: Restart all services
    listen: "restart services"
    debug:
      msg: "Restarting services..."

  - name: Restart nginx
    listen: "restart services"
    service:
      name: nginx
      state: restarted

  - name: Restart php-fpm
    listen: "restart services"
    service:
      name: php-fpm
      state: restarted

Conditional Execution

- name: Deploy application
  block:
    - name: Pull latest code
      git:
        repo: "{{ app_repo }}"
        dest: "{{ app_path }}"
        version: "{{ app_version }}"
      register: git_result

    - name: Install dependencies
      command: composer install --no-dev
      args:
        chdir: "{{ app_path }}"
      when: git_result.changed

    - name: Run migrations
      command: php artisan migrate --force
      args:
        chdir: "{{ app_path }}"
      when: git_result.changed

  rescue:
    - name: Rollback on failure
      command: git checkout {{ previous_version }}
      args:
        chdir: "{{ app_path }}"
        
    - name: Notify team
      slack:
        token: "{{ slack_token }}"
        msg: "Deployment failed on {{ inventory_hostname }}"

Delegation and Serial Execution

# Rolling updates with health checks
- name: Deploy to webservers
  hosts: webservers
  serial: 2  # Deploy to 2 hosts at a time
  max_fail_percentage: 25
  
  pre_tasks:
    - name: Remove from load balancer
      delegate_to: localhost
      uri:
        url: "https://lb.yourorg.com/api/servers/{{ inventory_hostname }}"
        method: DELETE
        headers:
          Authorization: "Bearer {{ lb_token }}"
    
    - name: Wait for connections to drain
      wait_for:
        timeout: 30

  roles:
    - nginx
    - application

  post_tasks:
    - name: Wait for application to be ready
      uri:
        url: "http://localhost:8080/health"
        status_code: 200
      register: health
      until: health.status == 200
      retries: 10
      delay: 5

    - name: Add back to load balancer
      delegate_to: localhost
      uri:
        url: "https://lb.yourorg.com/api/servers"
        method: POST
        body_format: json
        body:
          hostname: "{{ inventory_hostname }}"

Vault for Secrets

# Create encrypted file
ansible-vault create group_vars/all/vault.yml

# Edit encrypted file
ansible-vault edit group_vars/all/vault.yml

# Encrypt existing file
ansible-vault encrypt secrets.yml

# Use in playbook
ansible-playbook site.yml --ask-vault-pass
# group_vars/all/vault.yml (encrypted)
vault_db_password: "super_secret_password"
vault_api_key: "api_key_here"
vault_ssl_key: |
  -----BEGIN PRIVATE KEY-----
  ...
  -----END PRIVATE KEY-----
# Reference in variables
# group_vars/all/main.yml
db_password: "{{ vault_db_password }}"
api_key: "{{ vault_api_key }}"

CI/CD Integration

GitLab CI Pipeline

# .gitlab-ci.yml
stages:
  - lint
  - test
  - deploy

variables:
  ANSIBLE_HOST_KEY_CHECKING: "False"
  ANSIBLE_FORCE_COLOR: "True"

lint:
  stage: lint
  image: cytopia/ansible-lint
  script:
    - ansible-lint playbooks/*.yml roles/*/

test:
  stage: test
  image: ansible/ansible:latest
  script:
    - ansible-playbook playbooks/site.yml --syntax-check
    - ansible-inventory --list -i inventory/staging/ > /dev/null

deploy_staging:
  stage: deploy
  image: ansible/ansible:latest
  only:
    - develop
  script:
    - eval $(ssh-agent -s)
    - echo "$SSH_PRIVATE_KEY" | ssh-add -
    - ansible-playbook -i inventory/staging playbooks/site.yml
  environment:
    name: staging

deploy_production:
  stage: deploy
  image: ansible/ansible:latest
  only:
    - main
  when: manual
  script:
    - eval $(ssh-agent -s)
    - echo "$SSH_PRIVATE_KEY" | ssh-add -
    - ansible-playbook -i inventory/production playbooks/site.yml
  environment:
    name: production

Testing with Molecule

# molecule/default/molecule.yml
---
dependency:
  name: galaxy
driver:
  name: docker
platforms:
  - name: ubuntu-22
    image: ubuntu:22.04
    pre_build_image: true
  - name: debian-11
    image: debian:11
    pre_build_image: true
provisioner:
  name: ansible
verifier:
  name: ansible
# molecule/default/converge.yml
---
- name: Converge
  hosts: all
  become: true
  roles:
    - role: nginx
# molecule/default/verify.yml
---
- name: Verify
  hosts: all
  tasks:
    - name: Check nginx is installed
      command: nginx -v
      register: nginx_version
      changed_when: false

    - name: Check nginx is running
      service_facts:

    - name: Assert nginx is running
      assert:
        that:
          - "'nginx' in services"
          - "services['nginx']['state'] == 'running'"

Lessons Learned

  1. Idempotency is everything. Playbooks should be safe to run multiple times without side effects.
  2. Keep roles focused. One role = one responsibility. Compose complex configurations from simple roles.
  3. Version your roles. Use requirements.yml and pin versions for reproducibility.
  4. Test before production. Molecule testing catches issues before they hit real servers.
  5. Document your variables. Future you will thank present you.

Conclusion

Ansible transforms infrastructure management from tribal knowledge into version-controlled code. Start with simple playbooks, graduate to roles as complexity grows, and integrate with CI/CD for confidence in changes.

The investment in automation pays dividends: faster deployments, fewer mistakes, and servers that configure themselves the same way every time.

Resources