Craig Marvelley

Software and such

Waiting for SSH

Update 2014/09/06 - Michael DeHaan mentions on Twitter that he adds in a few seconds of sleep with the pause module to ensure the SSH port is open.

I’ve spent the day provisioning a whole lot of EC2 instances with Ansible from a control machine in the cloud. This involves two stages: firstly an instance is launched, and then once SSH is available (using Ansible’s wait_for module), the second stage of more detailed provisioning begins.

An issue I’d experienced a few times previously but had not been able to pinpoint, was that often the wait_for module failed to identify that SSH is ready. My Ansible task looked like this:

1
2
3
4
5
6
7
8
9
- name: Wait until SSH is available
  local_action: 
    module: wait_for 
    host: "{{ item.public_dns_name }}"
    port: 22 
    delay: 60 
    timeout: 320 
    state: started
  with_items: launched_instances.instances

That task would often time out, but in such cases if I were to immediately try to SSH from the terminal it would succeed, which was odd indeed.

Today this behaviour was consistent, and I eventually realised that in the task I was using the instance’s public DNS name, whereas when I was connecting via the terminal I used the public IP address. Indeed, changing the task to use the IP address seems to have made the whole thing a lot more reliable:

1
2
3
4
5
6
7
8
9
- name: Wait until SSH is available
  local_action: 
    module: wait_for 
    host: "{{ item.public_ip }}"
    port: 22 
    delay: 60 
    timeout: 320 
    state: started
  with_items: launched_instances.instances

I’m guessing that on my Mac (where this was rarely an issue) the DNS cache updates quicker than it does on the control machine in EC2, where this problem was more frequent - using the explicit IP address renders the issue moot.