+1 for that. I don't learn well from reading it in a book, rather I learn best by looking through working examples. Even if you only release some of the roles and a generic starter, the community would be forever grateful.
I finally got some time together to look further into Ansible and how I can leverage it at work. My first project was to handle JSS upgrades. Not as elaborate as this session was setting it up from scratch, but has been a good place for me to start:
Hope it helps someone else out!
Nice work! Just curious in a clustered environment how are you handling the master upgrading the JSS and the non master nodes being down during this process? Are you leveraging the new feature set where if a non master node detects the JSS is upgrading it displays a maintenance message?
Also, do you have a test environment where this stuff is validated, then once validation is done do you migrate this process to production? Sorry for all the questions, but I have been wanting to spend some serious time with Ansible to create exactly what you just did and automate upgrades. I just have not had the dedicated time to do so.
First I kill services at the load balancer just to make my life easier.
Then, I've set the playbook up to do the following:
I run this against a small VM cluster (only 2 nodes) then promote to production. Snapshots are my friend.
I think that there is definitely some room for improvement which will gain some increase in speed as well.
If you've got any ideas on how I can check for the JSS upgrade to be complete and at a login page, I would love to hear those.
Awesome stuff man. I am really digging it. One thing I was thinking about doing to ensure that web app is up and done upgrading is just a GET request against an API resource with no authentication. Which in return should always output with a 401 error (not authorized). However, I have not actually tested this method against the JSS during an upgrade, and my test environments are pretty small and upgrade in a matter of seconds sometimes.
That would at least be more intelligent than a service status check, or a ping to see if the JSS is up. You could also try to curl/GET a cookie based session. I would imagine if the JSS is working through its upgrade process you probably can't authenticate to it. Then in your scripting logic look for non 200 status to halt/sleep the process (on the non master nodes) until you get the 200 status then bring the cluster back online.
I have not extensively tested any of these methods but these are the ideas I had when I was drafting up the Ansible playbook process in my head. Hopefully I will have some time in the future to dedicate a good solid amount of hours to build the environment and test some of this stuff. I am booked pretty solid though for the foreseeable future unfortunately.
@tlarkin Good call on the API call!
First, I tried doing a cURL on the /robots.txt (which is what I use on my load balancers to verify Tomcat has actually loaded) and validate the return string but I think the API method may be simpler to check with Ansible.
I can set the Ansible uri module to look for that return code of 401. While the JSS is starting up, I get a return code of 200 until the login window appears. At that point, I get the expected 401 from the API. I will play around with it and see if I can replicate a long upgrade (I may still have a an older db backup that would require a table conversion, I will have to look) and see how things go.
All - I have updated the playbook to perform an GET on the API's for the activationcode. It appears, at least in my testing, that the API is not active while the startup screen is present on the JSS. I still need to test this during a longer upgrade, but that will probably have to wait until next week. It will query the API for a 401 return code ever 15 seconds, for up to an hour, before failing. I also updated the logic to use variables based on the ssh_user you configure in your Ansible host file rather than being statically set to "administrator."
One caveat: you will need to install the httplib2 python library on all of your remote hosts. Otherwise, you will always get an error on the API query task.
Edit: Cluster upgrade in 2 minutes and 2 seconds? Yes, please.
real 2m12.790s user 0m1.618s sys 0m1.259s