I've transitioned in my job from doing pure programming to now working on machine learning. I can detail a bit here how I went about learning the requisite background.
The process has overall involved about a year of on and off self-education. I never took any online courses. I just don't find them useful.
The book that kicked me off on the right path was:
This is not strictly a machine learning book. It's pretty much an entire book (written as an IPython notebook, but also available in paperback) dedicated to explaining a powerful algorithm in Bayesian inference known as Markov Chain Monte Carlo (MCMC) using the PyMC library.
The value here is that it's very introductory in nature, it's very hands-on, and it got me in the mindset of thinking about statistics when I hadn't done so in years.
I jumped into this book next:
I didn't read it cover-to-cover and in fact there's still a lot of material here I haven't read. This is a pretty intense book for someone new to statistics. Even if you have a rock-solid background in mathematics, the statistical language may still throw you for a loop here and there. Nevertheless, it made for excellent "background" reading while I absorbed information from other sources.
At this point the Deep Learning book was available:
This is an excellent book, and despite the title it covers more than just pure deep learning. That's definitely the focus, but the introductory chapters on statistics and machine learning in general were fantastic. I recommend reading this book cover-to-cover. It's a gold mine. I was able to go back to The Elements of Statistical Learning after this and follow along much more easily.
After reading that book, I was shocked to find I could read state-of-the-art research papers in deep learning and follow along with most or even every detail. To get started, I recommend:
This is a new online machine learning journal focused on exposition and clear explanations more than original research. Publications are rare but uniformly excellent. My own work is focused pretty strongly on sequence processing rather than images, so here are some research papers I read:
All throughout this process I was playing around with toy problems using Python and Spark.