Is it true that most ML Engineer jobs are focused on building ML infrastructure and systems, rather than on models and algorithms?

a real large scale ML service is essentially a global hpc grid. running inference on state-of-the-art models is incredibly computationally expensive, and has very tight latency constraints

example thats very easy to illustrate is facebook machine translation

you have a limited amount of compute resources, an ever changing set of models+code, and different countries using it at different times of the day. in addition, the state of the art is always changing (remember LSTMs became popular only a few years ago) and the architecture is always adapting to research

building and running such a system is essentially HPC - huge amount of multithreading, custom loadbalancers and routing across datacenters, parallel algorithms, using hardware accelerators in creative ways (given limited instruction sets), etc etc

modelling involves a lot of expeirmentation (and a lot of patience!), as well as a ton of reading and staying up to date. you have to be REALLY into it to do well. like the modeling people at the top FAANG groups likely never thought of it as some trendy thing to get into, or maybe they did at the beginning, but the people who really live that life are super passionate. its not really a thing you do just because

note: all of this only applies if you're working on some large scale thing where its competing for accuracy -- image classification, video understanding, etc

tons of jobs have modelling and serving stack in the same SWE role. things like training classifiers for spreadsheets or ad impressions or whatever

/r/cscareerquestions Thread Parent