a real large scale ML service is essentially a global hpc grid. running inference on state-of-the-art models is incredibly computationally expensive, and has very tight latency constraints
example thats very easy to illustrate is facebook machine translation
you have a limited amount of compute resources, an ever changing set of models+code, and different countries using it at different times of the day. in addition, the state of the art is always changing (remember LSTMs became popular only a few years ago) and the architecture is always adapting to research
building and running such a system is essentially HPC - huge amount of multithreading, custom loadbalancers and routing across datacenters, parallel algorithms, using hardware accelerators in creative ways (given limited instruction sets), etc etc
modelling involves a lot of expeirmentation (and a lot of patience!), as well as a ton of reading and staying up to date. you have to be REALLY into it to do well. like the modeling people at the top FAANG groups likely never thought of it as some trendy thing to get into, or maybe they did at the beginning, but the people who really live that life are super passionate. its not really a thing you do just because
note: all of this only applies if you're working on some large scale thing where its competing for accuracy -- image classification, video understanding, etc
tons of jobs have modelling and serving stack in the same SWE role. things like training classifiers for spreadsheets or ad impressions or whatever