Can you also explain why people say it can't be automated easily? The way I see it we need a source of perception and a good teacher. Like an array of high resolution visual, audio (to get noises and dialog recognition) and chemical sensors that will be connected to an ANN and also some successful doctors who will be source of classification error which we train the net with. Get a few of such trained nets as competitors in a zero-sum game and train the final net creating a realistic predictor.
For example this kind of "doctor visit" can be in the later stages of process: a mobile phone app that use speech generation/recognition to get a rough estimate of the person illness then gradually closing in using additional input using internal (from camera and microphone) and external (analyses and scans) data and those pre-trained ANNs. Also the app can ask as many questions as it can if it doesn't get something and request analyses done if there is an ambiguity which is easily implemented.