God as my witness, my autopilot just tapped the brakes to slow down when it saw a pothole...

Great link with the Karpathy talk. His diagram of the stack shows 1.0 code as typical deterministic expertly defined systems and 2.0 code as neural net like code where the model and the weights are tuned by an optimizer. The sensor data are prepared in a deterministic way using 1.0 code, passed to 2.0 code, then the output of the 2.0 code is used to inform 1.0 code expert systems to determine correct output to steering, acceleration and brake. There's not only Visual and Radar, but inertial measurement unit and ultrasonics. Road noise under the vehicle can relay important information.

I think the real holdup for 2.0 code is that Multi layer Neural Nets and all their variants (including corrections like local receptive fields) is still unsuitable for what is needed to safely drive. It's impossible to not overfit the labelled data. If you drill down to the most fundamental line of code in a ML algorithm it is just a summation over a dot product. Which basically boils down to result = (observation1 * weight1 + observation2 * weight2, ...) Layering these millions wide and hundreds of layers thick isn't a correction to the fundamental flaw. Coherence between separate areas of the area are reinforcing each other incorrectly. Consider this image: https://i.imgur.com/yLV3zMc.jpg The reason neural nets can't help but find a poodle in the surface of a chocolate muffin is because the neural net has no concept of 3d nor 2d coherence. It sees the eye, the ear, the texture of a dog in the muffin. Guess what, it's a dog because summation over dot product. eye + ear + fur = probably a dog. Fail.

3.0 code has to enable the neural nets to reconstruct the 3d dog in the image, so that when the 3d projection pops up, you can say, hahaha, this isn't a dog, it's just a photograph containing hundreds of dog ears, dog eyes and dog fur.

/r/teslamotors Thread Parent