Aligning artificial intelligence types of intelligence, and counter alien values

Most of those thought experiments were developed back when fitness functions were hand coded.

Fitness functions and LLMs weights are now adjusted with feedback, which means the Ai is learning the kinds of solutions and outputs that humans like and approve of. A sufficiently advanced ASI will be able to model humans pretty perfectly, like we can model lesser life forms such as The C-elegans (https://openworm.org/) which is nearly perfectly.

Which means that it's outputs will be very much within the domain space of the sorts of things that humans like.

The biggest danger is wireheading, and we absolutely HAVE to give it lots of feedback and training to prevent it from wireheading.

/r/ControlProblem Thread