[D] Best way to approach solving this toy problem with ML

Sorry - I don't think I was clear after all.

So the sequence that is generated will be a known length which is decided before the game begins. And you/the agent can make as many guesses at the code as you would like - the challenge is to make an attempt as guessing the sequence AND the value of Z.

As a human I can come up with a number of heuristics that will try to achieve the goal in a lower number of steps than just random guessing.

For example lets say the generated sequence is 1100 and the generated Z is 2:

I guess:

1111 - 'close'

0000 - 'close'

1100 - 'close'

1110 - 'not close' (now I know there is some sensitivity on the result on the third bit)

1101 - 'not close'

etc.

So as you can see one can build up a knowledge base of evidence given their historical guesses and the systems 'close'/'not close' outputs.

Essentially, the agent should be able to navigate this evidence till it has enough to make a guess on the sequence and Z.

/r/MachineLearning Thread Parent