[D] Manipulating class distributions based on holdout test set data...seems wrong, but maybe it's right?

This comment was posted to reddit on Mar 04, 2022 at 1:06 pm and was deleted within 6 hour(s) and 13 minutes.

[D] Manipulating class distributions based on holdout test set data...seems wrong, but maybe it's right?

The change in the pos/neg ratio is probably just one of many other timely shifts in your observed distribution. What you are observing is one simple instance of what is called "concept drift". In the ideal scenario you should have enough data to do a more fine-grained evaluation of your model in the time scale (evaluating the model performance in validation present data with a fixed window of past training data), instead of having a whole chunk as your training data (19-21) and one as your held-out (22).

Without enough data, however, you'll probably need to do proper EDA and check how your data shifts with time before choosing a modeling strategy.

/r/MachineLearning Thread

[D] Manipulating class distributions based on holdout test set data...seems wrong, but maybe it's right?

Recently removed from /r/MachineLearning

More Random Comments