The GRE computerized things do some of it, but not at a fine-grained level enough to matter…
I think people do some in SOME psych/neuro labs for trials but I can’t remember which ones. I know there are a lot of tests where you have to start over from the very beginning if you do them badly (eg on humanbenchmark.com) and this makes humanbenchmark suck because I want to see my “max ability” without having to go back through the 2-item/3-item/4-tem each time I screw up"
Satisficing objectives in the brain do not appear to be implemented by simply satisficing on a reward function, instead there is a slightly more complex PID control loop, where the salience and importance of varying objectives are flexibly increased and decreased in line with physiological needs to maintain homeostasis. I wrote previously about the benefits of dynamic reward functions and homeostatic control for alignment and specifically preventing the many pathologies of pure maximization.
However, there is a more important angle. Beyond just satisficing, hedonic treadmills are an existence proof of dynamic, flexible, and corrigible value changes occurring regularly in intelligent creatures including humans. Moreover, both reinforcement learning and these homeostatic control mechanisms are evolutionarily ancient (much older than the neocortex), so it is likely that they are very simple algorithms at their core. Importantly, the existence and ubiquity of such loops provide an existence proof of dynamically controlling policies and values learnt by reinforcement learning algorithms in a flexible and (almost entirely)3 corrigible way. This means that there must exist RL algorithms that give an outside system very powerful levers into its own internal objectives, and allow it to be flexibly changed while maintaining performance and coherent behaviour