On the Importance of Subjective and Objective Measures

Hi, everyone. I’ve been a member for just over a month now, but this is my first posted topic.

Subjective metrics (e.g. happiness) alone aren’t very consistent or reliable data points. Objective metrics (e.g. distance run) are great for statistical analysis but don’t yet exist for many of the things we care about. To get around that, we use markers or proxies of the things we care about. I argue that a combination of subjective and objective metrics is a powerful tool in gaining insights.

The latest version of my personal tracking framework has around 40 data points for both Subjective Productivity Rating (SPR) and for Deep Work hours.

SPR is a measure of how productive my day was with reference to the things most important to my life and career, on an integer scale of 1–5.

Deep Work hours are a related objective metric, simply measured by how many hours I logged in my retrospective calendar as being spent in a state of Deep Work. This is really only a proxy for how productive I’ve been, as my hours in this focused state are not always equally intense, so an hour of Deep Work is of varying impact.

On their own, neither is a very good indicator of how productive I’ve really been. SPR is subject to bias, flawed perceptions and the inherent noisiness of quantifying a psychological experience, whilst Deep Work hours are only a reference point for how productive I was. Both are great things to track individually, but together they allow me to get a sense of:

  1. How productive I perceive my time spent working to be.
  2. If/when more time in deep work starts having diminishing returns.
  3. How closely each may be tethered to my actual productivity.

I’ve shared a plot of this data and some basic analysis in a blog post, so please check that out if you’re interested.

As a new member, but longtime self-tracker (3 years now), I always welcome feedback from the community — especially from anyone who tracks similar metrics or has suggested improvements to my setup.

Happy tracking,


1 Like

I like the simplicity of your plot and also the SPR as a measure. While SPR might be understood as a proxy to measure outputs of work that are appreciable by others, it could also be understood as relevant independently. The fears that it is biased, noisy and invalid may be overblown; after all, simple subjective self-assessments have been used for in psychology for decades and investigated in many validity studies.

1 Like

Thanks for the feedback! I tend to err on the side of caution when it comes to drawing any conclusions from my personal data. In this case, I wasn’t convinced that SPR maps reliably to my actual productivity. If, as you point out, I am being overly pessimistic about the relevance of SPR as a metric, then I can always update my hypotheses to be more lenient.

I should also point out that when cross-referencing the SPR data from previous datasets with my productivity score on RescueTime, there was also fairly low correlation. Once again, that could be due to flaws in the RescueTime algorithm and/or flaws in my perceived productivity on any day. My plan is to simply keep collecting daily data for all these metrics and constantly re-examine this relationship in the future. I’d also like to add some more task-specific productivity measures, like net words written (for writing and blogging) and lines of code committed (for software projects).

I’d love to know your thoughts on my choice of a 1-5 scale as opposed to 3-, 7-, or 10- point scales.

Thanks again for the response. It’s great to have your input on this!

Interesting that you highlight the question about scale. This was a very active topic in the first few years of QS meetings. My lessons from this discussion was to customize the scale according to the phenomenon you are trying to learn about. Where you are trying to measure small effects, you need a more precise scale (and reason to believe that the precise measurements are valid - which may be hard to come by). The late Seth Roberts had a 100 point scale for sleep quality, which everybody gave him a hard time about. There were skeptical questions about it at nearly every talk, and while I learned a ton from Seth and respected his experience a lot, I myself was never convinced that he needed 100 points on that scale. I tend to care about extreme, easily noticed effects, so I use binary measures or three point scales most of the time.

1 Like

That’s a great story about Seth. I’d have to agree that 100-pointing sleep is a bit excessive, but I love the fact that he stuck to his guns on what is, ultimately, something we still need to debate.

Around 80% of my actively tracked metrics are indeed binary. It’s the obvious choice for habit tracking and other “did I do x today” variables. For SPR, SMR (subjective mood rating), general anxiety, and a few other metrics I use my 5-point scale. Everything else has units. I can’t recall ever using 3-point metrics before. Could you elaborate on what variables you apply them to?

I’ve used a 3 pt scale for sleep quality, mood, and hunger. For sleep and mood, 1 is very good, 2 is “somewhere in the middle” and 3 is very bad. For sleep and mood, my main goal was to find out if there were triggers for very bad sleep or very bad mood. (Yes, I could have used a binary measure.) For hunger 1 was somewhat hungry, 2 was hungry, 3 was very hungry. I wanted to see if very strong cravings were caused by low blood sugar. Instead, the strongest cravings came not soon after low points but about ninety minutes after blood sugar peaks, when I was in a normal range but quickly declining. I think there may be an effect of pace of change of blood sugar, although I’m not sure.

1 Like

Ah that’s interesting. Your sleep and mood 3-pt scales function in a similar way to my 5-pt ones. I very seldom have 1s or 5s for most variable types, so in reality 3 is just “somewhere in the middle”, 2 is bad, 4 is good, and 1 and 5 serve as special cases for truly exceptional circumstances.

At the end of the day, either system could be converted to a binary system by increasing the number of features. The resulting statistical correlations would likely be very similar. The difference is mostly for the human interface of the tracking – where our personal preferences are one of the dominant factors.

Your blood sugar vs. hunger experiments sound quite fascinating. Did you write up the findings anywhere?

I didn’t write anything up yet, but when I can get hold of some more Freestyle Libre sensors I’m going to do this again.