How I found out that Oura's sleep stage detection is a joke, after years of using the ring

A few months ago, Oura introduced a “New Sleep Staging Beta” algorithm for sleep stage detection. Their mistake was that they enabled it at the same time as the current algorithm, so you can compare the two in the the app.

That’s what I did, and the results shocked me.

One one night, my deep sleep was 23 minutes according to the current algo, and 1h23m according to the Beta one. This is way out of the margin of error by an unimaginable margin.

Another night, 15 minutes vs. 1h5m.

It wasn’t always this bad. Some other nights the new duration was within 20% of the old one. But the point is, there was no way to tell when the numbers would differ wildly. The duration of REM sleep was also pretty wildly fluctuating between the current and beta algorithms

The bigger point is that YEARS OF SLEEP TRACKING with Oura might be no more useful than common sense. Tweaking your diet or exercise or supplements then checking the app the next morning for how well your slept, is about as accurate as palm reading. Yet I’ve seen countless users boasting in the Oura FB users group about how they did this or that and had fantastic numbers the next morning. I hate to break it to you guys, but those numbers might as well be RANDOM.

Of course, you should not draw conclusions after one night. But who’s to say that the old algorithm wasn’t consistently wrong? It also claimed to have been “validated in sleep labs” and whatnot.

Detecting sleep stages from movement or bloodflow is just a hard problem. Without an EEG, it seems there’s no good enough solution.

So if you think about paying for Oura’s monthly subscription, I’d suggest thinking again.

1 Like

As far as i remember almost all devices are like this.

1 Like

Lately I have found it helpful not to beleive any of them. But some of their measures can be reverse engineered to correlate with my own experience.

I recently bought both a FitBit Charge and Garmin Vivosmart 5. They both claim to measure sleep but neither seems to measure anything real.

The FitBit Readyness score also seems a bit made up and does not necessarily correlate to how I feel. But it has a nice display and the trends at night on heart rate are actually kind of interesting. Lower heart rate seems to correlate with wellbeing. And the SPO2 went up after I started exploring a new breathing practice.

I like the Garmin body battery because it does seem to correlate with how I am feeling and it allows me to investigate things that might substantially affect it during the day.

The Garmin SPO2 seems about 10 point below a standard meter so there it is more itneresting to explore the variance rather than the absolute values. One interesting recent discovery with the Garmin is that my reported stress levels at night went down and my SPO2 levels went up after I started taping my mouth, which I had previously been pretty skeptical of.

That said, i am not used to wearing a watch after so many years without, especially at night. I might give it a few more weeks to see what else I can learn.

I can make those values go up by tightening the band a bit more :upside_down_face:

2 Likes

How much “rank-biased overlap” is there between the old and new sleep scores?

If certain values are consistently over- or underestimated, it can still be useful: e.g. if alcohol appeared to be reducing deep sleep from 60 to 30 minutes, but now it’s from 120 to 60 minutes, you’d know it’s not good for your sleep either way! But you would not want to say “hey I’m getting the recommended 60 minutes of deep sleep despite drinking so it’s all good” :grimacing:

2 Likes

There also a lot of people on reddit, who believe in accurate oura sleep stages.
From my experience there are no non-eeg devices with acceptable accuracy to measure daily deviations of sleep stages. But they are still useful for measuring other variables, for example, total time in bed looks fine. One can deduct subjective awake time from it and get pretty valuable total sleep time metric with good accuracy.

I own multiple eeg devices and i can tell that its not a simple story even for them. Dreem 2 have a better accuracy for sure, but when I’ve looked at Hypnodyne ZMax raw eeg data I’ve found that eeg signal is complex and sometimes there are 2 sleep stages at same time. For example, imagine rapid eye movements during slow delta waves - which stage should be marked here? Which stage sleep specialist will mark in that situation during psg study? There are no easy answer here (it seems that left / right parts of the brain can be in different stages at same time). Also 1 hour of deep sleep for different days might be not same, because there are difference in delta wave power/amplitude. Now we came that lenght of stage may not be enough to make conclusions when comparing different nights.

Non-eeg devices trying to solve a task which isnt easily solved with an eeg data and imho their prediction models should not be exposed to general public because of poor accuracy and misleading metrics.

My Garmin sleep data and body battery data are perfect for my needs because they help me passively measure how alcohol consumption and sexual activity affect my rest and recovery. I believe oura ring, apple watch, and several options can yield the same insights.

If you’re interested in measuring and modifying behaviors, then there are still good options out there. However, you probably need clinical assistance if you want anything related to brain waves patterns.

After years of measuring my sleep carefully, I finally gave up a year or two ago because, like you, I started to appreciate the inaccuracy of the data. I have yet to see a good analysis of self-tracked sleep data that generated anything (1) non-obvious and (2) actionable.

Instead, I just keep a written notebook where I record at a high level what happened. A night of unexplained poor sleep, for example, and I’ll write precisely what I ate/did the previous day. Later, if it happens again I’ll compare to see if I can identify anything those nights had in common.

It feels more “scientific” to track everything to the nth degree, but I get more out of my current system that’s actionable.

3 Likes

https://wiki.openhumans.org/wiki/Finding_relations_between_variables_in_time_series

the same feeling. And when i found the bad data, for example low-level sleep quality, i really really feel nervous. How can i survive this kind of feeling?

Every journey of 10000 steps start with the first step. Take a baby step and then another one. Relax between steps and pace yourself. Be patient.

Does a tired lion care about low-quality sleep data after a storm or bothersome hyenas? or does he seek food, drink. Rest. Play. And sleep?

I’ve been using a Fitbit charge for a couple of years and it’s basic sleep time measure seems to track with my own observations; for instance, if I have a couple of wake ups at night that I remember, and then a period where I fall back asleep, I typically see this reflected in my sleep data. I have no confidence in the sleep stages and pay little attention to them, except that I do notice the “deep sleep” is almost always before 2 am when I almost never wake up and coincides with my lowest heart rate, while the light sleep often comes in the early morning hours (2 am to 5 am) when I sometimes do wake up. I should look more closely.

2 Likes

I have used the Oura ring since 2017 - for me, its always been a recovery tracker; HRV, resting heart rate, breathing rate & temperature.

Sleep is meh… kinda nice to know but I cannot draw any concrete conclusions on that

1 Like