How do you guys choose Sample Size for your self experimentation?

How do you guys choose sample size for your self experimentation? (With A/B experimental design)

Do you calculate it?

Can I just use 80*365=29.200 as population size, if I assume that I will live for 80 years? (29.200 days)

When I do the calculation shown on the image, can I then assume that my results are correct in 95% of all cases with an margin of error of 5%?

Examples of my next 3 experiments:

  • How does training my neck affect my neck pain? (A/B design)
  • How does NOT eating gluten affect my focus? (A/B design)
  • How does working less affect my focus? (A/B design)

It depends on a number of factors. Ultimately, you probably want to use a concept called statistical power to calculate the relevant parameters for your experiment. A couple of considerations:

What effect size are you expecting?

  • Imagine your experiment has a huge effect on your measure (e.g. your bodyweight goes from 50kg -> 80kg). You will need a smaller sample size to confidently assert that your experiment caused a change

What statistical test are you using?

  • This article recommends using a Hedge’s g test for single-subject experiments. You could research how to calculate power for this test to help with your question!

Thanks for the reply!
So according to statistical power there are 3 determines of power. Significance level, effect size and sample size.
The problem is I’m trying to calculate the sample size (Amount of days I experiment), but I don’t know the effect size in advance.

Maybe the best thing I can do is to guess?

Pretty much! An alternative was of thinking about it would be:

  • With N sample size, I’m able to detect a X change in (your measure) with Y% level of statistical significance

As part of your experiment design, you will know that it can only detect significant changes in effects of a certain size or larger. If you end up with less sample size than planned, you can still learn from the experience, just with less confidence (stat sig.) or less precision (detect larger effects only).

None of these statistical methods have anything to say about correctness; all they do is tell you how often you’d find a similar effect in random data sets…

Sample size is somewhat relevant for coin-flip experiments: You can randomly drink or skip coffee each day, for example, and measure some effect.

But many things (like neck exercises) are cumulative, so the more important question is how long do you need to stick with each treatment… This you need to ask a physiotherapist, not a statistician :smile:

Relevant paper that discusses the issues with “frequentist” stats for n-of-1 studies (and proposes some alternatives): https://homes.cs.washington.edu/~rkarkar/pubs/jour-jhir2018-bayes.pdf