Descriptives and visualizations for large numbers of variables

Michael_Cohn · August 16, 2015, 6:39am

Hi, QSers. I’m a longtime self-tracker and moderately skilled statistician, but I don’t have much experience analyzing self-tracking data. Right now I’m trying to get some insight into a problem I have with intermittent and sometimes severe stomach pain. I can’t tell what causes it or when it’s most likely to happen, but it seems likely that it’s related to what I eat.

For the past year I’ve recorded the following data nearly every day:

[list]
[]Subjective ratings of stomach pain several times throughout the day
[]What I eat (stored in cronometer, so it also exports with full nutritional information)
[]Exercise
[]Amount of sleep
[*]Hours worked (potential proxy for stress?)
[/list]
All my data are in csv format and quite easy to restructure / merge / join as necessary.

I’d like to have a platform that will let me look at things like which foods predict the worst pain (that day or the following day) and whether pain is most freqent on certain days, or whether painful days tend to cluster.

I’ve taken a quick look at software like Zenobase and Exist, but they seem designed for a relatively small number of predictor variables, when you have existing hypotheses about what might be related. I probably eat 100 different foods on a regular basis and I’d want to test them all as potential predictors, as well as nutrients and food categories.

Can anyone recommend any software that’s well suited to running and visualizing this kind of comparison? Or should I suck it up and run a bunch of automated reports and machine learning models in R?

Many thanks.

ejain · August 17, 2015, 5:30am

Have a look at Kaus. If you find anything useful, let us know

arober11 · August 17, 2015, 11:11am

Personally I’d see a Doctor, failing that grab a copy of RStudio, and have a play with the glm() module, to see if something obvious shows up, or if you want to go a stage further play with the krls() module, as it will add some machine learning through its Kernel-Based Regularized Least Squares implementation.