Help in Data analysis, statistical methods

I started my own chronic desease project and posted about it

Since there was not much attention to it I want to ask my questions to the more specified forum sections.

I am noting my wellbeing and want to find variables that may have an effect. I found this video as a starting point.

However I am not too firm in medical statistics and I want to ask your help.
I am not quite sure which methods should be applied to get the significant variables. So I wonder whether there are users in this forum either firm in statistics or, even better, firm in medical statistics.

What would be a reasonable approach to find those variables? Which statistical tests or pre-tests should be done.

Thanks for your help.


Hi Jochen, you are in the right place, in the sense that there are many people with statistical expertise on the forum. However, I think your query isn’t well formulated to get a helpful reply. The work you are trying to do - track many things and extract the significant variables - is very complex, and trying to identify the proper methods in advance of collecting any data may be impossible. I realize this is a frustrating (non)answer, but I encourage you not to give up, but rather to do some devoted tracking of the variables you outlined in your other post (or even a subset of these), and do the tedious work of representing them in a structured format with attention to the “date/time” variable. You will find this step to be harder than any of us would like, but if you can get there you will be quite far along, and will be able to share the data and ask more specific questions about how to analyze it.

I’m afraid I don’t know enough to tell you how you should be approaching this, but if you come up with something, I know enough to tell you why it’s wrong :slight_smile:

Hi Gary,
thanks for your (non)answer anyway :wink:
As I stated I am doing the tracking of those variable by date. I find it hard to track all the day so I am collecting the data in the evening.

31.12.2013 .80 .1 .0 .00 .00 .00 .00 .00 0 .00 0 0 0 0 0 0 0 1 0 5109 2583 84.8 21 1 0 0 0 0 .00 1.00 ,00 1,00 1
01.01.2014 .60 .2 .0 .00 .00 6.9 1.90 2.80 0 .00 1 1 1 2 0 0 0 1 1 4157 2494 84.8 21 1 0 0 0 0 .00 1.00 1,00 ,00 0

So it looks like this. It’s just copied out of PSPP.

My question is aiming to the way one should track the data. As you may notice there are variables I track in yes/no form, so 1 and 0. I think any statistics pro can give me an answer whether this is OK or perhaps to change to some kind of value ranges, like represented in weight or amount of calories burnt.

I think there are many self quantifiers out there. But is their way of tracking suitable for analysis? That’s what my questions is about.

If someone knows statistics very well he can tell, whack way would be better. It doesn’t make sense to track data for months and end up with a result like “wrong type of data values”.

We had a project an university where we were analyzing bone growth and different types of augmentation materials. After the sacrifice of 20 pigs we ended up in the comment of our statistics department, that we should have used 250 pigs instead, because otherwise we would just be measuring the effect of the individual pig and not the augmentation material!

But what is to be corrected afterwards? So I have learned to ask the statistics pro in advance.

So in my opinion you sure can design the statistical approach in advance or at least can state what is to be avoided.