Applying Data Science to Self-Quantification

(Full disclosure I work for IBM)

I am new to this forum (thanks for having me) but have been a fan of self-quantification for quite a long time. I am trying to gauge if there is much interest in the following idea. Let’s say you are using different apps to measure your habits (fitbit, runkeeper etc). What if there was a way to aggregate this data and also provide other type of input (like your daily happiness index or what you ate that day or how many cigarettes you smoked). Some input comes from third party apps and other input is hand typed based on triggers (time of day, certain event, geo location etc). Ultimately YOU choose what goes in the bucket. But once you have a bucket full of data, there is a whole data analytics engine available to provide insights and correlations that you might not have been aware of and of course silo-ed quantification apps can’t give you. Not only that but also analysis for specific areas, demographics etc. Of course the right privacy and compliance agreements would be in place to establish trust and of course you can delete your account and ‘be forgotten’. Would you use such a platform? If not, why? How would you pivot the idea?


I think there is no good solution for it.

You wrote that you work for IBM - how about exporting all data to IBM Watson IOT? Or (my disclosure: I’m CEO of SenLab, maker of iotool): use a general purpose app for collecting all sensor data on one device (place).

For example our IoTool collects data from more than 100 sensors with more than 250 sensor readings (bio sensors, ambiental sensors, location, social contacts…), processes them, shows them, but all data stays on a smartphone if you wish. You can export data, but with your full control. However you can also sync data to for example IBM Watson in real-time with an iotool plugin.

IoTool is still in beta (

Data analytic - would I use it? Maybe, depends on use case. But you know, it is quite simple: You give your data and get an analytic platform (for free), but there is no such thing as a free lunch. It’s up to you.

1 Like


I’ll weigh in with another biased tip. As @IoTool notes, aggregating the data yourself is a massive PITA.

Open Humans is a nonprofit platform that members to share their data with research. (I am a co-founder. It’s been funded by various grants, including RWJF and the Knight Foundation.)

Researchers and citizen scientists can set up projects that can (a) request access to data sources, (b) act as data sources, © engage members (send messages to them, send them out to other URLs, etc.).

You would be welcome to pilot a data analysis engine there. We welcome projects that enable users to explore and understand their data. This saves you the data aggregation work, and you get a community of users interested in using your tool. Our APIs support minimal automation ­(to enable research, since automation is a lot of work) – or full automation, if you want. Through Open Humans, you could do your “market research” to develop the idea.

For added data (happiness index, what you ate) – the free, already-working option is to email members daily with a link to a survey. We’ve also been talking to Bob Evans about how Paco experiments might become integrated (researcher collects data via Paco & its pushed it a member’s Open Humans account). The idea is unfunded at the moment, my guess is it needs around $50k funding.



I am actually looking for exactly what you describe.
I have been collecting data on my sleepwalking for some years now, ranging from sensors/apps to hand types of data. But also with a lot of different data formats (binary, longitudinal, ordinal, secondary (calculated from primary data), etc…).

I have been using pivot tables to extract correlations type of information out of it. But having in the range of 200,000 data points in around 50 different categories I am looking for a more tailored way of analyzing it.

The most useful tool I have seen so far is JMP but I can’t afford that kind of license price!!
So if you (or anyone) has a suggestion I am up for it.

I am now (slowly) learning python just for that purpose. :wink:

I wasn’t aware of
I love the project. It covers many types of health data, except for medical health records (de-identified of course). Or am I wrong?
I will have to test it with my 23andme data!

I love the bucket idea. I develop hardware and write about Biohacking for Adafruit. I’m also a mid pack Ultrarunner on a ketogenic diet. The challenge for me has been trying to abstract meaning from exercise, sleep and diet data. Plus adding my own hardware that is ripping data from commercial and DIY sensors.

Having a bucket to throw it all in for now would be better than scattering it between a dozen different sites and hope I can come up with a meaningful way to merge it all. I depend on Apple Health for my bucket today, but only half my devices can talk to it without manual import. has been helpful for short term data plots and overlaying seemingly unrelated topics.

I have different kind of personal data, and for each I am practically writing my own analyzers and stats generators (e.g. Facebook Conversation Analyzer, Fitbit Data).

While I am enjoying and learning from building such projects, I clearly see the limitations and downsides. The more kind of data I will get, and the more chaotic this will end up to be, while a proper professional analytical engine could just improve of value from such diversity, given that it would be hypothetically able to correlate, infer and derive even more useful insight out of the “bucket”. Even more important, as you mentioned, such an engine would be able to provide demographics based on all the data it has access to, that is again much more valuable that what can be derived from your data only.

The only thing that will make me hesitate to use such a platform is again related to privacy concerns, and fear on what they can do with my data, but I know it’s unavoidable, so at some point I will simply have to accept it.

A lot of technical implementations details for such a platform can be discussed, and many have been already addressed by others here, what I want to say instead is that I would like to still have some kind of low level control on the data, meaning some interface from where to fed possible commands or code for doing analysis or generate statistics.


thank you for your thoughts. I took a look at iotool and what you have been able to accomplish is impressive (kudos). I think that IBM is in a unique position to analyze this type of data because it either owns (Weather Company) or has access to (Twitter) some of the most meaningful data stores out there so the kind of insight it can drive in this space might be second to none. I am glad I have connected with you on LinkedIn so we can keep in touch!



thank you for making me aware of openhumans. I will try to digest all this and I wonder if it would make sense, one day, for IBM to ‘donate’ this type of data (with user consent and scrubbed of course) to one of the experiment/causes in openhumans. Thank you for your input.


Luke (great screen name btw),

thank you for the reply and support. Would you like to be a beta user? If so follow us on @PersAnalytics on Twitter so we can stay in touch when we announce the Beta phase!



I just checked out Adafruit and the stuff sold there feels like I am in Willy Wonka’s factory :slight_smile:

If you would like to be notified of the Beta please follow @PersAnalytics on Twitter (or send me your email address if you don’t use Twitter).

Thank you for your input!



privacy concerns me as well (that’s why this was one of the first things I mentioned in my post). Without the proper privacy framework, this idea falls apart because you can’t establish trust with your users. Unlike some other tech giants, IBM’s business model does not revolve around selling ads (which can lead to aggressive selling of user data) so I feel confident when I make the claim that IBM won’t sell your data :slight_smile:

If you are curious to see how this turns out, please follow us on @PersAnalytics.

Thanks again for your input, I do share your concerns.


Thanks. I am following. Also my email is

@tachtevrenidis Great idea. Currently I am using SPSS as my own “bucket” with data from different wearables and apps for over two years now. The way how Fitbit, Withings, and apps like Taplog and TracknShare compile the data into csv’s is often quite different.Simply putting all these datasets together in one dataset will not work and there are some other issue’s you should consider.

One example; I often use Taplog to do my personal experiments. Because I make my own experiments with my personal variables it is hard to do some automatic analysis on this kind of data. It would require and additional application. If you can offer this I would be definitely be interested.

I like the idea very much and if you need a Beta-tester I would like to volunteer.


thank you for your interest (taplog looks interesting). Please stay tuned for beta.


@tachtevrenidis: Sorry, I think you’ve misunderstood Open Humans. I think you may be missing out on the resource it potentially represents to you.

To clarify: individuals join Open Humans. The individual imports their data (FitBit, Moves, RunKeeper, etc). Then he/she can authorize sharing his/her data with individual projects. You/IBM are welcome to create a project that requests access to some set of data sources. This lets you prototype your data analysis ideas, saving you the work of building tools to retrieve/collect it. It also gives you an audience (Open Humans members). When a member joins your project, you get access to his/her data.

To be extra clear about this opportunity: your project gets listed on the site. Open Humans welcomes you inviting the members to “leave” the site and do stuff elsewhere (e.g. make an account on your platform). Your project could be authorized to send messages to members, access their email address, etc. You get the data, and the members.

Open Humans was inspired by privacy concerns . A lot of “big data” is sensitive and identifiable. It is a nonprofit project that enables sharing and research by letting members manage their own data: they choose which projects to share data with. Data donations are welcome, but data is pushed to a member’s account. It is NOT a repository for “scrubbed data”.

@patbuen EHRs and methods for an individual to transferr their own data is a thorny problem, much bigger than we are! Our current approach is “wait and let the big players figure it out”. (c.f. Blue Button, meaningful use stage 3.)

1 Like