Data Analysis Software

Hi guys,

From my perspective, all my efforts for self quantification have bumped up against the same wall: a lack of powerful and easy to use software with which to draw meaningful conclusions from the data I’ve collected.

I believe that if we are intellectually honest with ourselves and take a true scientific approach to interpreting data we would admit that a great deal of the time our interpretations are just flat out wrong and occasionally counter-productive.

I can not and will not accept my findings until they are truly data-driven.

Here is a list of software that I’ve tried to use which has failed for one reason or another to deliver:
Eureqa
Q
SPSS
Minitab 16

And a bunch of others that I can’t remember.

It’s really very easy to test if something is effective at finding meaning - all I do is set up test data where I simulate taking a pill and then improve my QOL (quality of life) score by a percentage a few weeks down the track (with noise applied on top so as not to make it too easy for the system and add some realism :slight_smile:
I use google forms to collect the data (with automatic time-stamps).

No software that I’ve used has been able to handle this sort of basic data-mining and meaning discovery, or it is simply too difficult to use (and I am no slouch… I’ve been working on this full-time for months now…)

Any suggestions or other tested software to cross off the list would be appreciated.

Currently I’m investigating using genetic algorithms to evolve data-mining algorithms to match the data… kind of like Eureqa but better able to handle time delays.

Bard
PS: I’ve just undergone major surgery, so sorry if I’ve been off the radar for a bit :slight_smile:

Hey Bard, welcome to the forum!

I also have this concern (Got all this data, now what?), but I don’t have enough data yet, so it’s only a theoretical concern. However, here are two ideas:

  • http://www.wetrackify.com/ - “meta self-tracking application. It allows you to find correlations between your individual data streams, like how your diet correlates with your sleep and your mood”. Looks like they’re just launching.

  • JMP - this is a very powerful statistical analysis software package, that’s also really to use and can show patterns easily. I haven’t really used it yet, but I worked with it on an older project a few years ago, and even back then, I remember it got a “wow” out of me.

Nice! Checking out JMP right now.

Wetrackify sounds lovely and the pitch was nice, but none of that means anything without the amazing algorithms backing it up. I could pitch a search engine better than google, but it wouldn’t mean a whole lot without some extraordinary algorithms behind it :slight_smile:

Hi Bard,

I would also recommend looking into R.

R is a very powerful opensource statistical software program that can help with data analysis and visualization. People from around the world contribute to the code base. In addition, statisticians are constantly developing new packages, essentially new code to run statistical methods, that are released for free.

I stumbled across this great blog post if you need some help getting started with R: Videos on Data Analysis with R

[quote]Currently I’m investigating using genetic algorithms to evolve data-mining algorithms to match the data… kind of like Eureqa but better able to handle time delays.[/quote]Seth Roberts writes a lot about keeping stuff simple.

Interpreting the results of genetic algorithms isn’t easy.
There a good chance that a simple linear model will give you better insight because it gives you nice P values.

As long as you are searching for a software that’s “easy to use” there a good chance that you will delude yourself with genetic algorithmic.
Sometimes you just don’t have enough data to get a meaningful answer. The genetic algorithm will still give you an answer but that answer will be noise.

You need a lot of data to tune a genetic algorithm for a specific problem. If you are just one individual and have metrics with daily data, you probably don’t have enough data.
Sticking to clear statistical test and graphing your data is much better.

[quote]It’s really very easy to test if something is effective at finding meaning - all I do is set up test data where I simulate taking a pill and then improve my QOL (quality of life) score by a percentage a few weeks down the track (with noise applied on top so as not to make it too easy for the system and add some realism [/quote]No, a data set of one pill for one patient is way to small to test the effectiveness of a algorithm.
Effectiveness of algorithms get’s measured by their ROC-curves.

Bard, I’m curious about what counts as a meaningful. If you simulate taking a pill that improves your QOL (quality of life) score, wouldn’t it be pretty easy to look at a simple graph of your score over time, with the “pill start” date noted, and see that the pill may have affected your QOL? Then stop the pill and see if score went down. Repeat to check your result.

What am I missing?

I’m unclear about what the problem has been with these applications. Aside from cost, I have not had any problems with SPSS.

I like R, but it’s got a steep learning curve. If you just want to dive into some machine learning stuff with a GUI, my personal favorite software is Weka: http://www.cs.waikato.ac.nz/ml/weka/

I’ve had good luck with “alternating decision trees” and “random forest”. But don’t be discouraged if you get bad results – machine learning depends a lot on having the right features, which usually means processing your data a little more. (For instance, in Wikipedia, Andrew West discovered that geolocating the IP address of editors and then figuring out the “time of day” of their edit in their local timezone made a big difference. Obvious in hindsight, but not something the records steered us towards…)

–Bo

This is a tool that looks to have a lot of potential

Wolfram Launches Computable Document Format
www.wolfram.com/news/cdf-computable-document-format-released.html

I am not capable of evaluating this but if they release a set of tools like Adobe’s Acrobat Pro Suite (which I have used extensively), and it becomes a recognized standard it could prove to be important…

Here is the Video http://youtu.be/_eu7oPLm2XA

An advanced book that deals with many important topics of potential relevance…

Collaborative Computational Technologies for Biomedical Research
(Wiley Series on Technologies for the Pharmaceutical Industry)
by Sean Ekins
Permalink: http://amzn.com/0470638036

A link to some discussion about the book…

http://pipeline.corante.com/archives/2011/07/26/data_handling_in_collaborations.php

A connection of mine is working on a new technology for visualizing multiple streams of time-related data. He’s looking for very large datasets, or datasets from multiple individuals: http://forum.quantifiedself.com/thread-large-datasets-needed

Dear Bard,

you may be interested in the app called DataKaizen that I have just developed (Kaizen means “change for the better” in Japanese). The goal of this app is precisely to be an easy-to-use piece of software to draw meaningful conclusions from collected data.

The app applies pattern classification and data-mining algorithms (support vector machines, hierarchical Bayesian networks) to define individual limits of lab data (as opposed to the traditional “reference ranges” derived from epidemiology that IMHO do not really make sense in personalized medicine).

DataKaizen is a new app and any feedback on how to improve it would be most appreciated.

Thank you,
Pierre