Personal Data Task Force

Earlier today John Wilbanks sent out this tweet:

John was lamenting the fact that he couldn’t export and store the genome interpretations that 23&Me provides (they do provide a full export of a user’s genotype). By the afternoon two developers, Beau Gunderson and Eric Jain, had submitted their projects. (You can view them here and here).

We’ve doing some exploration and research about QS APIs over the last two years and we’ve come to understand that having data export is key function of personal data tools. Being able to download and retain an easily decipherable copy of your personal data is important for a variety of reasons. One just needs to spend some time in our popular Zeo Shutting Down: Export Your Data thread to understand how vital this function is.

We know that some toolmakers already include data export as part of their user experience, but many have not included it or only provide partial support. I’m proposing that we, as a community of people who support and value the ability to find personal meaning through personal data, work together to provide the tools and knowledge to help people access their data.

Would you help and be a part of our Personal Data Task Force? We can work together to build a common set of resources, tools, how-to’s and guides to help people access their personal data. I’m listening for ideas and insights. Please let me know what you think and how you might want to help.

Replies here or [via email](mailto:ernesto@quantifiedself.com?Subject:Task Force) are welcomed.


We’ve already started receiving a few links via Twitter. I’ll be cateloging them here and updating as they come in.

From Ed Hunsinger: http://wiki.biologger.com/index.php?title=APIExportTools

I have created a Google App Engine web application that, on a daily basis, connects to a variety of APIs and retrieves yesterdays personal data. It can then run rules and do visualizations. The good thing about this is that the retrieval, storage, processing, and visualization happens in the cloud, running on code that I control.

Currently it retrieves data from:
Rescuetime (all computer/cell phone activity)
Mint.com (all financial transactions, account and investment balance changes)
Myfitnesspal.com (Calorie logging, fitbit scale weight measurements)

Formerly, it also retrieved Google Latitude location data, but Google has retired that API and I haven’t coded a replacement location source yet.

I’m also able to calculate when I went to sleep and awoke based off Rescuetime data. Yesterday I finished adding a feature that is able to send email reminders - for example after it detects I’ve woken up, start sending me reminders to take my vitamins until I acknowledge the reminders.

I’d be interested in collaboration - Adding more APIs, useful features, and further packaging the web application so that anyone can run their own version of it (for data privacy).

Please let me know if you are interested.

Sounds interesting; are you planning to open-source the code?

Yes, the code will be open source. The intent is that someone can take the code from github, create a free google app engine account, and deploy the web app to their google app engine account. From that point on it runs in the cloud collecting their data, and they can visit it to view or export their data. The main idea that it is different from your product, “Zenobase”, is that the data is collected on Google App Engine, so that the only person who has access to the data and the code that collects it is you (And the NSA. And Google engineers, I guess.)

The only reason that it is not currently open source on github is that I put my development time towards adding new features, rather than making it user friendly. However I’m expecting to have some time to work on the project over the next few weeks, so I’m going to be looking for people that are interested in beta testing the software and developers who are interested in contributing to the project.

I’m guessing the code is written in Python? If it’s Java, it might be worth collaborating with Open mHealth on at least the code for the service-specific collectors. There is also http://www.reportr.io/ (Node.js), but I’m not sure what the status of that project is. In any case, I’d be happy to beta test and give some feedback.

Hi Ernesto,

I would guess I count as already being part of the personal data task force, but please go ahead and include me if not.

Hi mikeolteanu,

I’d be interested to try out your app engine app, and would be happy to give you feedback.

I’d also be interested to learn more about how you’re getting the data from Rescuetime, Mint.com, and Myfitnesspal.com. Are you doing OAuth binding and pulling the data via an API, or doing scraping?

Do you have an API, or plan to add an API, where an outside service could pull the data from an instance of your app engine app? If so, how would that work?

I second Eric’s comment about getting involved with the open mHealth folks being a potentially good idea. In their nomenclature, what you, me, and Eric are doing would be counted as a “DSU” (Data Storage Unit). They recently released a first draft of a spec for talking to DSUs here.

It’s incomplete, but I am hopeful that they’ll revise it to add the missing features needed for history and incremental updates. Once that’s there, it could potentially support a uniform interface for syncing data from various DSUs, and for the DSUs being able to sync data from each other. They’re currently working on a shim java webapp that adapts a set of services to the OMH DSU spec (moves, withings, fitbit, etc.).

My interest in this is that I lead the BodyTrack project, which works to support personal health empowerment and develops the the Fluxtream open source personal data visualization and exploration platform. We take a different technical approach, a java webapp supporting multiple users, with a primary instance running on a server at CMU, but also support the principle that the data belongs to the user and that we don’t do anything with it other than support the user’s experience. We bring together a set of services into a common explorable framework.

We would love to include Rescuetime, Mint, and Myfitnesspal, but we haven’t had time to try to assess the vagaries of interfacing to them. I’m happy to provide info on how to interface to any of the ones we do support (list is here). I also have plans to support the OMH DSU spec once it gets to point where it’s ready.

On the topic of Google Latitude, you still can get a JSON copy of it by going to Google Takeout and selecting “Location History”. When they disabled the API we added support for being able to upload those takeout location history archive files. It might be an approach that would work for you too.

Anne

1 Like

Rescuetime has a REST API that I pull information from. I typically use only a single command - “Get fully detailed activity history for day X in JSON”.

Mint.com doesn’t seem to have a public API, as their business model is that you log in to their site to see your aggregated financial data, while they give you advertisements for targeted financial products. So a browser session is simulated and the app engine app logs in using a saved username and password.

MyFitnessPal functions similar to Rescuetime - we using scraping to pull in the needed info.

Currently I have no need to provide data to other services. It collects the data, and I write code that lets me do meaningful things with the data on that same server, such as show the data on the web, send notification emails, reprocess and save the data, etc. This is the main point for me - I’m able to continually add API connections, and write more and more ways of using the data, and it all happens in a single place that is completely under my control.

But to answer your question, you can export the data of your databases using google app engine tools. An API could easily be made to allow access to the aggregated data.

Your Fluxstream project looks amazing.

In my case I’m doing a wild homegrown version of data collection, a small proof on concept graphing system for data visualization, and a non-web developers take on web development for user experience. These are not my strong points and the work suffers form that. The important part for me is just that as a developer I’m able to modify the code at will and add new features.

The discussion of what you and mHealth are trying to do have shown me that I’m on a different track. Specifically I’m developing something for a single user (myself), and only people who are very similar to me (QS developers) would be remotely interested in using it as it is now.

I can share my code for connecting to these, although I’m not sure you’ll be able to make use of the scraping, if you typically use APIs with auth keys instead of scraping. Let me know where to send the code to.

Yes, I’m relying on this export feature to track my location right now, but I haven’t coded a way to pull the location history KML automatically yet, since it requires authentication and may be a little complicated.

Regards,
Mike

Reading through this latest discussion makes my week. Reading with interesting and learning…

Hey mikeolteanu,

I love the concept that you started working on. Considering that I spend my time right before bed and immediately after waking up on my phone, that is a genius way to track sleep.

I was unaware that you could retrieve data from mint.com but that is great news to hear.

I would very much like to see what you have accomplished so far and I would be willing to help you out as well. Let me know what I can do (feel free to send me a PM).

Hello Erik and Anne, I just started a new job so I’ve been fairly busy lately, but I’ve cleaned up some code and separated it from my data collection system.

Here are some standalone python scripts that collect data:

Sleep Hours from rescuetime api
http://www.speaklynx.com/retrievelynx/public/sleep%20hours%20rescuetime

Mint account balances and transaction history
http://www.speaklynx.com/retrievelynx/public/mint%20account%20balances

Moves API - Location history (thanks Anne for showing me this cool API/app)
http://www.speaklynx.com/retrievelynx/public/moves%20api

My Fitness Pal - Fitbit scale weights - food log for a given day
http://www.speaklynx.com/retrievelynx/public/my%20fitness%20pal

Rescuetime - get detailed activity for a day
http://www.speaklynx.com/retrievelynx/public/rescuetime

They require you to enter your API keys or usernames and passwords in order to retrieve your info. You will also have to install a few dependencies in python, which should be really easy if you know how to use pip.

I’d be happy to assist over chat anyone who runs into difficulty using the above code. You can get me on chat by contacting me through google hangouts through my Google+ profile, found here: https://plus.google.com/+MichaelOlteanu/posts

The actual data collection web app would require a little bit of hand holding in order for me to help you set it up. I’m happy to do that, but I’d rather it be via chat, so again contact me on hangouts if you want to set up data collection on google app engine.

Erik notified me that the Rescuetime integration wasnt working. He describes the problem as:
"For the RescueTime python file, I was running into a bit of some trouble. I copy and pasted my 40 digit API Key that I found on RescueTime’s website. However, I was receiving this error from the code:

Quote:
{u’message’ : u’key not found’, u’error’ : u’# key not found’}

whenever I try print(result.json()) on line 41. "

You actually need to add rtapi_key= before your API key. I have updated the files at the previously linked locations to reflect this change. As part of the “redacting” to remove my own API code I removed that important part of the string.

Let me know if you run into any other problems, I’m happy to help.

That resolved my issue, thank you.

In terms of the Moves API python file, I had a few additional questions. Could you explain the access_token_json data variable on line 23 and how you received the necessary information. Also, what is the ‘code’ data member on line 15 suppose to be?

Any additional helpful would be appreciated.

Thank you mikeolteanu.

You can find some hacks I posted because these devices’ API are unavailable or incomplete:

withings ws-50 smart body analyzer
http://counterinception.com/content/extracting-your-data-withings-smart-body-analyzer-ws-50

heartmath emwave2
http://counterinception.com/content/opening-heartmath-emwave2-database

Set up developer account on moves developer website.
Give it a fake url for the redirect, i.e. www.asdasdsadasdas.com
Now you have 2 pieces of info from that website, a client id and a client secret.

In a browser on your computer, go to:

https://api.moves-app.com/oauth/v1/authorize?response_type=code&client_id=<client_id>&scope=activity location” (without quotes - make sure to get the whole thing despite the space in the url - edit the url to contain your client id)

a site will open with an 8 digit code. Go to your moves app and follow the instructions to enter the 8 digit code on your phone.

When you do, the site on your computer will change to www.asdasdasdasdas.com/SOMETHING/CODE
which won’t load properly because it doesnt exist, but the important thing is that you can see the “code” in the url, which is what you are after.

Get the code from this url. Now you need to use this code to get an access token. Edit the def moves_auth() function with the all the info you now have, and run it. It will return your access token. This access token will work for 6 months. Put the access token into the python variable and the api should now work.

Thank you for the response. Almost everything worked out, but there is one slight problem that I run into.

I received the code as you suggested and entered it in the code in which case I received the access_token_json data. However, the problem arises after I run the python file once, the code seems to reset on the second try. For instance, if I run the python file twice in a row without making any changes, the first instance returns the access_token_json and everything is fine. Despite not changing or doing anything on the second attempt, I receive the response "{“error”:“invalid_grant”}. Then, my Moves app on my phone disconnects from the development connected apps.

Regardless of whether I correctly enter the access_token_data into the code or do nothing altogether (including commenting out the entire file except the moves_auth function), the second attempt always produces the error above and disconnects my device forcing me to enter a new pin again.

Any suggestions would be helpful. Please let me know if I need to elaborate on any part of the problem.

Thank you.

I ran into this problem myself, and it is because you need to only use the authorization code once to get an access token. From that point on you only use the access token until it expires in 6 months. However if you go through the authorization process again, it seems to expire your previous access token. I’m pretty sure this is what you are running into.

Fantastic! That solved my issue.

The most interesting use case for me is using mint.com’s data, mainly because I am interested in personal finance. I had started playing around with the code, until I received a peculiar error message the other day. The error message can be seen in the attachment.

The code appears to throw an exception on line 51, which is “br.select_form(nr=formcount)”. It was working perfectly fine for me before yesterday until the error appeared. I even tried re-downloading your original file and running it, but I received the exact same message.

Any thoughts or advice?

Thank you!

It looks like mint.com recently made a change to the way their pages are rendered. now it sends a mostly empty page, expecting the client to download and run some javascript to build the rest of the page. The python script which retrieves the html does not emulate a browser to the degree that it can run the javascript embedded in the html, so it ends up no longer having the login form within the html.

Long story short, mint.com changed their page and the current approach doesnt work.

Instead a different approach is likely needed - I probably need to inspect how the login is happening now with the developers console in Chrome browser, and see the format of the post for login, and then mimic that with python code.

Another approach would be to use http://phantomjs.org/ to log in to the site, since phantomjs apparently can handle running javascript without a GUI. Then I could save the html and parse it with python.

I updated the mint script at the previous location with a new working version.