API Questions and Answers

[color=#000080]This is a thread for QS Toolmakers and others who are interested in API related topics. Eventually we will probably elevate this thread to a Forum where there can be multiple conversations moderated separately, but for now let’s see how we do with a single thread. And if you would like to share some information about yourself and be invited to other QS Toolmakers discussions, please fill out this form:

Ernesto and I will be actively moderating, and we welcome your ideas and questions.

ADDITION: Anne made a good suggestion below. Please introduce yourself when you post for the first time! [/color]

Thanks so much Gary, Ernesto, and Eric. I think this is an important step and I talked to many people at the 2013 QS Conference in Amsterdam last week who are psyched to participate.

I propose that folks who are interested also post to the thread in addition to filling in the form, at least just to say “hi” and subscribe to receive email notification of things posted here.

I also propose that in a week or so when folks have had a chance to sign up and attach themselves to the thread that we set up a possibly recurring live chat opportunity (conference call, google hangout, or tweet chat).

Thanks,
Anne Wright

For those who haven’t gotten the link yet, here is the list of API issues; you can vote and add your own:

http://www.google.com/moderator/#15/e=210167&t=210167.40

Hi everybody, Teemu from Helsinki, Finland here.

Related to our heated date format discussion in QS Amsterdam API session: I used to develop a mobile Calendar and Calendar synchronization, there might be a possibility to leverage that experience here.

By the way, is anyone here aware of the HumanAPI project(http://humanapi.co/) and who is behind it. It seems to be aligned with the goals of this QS API group?

Cheers,
Teemu Kurppa

The issue wasn’t so much the format (ISO 8601 please), but the fact that some services store timestamps either as local time or in UTC. Former makes it difficult to treat the data as a continuous time-series; latter prevents extracting e.g. the hour of day.

I’d be interested in hearing their justifications. If the problem is that they couldn’t figure out how to make it work, rather than simple oversight or intentional limitation, then there might be the opportunity for them to leverage our experience.

The two (?) guys behind HumanAPI are @andreimpop and @OlaWiberg. If someone managed to provide a service that lets me get data from all the providers in a useful manner, that would be save me a lot of trouble! But I’m not holding my breath: The one screenshot on their website shows a blood pressure measurement–without a timestamp :slight_smile:

Here’s an interesting slide from Mary Meeker’s Internet Trends slide deck about API calls to My Fitness Pal:

I’d like to review the API issues list, and then turn it into a blog post to draw more attention to the issues (is there enough consensus to have it published on quantifiedself.com?).

Could do this as part of a first Google Hangout, as Anne suggested. How about sometime next week?

Sounds good.

There’s a 10 hour time zone difference between Pacific and Helsinki, but usually I can adjust my schedules easily. My only wish is that we don’t have the hangout between 22:00 UTC - 4:00 UTC.

Cheers,
Teemu

How about 15:00 UTC (08:00 Pacific / 11:00 Eastern / 18:00 Helsinki)?

That’s good for me. A few hours later is also good.

By the way, have you discussed about good practices of universal ids for data entries yet?

I don’t think this has come up so far. Can you be more specific? Is the problem the lack of identifiers (which can make it hard to know if you’ve seen a data point already), or with the representation of the identifiers?

i[quote]The two (?) guys behind HumanAPI are @andreimpop and @OlaWiberg. If someone managed to provide a service that lets me get data from all the providers in a useful manner, that would be save me a lot of trouble! But I’m not holding my breath: The one screenshot on their website shows a blood pressure measurement–without a timestamp :slight_smile:
[/quote]

Hey guys - Andrei here. Definitely time-stamping our data – those screenshots are a little out of date at this point.

Unfortunately I wasn’t able to make it to Amsterdam, but I had a few people ping me the API thread and it was definitely useful. I can say we are thinking about a few things, and I’m sure there are some things we are missing.

A few things we are going to be hitting on our API that were mentioned:

  • SSL
  • OAuth2 without expiring tokens for the “hacker” version of humanAPI (the tokens will expire for the secure HIPAA version)

We aren’t strongly opinionated on the time zone issue yet. I see it was mentioned very often in the thread. Our model returns everything in UTC based on the timezone of the measurement, but we are still working through a few things.

On the unit issue, today we are representing everything in metric units (we based this decision on the developing HL7 standards, which use the metric system in every example that they provide). We can rely on the application developer to convert at the application level accordingly if they wish. Welcome more thoughts on this.

I mentioned this thread to Ola as well, so I’m sure he’ll have some interesting thoughts. As I mentioned to Gary in a chat earlier today I know many here have spent time thinking through these problems, and we want to know what sorts of things you guys would like to see in the API :slight_smile:

Welcome Andrei! Thank you for joining this thread, great to have you.

It was nice talking to you guys, great to see you following up here!

Dealing with expiring tokens is much less troublesome with OAuth2, thanks to the standard refresh mechanism. But can you elaborate on what is required for HIPAA?

If it’s in UTC, how can it be based on a time zone :huh:

The issue is with APIs that accept values in non-SI units, convert them to SI for internal storage, and then convert the values back to the original unit–with rounding errors. Seems less of an issue for an API that is aggregating data, though having a “give me the original values with no conversion and change of precision” option would still be nice to have, especially when you want to show the values rather than average or plot them.

The team at Open mHealth are thinking alot about APIs for data exchange and have worked out some of the thorny issues being explored here in pretty good depth. Full Disclosure: I collaborate with this team, but am not a member.

Overview: http://developer.openmhealth.org/developer/
Beta API: https://github.com/openmhealth/developer/wiki/DSU-API-1.0-Beta
Academic Intro: http://www.jmir.org/2012/4/e112/

One of the things that I think is most interesting about the API discussion is that when it is first approached the task seems relatively easy; the difficulties only emerge over time, as the diversity of the conditions (both the diversity of data coming from individuals, and the diversity of contexts and use cases for data on the other side starts to reveal itself. API middleware and aggregation systems are solving problems that are actually very hard, and will influence lots of what we can do in the future. I hear about solutions from many different people and companies, and I hope this “backchannel” conversation will help us use our resources a bit more effectively than if everybody has to encounter the hard problems as if for the first time. Just in this thread so far, you can see the struggle to figure out if we are talking about the same issues. This is important!

Gary is right. Developing an API is deceptively easy, but developing a useful API for a wide set of use cases in the real world, no so much. Some of the issues I’ve run into in a system that consumes 3rd party APIs (Fitbit, etc).

  1. De-duplication. Sometimes you get push updates within a day that overlap earlier reports (e.g. an ever-growing summary of steps for the day on each sync). How about a historical re-download. Each service and data type often has different needs for de-duplication to ensure that time-series reads are accurate.

  2. Reference times and time zones. Most DBs are geared towards timestamp storage, do you use a canonical time for a daily summary? What happens if the user changes timezones? Does your DB store midnight in current and travel timezones? What does that do for any reporting back? (Two summaries end up in one day). etc…

  3. The canonical unit problem above is another fun one.

  4. If creating your own data, separate “time of reference” and “time of entry”. Entry is when I reported it, reference is what I reported about (only the same in devices and momentary assessment). Often the time of reference is a calendar concept, not an interval between two timestamps (“yesterday”).

Some strategies I prefer when designing systems to consume 3rd party data:

  • Store everything as you get it, make sense of it separately.
  • De-duplicate on extraction; gain efficiency if needed via nightly archiving of redundant data.
  • Represent assumptions about timezones when not specified by API
  • Represent time explicitly. If you mean ‘morning’, then represent morning as a concept and don’t rely on timestamps. Code can map the raw data into ‘morning’ which may be complicated, particularly with changing timezones.
  • Be explicit about units; if you need a single-unit column (e.g. weight) for population queries or range queries, use a second internal canonical field that isn’t exposed to the consumer of the API
  • Don’t be afraid to recompute values to send out via an API or rendered for the user, so long as you cache them and can prefetch the expensive stuff; better to have clean data than slightly more performant code!

Cheers,
Ian

What do you mean by “re-download”? Both BodyMedia and Fitbit provide a “last synced” timestamp that can be used to ignore e.g. an incomplete step count. But if you are retrieving hour- or minute-resolution data, both BodyMedia and Fitbit make you redownload the data for the entire day repeatedly!

Most databases have a “date without time or timezone offset” data type. The problem is that both BodyMedia and Fitbit appear to tie all their data to such dates, and because they want each day to have no more and no less than 24 hours, data is lost (or gaps appear) when switching timezones!

Good point about recording the “time of entry”! This is important when data might appear out of sequence, e.g. because it is entered manually. But regarding the “time of reference”, I think both the local and the universal time can be of interest.

Re: re-download. I’ve found that while the “last synced” approach works great 95-99% of the time, sooner or later I always find some corner case where I see the same update/packet more than once and found that it’s better to design for that case at the outset. (e.g. a rare packet loss scenario where you store data but Fitbit doesn’t get an ack so doesn’t update last-synced, etc.)

I’ll be judging the HumanAPI Hackathon on July 7. More info at https://humanapisf.eventbrite.com/