Breakout: Aggregator platforms: Understanding data?

Kouris_Kalligas · May 3, 2014, 12:54pm

Many of us are involved in aggregating personal data or using services based on data aggregation. This sessions is an open discussion of lessons and challenges of combining heterogeneous data streams.

ejain · May 10, 2014, 7:53am

The most difficult data stream to integrate is the data that is stored in the user’s head only…

Kouris_Kalligas · May 10, 2014, 8:44am

Definitely! I note it down as a question for the session :-). Thanks Eric

Kouris_Kalligas · May 21, 2014, 11:41am

The main topics which were raised by the participants in the break-out session were around the use cases of aggregator platforms, the privacy concerns, the interoperability of interfaces from the tracking source to the aggregator platforms and further, the ontology of the data they import, the automation of the added value they present, and the subsequent machine learning which comes with it.

Taking the topics one by one:

Ontology of data: By ontology is meant to say that the data or metrics being integrated from different tracking sources should have the same terminology and possibly calculations. One step from X device is different from Y device. Looking at the future there should be a common ground on what data and metrics we use and what we mean by them.
Use of aggregator platforms: there was a common trend during the discussion that the added value of aggregator platforms is putting the data being integrated into context. E.g. Steps, calories consumed, and deep sleep have to be contextualized by an aggregator platform to allow better interpretation, better use, and on a long-term basis move towards predictions. Nevertheless, everybody agreed that predictions is a long shot…
Privacy concerns: there was a general concern of privacy concerns, how are data being used, who should use them, for what, and how safe they are in the context of an aggregator platform. There was no common ground on a solution but more on the concern.
Interoperability: There was a discussion on how a datapoint reaches an aggregator platform, from which source (where it is produced) to which aggregator platform and further on. The “journey” of a datapoint is an interesting point as it’s not really analysed till now. As we are moving towards more the internet of things world an aggregator platform has to take good care of this journey. This also depends, though, on if & how aggregator platforms will work together.
Automation: It was a clear and rather obvious remark that for aggregator platforms to provide the added value they can in machine learning (for example) automation of tracking and tools to track is important. If the quality of data depends on manual entry at the source then the quality of machine learning is also towards a low level.

Thanks to all the people who participated in this discussion. Please add comments/omissions in this thread and let’s see how the discussion will follow.

ejain · May 21, 2014, 7:18pm

One step from device x (with firmware v1.0.1) can be different from device x (with firmware v1.0.2). The other issue with steps is that one step in the park isn’t equivalent to one step in the jungle. This makes many statistically significant findings insignificant in practice.

Paying attention to the privacy policies of each service might be a good start!