Looking for solution on how to befriend any data source with any data consumer

Hello :wave:

This topic is my attempt to find a solution on how to befriend any data source with any data consumer. What cannot be found can be created, in which case I want to find like-minded people and make it happen.

The problem :thinking:

There are thousands apps and services we use to collect and work with our personal data, let’s call them data sources and consumers for the sake of generalization, or DSC as abbreviation. We wish that DSCs were interoperable between each other that then we can use data from one source to find an application in another place. For example, we collect our sleep data in Fitbit and later we see an analytical report in Sleep As Android app. Or another more sophisticated example, we collect our sleep data in Fitbit, Sleep As Android and Oura Ring app, later these 3 sources merge in one but very accurate source with help of 3rd party service and then it goes to another service which provides AI powered insides.

To make it possible every DSC has to use unified data structures and implement unified API. But there is no standard and there cannot be because every knowledge domain is unique, so the standardization is a constant process which must be led by industry professionals. I saw just one attempt which is Open mHealth. Don’t know what happened to them but it doesn’t look alive.

Unified data structures and unified API together make our personal data behave as liquid which is ease to stream, filter and transform. Liquid data allows us to move it anywhere and care about it the way we like, from public entry to very strict granular data access control. This way we can really make the most out of our data.

I’m QS enthusiast and software engineer who likes to design nice UX. Sometimes I have ideas combined all my passions which I eager to implement but then I struggle with the development of enormous amount of integrations with other services and data convertors to unify them. There’re hundreds engineers who’ve done this kind of job hundred times and I can’t reuse their result just because their work is private. Or it’s open sourced but we use different programming languages. Or there is no documentation. Or documentation exists but I have a single life and no time to read them all.

It would be great if I could focus only on development of sleep analysis algorithm and after release it would be instantly available to the mass users. Plus during the development I wouldn’t think about sleep data structures and data source to experiment with, I would just use publicly available sleep data streams from data donors. What a wonderful world I picture, huh? I’ve been thinking a lot on how to make it come true and here I present my idea.

The idea :bulb:

I call it the concept of a number of bridges and a single hub. In a simplified form, the solution looks like this:

Legend

:blue_car: DSC — data sources and data consumers, e.g. CSV-files, Exist, Google Fit, MongoDB, Grafana, JSON-files, Zapier, Apple Health, Dropbox, Open Humans, ActivityWatch, ZenoBase, Amazon S3, 23andMe, Fitbit, Tableau, anything
:house: Hub — a program which makes sure that all bridges are valid and speak same language (use known data structures)
:bridge_at_night: Bridge — a small program which connects DSC to Hub

Here is the example of more sophisticated setup:

Now, when you see a big picture let’s dig into details.

Hub

Hub is actually a web-server which orchestrates interaction between Bridges. It does things like:

  • Manage integrations (which are inside bridges)
  • Broadcast events to all bridges
  • Redistribute read and write operations between bridges

Hub Client

Hub Client provides API for Hub in your favorite programming language. It’s essential part of the system because bridges are going to be implemented by the community where people are free to use any programming language to write their own bridges. So JavaScript Hub Client will be useful for people who develop for Node.js and Web, while Go Hub Client will make happy golang developers.

The API includes:

  • Connect integration to Hub
  • Read from Hub
  • Write to Hub
  • Listen events from Hub

Bridge

The varieties of Bridges are gonna be created by our community members and DSC providers. Bridge is a small program written in your favorite programming language which implements either integration with DSC or application of DSC or both. The difference between integration and application is major.

The integration implements operations:

  • Read from DSC
  • Write to DSC
  • Emit events from DSC

Each operation is optional — aligned with the Bridge functionality requests. The operation can be public, which means it’s accessible to Hub therefor other Bridges. Or it can be private which makes sense only on the case when we’re gonna exploit this operation in the application part.

The application part is the place for business logic which does utility between DSC and Hub.

Example of the setup which imports your Google Photos from the cloud to your PC file system:

The amount of Bridges between a single DSC and Hub is unlimited. So we as end-users will be able to add new Bridges or replace old ones when a better option comes out.

We’re gonna have a special place where authors can publish their Bridges and track feedback on it, so it’ll be easy for end-users to pull and customize their setups, and for developers it will be a good place to monitor what’s in demand and acquire the first users for their products charged with Bridges.

What do you think? :woman_shrugging:t4:

Pardon the long article, so I will purposely omit the details of the technical implementation. You are very welcome to ask me to provide more examples on Bridge structures and specific setups. I can describe the details of the technical implementation in a document if you ask for it. At this point, I want to get feedback from you on the idea. I have probably missed a lot in the explanation and the idea itself, so please ask questions and criticize.