Responsible Data: Platform vs Data

As most people who know me are well aware, I am a big data person; I love data, and I get a bit fierce about the need to protect it. Andy, our CTO and lead software architect always winces when I say that software is disposable, but data is not; he understands, however, that the purpose of any software platform is to collect, store, manage, and present data for usage – so while the platform is very important, the asset that is being generated and used is the data.

Which is why conflation of platform and data annoys me -  it often leads to people focusing on the wrong problem (which platform to select) and ignoring the real asset to be generated (the data).

Data as an Asset

Time series, interoperable, standardized data is some of the most valuable assets we can have in ICT4D, but this data is too often undervalued or not migratable outside of a particular platform, even though data science has multiple methods of making this data live beyond its platform.

Data needs to live beyond & outside of the platform

See, data should live beyond your platform. Due to technology changes and increasing innovation, we maybe have 5 to 10 years with any one platform; however, the data collected 5 to 10 years ago should be migratable to a new platform. Not only that, but data is a multiplier – when I can combine my data with yours, we both win. If I can also then layer in demographic or geographic data, we get a level of analytic ability never before seen. But we can only do this combination if our data is independent of platform.

Think of it this way – how many of us have photos and word documents dating back 15 to 20 years ago? Can you access them? Read them, edit, publish and share them on different platforms? Of course, you can. But the various pieces of software you now use to access these items are NOT the same as you used 15 to 20 years ago – by design. We need to apply these same standards to ALL of our data.

Platforms are a means to generate, house, and use good data

Let’s focus on demanding platforms which generate, store, combine, protect, and use this data – and that can be migrated into new platforms and systems - in ways optimized for specific needs, and not on which piece of software is better.

Once we agree to separate platform from data, we have more opportunities for innovation in the ICT4D space – you can use your open source platform, or your proprietary code, or your crowdsourced, hackathon tool – AS LONG AS THE DATA IT GENERATES, STORES, AND PRESENTS – follows standards for quality, interoperability, reuse, access, publication, and security/privacy protections.  

There are no "Silver bullet, one size fits all" platforms

It also allows us to stop expecting the “perfect platform” which will do ALL THE THINGS for ALL THE PEOPLE, and rather allow us to create different platforms to be optimized for specific tasks – as long as the data can interoperate and migrate, platforms can live and die according to their usefulness; the data will continue to live on.

Keep focus on data quality and usage, not specific platforms

Governments, donors and other stakeholders can focus on making sure the data meets quality expectations without having to get into the business of selecting winners and losers; they can allow competition to work effectively to select inputs and tools, while requiring the outputs of the ICT4D intervention to meet standards.

So the next time someone asks you which software should be used, the answer really should be ”the one that is optimized for specific problem and audience, and where the data can be reused after the platform is retired”.

