+1 571-297-6383 | info@sonjara.com

9 Things To Make Your Development Data More Open

Everyone funded by USAID is talking about the new Open Data Policy - and how all contracts and cooperative agreements need to make their data open by default. ACK!

But how? This is the big question. So here are a quick top 9 things you and your project team can do to make your data more open, and therefore easier and higher quality when you publish it to the USAID Development Data Library.

1. Have an Open Data Plan.

Watch for my next blog article "Creating an Open Data Plan" on the details of what needs to go into one. But the first step is to explicitly identify all data you will be capturing during the lifetime of the project and figure out how much of it could/should be published to the DDL and how much of that should be made public (not everything published to the DDL should be made public).

If you are mid-project, this plan should include an assessment of what you currently have and what you can reasonably do within the budget and resources you have. Which leads me to point two:

2. Start Simple - Publish What You Can.

There is always something you can publish. Even if the bigger goals will take awhile or are beyond the scope of your funds, look for things that are easy to give you and your team practice in open data.

And to get a handle on what that means...

3. Use and Investigate Existing Similar Open Data Sets by Other People.

There is nothing more instructive in good Open Data practices than using someone else's Open Dataset. Especially when it is not very good.

Suddenly, all the other stuff I list below seems obvious. The more familiar the folks who are collecting data are with what others have produced will make it clearer how to approach your own. It also gives you an idea of where you can layer and aggregate or compare your data with others data (or even use their data instead of collecting more!).

Some places to find International Development Open Data sets to start with:

4. Structure the Data You Capture.

Structuring data is a fancy way of breaking each part of the data set into consistant parts.

Think of an Excel spreadsheet - each column is a field, and all the information in that column has to be the consistant in order for you to filter, sort, group, etc. You cannot mix 12/23/2014 with Winter 2014 because the computer is dumb and doesn't get that they are the same. Sure we can write code to map Winter to December, but that is annoying and doesn't help those in Australia, and Winter covers three months vs one day which takes us to...

5. Disaggregate Down to the Lowest Level You Want to Report On.

You can always aggregate up, but you cannot disaggregate down (at least not without making some major assumptions).

For example, if you want to know gender diversity, you have to capture all your data by male/female at the start. If you want to display results by district level in the country, then your data has to capture results at the district level to begin with.

And there is not enough disaggregated data in the world. If you can (and there are lots of reasons to be wary of capturing too much data), disaggregate as far as is feasible. And when disaggregating, it is really important to...

6. Use International Standards When Structuring Your Data.

When creating an address book, you would never create your own State abbreviations, right? Or make up your own Zip code? Or decide that this time, the State goes first, then the address line, and maybe stick the name of the person in the middle?

Of course not. We use standards that other people created and maintain. And the benefit is that when someone emails you an Excel spreadsheet with addresses in it, you can just use it as is, if they followed the standards.

Here are some common standards to be used in international development that should be used by default unless you have an exceptionally good reason not to:

And even when you use standard formats and structures, you will reap untold rewards if you...

7. Make Data Intuitive and/or Publish Data Dictionary.

Ever get a spreadsheet and the column headers read FN| LN | A1 | A2 | C1 | S1 | Z1? Without looking at the data, it is kind of hard to figure out what those are. A great way to make your data more open is "name it what it is". So instead of "FN", how about "First Name" (or first_name in sql) and the sheet title (or table name) is "address_book". A developer looking to merge or migrate the data will instantly know what that means.

However, some data sets handle complex information. For those for which intuitive naming is not enough, developing a data dictionary will be important so that developers can cross walk from your data to their needs. "So THAT is what 'Obligating party' means. I thought it referred to the very nice host at the Holiday staff party."

So that people don't misuse your data, it is also important to...

8. Publish Metadata About Your Data.

Metadata is data about your data, and should include source, limitations, time period, and other relevant data about the data - basically tells the outside user how trustworthy the data is and any caveats around using it.

Key questions that need to be answered, and ideally available in a machine readable format:

and finally,

9. Use Open Data as a Way to Improve Your Own Internal Performance.

The entire point of Open Data is to break through knowledge silos and allow raw data to be used in 1000s more innovative ways to bring fresh insight to our work. So plan for the benefits from the beginning.

Think about:

There is one thing we know about data - once you have it, you won't want to lose it. So by building a culture of data usage, you can foster support for good open data practices.


Want some help putting these tips into practice? Call Sonjara at 571-297-6383 or email info@sonjara.com. We offer one hour probono consulting on Open Data and USAID.

« Back to Sonjara Blog