+1 571-297-6383 | info@sonjara.com

Creating an Open Data Plan for USAID Projects

TL;DR Summary: Top-level steps to make sure your project has open data to be submitted to the USAID Development Data Library (DDL) in compliance with ADS 579 (PDF).  Remember, ALL projects and cooperative agreements will need to submit their data captured to the DDL during the period of performance of their award.

Introduction

As promised, here is an overview of the steps needed to create an open data plan in order to be compliant with USAID's new Open Data Policy  It is really easy to get lost in the weeds on this stuff, so I am only outlining the top-level steps; please note that every step may have many questions, decision points, and additional tasks included.

And of course, as this is a brand new policy with still many questions, the following is a set of suggested steps based on current understanding of USAID's requirements. You can ask specific questions of USAID at https://opendata.stackexchange.com/questions/ask?tags=USAIDopen.

During the Proposal Stage

  1. Budget: Make sure your budget includes (either implicitly or explicitly) the time, effort and expertise to create and implement an open data plan, as well as the IT systems required to generate and store open data, and submit it to the Development Data Library. You may have to choose between an API and manual upload (which I will describe this in an upcoming article).

  2. Legal Requirements: If the proposal involves collecting or generating data from host governments or other partners, make sure that the Memorandum of Understanding (MOU) that you sign clearly outlines which data will be shared with USAID and what data needs to be protected (recognizing that many governments have their own laws and policies about data related to their citizens).

  3. Security/Privacy: You may also want to think about any other privacy or security issues related to the data you plan to collect, and think through the impact of intellectual property protection or issues related to working in conflict zones.

  4. Reuse/Recycle: In some cases, there may be already data out there (such as on http://usaid.gov/data, http://data.gov, http://aiddata.org or http://data.worldbank.org). More and more governments are publishing their data, such as the Ghana Open Data Initiative (http://data.gov.gh ). You may be able to save a ton of time and effort by using someone else’s dataset instead of collecting the data yourself!

During Negotiations/Project Kick Off

  1. Open Data Lead: Identify someone on the implementing partner team responsible for open data in the project (usually someone involved in data collection such as the M&E person or the technology person).

  2. ID All Data: Write down ALL the data that is being collected by the project (regardless of whether it will be submitted or not). Examples include:
    1. Research data (results of household surveys, interviews with stakeholders)
    2. M&E data
    3. Project activity information (such as trainings held, locations, participants, agenda)
    4. Maps generated using GIS

  3. Prioritize: Identify in the data what constitutes “intellectual work” and therefore is high priority/important to be shared, vs. what constitutes “Incidental to award implementation”. This determination is important in order to figure out which datasets need to be submitted and which do not.

  4. Validate: Share that list with the COR/AOR in order to make sure s/he is in agreement. Once you have this agreement, you will probably have to come back with a more detailed plan, but it is a good idea to get at least a basic agreement on what will – and won’t – be shared, and those concerns you have about budget, legal/security/privacy issues, and so on at a very early stage.

  5. Plan: Make sure that submission of the data is part of your overall work plan so it doesn’t turn into an afterthought. Generating and submitting open data is a heck of a lot easier (and cheaper) if planned from the very start, than it is to retroactively do it during the close out of a project, five years in.

Remember, the Open Data policy is new to USAID as well as to the implementing partner community, so there are still many questions to be answered and some things that will still be trial and error in these early days.

During Data Design Phase (early!)

Once you have your list of datasets you are planning on sharing with USAID’s DDL, per dataset, you will need to identify or create the following items. Most of this information is required as part of the DDL submission as of this time, but this is also good practice for any dataset you are going to be capturing, even if you don’t know whether it will go to the DDL.

Summary Information

  1. Title and description of the data (such as “results of household surveys of attitudes towards climate change – raw and aggregated”) including its purpose (you would be surprised the number of datasets that don’t have this basic information).
  2. Relevant dates of the dataset (“captured March 2014”, “updated February 2015”)
  3. How is it going to be captured and where is it going to be housed?

Dataset Details

  1. Structure/data dictionary used (especially if you are using an international standard like IATI)
  2. Data quality approach (especially making sure it meets USAID data quality standards – ADS 203)
  3. Privacy and security issues (i.e. does this use human research subjects, raw data has national ID data, potential security issues due to conflict zone) and plan for protection
  4. Other proprietary information that may need special permission to share (such as covered under copyright and IP)
  5. Other resources and links to other documents such as those in the DEC or websites that may be related to this dataset
  6. Possible uses of the data beyond the project (for international development, local partners, USAID, and/ or for your organization)

Submission Information

  1. Name and contact info of the person submitting data
  2. Name of the prime organization
  3. Mechanism information (award number, operating unit, COR/AOR, contact info)
  4. Proposed access level for the data (public, restricted public, non-public, or other, and the reason why, any other restrictions, such as embargo dates)
  5. Publication plan/schedule (available on demand or submitted to the DDL on a periodic basis, and then details - URL, API instructions, etc.

Classification

  1. Program code (the foreign assistance categories), cross cutting themes, initiatives
  2. Keywords
  3. Language of the data
  4. Country or region it applies to
  5. The overarching program it belongs to

Notes on Submission to the USAID DDL

Submitting and sharing your data is not an all or nothing situation. The above are a good idea to identify for ALL data to be captured by your project, even if only part of it is eligible to be shared with USAID’s DDL. This is because you may find that you want to use some of this data for internal performance improvement or project analysis.

You can always decide to only submit a subset of data – such as aggregated information, rather than just the raw data, especially in the case where personally identifiable information is part of the raw data. You can also submit multiple datasets – one for public access and one for restricted or non-public access – that way some of the data can be reused by others in the development community and USAID staff get a fuller dataset while privacy is protected.


« Back to Sonjara Blog