15 Lesson 4: Sharing Open Data
15.2 Overview
In this lesson, you learn about the practice of sharing your data. The discussion starts with a review of the sharing process and how to evaluate if your data are sharable. Next, you take a look at ensuring your data is accessible with a closer look at repositories and the lifecycle of data accessibility from the selecting a repository to maintaining and archiving your data. The lesson then discusses some steps to make the data as reusable as possible, and concludes with a section about considering who will help with the data sharing process.
15.3 Learning Objectives
After completing this lesson, you should be able to:
- Recognize institutional variables, issues of security, and timing that affect your decision to share data.
- Recall the features, inherent responsibilities, funding considerations, and sponsor requirements that researchers should consider when selecting a repository to share data.
- Describe the tools and list some best practices that optimize the shareability of data.
15.4 Data Sharing Process Overview
Sharing data is a critical part of increasing reproducibility of results. Whether it’s new data we collect ourselves or data that we process in order to do our analysis, we end up sharing some form of data. We need to think about what data we will share and how to best ensure that it will be open and usable by others.
Data sharing should typically be done through a long-term data center or repository which will be responsible for ingesting, curating, and distributing/publishing your open data. You are responsible for providing information/metadata to help make your data be readily discoverable, accessible, and citable. The cost of archiving and publishing data should also be considered.
15.4.2 Open Data Sharing Process
In general, sharing your open data requires the following steps:
- Make sure your data can be shared
- Select or identify a repository to host your data
- Work with your repository to follow their process and meet their requirements
- Make sure your data is findable and accessible through the repository and is maintained and archived
- Request a DOI for your data set so that it is easily citable
- Choose a data license
Sometimes, you may be able to work with a well-staffed repository that will handle many of these steps for you. Otherwise, it is your responsibility to follow the above steps to share your data openly.
15.7 How to Enable Reuse of Data
15.7.1 Obtaining a DOI
Individuals cannot typically request a DOI (digital object identifier) themselves but rather have to go through an authorized organization that can submit the request, such as:
- The data repository
- Your organization
- The publisher (if the data set is part of a publication)
Data makers should provide summary information for DOI landing page(s) if required. Data sharers should accommodate data providers’ suggestions and comply with DOI guidelines and create landing page(s). If possible, reserve a DOI for you ahead of creating your data.
15.7.2 Ensuring Findability
Repositories handle the sharing, distribution, and curation of data. Additional services they may provide include:
- The assignment of a persistent identifier (like a DOI) to your data set
- The indexing and/or registration of your data and metadata in various services so that they can be searched and found online (i.e., through search engines).
- The provision of feedback to data makers to help them optimize their metadata for findability.
- Coordinating with data makers to ensure metadata refers to the DOI.
- Ensuring the DOI is associated with a landing page with information about your data.
15.7.3 Making it Easy to Cite Your Data
The goal is to make it easy to cite your data. Best practices include:
- Include a citation statement that includes your DOI.
- Different repositories and journals have different standards for how to cite data. If your repository encourages it, include a .CFF file with your data that explains how to cite your data.
- Clearly identify the data creators and/or their institution in your citation.
- This allows users to follow up with the creators if they have questions or discover issues.
- Include ORCiD of data authors where possible in the citation.
Now that your data are at a repository and have a citation statement and DOI, publicize it to your users and remind them to cite your data in their work!
15.8 Who is Responsible for Sharing Data
Sharing data openly is a team effort. An important part of planning for open data is planning and agreeing to roles and responsibilities of who will ensure implementation of the plan.
So what needs to be done? Documenting these roles and responsibilities in your Data Management Plan will help your team stay organized and do science faster! A well-written, detailed plan should include:
15.8.1 Who Will Move Data to a Repository
Once you are ready to send your data to your repository, find the repository’s recommendations for uploading data. Determine who will work with your repository to accomplish the following types of activities:
- Provide information on data volume, number of files, and nature (e.g., revised files)
- Check that the file name follows best practices
- How will the data be moved? (especially when files are large)
- Check the data! Verify the integrity of the data, metadata, and documentation transfer
15.8.2 Who Will Develop the Data Documentation and Metadata
Determine who will work with your repository, inventory the transferred data, metadata, and documentation. This role might include the task of populating any required metadata in databases to make the data findable.
You may be able to accomplish some of these tasks through a repository’s interface. However, some types of repositories may require you to interact with their administration teams. For this role, determine who will:
- Provide suggestions to organize data content and logistics-
- Develop the metadata
- Develop the documentation (e.g., README file or report)
- Extract metadata from data files, metadata files (if applicable), and documentation to populate the metadata database and request additional metadata as necessary
15.8.3 Who Will Help With Data Reuse
Once the repository has made your data available, someone from your team must test access to the data (its accessibility) and distribution methods (its findability). If possible, identify who will work with your repository to optimize/modify tools for intuitive human access and standardize machine access. This role requires someone who to:
- Clearly communicate the open protocols needed for the data/metadata.
- Provide actual data use cases to data publisher to optimize/modify data distribution tools based on available metadata.
- Understand the access protocol(s) and evaluate implications to targeted communities and user communities at large in terms of accessibility.
15.8.4 Who Will Develop Guidance on Privacy and Cultural Sensitivity of Data
Sharing data should be respectful of the communities that may be involved. This means thinking about privacy issues and cultural sensitivities. Who on your team will identify and develop guidance on:
- Privacy concerns and approval processes for release - is the data appropriately anonymized?
- How to engage with communities that data may be about.
- How data can be correctly interpreted.
- Are there any data restrictions that may be necessary to ensure the sharing is respectful of the community the data involves, eg. collective and individual rights to free, prior, and informed consent in the collection and use of such data, including the development of data policies and protocols for collection?
15.9 Lesson 4: Summary
The following are the key takeaways from this lesson:
- When and if to share data? Determine at what point in a project it makes the most sense to share our data. Remember, not all data can or should be shared.
- Where to share data? Sharing in a public data repository is recommended, and there are many types of repositories to choose from.
- How to enable reuse? Ensure appropriate, community-accepted metadata, assign a DOI, and develop a citation statement to make sure it can be easily found and cited.
- Who helps share data? There are many steps in making and sharing data and it’s important to think about who will be responsible for each step.
15.10 Lesson 4: Knowledge Check
Answer the following questions to test what you have learned so far.
Question
01/04
Data cannot be shared if it is:
- ITAR controlled
- Controlled Unclassified Information
- Subject to intellectual property, copyright, and licensing concerns
- All of the above
Question
02/04
Select the option you think is correct to complete the sentence.
It is best practice to start working with a repository _____.
- As early as possible
- When you have test data ready
- After you obtain a DOI
- When you are ready to release your data
Question
03/04
Which one of the following might be able to help you get a DOI for your data:
- The repository you are working with
- Your home organization
- A journal you are submitting a manuscript and data to
- All of the above
Question
04/04
Which of the following are roles to consider when sharing data? Select all that apply.
- Develop guidance on privacy and cultural sensitivity of the data
- Develop the data documentation and metadata
- Assign the data a DOI
- Verify the integrity of the data, metadata, and documentation transfer