Responsible data sharing: Identifying and remedying possible re-identification of human participants

Abstract

Open data collected from humans creates a tension between scholarly values of transparency and sharing on the one hand, and privacy and security on the other. A common solution is to make datasets anonymous by removing personally identifying information before sharing. However, ostensibly anonymized datasets may be at risk of re-identification if they include demographic information. In the present article, we (a) review current privacy standards; (b) describe computer science data protection frameworks and their adaptability to the social sciences; (c) provide practical guidance for assessing and addressing re-identification risk; (d) introduce two open-source algorithms – MinBlur and MinBlurLite – to increase privacy while maintaining the integrity of open data; and (e) highlight aspects of ethical data sharing that require further attention. Technical innovations can support competing values so that science can be as open as possible to promote transparency and sharing, and as closed as necessary to maintain privacy and security.

Link to resource: https://doi.org/10.31222/osf.io/5m3cx

Type of resources: Reading

Education level(s): College / Upper Division (Undergraduates), Graduate / Professional

Primary user(s): Student, Teacher

Subject area(s): Social Science

Language(s): English