Sensitive data
Sensitive data is any information that may cause harm to the subjects being studied. Often, we think of human participants, but sensitive data also includes communities, protected flora and fauna species, and geographic regions and environments.
Human participants
One of the challenges of sharing human participant data is the risk that your data may identify an individual, either directly or indirectly. Additionally, the information in your dataset may be legally protected or sensitive, which could lead to legal repercussions for you and/or bring harm to the individual if that information is released and linked to that individual’s identity.
Disclosure is the unauthorized release of information that may identify an individual research participant or organization. Examples of disclosive information include:
- Direct identifiers or Personally Identifiable Information (PII), such as name, address, social security number, and phone number.
- Indirect identifiers, such as zip code, birthdate, education, race, and ethnicity, that could be used in combination to uniquely identify an individual.
-
Information in a dataset that can be linked with outside information, from sources such as social media, administrative data, or other public datasets, that results in identification of an individual.
Legally protected data have restrictions placed on them by law. Examples include:
- Health Insurance Portability and Accountability Act (HIPAA) protected medical or healthcare data
-
Family Education Rights and Privacy Act (FERPA) protected educational records data, such as grades
Sensitive data include any information that may cause harm, legal jeopardy, or reputational damage to the participant if disclosed. Such data may or may not be legally protected. Examples include:
- Sexual behaviors
- Mental health information
- Criminal of illegal behaviors, such as drug use
-
Information about minors or other vulnerable populations
Before sharing human participant data publicly, the dataset should have a low disclosure risk. This involves removing both direct identifiers AND indirect identifiers that may pose a disclosure risk. If you have health data, you may need to meet HIPAA standards for de-identification (see University guidance for limited and de-identified health datasets).
If your data contain sensitive data, or if the removal of identifiers limit the usefulness of your data, consider sharing through archives with restricted access repositories, such as the Inter-University Consortium for Political and Social Research (ICPSR).
In addition to the content of the data, the agreement made with participants in your IRB can also limit the extent to which human subjects data can be shared.
Resources
- UMN HIPAA training
- UMN Collaborative Institutional Training Initiative (CITI) training
- Human Participants Data Essentials Primer
- Consent Forms Primer
- RDS’s IRB considerations page (de-identification and consent information)
Communities
Beyond individual participants, it’s important to consider how research may impact communities. Research that could impact entire communities may or may not be deemed “human subjects.” Consider the following:
- Was my data collected from or within a particular community?
- How would public release of this data (even if de-identified) impact the community?
- Who in the community can advise me on data sharing and community impacts?
- Is there an agreement (e.g., data use agreement, memorandum of understanding, etc.) about the data collected from the community?
-
Are there laws and regulations that protect the community?
You may not be able to legally share the data you collect from and within communities, or you may make the decision not to share for ethical reasons.
Resources
- UMN training hub - Tribal-University Relations Training
- UMN Indigenous research policy
- Indigenous Research Guidebook
- Research with Indigenous partners library guide
- CARE principles for Indigenous data governance
- Operationalizing the CARE and FAIR Principles for Indigenous data futures
- CARE Data Principles, Indigenous data, Data related to Indigenous Peoples and Interest
- NIH Responsible Management and Sharing of American Indian/Alaska Native Participant Data
Flora & fauna
The location of flora and fauna may need to be masked depending on the legal protection of certain species. Research what other publicly available data may be used to discover hidden details about your data. Consider sharing simulated rather than raw data or sharing in a restricted access data repository. Additionally, sharing data related to animal research in general may invite controversy. For this reason, you might include in the documentation Institutional Animal Care and Use Committee (IACUC) approval, other animal research approval, or whether approval was not required for this type of research.
Resources
Geographic regions
Other types of geographical data may also cause harm, such as the locations of abortion clinics, for example. Consider whether this data should be masked, simulated, or shared in a restricted access data repository.
Restricting access to sensitive data
Federal data sharing mandates require researchers to share all research data collected; however, funders also understand that there are circumstances which put people, communities, and species at risk. For this reason, you may write into a data management & sharing plan that you will not share the data. You must provide a clear and thorough justification for doing so - consider including their own policy language that allows this, as well as naming potential harms. You may instead share that data in a restricted access repository that “locks” the data files until the requestor can be vetted, either by the research team or the repository.