Data Management and Sharing

In our recent grant workshops, we’ve received a number of questions about the data management and sharing requirements that exist at NSF and NIH. While their web sites offer comments, we present below our “freeze-dried” version of things you should consider as you write this proposal section.

NSF Expectations

NSF expects a two page supplementary document describing how the proposal will conform to NSF policy on dissemination and sharing of research results.

  • A valid Data Management Plan  may include only the statement that no detailed plan is needed, as long a clear justification is provided.
  • The Data Management Plan will be reviewed as part of the intellectual merit and/or broader impacts of the proposal.
  • Proposers who feel that the plan cannot fit within the two page limit may use part of the 15-page Project Description for additional data management information.
  • FastLane will not permit submission of a proposal that is missing a data management plan.

Data may include, but are not limited to: data, publications, samples, physical collections, software and models. It is acceptable to state in the Data Management Plan that the project is not anticipated to generate data or samples that require management and/or sharing. You are encouraged to deposit your data in a public database such as the National Technical Information Service. Include any costs of implementing your Data Management Plan in your budget and budget narrative.  Your data must be maintained and released in accordance with appropriate standards for protecting privacy rights and maintaining the confidentiality of respondents. If your data has potential intellectual property and commercial value, you can protect that information; your program officer will provide details.

NIH Expectations

NIH expects a Data Sharing Plan or an explanation of why data sharing is not feasible is expected to be included in all applications where the generation of data is anticipated. Reviewers are instructed to assess the reasonableness of the data sharing plan or the rationale for not sharing research data

  • All NIH grant applications where the development of model organisms is anticipated are expected to include a description of a specific plan for sharing and distributing unique model organism research resources generated using NIH funding or state why such sharing is restricted or not possible
  • Applications that include Genome Wide Association Studies (GWAS), regardless of the requested costs, are expected to include either a plan for submission of GWAS data to the NIH designated data repository or an appropriate explanation for why submission to the repository will not be possible

Final research data are recorded factual material commonly accepted in the scientific community as necessary to document, support, and validate research findings. This does not mean summary statistics or tables; rather, it means the data on which summary statistics and tables are based. For most studies, final research data will be a computerized dataset. For example, the final research data for a clinical study would include the computerized dataset upon which the accepted publication was based, not the underlying pathology reports and other clinical source documents. For some but not all scientific areas, the final dataset might include both raw data and derived variables, which would be described in the documentation associated with the dataset.

Given the breadth and variety of science that NIH supports, neither the precise content for the data documentation, nor the formatting, presentation, or transport mode for data is stipulated. What is sensible in one field or one study may not work at all for others. awards. Data must be kept for 3 years following closeout of a grant or contract agreement. (Contracts may specify different time periods.)

The rights and privacy of human subjects who participate in NIH-sponsored research must be protected at all times. It is the responsibility of the investigators, their Institutional Review Board (IRB), and their institution to protect the rights of subjects and the confidentiality of the data. Prior to sharing, data should be redacted to strip all identifiers, and effective strategies should be adopted to minimize risks of unauthorized disclosure of personal identifiers.

Data can be shared through various dissemination strategies available to the Principal Investigator, including publications, scholarly presentations, data arachives, data sharing agreements, or data enclaves. Regardless of the mechanism used to share data, each dataset will require documentation. Documentation provides information about the methodology and procedures used to collect the data, details about codes, definitions of variables, variable field locations, frequencies, and the like.

Examples of Data Sharing Plans

Data-sharing plan depends on several factors, such as whether or not the investigator is planning to share data, the size and complexity of the dataset, and the like. Below are several examples of data-sharing plans.

Example 1

The proposed research will involve a small sample (less than 20 subjects) recruited from clinical facilities in the New York City area with Williams syndrome. This rare craniofacial disorder is associated with distinguishing facial features, as well as mental retardation. Even with the removal of all identifiers, we believe that it would be difficult if not impossible to protect the identities of subjects given the physical characteristics of subjects, the type of clinical data (including imaging) that we will be collecting, and the relatively restricted area from which we are recruiting subjects. Therefore, we are not planning to share the data.

Example 2

The proposed research will include data from approximately 500 subjects being screened for three bacterial sexually transmitted diseases (STDs) at an inner city STD clinic. The final dataset will include self-reported demographic and behavioral data from interviews with the subjects and laboratory data from urine specimens provided. Because the STDs being studied are reportable diseases, we will be collecting identifying information. Even though the final dataset will be stripped of identifiers prior to release for sharing, we believe that there remains the possibility of deductive disclosure of subjects with unusual characteristics. Thus, we will make the data and associated documentation available to users only under a data-sharing agreement that provides for: (1) a commitment to using the data only for research purposes and not to identify any individual participant; (2) a commitment to securing the data using appropriate computer technology; and (3) a commitment to destroying or returning the data after analyses are completed.

Example 3

This application requests support to collect public-use data from a survey of more than 22,000 Americans over the age of 50 every 2 years. Data products from this study will be made available without cost to researchers and analysts.                        

User registration is required in order to access or download files. As part of the registration process, users must agree to the conditions of use governing access to the public release data, including restrictions against attempting to identify study participants, destruction of the data after analyses are completed, reporting responsibilities, restrictions on redistribution of the data to third parties, and proper acknowledgement of the data resource. Registered users will receive user support, as well as information related to errors in the data, future releases, workshops, and publication lists. The information provided to users will not be used for commercial purposes, and will not be redistributed to third parties.

Leave a Reply