Archiving Data When a Project Is Completed
When you complete your project, data must be retained, whether it was shared or not, for a certain period to comply with various policies. Having a place to store infrequently accessed data apart from your active data storage is a good data management practice. Consider these questions when thinking about archiving your data.
How Long Do I Need to Retain the Data?
Northwestern’s research data policy specifies that research data must be kept for three years after the end of the funding period or the project. However, this period can be extended in the following cases.
|Data Type||Retention Period|
All Northwestern research data
|At least three years|
|Data generated by students||Until the student graduates or leaves Northwestern and all papers are published|
|Data supporting patent applications||Until the patent process is complete|
|Data subject to litigation or audit||Until the situation is resolved|
Data subject to HIPAA or under a HIPAA waiver
Six years past the end of project completion
See Northwestern’s Retention of University Records policy and its associated records retention schedule, the IRB investigator manual, and the University Patent and Invention policy for more information.
Sponsor, publisher, Institutional Review Board (IRB), or other state or federal policies can also extend the retention period. Retention periods can also vary by type of study. For example, FDA regulations stipulate that human-subjects data involving drug development must be kept for two years after following approval of a marketing application.
If you are not sure if any of these situations apply to your research, email firstname.lastname@example.org for advice.
What Do I Need to Retain?
Research projects generate a massive amount of content, some of which can be weeded out before archiving. Deciding what to keep and what not to keep can be important when storage space is limited. Here are some aspects to consider when determining what to retain.
- Is the data necessary to support my research findings? Make sure to keep all data that is central to your findings.
- Is the data unique? In the process of moving data between storage systems and backing it up, accumulating duplicate data is easy. Make sure to pare down duplicates.
- Can I easily regenerate an output file? In computational research, it is often sufficient to keep the raw data and any scripts written to analyze the data.
- Is the data useful? Research projects often generate extraneous files and half completed work. If the data is not useful, you can discard it. If you are not sure, err on the side of keeping it.
Where Can I Archive My Data?
Many researchers choose to retain their data on the storage service they used during the active phase of their research. This strategy is acceptable as long as it complies with the policies of the service you are using. Archiving is currently allowed on all of the storage services offered by Northwestern. However, as storage is becoming more of a bottleneck in the research process, you may not have a large enough storage quota on active storage to archive data there. Public cloud storage has several tiers with costs per terabyte (TB) that are well below the costs for active data storage. Additionally, you only pay for the storage you use, not for a quota you may or may not use at any given time.
For example, Amazon S3 Standard storage (active data storage) costs approximately $275/TB/year, but their Deep Glacier Storage Archive storage costs $12/TB/year. They also offer many options in between and “Intelligent Tiering” that automates moving unused data to less expensive storage tiers. The tradeoff is your data is not as accessible, as it can take up to a week to retrieve data from archival storage. Additionally, retrieving the data incurs a fee.
If you are interested in exploring using public cloud storage to archive your research data, email email@example.com to ask a data management specialist.