Choosing Appropriate Data Storage
The Feinberg School of Medicine (FSM) data storage policy uses different language to describe research data categories and makes specific recommendations about where Feinberg researchers should store their data. Contact firstname.lastname@example.org with questions about this policy and data storage in general.
Many factors go into choosing appropriate data storage for your data. The best place to store your research data depends on what you will do with it.
Consider the following questions when choosing a location to store your research data.
What Policies and Regulations Apply to My Data?
Not all data storage services meet the minimum requirements for all research data. All Northwestern research data is subject to Northwestern University’s Research Data policy and Data Classification policy, which describes categories of data (levels 1 to 4) referenced in the table below. Other policies and regulations may apply.
Northwestern University Data Classification Policy Categories
|Service||Good For||Level 1 Data||Level 2 Data*||Level 3 Data*||Level 4 Data|
||Storing working data that only you need to access||Yes||Maybe||Maybe||No|
|SharePoint||Storing working data shared with a team||Yes||Maybe||Maybe||No|
|RDSS: non-audited zone (resfiles)||Storing working data shared with a team, especially data with large individual file sizes||Yes||Yes||No||No|
|RDSS: audited zone (resfilesaudit)||Storing working data shared with a team, especially data with large individual file sizes||Yes||Yes||Maybe||No|
|FSMResFiles (for Feinberg School of Medicine)||Storing working data||Yes||Yes||Maybe||No|
|Quest Storage||Storing data being actively analyzed on Quest||Yes||Yes||Maybe||No|
|Public Cloud Storage||Storing working data and archiving research data||Yes||Maybe**||Maybe**||No|
|Computers managed by IT staff at Northwestern||Storing working data that only you need access to and is backed up on another server||Yes||Maybe||Maybe||No|
|Your personal computer or accounts for services not licensed by Northwestern University||Storing Northwestern research data on personal computers or accounts is not recommended||Yes||Maybe||Maybe||No|
* Refers only to the technical controls required to store this data type. Level 3 data and above requires other policies and procedures to be fully compliant with contractual or legal requirements they are subject to.
** Cloud services can usually be configured to satisfy most data storage requirements
Note: Using personal accounts on unsupported storage services like Dropbox is not allowed for storing Northwestern research data. Google Drive, including those provided by Northwestern (e.g., u.northwestern.edu) is also not permitted for Level 2 and above data. Please contact email@example.com with questions and for advice about alternatives.
Each service will have its own policies about what it can be used for. For example, Quest users must comply with the Quest Storage and Data policy. Please ensure you can follow the policies laid out by the platform.
Schools and Colleges
Northwestern schools and colleges may have rules about where research data should be stored. For example, Feinberg School of Medicine researchers must comply with the Feinberg School of Medicine Data Storage policy. Be sure to follow the regulations set by your school or college.
Data Use Agreements and Other Contracts
If you signed a data use agreement (DUA) with another organization or institution, verify the storage platform you choose complies with the data provider’s requirements for data storage and security. All DUAs must go through the Office for Research for approval and signing.
State and Federal Laws and Regulations
How Much Storage Do You Need? How Much Will It Cost?
Another factor to consider when evaluating storage options is the amount and type of data files that will be generated or used as part of your project. Tips for considering capacity when choosing data storage, include:
- Estimate on the high end when planning. If you have a mix of very large and very small files, doing the estimate for very large files should be sufficient.
- Use multiple storage services to their strengths. For example, SharePoint is excellent for collaborating with your research group, but it cannot accommodate files larger than 250 GB. Store collaborative documents like manuscripts and notes in SharePoint, and find another platform to store your files larger than 250 GB.
- The location of your data will change through the course of your project. For example, your data could be produced in a core facility and stored on RDSS, then moved to Quest for analysis. Results can be integrated into a manuscript in SharePoint.
- Archival storage is often less expensive. Think about putting infrequently accessed data in archival storage, such as Amazon S3 Glacier Deep Archive.
Any associated costs to storing your data can be included in your grant budgets as data management costs.
Cost and Capacity of Northwestern Storage Services
|OneDrive||No additional charge*||5 TB total max per user
250 GB individual file size limit
|SharePoint||No additional charge*||25 TB max per library
250 GB individual file size limit
|RDSS||$100/TB/Year||Minimum of 1 TB purchase|
|FSMResFiles (for Feinberg School of Medicine)||No additional charge||Determined by FSM IT**|
|Quest Storage (for data related to active computing, processing, and analysis on Quest HPC only)||1 to 2 TB at no additional charge
Buy in: $195/TB for five years
|Home - 80 GB
Projects - 1 to 2 TB for general
Scratch - 5 TB and 5,000,000 files
|Public Cloud Storage||Monthly payment reflecting previous month's use||Pay for storage used per month|
* Microsoft is currently reviewing its policy on charging for storage. We are tracking any changes here that may impact the community.
** Feinberg School of Medicine Only. Contact firstname.lastname@example.org with questions about your quota.
Is the Data Backed Up?
Backing up your data prevents data loss due to hardware failure, disasters, file corruption, and human error. While many data storage services do this automatically, understanding what you are protected against when using a storage service is important.
Different types of “backup” strategies protect your data from different risks.
Syncing involves continuously moving changes from one system to another so that both have the same file versions. For example, OneDrive and SharePoint allow you to specify which files and folders to synchronize changes between your computer and the cloud and which protects your data from hardware failure on your computer or theft. However, if a file is corrupted on your computer, the error will replicate in the cloud.
Versioning involves keeping a record of changes made to your files and being able to go back to older versions. There are different types of versioning. For example, OneDrive and SharePoint create a new version whenever changes are made. In contrast, RDSS takes daily snapshots of what the entire file system looks like once a day. These snapshots are kept for 28 days. Versioning alone will not help if the storage service itself goes down.
Data Protection Features on Northwestern University Storage Service
Replication involves creating a completely separate copy of your data on another server, typically in a distinct geographic location. Replication is critical for disaster recovery or bringing back an entire system from scratch if it goes down due to natural disaster or cyberattack. RDSS combines versioning with replication by copying its snapshots between the Evanston campus and the Chicago campus.
|OneDrive and SharePoint||Managed by Microsoft||Version created every time a file is saved|
|RDSS and FSMResFiles (for Feinberg School of Medicine)||Copies in geographically distinct locations (Chicago and Evanston)||Daily snapshots kept for 28 days|
|Quest Storage||Home directories are copied to an off-campus tape archive||Daily snapshots kept for 28 days|
|Public Cloud Storage||Configurable–storage cost for each copy||Configurable–pay for each version stored; usually includes file integrity checks|
Who Needs Access to Your Data?
Different storage services have different rules about what is possible with regard to sharing your data with collaborators or the public.
Each system has different default permissions for new files and folders.
- Locations that only you have access to by default are great for storing files that no one else should see. Keep in mind that files that are associated with an individual user account will disappear if they leave Northwestern.
- Locations that grant access to a specific group of people by default are great for collaboration and are not tied to a single user account.
These services may have a method to share outside of this default access group. The following table also outlines who has access by default on each service and how to share outside the default access group.
|Service||Default Access Group||How to Share Outside the Default Access Group*|
|OneDrive||You||Anyone with a Microsoft account
No anonymous link sharing
Recommended only for limited sharing; permissions are complex
|SharePoint||Site members and owners||Anyone with a Microsoft account; anonymous access
Note that some security rules may restrict enabling anonymous access
|RDSS||Authorized users by NetIDs, including affiliate NetIDs for external collaborators||None; access is controlled at the share level|
|FSMResFiles (for Feinberg School of Medicine)**||Managed by FSM IT**||Managed by FSM IT**|
||Home and scratch: you
Projects: Quest users who are part of the allocation
|Home: Cannot be shared with other Quest users
Scratch: Other Quest users
Projects: Other Quest users
|Public Cloud Storage
||Manually configured||Manually configured|
* Sharing here is defined as granting access to people outside the users who have access by default.
** Feinberg School of Medicine Only
Much like real estate, data storage is all about location. Can I access my data from where I need to analyze it? Common compute sources include Quest, your computer, or virtual machines run by Northwestern or public cloud providers.
Northwestern-run storage services can be directly mounted or synchronized to compute sources. Others require you to transfer your data to the compute source for use. To facilitate transfer among storage services, Northwestern University subscribes to Globus, a tool that facilitates large data transfers. Review the following table to see which storage services are accessible to Globus. Also, please see our documentation on Globus.
|Storage Service||Access Method||Data Transfer Options|
Sync specified files and folder to your computer or VM
|Globus data transfer tool|
Mount to your computer or VM as a network drive
|Globus data transfer tool*|
||Log in to Quest via ssh
Quest Analytics Nodes
|Globus data transfer tool (preferred)
Command line tools (sftp, scp, rsync)
|Public Cloud Storage
||Mount storage as a drive
Command line interface
|Command line interface
Globus data transfer tool
*Data on RDSS can currently only be transferred to Quest. The resfilesaudit zone is not accessible to Globus.
Making Your Decisions
Choosing where to store your data so that it is safe, compliant, and easy to work is harder than it seems. If you need help deciding where to store your data, email email@example.com, and our data management consultants will set a time to talk about your workflow and discuss options.