Choosing Appropriate Data Storage
The Feinberg School of Medicine (FSM) data storage policy uses different language to describe research data categories and makes specific recommendations about where Feinberg researchers should store their data. Contact fsmhelp@northwestern.edu with questions about this policy and data storage in general.
Many factors go into choosing appropriate data storage for your data. The best place to store your research data depends on what you will do with it.
Consider the following questions when choosing a location to store your research data.
What Policies and Regulations Apply to My Data?
Not all data storage services meet the minimum requirements for all research data. All Northwestern research data is subject to Northwestern University’s Research Data policy and Data Classification policy, which describes categories of data (levels 1 to 4) referenced in the table below. Other policies and regulations may apply.
Northwestern University Data Classification Policy Categories
Service | Good For | Level 1 Data | Level 2 Data* | Level 3 Data* | Level 4 Data |
---|---|---|---|---|---|
|
Storing working data that only you need to access | Yes | Maybe | Maybe | No |
SharePoint | Storing working data shared with a team | Yes | Maybe | Maybe | No |
RDSS: non-audited zone (resfiles) | Storing working data shared with a team, especially data with large individual file sizes | Yes | Yes | No | No |
RDSS: audited zone (resfilesaudit) | Storing working data shared with a team, especially data with large individual file sizes | Yes | Yes | Maybe | No |
FSMResFiles (for Feinberg School of Medicine) | Storing working data | Yes | Yes | Maybe | No |
Quest Storage | Storing data being actively analyzed on Quest | Yes | Maybe | Maybe | No |
Public Cloud Storage | Storing working data and archiving research data | Yes | Maybe** | Maybe** | No |
Computers managed by IT staff at Northwestern | Storing working data that only you need access to and is backed up on another server | Yes | Maybe | Maybe | No |
Your personal computer or accounts for services not licensed by Northwestern University | Storing Northwestern research data on personal computers or accounts is not recommended | Yes | Maybe | Maybe | No |
* Refers only to the technical controls required to store this data type. Level 3 data and above requires other policies and procedures to be fully compliant with contractual or legal requirements they are subject to.
** Cloud services can usually be configured to satisfy most data storage requirements
Note: Using personal accounts on unsupported storage services like Dropbox is not allowed for storing Northwestern research data. Google Drive, including those provided by Northwestern (e.g., u.northwestern.edu) is also not permitted for Level 2 and above data. Please contact researchdata@northwestern.edu with questions and for advice about alternatives.
How Much Storage Do You Need? How Much Will It Cost?
Another factor to consider when evaluating storage options is the amount and type of data files that will be generated or used as part of your project. Tips for considering capacity when choosing data storage, include:
- Estimate on the high end when planning. If you have a mix of very large and very small files, doing the estimate for very large files should be sufficient.
- Use multiple storage services to their strengths. For example, SharePoint is excellent for collaborating with your research group, but it cannot accommodate files larger than 250 GB. Store collaborative documents like manuscripts and notes in SharePoint, and find another platform to store your files larger than 250 GB.
- The location of your data will change through the course of your project. For example, your data could be produced in a core facility and stored on RDSS, then moved to Quest for analysis. Results can be integrated into a manuscript in SharePoint.
- Archival storage is often less expensive. Think about putting infrequently accessed data in archival storage, such as Amazon S3 Glacier Deep Archive.
Any associated costs to storing your data can be included in your grant budgets as data management costs.
Cost and Capacity of Northwestern Storage Services
Service | Cost | Capacity |
---|---|---|
OneDrive | No additional charge* | 5 TB total max per user 250 GB individual file size limit |
SharePoint | No additional charge* | 25 TB max per library 250 GB individual file size limit |
RDSS | $100/TB/Year | Minimum of 1 TB purchase |
FSMResFiles (for Feinberg School of Medicine) | No additional charge | Determined by FSM IT** |
Quest Storage (for data related to active computing, processing, and analysis on Quest HPC only) | 1 to 2 TB at no additional charge Buy in: $195/TB for five years |
Home - 80 GB Projects - 1 to 2 TB for general Scratch - 5 TB and 5,000,000 files |
Public Cloud Storage | Monthly payment reflecting previous month's use | Pay for storage used per month |
* Microsoft is currently reviewing its policy on charging for storage. We are tracking any changes here that may impact the community.
** Feinberg School of Medicine Only. Contact fsmhelp@northwestern.edu with questions about your quota.
Is the Data Backed Up?
Backing up your data prevents data loss due to hardware failure, disasters, file corruption, and human error. While many data storage services do this automatically, understanding what you are protected against when using a storage service is important.
Different types of “backup” strategies protect your data from different risks.
Data Protection Features on Northwestern University Storage Service
Replication involves creating a completely separate copy of your data on another server, typically in a distinct geographic location. Replication is critical for disaster recovery or bringing back an entire system from scratch if it goes down due to natural disaster or cyberattack. RDSS combines versioning with replication by copying its snapshots between the Evanston campus and the Chicago campus.
Service | Replication | Versioning |
---|---|---|
OneDrive and SharePoint | Managed by Microsoft | Version created every time a file is saved |
RDSS and FSMResFiles (for Feinberg School of Medicine) | Copies in geographically distinct locations (Chicago and Evanston) | Daily snapshots kept for 28 days |
Quest Storage | Home directories are copied to an off-campus tape archive | Daily snapshots kept for 28 days |
Public Cloud Storage | Configurable–storage cost for each copy | Configurable–pay for each version stored; usually includes file integrity checks |
Who Needs Access to Your Data?
Different storage services have different rules about what is possible with regard to sharing your data with collaborators or the public.
Each system has different default permissions for new files and folders.
- Locations that only you have access to by default are great for storing files that no one else should see. Keep in mind that files that are associated with an individual user account will disappear if they leave Northwestern.
- Locations that grant access to a specific group of people by default are great for collaboration and are not tied to a single user account.
These services may have a method to share outside of this default access group. The following table also outlines who has access by default on each service and how to share outside the default access group.
Service | Default Access Group | How to Share Outside the Default Access Group* |
---|---|---|
OneDrive | You | Anyone with a Microsoft account No anonymous link sharing Recommended only for limited sharing; permissions are complex |
SharePoint | Site members and owners | Anyone with a Microsoft account; anonymous access Note that some security rules may restrict enabling anonymous access |
RDSS | Authorized users by NetIDs, including affiliate NetIDs for external collaborators | None; access is controlled at the share level |
FSMResFiles (for Feinberg School of Medicine)** | Managed by FSM IT** | Managed by FSM IT** |
Quest Storage |
Home and scratch: you Projects: Quest users who are part of the allocation |
Home: Cannot be shared with other Quest users Scratch: Other Quest users Projects: Other Quest users |
Public Cloud Storage |
Manually configured | Manually configured |
* Sharing here is defined as granting access to people outside the users who have access by default.
** Feinberg School of Medicine Only
Is My Data Accessible to Compute Sources
Much like real estate, data storage is all about location. Can I access my data from where I need to analyze it? Common compute sources include Quest, your computer, or virtual machines run by Northwestern or public cloud providers.
Northwestern-run storage services can be directly mounted or synchronized to compute sources. Others require you to transfer your data to the compute source for use. To facilitate transfer among storage services, Northwestern University subscribes to Globus, a tool that facilitates large data transfers. Review the following table to see which storage services are accessible to Globus. Also, please see our documentation on Globus.
Storage Service | Access Method | Data Transfer Options |
---|---|---|
OneDrive SharePoint |
Web interface Sync specified files and folder to your computer or VM |
Globus data transfer tool |
RDSS |
Mount to your computer or VM as a network drive
|
Globus data transfer tool* |
Quest Storage |
Log in to Quest via ssh Quest Analytics Nodes |
Globus data transfer tool (preferred) FTP clients Command line tools (sftp, scp, rsync) |
Public Cloud Storage |
Mount storage as a drive Command line interface Web interfaces: |
Command line interface Globus data transfer tool |
*Data on RDSS can currently only be transferred to Quest. The resfilesaudit zone is not accessible to Globus.
Making Your Decisions
Choosing where to store your data so that it is safe, compliant, and easy to work is harder than it seems. If you need help deciding where to store your data, email researchdata@northwestern.edu, and our data management consultants will set a time to talk about your workflow and discuss options.