SSCC Policies, Guidelines, & Best Practices

The Social Sciences Computing Cluster provides a centrally-managed data storage service, a rich suite of statistical software applications, and an advanced computational capability to support the research activities of Northwestern University social scientists.

It is important that Northwestern faculty, staff, and students consider these policies, guidelines, and best practices when using the Social Sciences Computing Cluster.

Policies

Disk

  • Home Directory Each account has an associated home directory, which is resident in a network file system (NFS). Home directories provide persistent storage which is accessible from the computing nodes (both interactive and batch) in the SSCC. The home directories have an associated disk quota of 5 GB when they are created. The intent of the quota is to prevent unlimited, runaway use of the file system, yet ensure adequate reserve for large files to be created when needed. These quotas can be increased upon reasonable request.
  • Disk Snapshot Backups Daily snapshots of home directories are taken at midnight, and retained for 7 days. Weekly snapshots are taken at 1:00 a.m. on Sundays, and are retained for 5 weeks. Monthly snapshots are taken at 2:00 a.m. on the first of each month, and are retained for 6 months provided there is sufficient storage capacity.
  • Temporary storage Scratch space in /tmp (and its equivalent, /scr01) is provided on each computational host. Free space in /tmp is needed for the proper functioning of the operating system. Files resident in /tmp are not backed up, and are subject to removal. Files which have not been accessed for 10 days are automatically removed on a daily basis. Furthermore, if /tmp fills up, files will be removed manually at the discretion of a system administrator.
  • Backup of Confidential Data Many contracts for the use of confidential data prohibit the making of backup copies. This is accomplished upon request to the cluster administrator. The best approach is to specify a path to a directory tree which contains files that are not to be copied to backup snapshots. As a result, disaster recovery is the responsibility of the file owner.
  • Inactive Storage Data analysis is episodic in nature. Certain studies are actively in use while others are not. Significant collections of file storage should be compressed with gzip when not in active use. Such compression typically results in a 90% to 95% reduction in storage space. System administrators reserve the right to gzip compress inactive storage to ensure overall productivity and reliability.
  • NFS Services For security and management reasons, NFS services are strictly controlled and will not be made available to computers that are not part of the SSCC cluster.

Interactive Compute Server

  • Off-Campus Access To ensure security, off-campus access to the interactive compute servers is restricted to Virtual Private Network (VPN) connections.
  • Inactive Sessions To keep licensing costs under control and to maximize opportunities, inactive interactive sessions will be terminated. Inactive sessions consume limited resources, and may prevent others from doing useful work. Many statistical applications are limited by licenses for concurrent use. The SSCC is allowed up to a certain number of concurrent users. Beyond that, the software will not run.
  • Temporary storage Scratch space in /tmp (and its equivalent, /scr01) is provided on each computational host. Free space in /tmp is needed for the proper functioning of the operating system. Files resident in /tmp are not backed up, and are subject to removal. Files which have not been accessed for 10 days are automatically removed on a daily basis. Furthermore, if /tmp fills up, files will be removed manually at the discretion of a system administrator.
  • Web Service for home directories is not provided.
  • Anonymous FTP Service is not provided.

Batch

  • Batch Cluster The policy for the batch cluster is realized using features of the software controlling the batch jobs. PBSPro is licensed for this purpose. A thruput cluster runs one job per CPU core by default. Support for multiprocessor jobs and for parallel MPI use is provided. Users are limited to 20 CPU cores in execution at any one time (all other things considered). A fair-share algorithm is employed to balance use: running jobs may be suspended to run jobs owned by users who have not lately had their "fair share" of use.
  • Temporary storage Scratch space in /tmp (and its equivalent, /scr01) is provided on each computational host. Temporary storage is managed by PBSPro. If something gets by PBSPro, then it will be subject to the same kinds of policies as the interactive nodes. Free space in /tmp is needed for the proper functioning of the operating system. Files resident in /tmp are not backed up, and are subject to removal. Files which have not been accessed for 10 days are automatically removed on a daily basis. Furthermore, if /tmp fills up, files will be removed manually at the discretion of the system administrator.

Guidelines

Interactive Compute Server

  • E-mail Service E-mail may be sent and received on each interactive compute server. Accounts are configured to forward incoming mail to your primary NU email account, so that no email will be received on the compute servers. The SSCC does not support POP or web email protocols. E-mail spools are limited in size and should have minimal use for incoming mail. E-mail spools are not backed up, and system administrators will clear them to ensure proper system functionality.
  • Print Service The Common UNIX Printing System provides basic printing services to networked printers such as the printer in the Economics Department computing lab. Additional printers will be configured upon request if they satisfy the software requirements of the operating system.

Batch

  • E-mail Service E-mail may be sent from each node of the batch cluster. E-mail cannot be received on batch nodes.
  • Print Services are not supported on the batch cluster nodes. If you must print something, save it to a file and print it from an interactive machine or download it to your desktop computer.

Best Practices

  • Network File System (NFS) The nature of a network file system entails the delivery of files over the network, which is much slower than attached disk. To improve performance and productivity it is suggested to compress data files with gzip and then used named pipelines to decompress the data files directly into statistical applications.
  • Restore Files It is recommended to restore files yourself from snapshot backups. See the Quick Reference for instruction on how to restore lost files from your home directory.
  • Use of Confidential Data Reasonable efforts are made to ensure the security and privacy of data stored in the SSCC. Nevertheless, precaution should always be taken with confidential data. All files should be stored using openssl 256-bit AES encryption (using the program named openssl). You should employ named pipelines to decrypt files directly into the data input stream of programs. The same technique may be used to encrypt output data streams. This ensures that unencrypted data are not exposed to the network.
  • Archival Data The safety of archival data cannot be guaranteed. Important data files should always be backed up with copies on other computing systems and media. Archival copies should be reviewed on a regular basis to ensure that they are still useful. Media can deteriorate and become obsolete. File formats become obsolete as programs are enhanced or are discontinued. Files might be ruined by inadvertent misuse of transfer protocols or software applications. Detailed documentation is a must.
  • Interactive Compute Servers, seldon and hardin, are designated for interactive sessions -- sessions that entail SSH logins and direct human interaction with the software. Priority is given to active interactive use. Background jobs may be run on these machines, but those jobs compete with interactive jobs for limited resources. As a best practice, background jobs should be converted to batch jobs to be submitted to the batch cluster.
Additional Information:
Support Contact:
Questions about the SSCC may be directed to sscc-info@northwestern.edu .

Last Updated: 21 May 2009

Information Technology 1800 Sherman Avenue Evanston, Illinois 60201 | Contact Us

Northwestern Home | Calendar: Plan-It Purple | Online Directory | Search

World Wide Web Disclaimer and University Policy Statements

© 2009 Northwestern University