Overview of data storage on TeraGrid resources
TeraGrid users can store their data in their home directories on individual compute resources, in temporary scratch space or parallel file systems, and in archival mass storage.
Choose where to store your data depending on your needs (e.g., speed, visibility, quotas, backup and purge policies):
-
Home directories: These have relatively small
storage quotas, but the storage is permanent and backed up
regularly. Home directories are visible to all nodes in the cluster,
including the login nodes. Best practices dictates that you move large
data sets to mass storage as soon as possible to conserve space on
individual compute resources.
-
Scratch space: For temporary storage of very
large data sets, use scratch space. Scratch has more space than home
directories, but data is purged regularly and not backed up. The
amount of storage space available at any time depends on the level of
concurrent use by others. Scratch space is visible by all nodes in a
cluster, including the login nodes.
-
Parallel file systems: These provide fast access
to large sets of data, but data is purged regularly and not backed
up. The amount of storage space available at any time depends on the
level of concurrent use by others. This space is visible to all nodes
in a cluster, including the login nodes.
-
Archival (mass) storage: For long-term storage of
large data sets, use archival storage. Access times are normally
slower than for other storage options, but a GridFTP front end can
increase transfer speeds. This space is accessible from all sites, but
backups are the responsibility of the user.
Long-term, archival storage on the TeraGrid is available on:
- High Performance Storage System (HPSS) at San Diego Supercomputer Center (SDSC); uses HSI, an FTP-like interface, to access data. See SDSC's HPSS User Guide.
- HPSS at Indiana University; uses GridFTP. See IU's Massive Data Storage System Service page.
- Golem at Pittsburgh Supercomputing Center (PSC); files migrated to Golem initially reside on disk, then file size and time of last access determine when files get moved to tape. See PSC's Golem page.
- DiskXtender Mass Storage System (MSS) at the National Center for Supercomputing Applications (NCSA); see NCSA's DiskXtender User Guide (in PDF format).
- Ranch at Texas Advanced Computing Center (TACC); see the Ranch user guide.
- HPSS for Frost (NCAR) users; features a maximum file
size of 1TB, initial per-user quota of 5TB, the ability to choose one
or two copies for a file at creation time, and a POSIX-compliant
interface. See HPSS on Frost
in the NCAR Frost user
documentation. To request an account, email
help@teragrid.org.
Note: To determine the amount of available space
on a scratch or parallel file system, use the df
command. To see the data storage policies for a specific site, use
tg-policy -data.
For specific data storage information for TeraGrid resources, see the Data Storage File Systems & Policies table on the Data Storage page in the TeraGrid User Support documentation.
This document was developed with support from the National Science Foundation (NSF) under Grant No. 0503697 to the University of Chicago and subcontracted to Indiana University. Additional support was provided by IU through its participation in the TeraGrid, which is supported by the NSF under Grants No. 0833618, SCI451237, SCI535258, and SCI504075. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.
Last modified on October 07, 2009.







