Storage Resources

This document describes the file systems available at the LSST Data Facility during the interim period where the Rubin filesystems at SLAC go into production mode on our own hardware, and while user and project data are being transferred from NCSA.

A sandbox area has been created at:

/sdf/group/rubin/sandbox/

with access control via the rubin_users unix group. Currently it is unregulated. Once the data transfers are done from NCSA, a more familiar filetree will be available. Note that the current filesystem is Lustre - not a friend to our small files. We expect better when we get to WekaFS in S3DF.

Home directory space is available at /sdf/home/<first_letter_of_account>/<account>

A scratch directory is auto-created for every SDF account in /sdf/scratch/<account>

Data compression

To reduce space usage in your home directory, an option for files that are not in active use is to compress them. The gzip utility can be used for file compression and decompression. Another alternative is bzip2, which usually yields a better compression ratio than gzip but takes longer to complete. Additionally, files that are typically used together can first be combined into a single file and then compressed using the tar utility.

Examples

Compress a file largefile.dat using gzip:

gzip largefile.dat

The original file is replaced by a compressed file named largefile.dat.gz.

To decompress the file:

gunzip largefile.dat.gz

Alternatively:

gzip -d largefile.dat.gz

To combine the contents of a subdirectory named largedir and compress it:

tar -zcvf largedir.tgz largedir

The convention is to use extension .tgz in the file name.

Note

If the files to be combined are in your home directory and you are close to the quota, you can create the tar file in the scratch directory (since the tar command may fail prior to completion if you go over quota):

tar -zcvf ~/scratch/largedir.tgz largedir

To extract the contents of the compressed tar file:

tar -zxvf largedir.tgz

Note

ASCII text and binary files like executables can yield good compression ratios. Image file formats (gif, jpg, png, etc.) are already natively compressed so further compression will not yield much gains. Depending on the size of the files, the compression utilities can be compute intensive and take a while to complete. Use the compute nodes via a batch job for compressing large files. With gzip, the file is replaced by one with the extension .gz. When using tar` the individual files remain — these can be deleted to conserve space once the compressed tar file is created successfully. Use of tar and compression could also make data transfers between the Campus Cluster and other resources more efficient.