Learn how to transfer data within, to and from NREL's high-performance computing (HPC) systems.
A supported set of instructions for data transfer using NREL HPC systems is provided on the HPC NREL Website.
Checking Usage and Quota
The below command is used to check your quota from an Eagle login node.
hours_report will display your usage and quota for each filesystem.
Best Practices for Transferring Files
File Transfers Between Filesystems on the NREL network
rsync is the recommended tool for transferring data between NREL systems. It allows you to easily restart transfers if they fail, and also provides more consistency when dealing with symbolic links, hard links, and sparse files than either scp or cp. It is recommended you do not use compression for transfers within NREL systems. An example command is:
$ rsync -aP --no-g /scratch/username/dataset1/ /mss/users/username/dataset1/
Mass Storage has quotas that limit the number of individual files you can store. If you are copying hundreds of thousands of files then it is best to archive these files prior to copying to Mass Storage. See the guide on how to archive files.
Mass Storage quotas rely on the group of the file and not the directory path. It is best to use the
--no-g option when rsyncing to MSS so you use the destination group rather than the group permissions of your source. You can also
chgrp your files to the appropriate group prior to rsyncing to MSS.
Small Transfers (<100GB) outside of the NREL network
curl will be your best option for small transfers (<100GB) outside of the NREL network. If your rsync/scp/curl transfers are taking hours to complete then you should consider using Globus.
If you're transferring many files then you should use rsync:
$ rsync -azP --no-g /mss/users/username/dataset1/ user@desthost:/home/username/dataset1/
If you're transferring an individual file then use scp:
$ scp /home/username/example.tar.gz user@desthost:/home/username/
You can use curl or wget to download individual files:
$ curl -O https://URL $ wget https://URL
Large Transfers (>100GB) outside of the NREL network
Globus is optimized for file transfers between data centers and anything outside of the NREL network. It will be several times faster than any other tools you will have available. Documentation about requesting a HPC Globus account is available on the Globus Services page on the HPC website. See Transfering files using Globus for instructions on transfering files with Globus.
Transfering files using Windows
For Windows you will need to download WinSCP to transfer files to and from HPC systems over SCP. See Transfering using WinSCP.
Archiving files and directories
Learn various techniques to combine and compress multiple files or directories into a single file to reduce storage footprint or simplify sharing.
tar, along with
zip, is one of the basic commands to combine multiple individual files into a single file (called a "tarball").
tar requires at least one command line option. A typical usage would be:
$ tar -cf newArchiveName.tar file1 file2 file3 # or $ tar -cf newArchiveName.tar /path/to/folder/
-c flag denotes creating an archive, and
-f denotes that the next argument given will be the archive name—in this case it means the name you would prefer for the resulting archive file.
To extract files from a tar, it's recommended to use:
$ tar -xvf existingArchiveName.tar
-xis for extracting,
-vuses verbose mode which will print the name of each file as it is extracted from the archive.
tar can also generate compressed tarballs which reduce the size of the resulting archive. This can be done with the
-z flag (which just calls
gzip on the resulting archive automatically, resulting in a
.tar.gz extension) or
-j (which uses
bzip2, creating a
# gzip $ tar -czvf newArchive.tar.gz file1 file2 file3 $ tar -xvzf newArchive.tar.gz # bzip2 $ tar -czjf newArchive.tar.bz2 file1 file2 file3 $ tar -xvjf newArchive.tar.bz2