Transferring Files
Learn how to transfer data within, to and from NREL's high-performance computing (HPC) systems.
For further information about invidiual systems' filesystem architecture and quotas, please see the Systems section.
Best Practices for Transferring Files
File Transfers Between Filesystems on the NREL network
rsync is the recommended tool for transferring data between NREL systems. It allows you to easily restart transfers if they fail, and also provides more consistency when dealing with symbolic links, hard links, and sparse files than either scp or cp. It is recommended you do not use compression for transfers within NREL systems. An example command is:
$ rsync -aP --no-g /scratch/username/dataset1/ /mss/users/username/dataset1/
Mass Storage has quotas that limit the number of individual files you can store. If you are copying hundreds of thousands of files then it is best to archive these files prior to copying to Mass Storage. See the guide on how to archive files.
Mass Storage quotas rely on the group of the file and not the directory path. It is best to use the --no-g
option when rsyncing to MSS so you use the destination group rather than the group permissions of your source. You can also chgrp
your files to the appropriate group prior to rsyncing to MSS.
Small Transfers (<100GB) outside of the NREL network
rsync
, scp
, and curl
will be your best option for small transfers (<100GB) outside of the NREL network. If your rsync/scp/curl transfers are taking hours to complete then you should consider using Globus.
If you're transferring many files then you should use rsync:
$ rsync -azP --no-g /mss/users/username/dataset1/ user@desthost:/home/username/dataset1/
If you're transferring an individual file then use scp:
$ scp /home/username/example.tar.gz user@desthost:/home/username/
You can use curl or wget to download individual files:
$ curl -O https://URL
$ wget https://URL
Large Transfers (>100GB) outside of the NREL network
Globus is optimized for file transfers between data centers and anything outside of the NREL network. It will be several times faster than any other tools you will have available. Documentation about requesting a HPC Globus account is available on the Globus Services page on the HPC website. See Transfering files using Globus for instructions on transfering files with Globus.
Transfering files using Windows
For Windows you will need to download WinSCP to transfer files to and from HPC systems over SCP. See Transfering using WinSCP.
Archiving files and directories
Learn various techniques to combine and compress multiple files or directories into a single file to reduce storage footprint or simplify sharing.
tar
tar
, along with zip
, is one of the basic commands to combine multiple individual files into a single file (called a "tarball"). tar
requires at least one command line option. A typical usage would be:
$ tar -cf newArchiveName.tar file1 file2 file3
# or
$ tar -cf newArchiveName.tar /path/to/folder/
The -c
flag denotes creating an archive, and -f
denotes that the next argument given will be the archive name—in this case it means the name you would prefer for the resulting archive file.
To extract files from a tar, it's recommended to use:
$ tar -xvf existingArchiveName.tar
-x
is for extracting, -v
uses verbose mode which will print the name of each file as it is extracted from the archive.
Compressing
tar
can also generate compressed tarballs which reduce the size of the resulting archive. This can be done with the -z
flag (which just calls gzip
on the resulting archive automatically, resulting in a .tar.gz
extension) or -j
(which uses bzip2
, creating a .tar.bz2
).
For example:
# gzip
$ tar -czvf newArchive.tar.gz file1 file2 file3
$ tar -xvzf newArchive.tar.gz
# bzip2
$ tar -czjf newArchive.tar.bz2 file1 file2 file3
$ tar -xvjf newArchive.tar.bz2