Using Git LFS (Large File Storage) for data repositories¶
This page describes how to use Git LFS for DM development.
DM uses Git LFS to manage test datasets within our normal Git workflow. Git LFS is developed by GitHub, though DM uses its own backend storage infrastructure (see SQR-001: The Git LFS Architecture for background).
All DM repositories should use Git LFS to store binary data, such as FITS files, for CI. Examples of LFS-backed repositories are lsst/afw, lsst/hsc_ci, lsst/testdata_decam and lsst/testdata_cfht.
On this page
- Installing Git LFS
- Configuring Git LFS
- Authenticating for push access
- Using Git LFS-enabled repositories
- Tracking new file types
- Creating a new Git LFS-enabled repository
Installing Git LFS¶
Download and install the git-lfs client by visiting the Git LFS homepage.
Many package managers, like Homebrew on the Mac, also provide git-lfs (brew install git-lfs
for example).
We recommend using the latest Git LFS client. The minimum usable client version for LSST is git-lfs is 2.3.4.
Git LFS requires Git version 1.8.2 or later to be installed.
Before you can use Git LFS with LSST data you’ll need to configure by following the next section.
Configuring Git LFS¶
Basic configuration¶
After you’ve installed Git LFS, run:
git lfs install
This is the regular Git LFS configuration step that adds a filter "lfs"
section to ~/.gitconfig
.
Additional configurations for LSST are next.
Configuration for LSST¶
LSST uses its own Git LFS servers. This section describes how to configure Git LFS to pull from LSST’s servers. If you are running an older client, version 1.2 or earlier, follow the note at the end of this section.
First, add these lines into your ~/.gitconfig
file:
# Cache anonymous access to DM Git LFS S3 servers
[credential "https://lsst-sqre-prod-git-lfs.s3-us-west-2.amazonaws.com"]
helper = store
[credential "https://s3.lsst.codes"]
helper = store
Then add these lines into your ~/.git-credentials
files (create one, if necessary):
https://:@lsst-sqre-prod-git-lfs.s3-us-west-2.amazonaws.com
https://:@s3.lsst.codes
Trying cloning a small data repository to test your configuration:
git clone https://github.com/lsst/testdata_subaru
That’s it.
Note
Configuration for Git LFS v1.2 and earlier*
The legacy Git LFS client (versions earlier than 1.3) has two configuration differences compared to the modern configuration described above.
First, add these lines into your ~/.gitconfig
file:
[lfs]
batch = false
# Cache anonymous access to DM Git LFS S3 servers
[credential "https://lsst-sqre-prod-git-lfs.s3-us-west-2.amazonaws.com"]
helper = store
[credential "https://s3.lsst.codes"]
helper = store
# Cache anonymous access to DM Git LFS server
[credential "https://git-lfs.lsst.codes"]
helper = store
Then add these lines into your ~/.git-credentials
file (create one, if necessary):
https://:@lsst-sqre-prod-git-lfs.s3-us-west-2.amazonaws.com
https://:@s3.lsst.codes
https://:@git-lfs.lsst.codes
Authenticating for push access¶
If you want to push to a LSST Git LFS-backed repository you’ll need to configure and cache your credentials.
First, set up a credential helper to manage your GitHub credentials (Git LFS won’t use your SSH keys). We describe how to set up a credential helper for your system in the Git set up guide.
Then the next time you run a Git command that requires authentication, Git will ask you to authenticate with LSST’s Git LFS server:
Username for 'https://git-lfs.lsst.codes': <GitHub username>
Password for 'https://<git>@git-lfs.lsst.codes': <GitHub password or token>
At the prompts, enter your GitHub username and password.
Once your credentials are cached, you won’t need to repeat this process on your system (unless you opted for the cache-based credential helper).
Note
Working with GitHub Two Factor Authentication
If you have GitHub’s two-factor authentication enabled, use a personal access token instead of a password.
You can set up a personal token at https://github.com/settings/tokens with read:org
permissions.
Using Git LFS-enabled repositories¶
Git LFS operates transparently to the user. Just use the repo as you normally would any other Git repo. All of the regular Git commands just work, whether you are working with LFS-managed files or not.
There are two caveats for working with LFS: HTTPS is always used, and Git LFS must be told to track new binary file types.
First, DM’s LFS implementation mandates the HTTPS transport protocol. Developers used to working with ssh-agent for passwordless GitHub interaction should use a Git credential helper, and follow the directions above for configuring their credentials.
Note this does not preclude using git+git
or git+ssh
for working with a Git remote itself; it is only the LFS traffic that always uses HTTPS.
Second, in an LFS-backed repository, you need to specify what files are stored by LFS rather than regular Git storage. You can run
git lfs track
to see what file types are being tracked by LFS in your repository. We describe how to track additional file types below.
Tracking new file types¶
Only file types that are specifically tracked are stored in Git LFS rather than the standard Git storage.
To see what file types are already being tracked in a repository:
git lfs track
To track a new file type (FITS files, for example):
git lfs track "*.fits"
Git LFS stores information about tracked types in the .gitattributes
file.
This file is part of the repo and tracked by Git itself.
You can git add
, commit
and do any other Git operations against these Git LFS-managed files.
To see what files are being managed by Git LFS, run:
git lfs ls-files
Creating a new Git LFS-enabled repository¶
Configuring a new Git repository to store files with DM’s Git LFS is easy. First, initialize the current directory as a repository:
git init .
Make a file called .lfsconfig
within the repository, and write these lines into it:
[lfs]
url = https://git-lfs.lsst.codes
Note that older versions of Git LFS used .gitconfig
rather than .lfsconfig
.
As of Git LFS version 1.1 .gitconfig has been deprecated, but support will not be dropped until LFS version 2.
Next, track some files types.
For example, to have FITS and *.gz
files tracked by Git LFS,
git lfs track "*.fits"
git lfs track "*.gz"
Add and commit the .lfsconfig
and .gitattributes
files to your repository.
You can then push the repo up to github with
git remote add origin <remote repository URL>
git push origin master
We also recommend that you include a link to this documentation page in your README
to help those who aren’t familiar with DM’s Git LFS.
In the repository’s README
, we recommend that you include this section:
Git LFS
-------
To clone and use this repository, you'll need Git Large File Storage (LFS).
Our [Developer Guide](https://developer.lsst.io/tools/git_lfs.html)
explains how to set up Git LFS for LSST development.