Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(git-lfs): use Git LFS for /cache and /generated folders #364

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

oscard0m
Copy link
Member

@oscard0m oscard0m commented Jun 9, 2023

Resolves #ISSUE_NUMBER


Behavior

Before the change?

Before this change, when someone was cloning this repository or fetching changes, all the files under folders /cache/ and /generated/ were downloaded too.

directory Size
cache 352Mb
generated 511Mb

This impacts at CI level when actions/checkout runs: ~23s everytime it runs

I run a benchmark by removing the last commit from main branch in my local machine and pulling:
image

I'm not sure if this is a fair test, but the timings are significantly slower.

After the change?

After this change, when someone clones/fetches repository changes, all the files under folder /cache/ and /generated/ take less time to clone/fetch.

This impacts at CI level when actions/checkout runs: ~7s everytime it runs

I plan to collect some numbers of a clone/fetch with and without these changes:
image

I'm not sure if this is a fair test, but the timings are significantly slower.

Other information

  • This change implies contributors having git lfs installed. This is especially important because, nowadays, GitHub Web is not capable of rendering diffs of these huge files, so checking them on our local machines is still essential.
  • We need to keep track of GitHub's Storage quota. I assume this won't be a problem considering this is managed by GitHub Inc.
  • ⚠️ To get additional benefits, I think we need to run git lfs migrate to Git History, but this will need coordination from maintainers. Would you like to experiment with that? Right now the .git folder is ~4GB, which is causing this performance issue when running git operations locally:

image

git lfs migrate import --include="generated/*.json" --include-ref=refs/heads/main
git lfs migrate import --include="cache/*.json" --include-ref=refs/heads/main

*I will experiment with a fork later to see the real benefits.

Additional info

Pull request checklist

  • Tests for the changes have been added (for bug fixes / features) Do not apply
  • Docs have been reviewed and added / updated if needed (for bug fixes / features)
  • Added the appropriate label for the given change

Does this introduce a breaking change?

  • Yes (Please add the Type: Breaking change label)
  • No

I'm assuming this would not be a BREAKING CHANGE, only for contributors without git lfs installed but that's not a breaking change, right?

Pull request type

Type: Maintenance


@oscard0m oscard0m added the Type: Maintenance Any dependency, housekeeping, and clean up Issue or PR label Jun 9, 2023
@wolfy1339
Copy link
Member

What are the impacts on using LFS?
Is it faster? Easier to review?

@oscard0m oscard0m force-pushed the use-git-lfs branch 3 times, most recently from ce8a406 to fb3c8d0 Compare June 9, 2023 20:27
@oscard0m oscard0m force-pushed the use-git-lfs branch 2 times, most recently from 640dc5b to a36671d Compare June 9, 2023 20:31
CONTRIBUTING.md Outdated Show resolved Hide resolved
@oscard0m
Copy link
Member Author

oscard0m commented Jun 9, 2023

What are the impacts on using LFS? Is it faster?

It should improve the git command times (git clone, git pull...). Consequently, the CI gets a few seconds faster because of this.

Easier to review?

Once configured, it should be the same, but if you don't configure a contributor won't be able to review it locally. I would like the maintainer to test in their local before merging this, so we make sure we are all happy with the changes (the diff is still the same, at least).

@oscard0m oscard0m marked this pull request as ready for review June 9, 2023 20:57
CONTRIBUTING.md Outdated Show resolved Hide resolved
Copy link
Member

@wolfy1339 wolfy1339 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to wait to hear from the full-time maintainers. They will have more insights into Git LFS plans

@oscard0m oscard0m marked this pull request as draft June 11, 2023 08:53
@oscard0m
Copy link
Member Author

Converted back to draft as I still want to run some experimentation

@nickfloyd
Copy link
Contributor

Hey @oscard0m,

I love that you're thinking of and walking through process optimizations like this - especially with this particular repo. While I don't particularly like the idea of adding another tool dependency to be able to work with Octokit, sometimes, as you've pointed out - it's beneficial.

You and @wolfy1339 bring up good points about the Octokit org needing to support LFS and having a more expanded storage quota - those are things @kfcampbell, and I can chase down if/when needed.

You made me wonder if there are other options we should entertain to approach this problem - such as:

  • pruning history
  • cleaning up unsupported versions
  • Shallow cloning

Notably, this problem is only going to get worse as we lean into versioning the dotcom schemas - so I am happy you're on top of this.

My gut tells me this might be a good move for us. The hosted file aspect of LFS is not much of a concern here for owners and contributors - unless working disconnected. I would like to investigate the options above as well.

Just to oversimplify the problem space here:

Pros:

  • Helps with our large file problem
  • Reduces dev flow performance
  • Should improve CI
  • Should improve codespaces interactions
  • This seems to be an ideal use case for LFS
  • Others?

Potential cons:

  • Adds a snowflake effect to this repo, making interactions with it unique (though I don't see this as too much of an issue)
  • Adds a toolchain dependency
  • Adds a precondition as to how the basics of LFS work
  • Adds a potential need for some account changes for the Octokit org (shouldn't be a problem)
  • Working "offline" gets wonky due to the hosted nature of LFS
  • Others?

@kfcampbell and I briefly chatted about this on Friday. We've got a strange week with the US holidays and other days out - but we plan to round out the conversation on Wednesday.

Converted back to draft as I still want to run some experimentation

Based on that comment, please let us know if you need any help or if you run into anything interesting - we want to make sure we are supporting you in any way possible!

Thank you both again for making what we have here so much better! ❤️

@oscard0m
Copy link
Member Author

oscard0m commented Jul 7, 2023

Thanks for the kind words and detailed follow-up on this @nickfloyd @kfcampbell. I have this experimentation on-hold due to some personal issues, but I would love to get deeper and arrive at some conclusions.

I agree adding an extra tool would be an extra barrier, so I will check other options and possibilities. Thanks for doing the exercise of oversimplifying the problem space, very useful. I'll get back to this ASAP.

@wolfy1339
Copy link
Member

This is getting especially relevant now,

remote: warning: See https://gh.io/lfs for more information.
remote: warning: File generated/api.github.com.deref.json is 50.01 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File generated/ghec.deref.json is 52.62 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: File cache/ghec.deref.json is 52.44 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
remote: warning: GH001: Large files detected. You may want to try Git Large File Storage - https://git-lfs.github.com.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Maintenance Any dependency, housekeeping, and clean up Issue or PR
Projects
Status: 🏗 In progress
Development

Successfully merging this pull request may close these issues.

3 participants