Skip to content

Run git operations in parallel to speed things up:#9100

Merged
hsbt merged 3 commits into
ruby:masterfrom
Shopify:ec-git-parallel
Nov 20, 2025
Merged

Run git operations in parallel to speed things up:#9100
hsbt merged 3 commits into
ruby:masterfrom
Shopify:ec-git-parallel

Conversation

@Edouard-chin
Copy link
Copy Markdown
Member

What was the end-user or developer problem that led to this PR?

Downloading git gems is very slow. I'd like to speed it up.

🔬 Comparison before/after
# frozen_string_literal: true

source "https://rubygems.org"

gem 'annex_29', github: "Shopify/annex-29"
gem 'ast', github: "whitequark/ast"
gem 'awesome_print', github: 'awesome-print/awesome_print'
gem 'concurrent-ruby', github: 'ruby-concurrency/concurrent-ruby'
gem 'csv', github: 'ruby/csv'
gem 'declarative', github: 'apotonick/declarative'
gem 'diffy', github: 'samg/diffy'
gem 'hashie', github: 'hashie/hashie'
gem 'hsluv', github: 'hsluv/hsluv-ruby'
gem 'holidays', github: 'holidays/holidays'

Before

bundle install 1.88s user 2.66s system 30% cpu 14.798 total

After

bundle install 1.84s user 3.74s system 84% cpu 6.616 total

What is your fix for the problem, implemented in this PR?

When you have a Gemfile that contains git gems, each repository will be fetched one by one. This is extremelly slow. A simple Gemfile with 5 git gems (small repositories) can take up to 10 seconds just to fetch the repos.

We can speed this up by running multiple git process and fetching repositories simultaneously.


The repositories are fetched in Bundler when Source::Git#specs is called. The problem is that source.specs is called in various places depending on the Gemfile used.

I think the issue is that calling source.specs feels like that as a "side effect" we are going to clone repositories. I believe that fetching repositories should be an explicit call.

For instance:

source "https://rubygems.org"

gem "foo", github: "foo/foo"

# The repository foo will be fetched as a side effect to the call to `source.spec_names`
# https://github.com/ruby/rubygems/blob/6cc7d71dac3d0275c9727cf200c7acfbf6c78d37/bundler/lib/bundler/source_map.rb#L21
source "https://rubygems.org"

gem "bar", source: "https://example.org"
gem "foo", github: "foo/foo"

# The repository foo will be fetched on a different codepath
# https://github.com/ruby/rubygems/blob/6cc7d71dac3d0275c9727cf200c7acfbf6c78d37/bundler/lib/bundler/source/rubygems_aggregate.rb#L35

# That is because the gem "bar" has a source that doesn't have the `/dependencies` API
# endpoint and therefore Bundler enters a different branch condition.

I opted to add a self explanatory call to fetch the git source repositories before we start the resolution, and just before any other calls to source.specs is performed.

Make sure the following tasks are checked

Comment thread bundler/spec/realworld/git_spec.rb Outdated
G
end

it "fetch git repositories in parallel" do
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if this test is a bit unorthodox, I had a hard time finding a way to highlight that the repositories are fetched simultaneously. My approach is to just check if the logs "Fetching ..." would be output in a chunk.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fetching in parallel seems like an implementation detail. I don't know that we need a test to specifically enforce that.

@Edouard-chin
Copy link
Copy Markdown
Member Author

Edouard-chin commented Nov 19, 2025

The failures are logging related due to Bundler.ui.silence {} which is not threadsafe. I'll figure out.

Edit: I pushed a fix to make the logger thread safe. Let me know if you'd prefer to have a PR for it.

@Edouard-chin Edouard-chin force-pushed the ec-git-parallel branch 3 times, most recently from 49069fb to 1300766 Compare November 20, 2025 00:12

def level=(level)
raise ArgumentError unless LEVELS.include?(level.to_s)
@level = level.to_s
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change this too?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally did, but I don't think we want to change it.
The reason being that we want children threads to have the same logger level as the main thread.

Bundler.ui.level = "debug"

Thread.new do
  puts Bundler.ui.level # "Info" (At the time the Bundler.ui object was instantiated)
end

@Edouard-chin
Copy link
Copy Markdown
Member Author

The failing test is due to a Ruby issue on 3.2 that was fixed in 3.2.2 https://bugs.ruby-lang.org/issues/19415
It incorrectly warns about a circular dependency.

I'm not sure what would be the best solution because the problem may arise for any other codepath that require something lazily (and Bundler has a lot of it), this would make our CI flaky.

@hsbt
Copy link
Copy Markdown
Member

hsbt commented Nov 20, 2025

Can we enable this parallel feature only >= Ruby 3.3? I'm okay to ignore this with Ruby 3.2 because that version will be EOL status at next year.

- ### Problem

  When you have a Gemfile that contains git gems, each repository will
  be fetched one by one. This is extremelly slow.
  A simple Gemfile with 5 git gems (small repositories) can take up
  to 10 seconds just to fetch the repos.

  We can speed this up by running multiple git process and fetching
  repositories silmutaneously.

  ### Solution

  The repositories are fetched in Bundler when `Source::Git#specs` is
  called.
  The problem is that `source.specs` is called in various places
  depending on Gemfile.

  I think the issue is that calling `source.specs` feels like that
  as a "side effect" we are going to clone repositories. I believe
  that fetching repositories should be an explicit call.

  For instance:

  ```ruby
  source "https://rubygems.org"

  gem "foo", github: "foo/foo"

  # The repository foo will be fetched as a side effect to the call to `source.spec_names`
  # https://github.com/ruby/rubygems/blob/6cc7d71dac3d0275c9727cf200c7acfbf6c78d37/bundler/lib/bundler/source_map.rb#L21
  ```

  ```ruby
  source "https://rubygems.org"

  gem "bar", source: "https://example.org"
  gem "foo", github: "foo/foo"

  # The repository foo will be fetched on a different codepath
  # https://github.com/ruby/rubygems/blob/6cc7d71dac3d0275c9727cf200c7acfbf6c78d37/bundler/lib/bundler/source/rubygems_aggregate.rb#L35

  # That is because the gem "bar" has a source that doesn't have the `/dependencies` API
  # endpoint and therefore Bundler enters a different branch condition.
  ```

  I opted to add a self explanatory call to fetch the git source
  repositories just before we start the resolution, and *just*
  before any other calls to `source.specs` is performed.
- The Logger is not thread safe when calling `with_level`.
  This now becomes problematic because we are using multiple
  threads during the resolution phase in order to fetch git gems.
- With the logger change that is now threadsafe, such code no longer
  behaves the same:

  ```ruby
  Bundler.ui.silence do
    Bundler.ui.level = 'info'

    Bundler.ui.info("foo")
    # This used to output something. Now it doesn't.
  end
  ```

  IMHO this is the right behaviour since we are in a silence block,
  changing the level should have no effect. And fortunately it seems
  that we only need to change this spec.

  The call to `Bundler.ui.silence` is done in a `around` block
  https://github.com/ruby/rubygems/blob/4a13684f07ebb1dea5501e3f826fab414f96bf47/bundler/spec/spec_helper.rb#L119
@Edouard-chin
Copy link
Copy Markdown
Member Author

Thank you for the suggestion! I wasn't sure whether this would be acceptable but yes that makes a lot of sense 👍

@hsbt hsbt merged commit e716adb into ruby:master Nov 20, 2025
76 checks passed
@Edouard-chin Edouard-chin deleted the ec-git-parallel branch November 20, 2025 23:21
@Edouard-chin
Copy link
Copy Markdown
Member Author

Thanks for your help everyone !

@hsbt hsbt mentioned this pull request Nov 21, 2025
4 tasks
Edouard-chin added a commit to Shopify/rubygems that referenced this pull request Feb 10, 2026
- In ruby#9100, I opened a patch to run git operations in parallel in
  order to download git gems more quickly.
  The parallelization doesn't works when resolving is not needed due
  to a premature call to `source.specs` hit by this codepath.

  You can reproduce the problem and see that the gems aren't
  downloaded in parallel if you have an existing Gemfile and
  Gemfile.lock. (In my first patch, I forgot to try when a
  Gemfile.lock already existed).

  I'd like to introduce a new call to `prelad_git_sources`, when
  bundler hits the "missing specs" branch.
  I unforunately couldn't find a single place where we could preload
  git sources due to the `source.specs` that is called at different
  path.
Edouard-chin added a commit to Shopify/rubygems that referenced this pull request Feb 10, 2026
- In ruby#9100, I opened a patch to run git operations in parallel in
  order to download git gems more quickly.
  The parallelization doesn't works when resolving is not needed due
  to a premature call to `source.specs` hit by this codepath.

  You can reproduce the problem and see that the gems aren't
  downloaded in parallel if you have an existing Gemfile and
  Gemfile.lock. (In my first patch, I forgot to try when a
  Gemfile.lock already existed).

  I'd like to introduce a new call to `prelad_git_sources`, when
  bundler hits the "missing specs" branch.
  I unforunately couldn't find a single place where we could preload
  git sources due to the `source.specs` that is called at different
  path.
Edouard-chin added a commit to Shopify/rubygems that referenced this pull request Feb 10, 2026
- In ruby#9100, I opened a patch to run git operations in parallel in
  order to download git gems more quickly.
  The parallelization doesn't works when resolving is not needed due
  to a premature call to `source.specs` hit by this codepath.

  You can reproduce the problem and see that the gems aren't
  downloaded in parallel if you have an existing Gemfile and
  Gemfile.lock. (In my first patch, I forgot to try when a
  Gemfile.lock already existed).

  I'd like to introduce a new call to `prelad_git_sources`, when
  bundler hits the "missing specs" branch.
  I unforunately couldn't find a single place where we could preload
  git sources due to the `source.specs` that is called at different
  path.
Edouard-chin added a commit to Shopify/rubygems that referenced this pull request Feb 13, 2026
- In ruby#9100, I opened a patch to run git operations in parallel in
  order to download git gems more quickly.
  The parallelization doesn't works when resolving is not needed due
  to a premature call to `source.specs` hit by this codepath.

  You can reproduce the problem and see that the gems aren't
  downloaded in parallel if you have an existing Gemfile and
  Gemfile.lock. (In my first patch, I forgot to try when a
  Gemfile.lock already existed).

  I'd like to introduce a new call to `prelad_git_sources`, when
  bundler hits the "missing specs" branch.
  I unforunately couldn't find a single place where we could preload
  git sources due to the `source.specs` that is called at different
  path.
Edouard-chin added a commit to Shopify/rubygems that referenced this pull request Mar 10, 2026
- There are failing tests on Ruby 3.2 due to an expecation that
  checks wheter stderr is empty. In this case, stderr outputs a
  warning from Ruby "warning: loading in progress, circular require
  considered harmful", but this is not true.

  This is a bug in Ruby 3.2 that I also encountered in ruby#9100 (comment)
  The problem is that Ruby 3.2 wrongly output this warning when
  multiple threads try to require a file at the same time.
  See https://bugs.ruby-lang.org/issues/19415 which was fixed in Ruby
  3.2.2.

  I opted to add a Mutex, as otherwise users on Ruby 3.2 will have this
  warning output.
Edouard-chin added a commit to Shopify/rubygems that referenced this pull request Mar 12, 2026
- There are failing tests on Ruby 3.2 due to an expecation that
  checks wheter stderr is empty. In this case, stderr outputs a
  warning from Ruby "warning: loading in progress, circular require
  considered harmful", but this is not true.

  This is a bug in Ruby 3.2 that I also encountered in ruby#9100 (comment)
  The problem is that Ruby 3.2 wrongly output this warning when
  multiple threads try to require a file at the same time.
  See https://bugs.ruby-lang.org/issues/19415 which was fixed in Ruby
  3.2.2.

  I opted to add a Mutex, as otherwise users on Ruby 3.2 will have this
  warning output.
matzbot pushed a commit to ruby/ruby that referenced this pull request Mar 12, 2026
- There are failing tests on Ruby 3.2 due to an expecation that
  checks wheter stderr is empty. In this case, stderr outputs a
  warning from Ruby "warning: loading in progress, circular require
  considered harmful", but this is not true.

  This is a bug in Ruby 3.2 that I also encountered in ruby/rubygems#9100 (comment)
  The problem is that Ruby 3.2 wrongly output this warning when
  multiple threads try to require a file at the same time.
  See https://bugs.ruby-lang.org/issues/19415 which was fixed in Ruby
  3.2.2.

  I opted to add a Mutex, as otherwise users on Ruby 3.2 will have this
  warning output.

ruby/rubygems@713dd5c0e1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants