Run git operations in parallel to speed things up:#9100
Conversation
| G | ||
| end | ||
|
|
||
| it "fetch git repositories in parallel" do |
There was a problem hiding this comment.
Sorry if this test is a bit unorthodox, I had a hard time finding a way to highlight that the repositories are fetched simultaneously. My approach is to just check if the logs "Fetching ..." would be output in a chunk.
There was a problem hiding this comment.
Fetching in parallel seems like an implementation detail. I don't know that we need a test to specifically enforce that.
|
The failures are logging related due to Edit: I pushed a fix to make the logger thread safe. Let me know if you'd prefer to have a PR for it. |
49069fb to
1300766
Compare
|
|
||
| def level=(level) | ||
| raise ArgumentError unless LEVELS.include?(level.to_s) | ||
| @level = level.to_s |
There was a problem hiding this comment.
I originally did, but I don't think we want to change it.
The reason being that we want children threads to have the same logger level as the main thread.
Bundler.ui.level = "debug"
Thread.new do
puts Bundler.ui.level # "Info" (At the time the Bundler.ui object was instantiated)
end|
The failing test is due to a Ruby issue on 3.2 that was fixed in 3.2.2 https://bugs.ruby-lang.org/issues/19415 I'm not sure what would be the best solution because the problem may arise for any other codepath that |
|
Can we enable this parallel feature only >= Ruby 3.3? I'm okay to ignore this with Ruby 3.2 because that version will be EOL status at next year. |
67d96de to
4222c1d
Compare
- ### Problem When you have a Gemfile that contains git gems, each repository will be fetched one by one. This is extremelly slow. A simple Gemfile with 5 git gems (small repositories) can take up to 10 seconds just to fetch the repos. We can speed this up by running multiple git process and fetching repositories silmutaneously. ### Solution The repositories are fetched in Bundler when `Source::Git#specs` is called. The problem is that `source.specs` is called in various places depending on Gemfile. I think the issue is that calling `source.specs` feels like that as a "side effect" we are going to clone repositories. I believe that fetching repositories should be an explicit call. For instance: ```ruby source "https://rubygems.org" gem "foo", github: "foo/foo" # The repository foo will be fetched as a side effect to the call to `source.spec_names` # https://github.com/ruby/rubygems/blob/6cc7d71dac3d0275c9727cf200c7acfbf6c78d37/bundler/lib/bundler/source_map.rb#L21 ``` ```ruby source "https://rubygems.org" gem "bar", source: "https://example.org" gem "foo", github: "foo/foo" # The repository foo will be fetched on a different codepath # https://github.com/ruby/rubygems/blob/6cc7d71dac3d0275c9727cf200c7acfbf6c78d37/bundler/lib/bundler/source/rubygems_aggregate.rb#L35 # That is because the gem "bar" has a source that doesn't have the `/dependencies` API # endpoint and therefore Bundler enters a different branch condition. ``` I opted to add a self explanatory call to fetch the git source repositories just before we start the resolution, and *just* before any other calls to `source.specs` is performed.
- The Logger is not thread safe when calling `with_level`. This now becomes problematic because we are using multiple threads during the resolution phase in order to fetch git gems.
- With the logger change that is now threadsafe, such code no longer
behaves the same:
```ruby
Bundler.ui.silence do
Bundler.ui.level = 'info'
Bundler.ui.info("foo")
# This used to output something. Now it doesn't.
end
```
IMHO this is the right behaviour since we are in a silence block,
changing the level should have no effect. And fortunately it seems
that we only need to change this spec.
The call to `Bundler.ui.silence` is done in a `around` block
https://github.com/ruby/rubygems/blob/4a13684f07ebb1dea5501e3f826fab414f96bf47/bundler/spec/spec_helper.rb#L119
4222c1d to
08a5cd5
Compare
|
Thank you for the suggestion! I wasn't sure whether this would be acceptable but yes that makes a lot of sense 👍 |
|
Thanks for your help everyone ! |
- In ruby#9100, I opened a patch to run git operations in parallel in order to download git gems more quickly. The parallelization doesn't works when resolving is not needed due to a premature call to `source.specs` hit by this codepath. You can reproduce the problem and see that the gems aren't downloaded in parallel if you have an existing Gemfile and Gemfile.lock. (In my first patch, I forgot to try when a Gemfile.lock already existed). I'd like to introduce a new call to `prelad_git_sources`, when bundler hits the "missing specs" branch. I unforunately couldn't find a single place where we could preload git sources due to the `source.specs` that is called at different path.
- In ruby#9100, I opened a patch to run git operations in parallel in order to download git gems more quickly. The parallelization doesn't works when resolving is not needed due to a premature call to `source.specs` hit by this codepath. You can reproduce the problem and see that the gems aren't downloaded in parallel if you have an existing Gemfile and Gemfile.lock. (In my first patch, I forgot to try when a Gemfile.lock already existed). I'd like to introduce a new call to `prelad_git_sources`, when bundler hits the "missing specs" branch. I unforunately couldn't find a single place where we could preload git sources due to the `source.specs` that is called at different path.
- In ruby#9100, I opened a patch to run git operations in parallel in order to download git gems more quickly. The parallelization doesn't works when resolving is not needed due to a premature call to `source.specs` hit by this codepath. You can reproduce the problem and see that the gems aren't downloaded in parallel if you have an existing Gemfile and Gemfile.lock. (In my first patch, I forgot to try when a Gemfile.lock already existed). I'd like to introduce a new call to `prelad_git_sources`, when bundler hits the "missing specs" branch. I unforunately couldn't find a single place where we could preload git sources due to the `source.specs` that is called at different path.
- In ruby#9100, I opened a patch to run git operations in parallel in order to download git gems more quickly. The parallelization doesn't works when resolving is not needed due to a premature call to `source.specs` hit by this codepath. You can reproduce the problem and see that the gems aren't downloaded in parallel if you have an existing Gemfile and Gemfile.lock. (In my first patch, I forgot to try when a Gemfile.lock already existed). I'd like to introduce a new call to `prelad_git_sources`, when bundler hits the "missing specs" branch. I unforunately couldn't find a single place where we could preload git sources due to the `source.specs` that is called at different path.
- There are failing tests on Ruby 3.2 due to an expecation that checks wheter stderr is empty. In this case, stderr outputs a warning from Ruby "warning: loading in progress, circular require considered harmful", but this is not true. This is a bug in Ruby 3.2 that I also encountered in ruby#9100 (comment) The problem is that Ruby 3.2 wrongly output this warning when multiple threads try to require a file at the same time. See https://bugs.ruby-lang.org/issues/19415 which was fixed in Ruby 3.2.2. I opted to add a Mutex, as otherwise users on Ruby 3.2 will have this warning output.
- There are failing tests on Ruby 3.2 due to an expecation that checks wheter stderr is empty. In this case, stderr outputs a warning from Ruby "warning: loading in progress, circular require considered harmful", but this is not true. This is a bug in Ruby 3.2 that I also encountered in ruby#9100 (comment) The problem is that Ruby 3.2 wrongly output this warning when multiple threads try to require a file at the same time. See https://bugs.ruby-lang.org/issues/19415 which was fixed in Ruby 3.2.2. I opted to add a Mutex, as otherwise users on Ruby 3.2 will have this warning output.
- There are failing tests on Ruby 3.2 due to an expecation that checks wheter stderr is empty. In this case, stderr outputs a warning from Ruby "warning: loading in progress, circular require considered harmful", but this is not true. This is a bug in Ruby 3.2 that I also encountered in ruby/rubygems#9100 (comment) The problem is that Ruby 3.2 wrongly output this warning when multiple threads try to require a file at the same time. See https://bugs.ruby-lang.org/issues/19415 which was fixed in Ruby 3.2.2. I opted to add a Mutex, as otherwise users on Ruby 3.2 will have this warning output. ruby/rubygems@713dd5c0e1
What was the end-user or developer problem that led to this PR?
Downloading git gems is very slow. I'd like to speed it up.
🔬 Comparison before/after
Before
After
What is your fix for the problem, implemented in this PR?
When you have a Gemfile that contains git gems, each repository will be fetched one by one. This is extremelly slow. A simple Gemfile with 5 git gems (small repositories) can take up to 10 seconds just to fetch the repos.
We can speed this up by running multiple git process and fetching repositories simultaneously.
The repositories are fetched in Bundler when
Source::Git#specsis called. The problem is thatsource.specsis called in various places depending on the Gemfile used.I think the issue is that calling
source.specsfeels like that as a "side effect" we are going to clone repositories. I believe that fetching repositories should be an explicit call.For instance:
I opted to add a self explanatory call to fetch the git source repositories before we start the resolution, and just before any other calls to
source.specsis performed.Make sure the following tasks are checked