Skip to content

perf: avoid double function call in ReverseDiff value_and_gradient#729

Merged
gdalle merged 5 commits into
mainfrom
gd/rev
Feb 16, 2025
Merged

perf: avoid double function call in ReverseDiff value_and_gradient#729
gdalle merged 5 commits into
mainfrom
gd/rev

Conversation

@gdalle
Copy link
Copy Markdown
Member

@gdalle gdalle commented Feb 14, 2025

Personal musings: The behavior of ReverseDiff is very confusing, even on ImmutableDiffResult. I encountered a Heisenbug which seems to depend on the compilation path (disappears when I add a print statement), where value_and_gradient! suddenly becomes incorrect when called inside value_gradient_and_hessian!. But only for a compiled tape. And the :gradient tests for value_and_gradient! still pass. What a mess.

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 14, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.92%. Comparing base (98e6e5f) to head (e59205d).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #729      +/-   ##
==========================================
- Coverage   97.93%   97.92%   -0.01%     
==========================================
  Files         122      122              
  Lines        6386     6372      -14     
==========================================
- Hits         6254     6240      -14     
  Misses        132      132              
Flag Coverage Δ
DI 98.96% <100.00%> (-0.01%) ⬇️
DIT 95.71% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gdalle gdalle marked this pull request as draft February 14, 2025 21:16
@gdalle
Copy link
Copy Markdown
Member Author

gdalle commented Feb 14, 2025

So, in the current state of the PR, value_and_gradient! works but not alongside a Hessian computation... unless we print something? I'm going crazy.

julia> DifferentiationInterface.value_gradient_and_hessian!(
           f,
           zeros(2),
           zeros(2, 2),
           SecondOrder(AutoFiniteDiff(), AutoReverseDiff(; compile=true)),
           ones(2),
       )
(0.0, [2.0, 2.0], [2.0 0.0; 0.0 2.0])

julia> DifferentiationInterface.value_gradient_and_hessian!(
           f,
           zeros(2),
           zeros(2, 2),
           SecondOrder(AutoFiniteDiff(), AutoReverseDiff(; compile=true)),
           ones(2),
       )
(0.0, [2.0, 2.0], [2.0 0.0; 0.0 2.0])

julia> using DifferentiationInterface, FiniteDiff, ReverseDiff

julia> DifferentiationInterface.value_and_gradient!(
           f,
           zeros(2),
           AutoReverseDiff(; compile=true),
           ones(2),
       )
(2.0, [2.0, 2.0])

julia> DifferentiationInterface.value_gradient_and_hessian!(
           f,
           zeros(2),
           zeros(2, 2),
           SecondOrder(AutoFiniteDiff(), AutoReverseDiff(; compile=true)),
           ones(2),
       )
(0.0, [2.0, 2.0], [2.0 0.0; 0.0 2.0])

julia> # now we just add a log

julia> DifferentiationInterface.value_and_gradient!(
           f,
           zeros(2),
           AutoReverseDiff(; compile=true),
           ones(2),
       )
[ Info: I'm here
(2.0, [2.0, 2.0])

julia> DifferentiationInterface.value_gradient_and_hessian!(
           f,
           zeros(2),
           zeros(2, 2),
           SecondOrder(AutoFiniteDiff(), AutoReverseDiff(; compile=true)),
           ones(2),
       )
[ Info: I'm here
(2.0, [2.0, 2.0], [2.0 0.0; 0.0 2.0])

@gdalle
Copy link
Copy Markdown
Member Author

gdalle commented Feb 14, 2025

I further boiled it down to whether I do y, grad or y, _ in the assignment. Still have no clue what's going on.

EDIT: pure-ReverseDiff example available at JuliaDiff/ReverseDiff.jl#269

using DifferentiationInterface  # version from this branch
import DifferentiationInterface as DI
using ReverseDiff: ReverseDiff

backend = AutoReverseDiff(; compile=true)
f(x) = sum(abs2, x)
x = ones(2)

prep = prepare_gradient(f, backend, zero(x))

function value_and_gradient_nested!(f, grad, prep, backend, x)
    y, _ = value_and_gradient!(f, grad, prep, backend, x)
    return y, grad
end
julia> value_and_gradient!(f, zeros(2), prep, backend, x)
(2.0, [2.0, 2.0])

julia> value_and_gradient_nested!(f, zeros(2), prep, backend, x)  # wrong
(0.0, [2.0, 2.0])

This behavior only happens with the compiled tape mode of ReverseDiff. I think it might be because the compiler struggles to figure out that ReverseDiff mutates the value of y in the MutableDiffResult?

@gdalle gdalle marked this pull request as ready for review February 15, 2025 11:57
@gdalle gdalle merged commit 3417ff5 into main Feb 16, 2025
@gdalle gdalle deleted the gd/rev branch February 16, 2025 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant