Adding PyTorch functionality to SIRF#1305
Conversation
KrisThielemans
left a comment
There was a problem hiding this comment.
Some detailed comments, but overall, I think we still need to decide on how to wrap AcquisitionModel, AcquisitionSensitivityModel, Resampler etc, i.e. anything that has forward and backward members. This should be written only once. You had https://github.com/SyneRBI/SIRF-Exercises/blob/71df538fe892ce621440544f14fa992c230fd120/notebooks/Deep_Learning_PET/sirf_torch.py#L38-L39, which seems completely generic, aside from naming of variables. So, something like this
class OperatorModule(torch.nn.Module):
def __init__(self, sirf_operator, sirf_src, sirf_dest):
"""constructs wrapper. WARNING operations will overwrite content of `sirf_src` and `sirf_dest`"""
...with usage
acq_model = pet.AcquisitionModelParallelProj()
# etc etc
torch_acq_model = sirf.torch.OperatorModule(acq_model, image, acq_data)Ideally, we also provide for being able to call save_for_backward
|
Added a README with the some notes for a design discussion here. |
KrisThielemans
left a comment
There was a problem hiding this comment.
I'd prefer not to have too much code in the design discussion (duplication as well as distracting from actual design). You could include pointers to implementation in the README, but it's still small enough that this probably isn't worth the effort.
Just keep a skeleton in the README, the shorter the better...
How do you want people to react to your questions in the README? Via a review? (not conventional, but could work)
|
By the way, we should avoid running CI on this until it makes sense. Easiest is to put |
KrisThielemans
left a comment
There was a problem hiding this comment.
Various comments in the review. A general comment: I'm not convinced about hiding the "times -1" for the objective function. For one thing, it will fail with CIL functions such as KL. There are a few possible strategies here, which we should discuss.
src/torch/gradchecks.py
Outdated
There was a problem hiding this comment.
@paskino so this is an attempt at pytest
There was a problem hiding this comment.
Now works with MR, I am getting some very sinister intermittant segfault with setting up the sirf.STIR objective function @KrisThielemans @paskino. I do think I saw this in the PETRIC challenge. Really not sure how to diagnose and it may be SIRF/PyTorch compatibility issues, but I really don't know.
KrisThielemans
left a comment
There was a problem hiding this comment.
We seem to have a disagreement on how this should be documented. Ideally @mehrhardt @gschramm provide an alternative opinion to mine...
By the way, I think using funky characters in the doc is a bad idea and that it is better to use Latex-doc (although it is less readable of course). This is because you don't know what language settings the user has (unless there's a way to enforce unicode in Markdown).
For me, this description of the autograd in pytorch makes sense: This explain why, in the backward pass of a vector valued function y = f(x), we have to implement the If we write that out in components, we get the "chain rule".
|
|
thanks for the link @gschramm. We could maybe include it in the README. I think the main question for me is how much we should elaborate in our own doc, especially if it gets into detail on what is common usage vs what it actually does. I've put some detailed comments, and @Imraj-Singh will address those he agrees with :-). I'd personally prefer not too much (potentially confusing) detail in the README, but point to torch doc and our use cases (which could have a bit more doc then) or exercises. |
src/torch/gradchecks.py
Outdated
|
|
||
|
|
||
| run_gradcheck(torch_obj_fun, torch_image_data, (modality, data_type), "Objective", | ||
| nondet_tol=1e-1, fast_mode=True, eps=1e-3, atol=1e-1, rtol=1e-1) |
There was a problem hiding this comment.
@gschramm I had to set up the tolerance for non-determinism to be extremely high for objective function gradchecks... I think this may be troubles with computing the numerical Jacobian from a PNLL type function causing issues with numerical precision? I was wondering if you get this with your gradchecks in parallelproj. As a comparison I only wrapped the acquisition model and used torch's PNLL for the objective, and I still need rather high tolerances for non-determinism.
src/torch/README.md
Outdated
| ## Forward and backward clarification | ||
|
|
||
| The use of the terms forward and backward have different meaning given the context: | ||
| * `torch.autograd.Function`: the `forward` method (forward pass) is the evaluation of the function, and `backward` method (backward pass) computes the (scaled directional) derivative, i.e. the Vector-Jacobian-adjoint-Product (VJP), from output to input. |
There was a problem hiding this comment.
Why do you refer to the derivatives as "scaled"?
There was a problem hiding this comment.
The vector isn't neccessarily a unit vector. That said, I think I will just say "gradient" as that is correct and less confusing.
There was a problem hiding this comment.
I don't agree that "gradient is correct". The backward method multiplies its argument with the gradient of the function, which is the VJP of course. I'd personally remove "from output to input", as this terminology only makes sense if you think about this is inserted as a layer in a feed-forward network.
| * `torch.autograd.Function`: the `forward` method (forward pass) is the evaluation of the function, and `backward` method (backward pass) computes the (scaled directional) derivative, i.e. the Vector-Jacobian-adjoint-Product (VJP), from output to input. | |
| * `torch.autograd.Function`: the `forward` method (forward pass) is the evaluation of the function, and `backward` method (backward pass) multiplies its argument with the gradient of the function, i.e. computes the Vector-Jacobian-adjoint-Product (VJP). |
src/torch/README.md
Outdated
|
|
||
| The wrapper is currently only for **reverse-mode** autodiff - there are JVP methods (for forward-mode autodiff) of `torch.autograd.Function` that are not used at present. | ||
|
|
||
| For `SIRFOperator` we can wrap the adjoint operator with `AdjointOperator`, there quite confusingly the forward is the application of the adjoint, and the backward *is* the linear (Jacobian) component/approximation of the operator. |
There was a problem hiding this comment.
backward is the linear (Jacobian) component/approximation of the operator
I don't know what this means
There was a problem hiding this comment.
If we have some operator A(x) and it's adjoint B(x), the forward pass would be B(x), and the backward pass would be J_A(x) * grad_out. So for operator Ax+b, adjoint is A^T x and and backward pass is A grad_out? grad_out is pytorch for product of upstream gradients, i.e. upstream chain rule....
Perhaps I should change it to:
For
SIRFOperator, wrapping its adjoint withAdjointOperatorresults in: forward executing the adjoint operation, and backward computing the Jacobian-based gradient of the originalSIRFOperator.
There was a problem hiding this comment.
I'm really not sure if this is helpful to mention at this point. It could be done in a use-case. But maybe we could say
| For `SIRFOperator` we can wrap the adjoint operator with `AdjointOperator`, there quite confusingly the forward is the application of the adjoint, and the backward *is* the linear (Jacobian) component/approximation of the operator. | |
| Note that in certain cases it can be necessary to use the adjoint of an operator as a forward step, e.g. when adding a gradient-descent step as a layer. This can be achieved by wrapping `sirf.AdjointOperator(sirf_operator)`. |
There was a problem hiding this comment.
Does this need to be addressed?
There was a problem hiding this comment.
sure. I like my draft :-)
There was a problem hiding this comment.
can't see anywhere it fits atm (seems outdated). Will merge as-is.
Co-authored-by: Kris Thielemans <KrisThielemans@users.noreply.github.com>
…ted as well as gradchecks tol decrease [ci skip]
- drop `object` typehints - enable torch cpu tests - misc formatting
- with the current `import sirf.SIRF as sirf`, we should use `sirf.AdjointOperator` - cvorrect convBlock error [ci skip]
3.9.0 was released. [ci skip]
…ted as well as gradchecks tol decrease [ci skip]
- drop `object` typehints - enable torch cpu tests - misc formatting
We were returning the wrong variable (found by Codacy) [ci skip]
|
@casperdcl did you rebase after all? |
|
Couldn't resist... rebased & squashed a few commits. |
- also misc README copyedits

Changes in this pull request
The functionality:
torch.autograd.Functionfor objective function, acquisition mode forward, and acquisition model backward. These are not really meant to be used by a user.torch.nn.Modulefor a module that interacts with thetorch.autograd.Function. In the forward method we could change the number of the dimensions in the array, swap in/out components of the forward operator etc.This functionality of
torch.nn.Modulecan vary dependent on the user's requirements, so perhaps we should just give a use case. Using it for lpd for example, or PET reconstruction with torch optimisers?Testing performed
Gradchecks for all the new functions. Note that objective function evaluations are quite unstable.
Related issues
Checklist before requesting a review
Contribution Notes
Please read and adhere to the contribution guidelines.
Please tick the following: