PyTorch + ONNX + Caffe2 Model deployer#362
Conversation
|
Can one of the admins verify this patch? |
|
jenkins ok to test
…On Fri, Jan 19, 2018 at 8:35 AM, UCB AMPLab ***@***.***> wrote:
Can one of the admins verify this patch?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#362 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAaV5PajEhTfliJKvv_xSa1ac8kg3ZFQks5tMMQ7gaJpZM4Rkuky>
.
|
|
jenkins ok to test |
|
Test FAILed. |
|
Test FAILed. |
Corey-Zumar
left a comment
There was a problem hiding this comment.
I think this is close. The Caffe2 container is currently crashing, and the unit test does not pass. Here are the logs from the container:
Attempting to run Caffe2 container without installing dependencies
Contents of /model
environment.yml
func.pkl
modules
WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode.
WARNING:root:Debug message: libcuda.so.1: cannot open shared object file: No such file or directory
CRITICAL:root:Cannot load caffe2.python. Error: /opt/conda/lib/python2.7/site-packages/caffe2/python/../../../../libcaffe2.so: undefined symbol: _ZN7leveldb2DB4OpenERKNS_7OptionsERKSsPPS0_
dockerfiles/Caffe2Dockerfile
Outdated
| && conda install -c anaconda cloudpickle=0.5.2 | ||
|
|
||
| RUN conda install -c ezyang onnx \ | ||
| && RUN conda install -c conda-forge protobuf==3.4.0 \ |
There was a problem hiding this comment.
Remove the RUN directives after &&. Also, align &&'s with the ones on lines 7-11
| registry=None, | ||
| base_image="clipper/caffe2-container:{}".format(__version__), | ||
| num_replicas=1): | ||
| """Registers an app and deploys the provided predict function with Caffe2 model as |
There was a problem hiding this comment.
This function deploys the prediction function with a PyTorch model. It serializes the PyTorch model in Onnx format and creates a container that loads it as a Caffe2 model. Let's update the documentation here and for deploy_caffe2_model with this information.
|
|
||
| import numpy as np | ||
|
|
||
| from clipper_admin.deployers import cloudpickle |
There was a problem hiding this comment.
this should just be import cloudpickle
|
Test FAILed. |
|
I guess it should be a mismatch between library such as protobuf or opencv, but there is no such error in my local environment. Will PR later. |
|
What's the status on this? |
|
Ok to test. @dcrankshaw |
|
Test PASSed. |
dcrankshaw
left a comment
There was a problem hiding this comment.
This looks great! Just some small naming and API changes requested.
bin/build_docker_images.sh
Outdated
| create_image tf_cifar_container TensorFlowCifarDockerfile $public | ||
| create_image tf-container TensorFlowDockerfile $public | ||
| create_image pytorch-container PyTorchContainerDockerfile $public | ||
| create_image caffe2-container Caffe2Dockerfile $public |
There was a problem hiding this comment.
Let's call this caffe2-onnx-container and rename the Dockerfile to Caffe2OnnxDockerfile
| @@ -0,0 +1,159 @@ | |||
| from __future__ import print_function, with_statement, absolute_import | |||
There was a problem hiding this comment.
This file shouldn't be called the Caffe2 deployer, because it doesn't let people deploy Caffe2 models. Let's rename this to onnx.py and we'll start to centralize all our ONNX-related model deployer functionality in here.
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| def create_endpoint( |
There was a problem hiding this comment.
Rename this to create_pytorch_endpoint, add an argument called onnx_backend="caffe2", then change the base_image default value to None (base_image=None).
Some context: We might want to support multiple ONNX backends soon. The cool thing about ONNX is that it decouples the choice of backend from the source of the model. So within the onnx model deployer we can support multiple training frameworks and inference frameworks. E.g. along with our create_pytorch_endpoint we might have a create_caffe2_endpoint and a create_mxnet_endpoint, any of which could deploy their model to a Caffe2 backend or an mxnet backend or a pytorch backend.
| clipper_conn.link_model_to_app(name, name) | ||
|
|
||
|
|
||
| def deploy_caffe2_model( |
There was a problem hiding this comment.
Rename this to deploy_pytorch_model, add an argument onnx_backend=caffe2, and change the base_image default value to be None (base_image=None).
Modify the method documentation to match the new functionality.
Then inside the function, add the following code:
if base_image is None:
if onnx_backend is "caffe2":
base_image = "clipper/caffe2-container:{}".format(__version__)
else:
logger.error("{backend} ONNX backend is not currently supported.".format(backend=onnx_backend))|
|
||
| try: | ||
| torch_out = torch.onnx._export( | ||
| pytorch_model, inputs, "pytorch_model.onnx", export_params=True) |
There was a problem hiding this comment.
Save this as just "model.onnx", the Caffe2 model can load any ONNX model, not just ones saved from PyTorch.
dockerfiles/Caffe2Dockerfile
Outdated
| && apt-get install -yqq -t jessie-backports openjdk-8-jdk \ | ||
| && conda install -y --file /lib/python_container_conda_deps.txt \ | ||
| && conda install -c anaconda cloudpickle=0.5.2 \ | ||
| && conda install -c ezyang onnx\ |
There was a problem hiding this comment.
This should be && conda install -c conda-forge onnx \
dockerfiles/Caffe2Dockerfile
Outdated
| && conda install -c anaconda cloudpickle=0.5.2 \ | ||
| && conda install -c ezyang onnx\ | ||
| && conda install -c conda-forge protobuf==3.4.0 \ | ||
| && conda install -c ezyang/label/devgpu caffe2 \ |
There was a problem hiding this comment.
This should be && conda install -c caffe2 caffe2 \
There was a problem hiding this comment.
I tried this command, but it seems to conflict with other packages in my local environment, so I install another one from anaconda cloud. I will change them as you say to see whether test can pass. @dcrankshaw
dockerfiles/Caffe2Dockerfile
Outdated
| && conda install -c ezyang onnx\ | ||
| && conda install -c conda-forge protobuf==3.4.0 \ | ||
| && conda install -c ezyang/label/devgpu caffe2 \ | ||
| && conda install -c ezyang onnx-caffe2 \ |
There was a problem hiding this comment.
This should be && pip install onnx-caffe2 \
dockerfiles/Caffe2Dockerfile
Outdated
| && conda install -c ezyang/label/devgpu caffe2 \ | ||
| && conda install -c ezyang onnx-caffe2 \ | ||
| && conda install -c jjh_pytorch pytorch \ | ||
| && conda install -c jjh_pytorch torchvision |
There was a problem hiding this comment.
Why are you installing pytorch and torchvision in this container?
There was a problem hiding this comment.
As torch and torchvision are imported in the test file. Should I delete there lines ? @dcrankshaw
There was a problem hiding this comment.
Umm actually the python function serialization code will find PyTorch as dependencies and install them anyway, so you can leave them in. But install them as
&& conda install -c pytorch pytorch torchvision \
| @@ -0,0 +1,222 @@ | |||
| from __future__ import absolute_import, print_function | |||
There was a problem hiding this comment.
Rename this to deploy_pytorch_to_caffe2_with_onnx.py
|
Test PASSed. |
|
The logs says that there exists some python format violations, could you check it for me ? @Corey-Zumar |
dcrankshaw
left a comment
There was a problem hiding this comment.
This is looking good! Almost there.
| for a model can be changed at any time with | ||
| :py:meth:`clipper.ClipperConnection.set_num_replicas`. | ||
| onnx_backend : str, optional | ||
| The provided onnx backend. |
There was a problem hiding this comment.
Add a comment that caffe2 is the only currently supported ONNX backend.
| deploy_pytorch_model(clipper_conn, name, version, input_type, inputs, func, | ||
| pytorch_model, base_image, labels, registry, | ||
| num_replicas) | ||
| num_replicas,onnx_backend) |
There was a problem hiding this comment.
You need a space here after the comma
num_replicas, onnx_backend)| for a model can be changed at any time with | ||
| :py:meth:`clipper.ClipperConnection.set_num_replicas`. | ||
| onnx_backend : str, optional | ||
| The provided onnx backend. |
There was a problem hiding this comment.
Add the same comment as above about caffe2 being the only currently supported ONNX backend
| if onnx_backend is "caffe2": | ||
| base_image = "clipper/caffe2-onnx-container:{}".format(__version__) | ||
| else: | ||
| logger.error("{backend} ONNX backend is not currently supported.".format(backend=onnx_backend)) |
| except Exception as e: | ||
| logger.warn("Error serializing torch model: %s" % e) | ||
|
|
||
| logger.info("Torch model has be serialized to ONNX foamat") |
There was a problem hiding this comment.
Oh, just that you have a typo in the word "format". You spelled it "foamat", and I want you to spell it "format".
That comment was using vim syntax for find/replace ("substitute format for foamat")
dockerfiles/Caffe2Dockerfile
Outdated
| && conda install -c ezyang/label/devgpu caffe2 \ | ||
| && conda install -c ezyang onnx-caffe2 \ | ||
| && conda install -c jjh_pytorch pytorch \ | ||
| && conda install -c jjh_pytorch torchvision |
There was a problem hiding this comment.
Umm actually the python function serialization code will find PyTorch as dependencies and install them anyway, so you can leave them in. But install them as
&& conda install -c pytorch pytorch torchvision \
| link_model=False, | ||
| predict_fn=predict): | ||
| deploy_caffe2_model(clipper_conn, model_name, version, "integers", inputs, | ||
| deploy_pytorch_model(clipper_conn, model_name, version, "integers", inputs, |
There was a problem hiding this comment.
Specify the onnx_backend argument here
|
|
||
| app_and_model_name = "easy-register-app-model" | ||
| create_endpoint(clipper_conn, app_and_model_name, "integers", | ||
| create_pytorch_endpoint(clipper_conn, app_and_model_name, "integers", |
There was a problem hiding this comment.
Specify the onnx_backend argument here
|
Test FAILed. |
|
Test FAILed. |
|
The test failed the format checker. Run |
|
Test FAILed. |
|
Test PASSed. |
|
Test FAILed. |
|
The logs are below.
Why the connection error happens after merging ? Could you take a look? @dcrankshaw @Corey-Zumar |
dcrankshaw
left a comment
There was a problem hiding this comment.
Some more cleanup changes
| :py:meth:`clipper.ClipperConnection.set_num_replicas`. | ||
| onnx_backend : str, optional | ||
| The provided onnx backend.Caffe2 is the only currently supported ONNX backend. | ||
| """ |
There was a problem hiding this comment.
You're missing documentation for the batch_size argument
| :py:meth:`clipper.ClipperConnection.set_num_replicas`. | ||
| onnx_backend : str, optional | ||
| The provided onnx backend.Caffe2 is the only currently supported ONNX backend. | ||
| """ |
There was a problem hiding this comment.
You're missing documentation for the batch_size argument
| # Deploy model | ||
| clipper_conn.build_and_deploy_model(name, version, input_type, | ||
| serialization_dir, base_image, labels, | ||
| registry, num_replicas, batch_size) |
There was a problem hiding this comment.
Move the call to clipper_conn.build_and_deploy_model() to inside the try statement
dockerfiles/Caffe2OnnxDockerfile
Outdated
| && apt-get install -yqq -t jessie-backports openjdk-8-jdk \ | ||
| && conda install -y --file /lib/python_container_conda_deps.txt \ | ||
| && conda install -c anaconda cloudpickle=0.5.2 \ | ||
| && conda install -c ezyang onnx\ |
There was a problem hiding this comment.
install onnx from the official channel: conda install -c conda-forge onnx
dockerfiles/Caffe2OnnxDockerfile
Outdated
| && conda install -c anaconda cloudpickle=0.5.2 \ | ||
| && conda install -c ezyang onnx\ | ||
| && conda install -c conda-forge protobuf==3.4.0 \ | ||
| && conda install -c ezyang/label/devgpu caffe2 \ |
There was a problem hiding this comment.
install caffe2 from the official source
dockerfiles/Caffe2OnnxDockerfile
Outdated
| && conda install -c ezyang onnx\ | ||
| && conda install -c conda-forge protobuf==3.4.0 \ | ||
| && conda install -c ezyang/label/devgpu caffe2 \ | ||
| && conda install -c ezyang onnx-caffe2 \ |
There was a problem hiding this comment.
install caffe2 onnx extensions from official source
dockerfiles/Caffe2OnnxDockerfile
Outdated
| && conda install -c ezyang/label/devgpu caffe2 \ | ||
| && conda install -c ezyang onnx-caffe2 \ | ||
| && conda install -c jjh_pytorch pytorch \ | ||
| && conda install -c jjh_pytorch torchvision |
There was a problem hiding this comment.
install pytorch and torchvision from the official channel: conda install pytorch torchvision -c pytorch
|
Test PASSed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test PASSed. |
Comments have been addressed
* update caffe2 deployer * update caffe2 container * update caffe2 container * update Caffe2Dockerfile * update deploy_caffe2_models.py * Update build_docker_images.sh * Format code * Update caffe2 container entrypoint permissions * Update Caffe2Dockerfile * Update caffe2_container.py * Update caffe2.py * Update build_docker_images.sh * Rename caffe2.py to onnx.py * Update onnx.py * Update and rename caffe2_container.py to caffe2_onnx_container.py * Update and rename caffe2_container_entry.sh to caffe2_onnx_container_entry.sh * Rename Caffe2Dockerfile to Caffe2OnnxDockerfile * Rename deploy_caffe2_models.py to deploy_pytorch_to_caffe2_with_onnx.py * Update caffe2_onnx_container_entry.sh * Update Caffe2OnnxDockerfile * Update deploy_pytorch_to_caffe2_with_onnx.py * Update onnx.py * Update onnx.py * Update caffe2_onnx_container_entry.sh * Update onnx.py * Update onnx.py * Update deploy_pytorch_to_caffe2_with_onnx.py * Update onnx.py * Update deploy_pytorch_to_caffe2_with_onnx.py * Update onnx.py * Update deploy_pytorch_to_caffe2_with_onnx.py * Support PyTorch + ONNX + Caffe2 Model deployer * Support PyTorch + ONNX + Caffe2 Model deployer * Update onnx.py
@Corey-Zumar #340 Please take a look, and test it. It seems fine on local environment.