Support for PyTorch model#322
Support for PyTorch model#322haofanwang wants to merge 0 commit intoucbrise:developfrom haofanwang:develop
Conversation
|
Can one of the admins verify this patch? |
Corey-Zumar
left a comment
There was a problem hiding this comment.
This works well - just a few minor issues to address. Nice job!
| registry=None, | ||
| base_image="clipper/pytorch-container:{}".format(__version__), | ||
| num_replicas=1): | ||
| """Registers an app and deploys the provided predict function with PySpark model as |
There was a problem hiding this comment.
Change "PySpark" to "PyTorch"
|
|
||
| try: | ||
| torch.save(pytorch_model,torch_model_save_loc) | ||
| #torch.save(pytorch_model.state_dict(), torch_model_save_loc) |
|
|
||
| import numpy as np | ||
|
|
||
| # sys.path.append(os.path.abspath("/lib/")) |
| with open(file_path, 'r') as serialized_func_file: | ||
| return cloudpickle.load(serialized_func_file) | ||
|
|
||
| ##### Need write code to load pytorch model here |
|
|
||
| ##### Need write code to load pytorch model here | ||
| def load_pytorch_model(model_path): | ||
| print(model_path) |
There was a problem hiding this comment.
Remove debugging print statement
| model = torch.load(model_path) | ||
| return model | ||
|
|
||
| ##### Need write code for pytorch container here |
| return [str(p) for p in preds] | ||
|
|
||
| def predict_doubles(self, inputs): | ||
| preds = self.predict_func(inputs) |
There was a problem hiding this comment.
This call to self.predict_func is missing the self.model parameter.
| @@ -0,0 +1,15 @@ | |||
| #!/usr/bin/env sh | |||
There was a problem hiding this comment.
Modify the permissions of this script such that it can be executed (like we did during the last meeting) and commit the permissions change.
There was a problem hiding this comment.
I‘m not sure what should I modify here,I just do the same thing as all other .sh file,do you mean I need change "#!/usr/bin/env sh" to " #!/bin/bash"?
There was a problem hiding this comment.
Ah, by permissions change I mean chmod the script so that the container can execute it. 777 should work just fine
|
Done,please check it. @Corey-Zumar |
dcrankshaw
left a comment
There was a problem hiding this comment.
Overall this looks pretty good. But you need to add some tests. Check out the PySpark model deployer tests for an example.
|
Done. |
Corey-Zumar
left a comment
There was a problem hiding this comment.
@haofanwang The deployment functionality looks really good. Most of the comments I left concern the test. The training method doesn't currently run due to an exception being thrown with the forward propagation step. If training the model before deploying it is too difficult, it's fine to write a test that deploys one of the pretrained PyTorch models, such as VGG16 from TorchVision, which can be used as
> from torchvision.models import vgg16
> model = vgg16()
...
| labels=None, | ||
| registry=None, | ||
| num_replicas=1): | ||
| """Deploy a Python function with a PySpark model. |
There was a problem hiding this comment.
Change PySpark to PyTorch
| registry=None, | ||
| num_replicas=1): | ||
| """Deploy a Python function with a PySpark model. | ||
| The function must take 3 arguments (in order): a SparkSession, the PySpark model, and a list of |
There was a problem hiding this comment.
Update this description. The function no longer takes a SparkSession object. Replace all instances of PySpark of PyTorch
| running_loss = 0.0 | ||
| for i, data in enumerate(train_loader, 1): | ||
| img, label = data | ||
| if use_gpu: |
There was a problem hiding this comment.
This variable is not defined
| img = Variable(img) | ||
| label = Variable(label) | ||
| else: | ||
| img = Variable(img) |
There was a problem hiding this comment.
both the "if" and "else" cases are the same
| def __len__(self): | ||
| return len(self.imgs) | ||
|
|
||
| def getName(self): |
| return 0 | ||
|
|
||
|
|
||
| def parseData(train_path, pos_label): |
There was a problem hiding this comment.
Nit: Replace camelcase method name with parse_data.
|
|
||
|
|
||
| def parseData(train_path, pos_label): | ||
| trainData = np.genfromtxt(train_path, delimiter=',', dtype=int) |
There was a problem hiding this comment.
Replace camelcase: train_data
| cleanup=True, start_clipper=True) | ||
|
|
||
| train_path = os.path.join(cur_dir, "data/train.data") | ||
| (x, y) = parseData(train_path, pos_label) |
There was a problem hiding this comment.
Replace with train_x, train_y = parse_data(train_path, pos_label)
| criterion = nn.CrossEntropyLoss() | ||
| optimizer = optim.SGD(model.parameters(), lr=0.001) | ||
| for epoch in range(1000): | ||
| print('*' * 10) |
| optimizer = optim.SGD(model.parameters(), lr=0.001) | ||
| for epoch in range(1000): | ||
| print('*' * 10) | ||
| print('epoch {}'.format(epoch + 1)) |
|
@Corey-Zumar I do some changes,please check it. |
Corey-Zumar
left a comment
There was a problem hiding this comment.
@haofanwang There's an error in the deployment test. The container crashed with the following message:
Traceback (most recent call last):
File "/container/pytorch_container.py", line 110, in <module>
input_type)
File "/container/rpc.py", line 491, in start
self.server.run()
File "/container/rpc.py", line 303, in run
prediction_request)
File "/container/rpc.py", line 131, in handle_prediction_request
outputs = predict_fn(prediction_request.inputs)
File "/container/pytorch_container.py", line 43, in predict_ints
preds = self.predict_func(self.model, inputs)
File "deploy_pytorch_models.py", line 62, in predict
TypeError: 'NoneType' object is not callable
|
Oh,I forgot to return the trained model. Have fixed it. @Corey-Zumar |
There was a problem hiding this comment.
@haofanwang The container launched by the integration test crashes with the follow stack trace
Traceback (most recent call last):
File "/container/pytorch_container.py", line 103, in <module>
model = PyTorchContainer(model_path, input_type)
File "/container/pytorch_container.py", line 36, in __init__
self.model = load_pytorch_model(torch_model_path)
File "/container/pytorch_container.py", line 22, in load_pytorch_model
model = torch.load(model_path)
File "/opt/conda/lib/python2.7/site-packages/torch/serialization.py", line 231, in load
return _load(f, map_location, pickle_module)
File "/opt/conda/lib/python2.7/site-packages/torch/serialization.py", line 379, in _load
result = unpickler.load()
AttributeError: 'module' object has no attribute 'BasicNN'
Before submitting this for another round of reviews, please make sure that the integration test passes. You can test it by running python integration-tests/deploy_pytorch_models.py from the clipper root directory.
| def train(model): | ||
| model.train() | ||
| optimizer = optim.SGD(model.parameters(), lr=0.001) | ||
| for epoch in range(2000): |
There was a problem hiding this comment.
Let's only train the model for 10 epochs
|
@Corey-Zumar Integration test has passed. |
Implement a model deployer for PyTorch,support deploying PyTorch models directly from the clipper_admin tool, similarly to the way we support deploying PySpark models directly by calling Clipper.deploy_pyspark_model.