Support for PyTorch model by haofanwang · Pull Request #322 · ucbrise/clipper

haofanwang · 2017-11-20T23:59:39Z

Implement a model deployer for PyTorch,support deploying PyTorch models directly from the clipper_admin tool, similarly to the way we support deploying PySpark models directly by calling Clipper.deploy_pyspark_model.

AmplabJenkins · 2017-11-21T00:00:03Z

Can one of the admins verify this patch?

Corey-Zumar

This works well - just a few minor issues to address. Nice job!

Corey-Zumar · 2017-11-24T05:06:16Z

clipper_admin/clipper_admin/deployers/pytorch.py

+        registry=None,
+        base_image="clipper/pytorch-container:{}".format(__version__),
+        num_replicas=1):
+    """Registers an app and deploys the provided predict function with PySpark model as


Change "PySpark" to "PyTorch"

Corey-Zumar · 2017-11-24T05:06:56Z

clipper_admin/clipper_admin/deployers/pytorch.py

+
+    try:       
+         torch.save(pytorch_model,torch_model_save_loc)
+         #torch.save(pytorch_model.state_dict(), torch_model_save_loc)


Remove commented code

Corey-Zumar · 2017-11-24T05:07:08Z

containers/python/pytorch_container.py

+
+import numpy as np
+
+# sys.path.append(os.path.abspath("/lib/"))


Remove commented code

Corey-Zumar · 2017-11-24T05:07:27Z

containers/python/pytorch_container.py

+    with open(file_path, 'r') as serialized_func_file:
+        return cloudpickle.load(serialized_func_file)
+
+##### Need write code to load pytorch model here


Remove this line

Corey-Zumar · 2017-11-24T05:07:36Z

containers/python/pytorch_container.py

+
+##### Need write code to load pytorch model here
+def load_pytorch_model(model_path):
+    print(model_path)


Remove debugging print statement

Corey-Zumar · 2017-11-24T05:07:46Z

containers/python/pytorch_container.py

+    model = torch.load(model_path)  
+    return model
+
+##### Need write code for pytorch container here


Corey-Zumar · 2017-11-24T05:08:24Z

containers/python/pytorch_container.py

+        return [str(p) for p in preds]
+
+    def predict_doubles(self, inputs):
+        preds = self.predict_func(inputs)


This call to self.predict_func is missing the self.model parameter.

Corey-Zumar · 2017-11-24T05:09:54Z

containers/python/pytorch_container_entry.sh

@@ -0,0 +1,15 @@
+#!/usr/bin/env sh


Modify the permissions of this script such that it can be executed (like we did during the last meeting) and commit the permissions change.

I‘m not sure what should I modify here,I just do the same thing as all other .sh file,do you mean I need change "#!/usr/bin/env sh" to " #!/bin/bash"?

Ah, by permissions change I mean chmod the script so that the container can execute it. 777 should work just fine

haofanwang · 2017-11-25T08:51:09Z

Done,please check it. @Corey-Zumar

dcrankshaw

Overall this looks pretty good. But you need to add some tests. Check out the PySpark model deployer tests for an example.

haofanwang · 2017-11-30T11:27:15Z

Done.

Corey-Zumar

@haofanwang The deployment functionality looks really good. Most of the comments I left concern the test. The training method doesn't currently run due to an exception being thrown with the forward propagation step. If training the model before deploying it is too difficult, it's fine to write a test that deploys one of the pretrained PyTorch models, such as VGG16 from TorchVision, which can be used as

> from torchvision.models import vgg16
> model = vgg16()
...

Corey-Zumar · 2017-12-07T01:18:45Z

clipper_admin/clipper_admin/deployers/pytorch.py

+        labels=None,
+        registry=None,
+        num_replicas=1):
+    """Deploy a Python function with a PySpark model.


Change PySpark to PyTorch

Corey-Zumar · 2017-12-07T01:19:15Z

clipper_admin/clipper_admin/deployers/pytorch.py

+        registry=None,
+        num_replicas=1):
+    """Deploy a Python function with a PySpark model.
+    The function must take 3 arguments (in order): a SparkSession, the PySpark model, and a list of


Update this description. The function no longer takes a SparkSession object. Replace all instances of PySpark of PyTorch

Corey-Zumar · 2017-12-07T01:24:22Z

integration-tests/deploy_pytorch_models.py

+        running_loss = 0.0
+        for i, data in enumerate(train_loader, 1):
+            img, label = data
+            if use_gpu:


This variable is not defined

Corey-Zumar · 2017-12-07T01:24:50Z

integration-tests/deploy_pytorch_models.py

+                img = Variable(img)
+                label = Variable(label)
+            else:
+                img = Variable(img)


both the "if" and "else" cases are the same

Corey-Zumar · 2017-12-07T01:26:52Z

integration-tests/deploy_pytorch_models.py

+    def __len__(self):
+        return len(self.imgs)
+
+    def getName(self):


This is never used

Corey-Zumar · 2017-12-07T01:28:46Z

integration-tests/deploy_pytorch_models.py

+        return 0
+
+
+def parseData(train_path, pos_label):


Nit: Replace camelcase method name with parse_data.

Corey-Zumar · 2017-12-07T01:29:05Z

integration-tests/deploy_pytorch_models.py

+
+
+def parseData(train_path, pos_label):
+    trainData = np.genfromtxt(train_path, delimiter=',', dtype=int)


Replace camelcase: train_data

Corey-Zumar · 2017-12-07T01:30:26Z

integration-tests/deploy_pytorch_models.py

+            cleanup=True, start_clipper=True)
+
+        train_path = os.path.join(cur_dir, "data/train.data")
+        (x, y) = parseData(train_path, pos_label)


Replace with train_x, train_y = parse_data(train_path, pos_label)

Corey-Zumar · 2017-12-07T01:31:07Z

integration-tests/deploy_pytorch_models.py

+    criterion = nn.CrossEntropyLoss()
+    optimizer = optim.SGD(model.parameters(), lr=0.001)
+    for epoch in range(1000):
+        print('*' * 10)


Remove print statement

Corey-Zumar · 2017-12-07T01:31:27Z

integration-tests/deploy_pytorch_models.py

+    optimizer = optim.SGD(model.parameters(), lr=0.001)
+    for epoch in range(1000):
+        print('*' * 10)
+        print('epoch {}'.format(epoch + 1))


Remove print statement

haofanwang · 2017-12-07T07:37:17Z

@Corey-Zumar I do some changes,please check it.

Corey-Zumar

@haofanwang There's an error in the deployment test. The container crashed with the following message:

Traceback (most recent call last):
  File "/container/pytorch_container.py", line 110, in <module>
    input_type)
  File "/container/rpc.py", line 491, in start
    self.server.run()
  File "/container/rpc.py", line 303, in run
    prediction_request)
  File "/container/rpc.py", line 131, in handle_prediction_request
    outputs = predict_fn(prediction_request.inputs)
  File "/container/pytorch_container.py", line 43, in predict_ints
    preds = self.predict_func(self.model, inputs)
  File "deploy_pytorch_models.py", line 62, in predict
TypeError: 'NoneType' object is not callable

haofanwang · 2017-12-07T17:57:07Z

Oh,I forgot to return the trained model. Have fixed it. @Corey-Zumar

Corey-Zumar

@haofanwang The container launched by the integration test crashes with the follow stack trace

Traceback (most recent call last):
  File "/container/pytorch_container.py", line 103, in <module>
    model = PyTorchContainer(model_path, input_type)
  File "/container/pytorch_container.py", line 36, in __init__
    self.model = load_pytorch_model(torch_model_path)
  File "/container/pytorch_container.py", line 22, in load_pytorch_model
    model = torch.load(model_path)
  File "/opt/conda/lib/python2.7/site-packages/torch/serialization.py", line 231, in load
    return _load(f, map_location, pickle_module)
  File "/opt/conda/lib/python2.7/site-packages/torch/serialization.py", line 379, in _load
    result = unpickler.load()
AttributeError: 'module' object has no attribute 'BasicNN'

Before submitting this for another round of reviews, please make sure that the integration test passes. You can test it by running python integration-tests/deploy_pytorch_models.py from the clipper root directory.

Corey-Zumar · 2017-12-09T08:26:35Z

integration-tests/deploy_pytorch_models.py

+def train(model):
+  model.train()
+  optimizer = optim.SGD(model.parameters(), lr=0.001)
+  for epoch in range(2000):


Let's only train the model for 10 epochs

haofanwang · 2017-12-15T01:44:41Z

@Corey-Zumar Integration test has passed.

dcrankshaw self-requested a review November 21, 2017 20:11

dcrankshaw added the status: needs review label Nov 21, 2017

Corey-Zumar requested changes Nov 24, 2017

View reviewed changes

Corey-Zumar added status: needs revision and removed status: needs review labels Nov 24, 2017

dcrankshaw requested changes Nov 27, 2017

View reviewed changes

dcrankshaw mentioned this pull request Nov 27, 2017

Querying Clipper with an image #325

Closed

dcrankshaw added status: needs review and removed status: needs revision labels Nov 29, 2017

Corey-Zumar requested changes Dec 7, 2017

View reviewed changes

Corey-Zumar added type: bug and removed status: needs review labels Dec 7, 2017

Corey-Zumar requested changes Dec 7, 2017

View reviewed changes

Corey-Zumar added status: needs review and removed type: bug labels Dec 9, 2017

Corey-Zumar requested changes Dec 9, 2017

View reviewed changes

Corey-Zumar added status: needs revision and removed status: needs review labels Dec 10, 2017

Corey-Zumar closed this Dec 16, 2017

Corey-Zumar mentioned this pull request Dec 16, 2017

Support for PyTorch model #346

Merged


		import numpy as np

		# sys.path.append(os.path.abspath("/lib/"))



		def parseData(train_path, pos_label):
		trainData = np.genfromtxt(train_path, delimiter=',', dtype=int)

Conversation

haofanwang commented Nov 20, 2017

Uh oh!

AmplabJenkins commented Nov 21, 2017

Uh oh!

Corey-Zumar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

haofanwang commented Nov 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dcrankshaw left a comment

Choose a reason for hiding this comment

Uh oh!

haofanwang commented Nov 30, 2017

Uh oh!

Corey-Zumar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

haofanwang commented Dec 7, 2017

Uh oh!

Corey-Zumar left a comment

Choose a reason for hiding this comment

Uh oh!

haofanwang commented Dec 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Corey-Zumar left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

haofanwang commented Dec 15, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

haofanwang commented Nov 25, 2017 •

edited

Loading

haofanwang commented Dec 7, 2017 •

edited

Loading

Corey-Zumar left a comment •

edited

Loading