Skip to content

change set_num_replicas in docker_container_manager#616

Merged
simon-mo merged 5 commits intoucbrise:developfrom
Evan-JH-Kim:develop
Mar 28, 2019
Merged

change set_num_replicas in docker_container_manager#616
simon-mo merged 5 commits intoucbrise:developfrom
Evan-JH-Kim:develop

Conversation

@Evan-JH-Kim
Copy link
Contributor

Due to

[docker_container_manager.py:353] [default-cluster] Found 0 replicas for sample-model:1. Adding 1
Traceback (most recent call last):
File "", line 1, in
File "/Users/lineplus/anaconda3/envs/dory_py3.6/lib/python3.6/site-packages/clipper_admin/clipper_admin.py", line 620, in deploy_model
num_replicas=num_replicas)
File "/Users/lineplus/anaconda3/envs/dory_py3.6/lib/python3.6/site-packages/clipper_admin/docker/docker_container_manager.py", line 279, in deploy_model
self.set_num_replicas(name, version, input_type, image, num_replicas)
File "/Users/lineplus/anaconda3/envs/dory_py3.6/lib/python3.6/site-packages/clipper_admin/docker/docker_container_manager.py", line 363, in set_num_replicas
self.docker_client.api.inspect_container(name).get("State").get("Health").get("Status") != "healthy":
AttributeError: 'NoneType' object has no attribute 'get'

This error occurs because container.attrs.get("State") and self.docker_client.api.inspect_container(name) can be 'None' type before enough delay time.
Giving delay time can resolve this problem.

…teError: 'NoneType' object has no attribute 'get'
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@withsmilo
Copy link
Collaborator

Jenkins test this please

1 similar comment
@simon-mo
Copy link
Contributor

Jenkins test this please

@simon-mo
Copy link
Contributor

@withsmilo forgot to made you an admin in jenkins, you should be all set now.

@simon-mo
Copy link
Contributor

Jenkins ok to test

@withsmilo
Copy link
Collaborator

@simon-mo : Thanks! but Travis CI are generating error logs now. :(

until docker pull clippertesting/query_frontend:993336adf0; do sleep 5; done
until docker pull clippertesting/management_frontend:993336adf0; do sleep 5; done
until docker pull clippertesting/frontend-exporter:993336adf0; do sleep 5; done
until docker pull clippertesting/noop-container:993336adf0; do sleep 5; done
until docker pull clippertesting/python-closure-container:993336adf0; do sleep 5; done
Error response from daemon: manifest for clippertesting/query_frontend:993336adf0 not found
Error response from daemon: manifest for clippertesting/frontend-exporter:993336adf0 not found
Error response from daemon: manifest for clippertesting/python-closure-container:993336adf0 not found
Error response from daemon: manifest for clippertesting/management_frontend:993336adf0 not found
...

@simon-mo
Copy link
Contributor

@withsmilo yes this is because it's waiting on jenkins to push the images. This is fine because the image built by jenkins will be tagged with the commit hash, we can always go to travis's UI and restart travis build.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1777/
Test FAILed.

@simon-mo
Copy link
Contributor

It seems the multi-tenancy tests have failed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1782/
Test FAILed.

@withsmilo
Copy link
Collaborator

Jenkins test this please

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1789/
Test FAILed.

@simon-mo
Copy link
Contributor

simon-mo commented Feb 1, 2019

I believe there's a bug somewhere, this fails the multi-tenancy tests.

To reproduce it locally, you can pip install -e . inside clipper_admin directory and run the test with cd integration_tests; python multi_tenancy_test.py

@simon-mo
Copy link
Contributor

simon-mo commented Feb 1, 2019

Log

19-02-01:05:42:05 INFO [test_utils.py:75] Creating DockerContainerManager
19-02-01:05:42:05 INFO [test_utils.py:94] Starting up Docker cluster multi-tenancy-1-187
19-02-01:05:42:05 INFO [test_utils.py:105] Starting Clipper
19-02-01:05:42:05 INFO [docker_container_manager.py:151] [multi-tenancy-1-187] Starting managed Redis instance in Docker
19-02-01:05:42:12 INFO [docker_container_manager.py:229] [multi-tenancy-1-187] Metric Configuration Saved at /tmp/tmpKMeWmu.yml
19-02-01:05:42:13 INFO [clipper_admin.py:143] [multi-tenancy-1-187] Clipper is running
19-02-01:05:42:14 INFO [test_utils.py:75] Creating DockerContainerManager
19-02-01:05:42:14 INFO [test_utils.py:94] Starting up Docker cluster multi-tenancy-2-6785
19-02-01:05:42:14 INFO [test_utils.py:105] Starting Clipper
19-02-01:05:42:14 INFO [docker_container_manager.py:151] [multi-tenancy-2-6785] Starting managed Redis instance in Docker
19-02-01:05:42:20 INFO [docker_container_manager.py:229] [multi-tenancy-2-6785] Metric Configuration Saved at /tmp/tmp7b6Aa0.yml
19-02-01:05:42:22 INFO [clipper_admin.py:143] [multi-tenancy-2-6785] Clipper is running
19-02-01:05:42:23 INFO [clipper_admin.py:220] [multi-tenancy-1-187] Application testapp0-model was successfully registered
19-02-01:05:42:23 INFO [deployer_utils.py:41] Saving function to /tmp/tmp1cYzDkclipper
19-02-01:05:42:23 INFO [deployer_utils.py:51] Serialized and supplied predict function
19-02-01:05:42:23 INFO [python.py192] Python closure saved
19-02-01:05:42:23 INFO [python.py198] Using Python 2 base image
19-02-01:05:42:23 INFO [clipper_admin.py:472] [multi-tenancy-1-187] Building model Docker image with model data from /tmp/tmp1cYzDkclipper
19-02-01:05:42:24 INFO [clipper_admin.py:477] [multi-tenancy-1-187] Step 1/2 : FROM clippertesting/python-closure-container:2e57b899a3
19-02-01:05:42:24 INFO [clipper_admin.py:477] [multi-tenancy-1-187] ---> ac6d18edda4e
19-02-01:05:42:24 INFO [clipper_admin.py:477] [multi-tenancy-1-187] Step 2/2 : COPY /tmp/tmp1cYzDkclipper /model/
19-02-01:05:42:24 INFO [clipper_admin.py:477] [multi-tenancy-1-187] ---> 1b3f714b6bef
19-02-01:05:42:24 INFO [clipper_admin.py:477] [multi-tenancy-1-187] Successfully built 1b3f714b6bef
19-02-01:05:42:24 INFO [clipper_admin.py:477] [multi-tenancy-1-187] Successfully tagged multi-tenancy-1-187-testapp0-model:1
19-02-01:05:42:24 INFO [clipper_admin.py:479] [multi-tenancy-1-187] Pushing model Docker image to multi-tenancy-1-187-testapp0-model:1
19-02-01:05:43:05 INFO [docker_container_manager.py:353] [multi-tenancy-1-187] Found 0 replicas for testapp0-model:1. Adding 1
19-02-01:05:43:07 INFO [clipper_admin.py:656] [multi-tenancy-1-187] Successfully registered model testapp0-model:1
19-02-01:05:43:07 INFO [clipper_admin.py:574] [multi-tenancy-1-187] Done deploying model testapp0-model:1.
19-02-01:05:43:07 INFO [clipper_admin.py:282] [multi-tenancy-1-187] Model testapp0-model is now linked to application testapp0-model
19-02-01:05:43:07 INFO [clipper_admin.py:220] [multi-tenancy-2-6785] Application testapp0-model was successfully registered
19-02-01:05:43:07 INFO [deployer_utils.py:41] Saving function to /tmp/tmp5Vw8wMclipper
19-02-01:05:43:07 INFO [deployer_utils.py:51] Serialized and supplied predict function
19-02-01:05:43:07 INFO [python.py192] Python closure saved
19-02-01:05:43:07 INFO [python.py198] Using Python 2 base image
19-02-01:05:43:07 INFO [clipper_admin.py:472] [multi-tenancy-2-6785] Building model Docker image with model data from /tmp/tmp5Vw8wMclipper
19-02-01:05:43:08 INFO [clipper_admin.py:477] [multi-tenancy-2-6785] Step 1/2 : FROM clippertesting/python-closure-container:2e57b899a3
19-02-01:05:43:08 INFO [clipper_admin.py:477] [multi-tenancy-2-6785] ---> ac6d18edda4e
19-02-01:05:43:08 INFO [clipper_admin.py:477] [multi-tenancy-2-6785] Step 2/2 : COPY /tmp/tmp5Vw8wMclipper /model/
19-02-01:05:43:08 INFO [clipper_admin.py:477] [multi-tenancy-2-6785] ---> a17bd6c1aad9
19-02-01:05:43:08 INFO [clipper_admin.py:477] [multi-tenancy-2-6785] Successfully built a17bd6c1aad9
19-02-01:05:43:08 INFO [clipper_admin.py:477] [multi-tenancy-2-6785] Successfully tagged multi-tenancy-2-6785-testapp0-model:1
19-02-01:05:43:08 INFO [clipper_admin.py:479] [multi-tenancy-2-6785] Pushing model Docker image to multi-tenancy-2-6785-testapp0-model:1
19-02-01:05:43:55 INFO [docker_container_manager.py:353] [multi-tenancy-2-6785] Found 0 replicas for testapp0-model:1. Adding 1
19-02-01:05:44:00 INFO [clipper_admin.py:656] [multi-tenancy-2-6785] Successfully registered model testapp0-model:1
19-02-01:05:44:00 INFO [clipper_admin.py:574] [multi-tenancy-2-6785] Done deploying model testapp0-model:1.
19-02-01:05:44:00 INFO [clipper_admin.py:282] [multi-tenancy-2-6785] Model testapp0-model is now linked to application testapp0-model

'{"query_id":0,"output":0.6000000000000001,"default":false}', 5.698000 ms
'{"query_id":0,"output":"None","default":true,"default_explanation":"No connected models found for query"}', 8.068000 ms
Traceback (most recent call last):
File "/clipper/integration-tests/multi_tenancy_test.py", line 78, in test(kubernetes=False)
File "/clipper/integration-tests/multi_tenancy_test.py", line 24, in test
	assert not res_2['default']
AssertionError

@withsmilo
Copy link
Collaborator

withsmilo commented Feb 2, 2019

@simon-mo :
Thank you for your kindness. After several tests, I noticed that the multi-tenancy test was too unstable.
To solve this problem, we need to add time.sleep(10) as shown in the codes below. Do I need to make a PR to fix it?

https://github.com/ucbrise/clipper/blob/develop/integration-tests/multi_tenancy_test.py#L14-L27

def test(kubernetes):
    conn_1 = create('multi-tenancy-1-{}'.format(randint(1,9999)), use_kubernetes=kubernetes)
    conn_2 = create('multi-tenancy-2-{}'.format(randint(1,9999)), use_kubernetes=kubernetes)

    deploy_(conn_1, use_kubernetes=kubernetes)
    deploy_(conn_2, use_kubernetes=kubernetes)

    time.sleep(10)  # <---- added!

    res_1 = predict_(conn_1.get_query_addr(), [.1, .2, .3])
    res_2 = predict_(conn_2.get_query_addr(), [.1, .2, .3])
    assert not res_1['default']
    assert not res_2['default']

    conn_1.stop_all()
    conn_2.stop_all()

@simon-mo
Copy link
Contributor

simon-mo commented Feb 5, 2019

Fix added at #626, will rebase after that.

@withsmilo
Copy link
Collaborator

@Evan-JH-Kim , please merge this PR with latest develop branch.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1816/
Test PASSed.

@withsmilo
Copy link
Collaborator

Jenkins ok to test

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1817/
Test PASSed.

@Evan-JH-Kim
Copy link
Contributor Author

Travis CI still fails.. what's wrong with it?

@withsmilo
Copy link
Collaborator

@Evan-JH-Kim : Passed on all tests of Travis CI.
@simon-mo : Please review this PR.

@withsmilo withsmilo requested a review from RehanSD March 28, 2019 00:34
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Clipper-PRB/1854/
Test PASSed.

@simon-mo simon-mo merged commit 4d2ef58 into ucbrise:develop Mar 28, 2019
rkooo567 pushed a commit to rkooo567/clipper that referenced this pull request Apr 27, 2019
* change set_num_replicas in docker_container_manager, due to - AttributeError: 'NoneType' object has no attribute 'get'

* revise set_num_replicas logic

* changed 'container' to 'inspect_container'
rkooo567 pushed a commit to rkooo567/clipper that referenced this pull request Apr 28, 2019
* change set_num_replicas in docker_container_manager, due to - AttributeError: 'NoneType' object has no attribute 'get'

* revise set_num_replicas logic

* changed 'container' to 'inspect_container'
rkooo567 pushed a commit to rkooo567/clipper that referenced this pull request Apr 28, 2019
* change set_num_replicas in docker_container_manager, due to - AttributeError: 'NoneType' object has no attribute 'get'

* revise set_num_replicas logic

* changed 'container' to 'inspect_container'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants