Skip to content

intermittent timeouts when creating new deployments #70

@elmiko

Description

@elmiko

occasionally i will see a timeout when openshifter attempts to contact an instance through ssh during the creation process, this will cause the process to fail. if i restart the create command then it usually proceeds as normal. i think this may have to do with vm instances taking longer than expected to become live. here is the output log i see occasionally:

INFO:Provisioner(gce):Validating master existence
INFO:Provisioner(gce):Getting node
INFO:Provisioner(gce):Master exists (35.190.201.132)
INFO:Provisioner(gce):Validating node node-0 existence
INFO:Provisioner(gce):Getting node
INFO:Provisioner(gce):Node node-0 exists (35.189.199.255)
INFO:Provisioner(gce):Validating node node-1 existence
INFO:Provisioner(gce):Getting node
INFO:Provisioner(gce):Node node-1 exists (35.195.218.93)
INFO:paramiko.transport:Connected (version 2.0, client OpenSSH_7.4)
INFO:paramiko.transport:Authentication (publickey) successful!
INFO:paramiko.transport:Connected (version 2.0, client OpenSSH_7.4)
INFO:paramiko.transport:Authentication (publickey) successful!
Traceback (most recent call last):
  File "../main.py", line 14, in <module>
    openshifter.cli.cli()
  File "/usr/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/root/openshifter/cli.py", line 47, in create
    openshifter.create()
  File "/root/openshifter/__init__.py", line 50, in create
    self.install()
  File "/root/openshifter/__init__.py", line 34, in install
    features.execute("pre_install", self.deployment, self.cluster)
  File "/root/features/__init__.py", line 39, in execute
    ssh_client = Ssh(deployment, cluster)
  File "/root/openshifter/ssh.py", line 21, in __init__
    self.connect("node", node.public_address)
  File "/root/openshifter/ssh.py", line 26, in connect
    self.clients[address].connect()
  File "/root/openshifter/ssh.py", line 83, in connect
    allow_agent=False, look_for_keys=False)
  File "/usr/lib/python3.6/site-packages/paramiko/client.py", line 357, in connect
    raise NoValidConnectionsError(errors)
paramiko.ssh_exception.NoValidConnectionsError: [Errno None] Unable to connect to port 22 on 35.195.218.93

i am not able to consistently reproduce this, but i feel adding some sort of delay or retry for these commands might help.

for reference i am running this image:

docker.io/osevg/openshifter        latest              eaffb778a868        2 weeks ago         846.2 MB

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions