Skip to content

Add watchdog functionalities#600

Merged
mssun merged 2 commits intoapache:masterfrom
ya0guang:watchdog
Jan 19, 2022
Merged

Add watchdog functionalities#600
mssun merged 2 commits intoapache:masterfrom
ya0guang:watchdog

Conversation

@ya0guang
Copy link
Member

Description

Clients can now cancel a task, even when the task is executing;
Teaclave can detect dangling tasks, which means the executor can send heartbeat packets for liveness report.

Currently Client Rust API changes are missing.

Fixes # (issue)

Type of change (select or add applied and delete the others)

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • API change with a documentation update
  • Additional test coverage
  • Code cleanup or just sync with upstream third-party crates

How has this been tested?

Checklist

  • Fork the repo and create your branch from master.
  • If you've added code that should be tested, add tests.
  • If you've changed APIs, update the documentation.
  • Ensure the tests pass (see CI results).
  • Make sure your code lints/format.

@ya0guang ya0guang requested a review from mssun January 18, 2022 18:44
Copy link
Contributor

@AI-Memory AI-Memory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of commits in this PR looks like intermediate patches, so it would be better if squash some of them with more meaningful commit comments, Thanks.

- executor now heartbeats to send its current status and receive
command from scheduler
- scheduler can now find out lost executor(s) and monitor their status,
and fail the corresponding task(s) when necessary
- users can cancel a task, new API has been added
- new tests for task cancelation and dangling task detection
./teaclave_authentication_service &
./teaclave_storage_service &
sleep 3 # wait for authentication and storage service
sleep 10 # wait for authentication and storage service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it may suggest a new feature similar to readiness probe of k8s.

Copy link
Member Author

@ya0guang ya0guang Jan 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I think probing may be added in the following commits

# or something unexpected happens,
# you may uncomment the following lines to cancel the task
# time.sleep(3)
# print("[+] canceling task")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code comment could be removed if it is incomplete.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code can be uncomment to cancel an executing task. It means to demo calling the new cancel_task API and provide users an example. This example will run for a long while, so users may want to know how to cancel a task running for a long period without response.

# or something unexpected happens,
# you may uncomment the following lines to cancel the task
# time.sleep(3)
# print("[+] canceling task")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code can be uncomment to cancel an executing task. It means to demo calling the new cancel_task API and provide users an example. This example will run for a long while, so users may want to know how to cancel a task running for a long period without response.

./teaclave_authentication_service &
./teaclave_storage_service &
sleep 3 # wait for authentication and storage service
sleep 10 # wait for authentication and storage service
Copy link
Member Author

@ya0guang ya0guang Jan 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I think probing may be added in the following commits

@mssun mssun self-requested a review January 19, 2022 22:15
Copy link
Member

@mssun mssun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@mssun mssun merged commit 8fdac8c into apache:master Jan 19, 2022
@ya0guang ya0guang deleted the watchdog branch April 29, 2022 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants