-
Notifications
You must be signed in to change notification settings - Fork 243
Prevent parallel controller pipelines run causing two master replicas #1066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
helix-core/src/main/java/org/apache/helix/controller/ControllerLeaderSession.java
Outdated
Show resolved
Hide resolved
|
Got it. Thanks for pointing that out. I will get a better name for it.
Maybe "ZkSessionWrapper". If you have a better name, please let me know.
…On Fri, Jun 5, 2020 at 12:52 PM Lei Xia ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In helix-core/src/main/java/org/apache/helix/HelixManager.java
<#1066 (comment)>:
> @@ -425,6 +426,19 @@ void addExternalViewChangeListener(org.apache.helix.ExternalViewChangeListener l
*/
boolean isLeader();
+ /**
+ * Checks if the cluster manager is leader and sets its ZK session in
+ * ***@***.*** ControllerLeaderSession}.
+ *
+ * @param controllerLeaderSession To include ZK session of the cluster manager in return
+ *
+ * @return true if this is a controller and a leader of the cluster. Zk session of the cluster
+ * manager is set in controllerLeaderSession
+ */
+ default boolean isLeader(ControllerLeaderSession controllerLeaderSession) {
This could be confusion because there is no "leader" concept for a Helix
manager. Leader only applies to a controller instance, but HelixManager is
a broader concept here.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1066 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABHSRCM4ZMRIF3VAWDW7VQTRVFEHHANCNFSM4NU6ODJA>
.
|
helix-core/src/main/java/org/apache/helix/controller/stages/MessageDispatchStage.java
Outdated
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/controller/stages/MessageDispatchStage.java
Outdated
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/BaseDataAccessor.java
Outdated
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/HelixDataAccessor.java
Outdated
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/controller/stages/AttributeName.java
Outdated
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixManager.java
Outdated
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixManager.java
Outdated
Show resolved
Hide resolved
5147503 to
48e8a5f
Compare
helix-core/src/main/java/org/apache/helix/BaseDataAccessor.java
Outdated
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/controller/stages/MessageDispatchStage.java
Outdated
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixManager.java
Outdated
Show resolved
Hide resolved
helix-core/src/test/java/org/apache/helix/common/ZkTestBase.java
Outdated
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
Outdated
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/impl/client/DedicatedZkClient.java
Outdated
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
Outdated
Show resolved
Hide resolved
zookeeper-api/src/test/java/org/apache/helix/zookeeper/impl/client/TestRawZkClient.java
Outdated
Show resolved
Hide resolved
|
The current pull request has one important assumption if I understand correctly. Controller assignment to clusters in super cluster change changes when controller has a session expiration and establishing a new session. The thing is that is there any other event can cause controller assignment to cluster change without controller going through session expiration. For example, new cluster are added? Or any others? |
ab57dd4 to
c0357b5
Compare
helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
Outdated
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
Outdated
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
Outdated
Show resolved
Hide resolved
We add a new interface SessionAwareZkWriteData to make it generic: representing data is session aware. In future, we may implement this interface to have a public one like SessionAwareZNRecord . With this new interface, ZkClient doesn’t need to change: treating SessionAwareZkWriteData as generic one. How about this? @lei-xia @jiajunwang @kaisun2000 |
helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/controller/stages/AttributeName.java
Outdated
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/datamodel/SessionAwareZNRecord.java
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/datamodel/SessionAwareZNRecord.java
Show resolved
Hide resolved
helix-core/src/test/java/org/apache/helix/controller/stages/DummyClusterManager.java
Outdated
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/datamodel/SessionAwareZNRecord.java
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/controller/stages/MessageDispatchStage.java
Outdated
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks good to me.
Raise one question about Pause signal added immediately with a lead controller switch case? Can we double check?
If this is indeed a concern. Let us file another ticket to track it.
Also resolve JJ's comment, it seems minor at this point.
helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
Show resolved
Hide resolved
f3ac411 to
ef5f5fb
Compare
jiajunwang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
Sync-ed with @pkuwm offline. As I raised, there seems to be another race condition in this code. This is not introduced by this pull request, it exists all the time. Basically, it is like this:
Let us sync about shall we fix it here, or in another pull, or what should we do? I will create an issue to track it. #1442 is tracking this newly raised issue. |
@kaisun2000 Thanks for creating the issue for tracking it. If it is an issue, we should get it in another PR. Let's discuss. |
|
This PR is ready to be merged, approved by @kaisun2000 @jiajunwang There is a case that after controller leader switches, pipelines of both old Helix controller leader and new leader are running in parallel. The commit addresses this issue by blocking non-leader controller to send messages to ZK. |
Issues
Fixes #1027
Description
There is a case that after controller leader switches, pipelines of both old Helix controller leader and new leader are running in parallel. Different assignment decisions are sent to different participants so there are double masters for a single partition.
The PR addresses this issue by blocking non-leader controller to send messages.
Tests
The following tests are written for this issue:
testNoMessageSentOnControllerLosesLeadershipTestRawZkClient.testAsyncCreateByInvalidSessionThe following is the result of the "mvn test" command on the appropriate module:
Commits
Documentation (Optional)
(Link the GitHub wiki you added)
Code Quality
(helix-style-intellij.xml if IntelliJ IDE is used)