Skip to content

Heartbeat issue with Topic #883

@andysCaplin

Description

@andysCaplin

C++ compiler version: g++ (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3)
Hazelcast Cpp client version: 4.0.1
Hazelcast server version: 4.0
Number of the clients: 8
Cluster size, i.e. the number of Hazelcast cluster members: 9
OS version (Windows/Linux/OSX): Linux

Please attach relevant logs and files for client and server side.
Server side log

2021-06-10 16:47:57.783  INFO 7057 --- [ached.thread-13] c.h.internal.nio.tcp.TcpIpConnection     : [172.20.0.239]:5701 [dev] [4.0.3] Connection[id=8, /127.0.0.1:5701->/127.0.0.1:60152, qualifier=null, endpoint=[127.0.0.1]:60152, alive=false, connectionType=CPP] closed. Reason: Client heartbeat is timed out, closing connection to Connection[id=8, /127.0.0.1:5701->/127.0.0.1:60152, qualifier=null, endpoint=[127.0.0.1]:60152, alive=true, connectionType=CPP]. Now: 2021-06-10 16:47:57.782. LastTimePacketReceived: 2021-06-10 16:47:26.273
2021-06-10 16:47:57.786  INFO 7057 --- [_hopper.event-1] c.c.d.a.ClientCleanupService             : Client <11f1dccf-6aa7-493f-a21b-b95b548a9b06> has connected: ClientEvent{uuid='11f1dccf-6aa7-493f-a21b-b95b548a9b06', eventType=CONNECTED, address=/172.20.0.239:35984, clientType=CPP, name='hz.client_1', attributes=[]}
2021-06-10 16:47:57.786  INFO 7057 --- [ration.thread-1] c.h.c.i.p.t.AuthenticationMessageTask    : [172.20.0.239]:5701 [dev] [4.0.3] Received auth from Connection[id=10, /172.20.0.239:5701->/172.20.0.239:35984, qualifier=null, endpoint=[172.20.0.239]:35984, alive=true, connectionType=CPP], successfully authenticated, clientUuid: 11f1dccf-6aa7-493f-a21b-b95b548a9b06, client version: 4.0.1
2021-06-10 16:47:57.788  INFO 7057 --- [_hopper.event-3] c.h.client.impl.ClientEndpointManager    : [172.20.0.239]:5701 [dev] [4.0.3] Destroying ClientEndpoint{connection=Connection[id=8, /127.0.0.1:5701->/127.0.0.1:60152, qualifier=null, endpoint=[127.0.0.1]:60152, alive=false, connectionType=CPP], clientUuid='11f1dccf-6aa7-493f-a21b-b95b548a9b06, authenticated=true, clientVersion=4.0.1, creationTime=1623340045794, latest clientAttributes=null}
2021-06-10 16:47:57.788  INFO 7057 --- [_hopper.event-1] c.c.d.a.ClientCleanupService             : Client <11f1dccf-6aa7-493f-a21b-b95b548a9b06> has disconnected: ClientEvent{uuid='11f1dccf-6aa7-493f-a21b-b95b548a9b06', eventType=DISCONNECTED, address=/127.0.0.1:60152, clientType=CPP, name='hz.client_1', attributes=[]}
2021-06-10 16:47:57.788  INFO 7057 --- [pool-2-thread-2] c.c.datasource.licence.LicenceService    : Server Licence request: ServerRequest{peerUUID='11f1dccf-6aa7-493f-a21b-b95b548a9b06', requestType='License', productName='transformer', timestamp='1623340077', metaData={label=Transformer-2}}
2021-06-10 16:47:57.788  INFO 7057 --- [_hopper.event-1] c.c.d.a.ClientCleanupService             : Check remove peer <11f1dccf-6aa7-493f-a21b-b95b548a9b06> port 60152 ports [35984]

The client side logs the disconnect initiated by the server

Expected behaviour

Heartbeats should not stop being sent from the clients

Actual behaviour

Heartbeats not being sent when Topic code is in use

Steps to reproduce the behaviour

I've added a basic pub/sub Topic feature to 3 of the clients based on the example code.
The 1st of the 3 clients that comes up never has an issue.
The 1st client publishes an initial info message and then regular updates to the topic
The 2nd and 3rd of the clients subscribe to the topic and only publish the initial info message
The 2nd and 3rd of the clients disconnect from the server every 40 seconds
The thread that handles the updates on the 2nd and 3rd clients works fine. It receives and handles the updates as they arrive i.e. they're not stuck.

The server configuration is

<properties>
    <property name="hazelcast.heartbeat.failuredetector.type">deadline</property>
    <property name="hazelcast.heartbeat.interval.seconds">5</property>
    <property name="hazelcast.max.no.heartbeat.seconds">30</property>
    <property name="hazelcast.client.heartbeat.interval">5000</property>
    <property name="hazelcast.client.heartbeat.timeout">30000</property>
    <property name="hazelcast.client.max.no.heartbeat.seconds">30</property>
</properties>

The pub/sub code is

Subscribe

    try {
        static auto topic = hz_ptr->get_topic(topicname).get();
        static hazelcast::client::topic::listener topicListener;
        topicListener.
                on_received([](hazelcast::client::topic::message &&message) {
                    auto clusterMsg = message.get_message_object().get<CaplinCluster>().value();
                    if ( uuid.compare(clusterMsg.i_uuid) != 0 ) {
                        ds_msg_t *msg = (ds_msg_t *)calloc(1, sizeof(*msg));
                        msg->type = clusterMsg.i_type;
                        msg->len  = clusterMsg.i_len;
                        msg->msg  = clusterMsg.i_msg;
                        cluster_cb(clusterMsg.i_label.c_str(), msg);
                    } 
                });
        topic->add_message_listener(std::move(topicListener)).get();
    } catch (hazelcast::client::exception::iexception &e) {
        ds_log(event_log, DS_LOG_ERROR, "Discovery: Cluster - Failed to subscribe to <%s>\n", topicname);
    }

Publish

    try {
        CaplinCluster discoveryMsg = CaplinCluster{uuid, _ds_datasrc_local_label(), (int32_t)msg->type, (int32_t)msg->len, msg->msg};
        auto topic = hz_ptr->get_topic(topicname).get();
        topic->publish(discoveryMsg).get();
    } catch (hazelcast::client::exception::iexception &e) {
        ds_log(event_log, DS_LOG_ERROR, "Discovery: Cluster - Failed publish to <%s>\n", topicname);
    }

Metadata

Metadata

Assignees

No one assigned

    Labels

    Source: Communityto-jiraUse to create a placeholder Jira issue in Jira APIs Project

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions