Skip to content

chore: Improve logging at node startup#24764

Closed
Jeffrey-morgan34 wants to merge 10 commits intomainfrom
coe-679-keystoreloader-observability
Closed

chore: Improve logging at node startup#24764
Jeffrey-morgan34 wants to merge 10 commits intomainfrom
coe-679-keystoreloader-observability

Conversation

@Jeffrey-morgan34
Copy link
Copy Markdown
Contributor

@Jeffrey-morgan34 Jeffrey-morgan34 commented Apr 6, 2026

EnhancedKeyStoreLoader loads signing keys from disk and gossip CA certificates from the roster at node startup. Before this change, all key loading activity was logged at TRACE/DEBUG level. These logs being visible will allow easier diagnose of cert issues at node startup. These new logs only occur at start up so the extera noise will be minimal

DevOps spent significant time under pressure running manual openssl commands against every node to find the mismatch. The logs gave no useful signal: nodes started, stayed stuck in CHECKING, and emitted
only a cryptic TLS SSLHandshakeException with no indication of the root cause.

Changes:

  • Key and certificate loads are now logged at INFO so they're visible in production
  • Scan start log includes the directory being scanned
  • Scan completion log includes a count summary (sigKeysLoaded, sigCertsLoaded)
  • Missing certificates are flagged with WARN instead of staying silent
  • verify() now checks whether the on-disk private key corresponds to the gossip CA certificate in the roster. If they don't match, a WARN is emitted with three SHA-384 fingerprints operators can cross-reference against openssl x509 -fingerprint -sha384.

Log comparison

Scenario 1: Happy path

Before
[DEBUG] 08:01:00.001 42 main Starting key store enumeration
[TRACE] 08:01:00.002 43 main Attempting to locate key stores for nodeId 0
[TRACE] 08:01:00.012 53 main Found enhanced private key store for nodeId: 0 [ purpose = SIGNING, fileName = s-private-node1.pem ]
[TRACE] 08:01:00.013 54 main Attempting to locate key stores for nodeId 1
[TRACE] 08:01:00.021 62 main Found enhanced private key store for nodeId: 1 [ purpose = SIGNING, fileName = s-private-node2.pem ]
[TRACE] 08:01:00.022 63 main Attempting to locate key stores for nodeId 2
[TRACE] 08:01:00.030 71 main Found enhanced private key store for nodeId: 2 [ purpose = SIGNING, fileName = s-private-node3.pem ]
[TRACE] 08:01:00.031 72 main Completed key store enumeration

TRACE messages are invisible in production. No certificate loading logged. No indication of which directory was scanned.

After
[DEBUG] 08:01:00.001 42 main Starting key store enumeration [ keyStoreDirectory = /opt/hgcapp/services-hedera/HapiApp2.0/data/keys ]
[DEBUG] 08:01:00.002 43 main Attempting to locate key stores for nodeId 0
[INFO] 08:01:00.012 53 main Found enhanced private key store for nodeId: 0 [ purpose = SIGNING, fileName = s-private-node1.pem ]
[INFO] 08:01:00.014 55 main Loaded signing certificate from roster for nodeId: 0 [ purpose = SIGNING, fingerprint =
94:6B:AD:3F:C1:08:22:7A:5B:9E:D4:11:FF:2C:88:01:76:34:AC:E9:12:5D:B7:F0:44:98:6C:3A:E5:27:BD:CC:14:78:50:F3:91:DD:6E:B2:A9:C0:37:85:1F:4E:72:D8 ]
[DEBUG] 08:01:00.015 56 main Attempting to locate key stores for nodeId 1
[INFO] 08:01:00.021 62 main Found enhanced private key store for nodeId: 1 [ purpose = SIGNING, fileName = s-private-node2.pem ]
[INFO] 08:01:00.023 64 main Loaded signing certificate from roster for nodeId: 1 [ purpose = SIGNING, fingerprint =
A3:F1:72:CC:9D:44:BB:21:E8:55:0F:3C:77:19:28:D6:8A:EC:34:61:B0:5F:C2:90:12:67:4D:A8:39:EE:14:7B:F5:80:C3:DD:96:41:2E:B7:53:08:9A:1C:64:50:E7:F2 ]
[DEBUG] 08:01:00.024 65 main Attempting to locate key stores for nodeId 2
[INFO] 08:01:00.030 71 main Found enhanced private key store for nodeId: 2 [ purpose = SIGNING, fileName = s-private-node3.pem ]
[INFO] 08:01:00.032 73 main Loaded signing certificate from roster for nodeId: 2 [ purpose = SIGNING, fingerprint =
7C:2E:B9:55:F0:3D:68:AA:14:C7:82:3F:99:D1:06:BC:E4:57:2A:8B:30:F6:CD:11:74:4E:A0:59:2C:D8:87:1B:E3:96:43:70:F5:8C:25:B1:9D:62:AE:0C:38:51:7F:D4 ]
[DEBUG] 08:01:00.033 74 main Completed key store enumeration [ sigKeysLoaded = 1, sigCertsLoaded = 3 ]

Key and cert loads visible at INFO with fingerprints operators can cross-reference using openssl x509 -fingerprint -sha384 -in <cert-file>. Scan completes with a count summary.

Scenario 2: Missing certificate

Before
[DEBUG] 08:01:00.001 42 main Starting key store enumeration
[TRACE] 08:01:00.002 43 main Attempting to locate key stores for nodeId 0
[TRACE] 08:01:00.012 53 main Found enhanced private key store for nodeId: 0 [ purpose = SIGNING, fileName = s-private-node1.pem ]
[TRACE] 08:01:00.013 54 main Attempting to locate key stores for nodeId 1
... (no log for missing cert)
[TRACE] 08:01:00.031 72 main Completed key store enumeration
... (later, during verify)
[WARN] 08:01:00.210 251 main No certificate found for nodeId 1 [purpose = SIGNING ]

No indication a cert was missing until verify() runs. With TRACE invisible in production, the first visible sign of a problem is the WARN from verify() — with no preceding context about what loaded successfully.

After
[DEBUG] 08:01:00.001 42 main Starting key store enumeration [ keyStoreDirectory = /opt/hgcapp/services-hedera/HapiApp2.0/data/keys ]
[DEBUG] 08:01:00.002 43 main Attempting to locate key stores for nodeId 0
[INFO] 08:01:00.012 53 main Found enhanced private key store for nodeId: 0 [ purpose = SIGNING, fileName = s-private-node1.pem ]
[INFO] 08:01:00.014 55 main Loaded signing certificate from roster for nodeId: 0 [ purpose = SIGNING, fingerprint = 94:6B:AD:3F:... ]
[DEBUG] 08:01:00.015 56 main Attempting to locate key stores for nodeId 1
[INFO] 08:01:00.021 62 main Found enhanced private key store for nodeId: 1 [ purpose = SIGNING, fileName = s-private-node2.pem ]
[WARN] 08:01:00.022 63 main No signing certificate found in roster for nodeId: 1 [ purpose = SIGNING ]
[DEBUG] 08:01:00.023 64 main Attempting to locate key stores for nodeId 2
[INFO] 08:01:00.029 70 main Found enhanced private key store for nodeId: 2 [ purpose = SIGNING, fileName = s-private-node3.pem ]
[INFO] 08:01:00.031 72 main Loaded signing certificate from roster for nodeId: 2 [ purpose = SIGNING, fingerprint = 7C:2E:B9:55:... ]
[DEBUG] 08:01:00.032 73 main Completed key store enumeration [ sigKeysLoaded = 1, sigCertsLoaded = 2 ]
... (later, during verify)
[WARN] 08:01:00.210 251 main No certificate found for nodeId 1 [purpose = SIGNING ]

Missing cert flagged immediately during scan() with a WARN. The count at completion (sigCertsLoaded = 2, not 3) is a second visible signal something is wrong.

Scenario 3: Key/cert mismatch

Before
[DEBUG] 08:01:00.001 42 main Starting key store enumeration
[TRACE] 08:01:00.002 43 main Attempting to locate key stores for nodeId 0
[TRACE] 08:01:00.012 53 main Found enhanced private key store for nodeId: 0 [ purpose = SIGNING, fileName = s-private-node1.pem ]
[TRACE] 08:01:00.031 72 main Completed key store enumeration
... (node starts, gossip connections fail with cryptic TLS errors)

No detection. The node loads, starts, then fails at gossip connection time with a TLS handshake error that gives no indication of the root cause. DevOps had to manually run openssl commands against every node to find the mismatch.

After
[DEBUG] 08:01:00.001 42 main Starting key store enumeration [ keyStoreDirectory = /opt/hgcapp/services-hedera/HapiApp2.0/data/keys ]
[DEBUG] 08:01:00.002 43 main Attempting to locate key stores for nodeId 0
[INFO] 08:01:00.012 53 main Found enhanced private key store for nodeId: 0 [ purpose = SIGNING, fileName = s-private-node1.pem ]
[INFO] 08:01:00.014 55 main Loaded signing certificate from roster for nodeId: 0 [ purpose = SIGNING, fingerprint =
94:6B:AD:3F:C1:08:22:7A:5B:9E:D4:11:FF:2C:88:01:76:34:AC:E9:12:5D:B7:F0:44:98:6C:3A:E5:27:BD:CC:14:78:50:F3:91:DD:6E:B2:A9:C0:37:85:1F:4E:72:D8 ]
[DEBUG] 08:01:00.015 56 main Completed key store enumeration [ sigKeysLoaded = 1, sigCertsLoaded = 1 ]
... (during verify)
[WARN] 08:01:00.210 251 main Signing private key does not match certificate public key for nodeId 0
[ purpose = SIGNING,
certFingerprint = 94:6B:AD:3F:C1:08:22:7A:5B:9E:D4:11:FF:2C:88:01:76:34:AC:E9:12:5D:B7:F0:44:98:6C:3A:E5:27:BD:CC:14:78:50:F3:91:DD:6E:B2:A9:C0:37:85:1F:4E:72:D8,
rosterPubKeyFingerprint = A3:F1:72:CC:9D:44:BB:21:E8:55:0F:3C:77:19:28:D6:8A:EC:34:61:B0:5F:C2:90:12:67:4D:A8:39:EE:14:7B,
diskPubKeyFingerprint = 7C:2E:B9:55:F0:3D:68:AA:14:C7:82:3F:99:D1:06:BC:E4:57:2A:8B:30:F6:CD:11:74:4E:A0:59:2C:D8:87:1B ]
— node may fail to establish gossip connections

Mismatch detected at startup before gossip is attempted. Three fingerprints are provided:

  • certFingerprint: run openssl x509 -fingerprint -sha384 -in <cert> to cross-reference the roster cert against what's on disk
  • rosterPubKeyFingerprint and diskPubKeyFingerprint: both hash the same byte format so they are directly comparable — if they differ, the key on disk and the key inside the roster cert are different key pairs

The node is still allowed to start (WARN, not abort) — this does not change startup behavior, it gives operators the signal they need before the TLS handshake fails.

@trunk-io
Copy link
Copy Markdown

trunk-io bot commented Apr 6, 2026

Merging to main in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@lfdt-bot
Copy link
Copy Markdown

lfdt-bot commented Apr 6, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@Jeffrey-morgan34 Jeffrey-morgan34 self-assigned this Apr 6, 2026
@codacy-production
Copy link
Copy Markdown

codacy-production bot commented Apr 6, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Coverage 73.08% diff coverage · 0.00% coverage variation

Metric Results
Coverage variation 0.00% coverage variation (-1.00%)
Diff coverage 73.08% diff coverage

View coverage diff in Codacy

Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (7675f47) 96395 78050 80.97%
Head commit (8a54538) 96442 (+47) 78085 (+35) 80.97% (0.00%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#24764) 52 38 73.08%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

TIP This summary will be updated as you push new changes. Give us feedback

@Jeffrey-morgan34 Jeffrey-morgan34 force-pushed the coe-679-keystoreloader-observability branch from 5da499e to 8f0d0ae Compare April 6, 2026 15:41
Add startup logging to EnhancedKeyStoreLoader so that key/cert loading
failures are visible in production logs (previously TRACE-level only).

- Log keyStoreDirectory at scan() start
- Raise key-found logs from TRACE to INFO in resolveNodePrivateKey()
- Add INFO/WARN logging to resolveNodeCertificate() which was previously
  completely silent; include SHA-384 cert fingerprint on success
- Raise scan() completion log from TRACE to DEBUG with loaded key/cert counts

Signed-off-by: Jeffrey Morgan <jeffrey.morgan@swirldslabs.com>
- Add warnIfSigKeyMismatch() to verify() to log WARN when the on-disk
  RSA signing private key does not match the public key in the roster cert
- Catch RuntimeException in addition to checked exceptions to guard against
  malformed certs during the comparison
- Use DigestType.SHA_384.algorithmName() in certFingerprint() to align
  with platform-sdk convention instead of hardcoded string literal

Signed-off-by: Jeffrey Morgan <jeffrey.morgan@swirldslabs.com>
Tests verify that warnIfSigKeyMismatch() logs a WARN when the on-disk
signing private key does not correspond to the roster cert, and stays
silent when they match. Adds org.apache.logging.log4j.core to
testModuleInfo for the Logger cast needed by MockAppender.

Signed-off-by: Jeffrey Morgan <jeffrey.morgan@swirldslabs.com>
…Mismatch

Signed-off-by: Jeffrey Morgan <jeffrey.morgan@swirldslabs.com>
- Replace PublicKey.equals() with Arrays.equals(getEncoded()) to avoid
  false positives when derived key (BouncyCastle) and cert key (SunRsaSign)
  are from different JCA providers
- Add cert and derived key fingerprints to the mismatch WARN message so
  operators can cross-reference without manual openssl commands
- Add explicit null guard for cert before mismatch check
- Log DEBUG when skipping non-RSA key types rather than silently returning
- Add SHA-384 rationale comment (matches openssl x509 -fingerprint -sha384)
- Add RSAPrivateCrtKey Javadoc explaining why CRT form is required
- Replace StringBuilder loop with HexFormat in certFingerprint()
- Log TRACE on fingerprint computation failure instead of silently swallowing
- Add certFingerprint(PublicKey) overload for derived key fingerprinting

Signed-off-by: Jeffrey Morgan <jeffrey.morgan@swirldslabs.com>
- Replace certFingerprint(PublicKey) overload with keyFingerprint(PublicKey)
  to clarify that cert and key fingerprints hash different byte sequences
- Extract shared fingerprint(byte[]) helper to eliminate duplicated
  digest+format logic between the two fingerprint methods
- Log three values in mismatch WARN: certFingerprint (full cert DER, for
  openssl cross-reference), rosterPubKeyFingerprint and diskPubKeyFingerprint
  (both SubjectPublicKeyInfo bytes, directly comparable to each other)
- Add fixed seed (42) to Random in mismatch tests for reproducibility
- Assert log level is WARN (not just message content) in mismatch tests

Signed-off-by: Jeffrey Morgan <jeffrey.morgan@swirldslabs.com>
Signed-off-by: Jeffrey Morgan <jeffrey.morgan@swirldslabs.com>
- Split long log/comment lines to stay within 120 character limit
- Rename short variable kc to keysAndCerts in tests
- Remove trailing periods from @return javadoc tags

Signed-off-by: Jeffrey Morgan <jeffrey.morgan@swirldslabs.com>
@Jeffrey-morgan34 Jeffrey-morgan34 force-pushed the coe-679-keystoreloader-observability branch from 8f0d0ae to 225cee5 Compare April 6, 2026 16:00
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 6, 2026

Codecov Report

❌ Patch coverage is 69.23077% with 16 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...wirlds/platform/crypto/EnhancedKeyStoreLoader.java 69.23% 14 Missing and 2 partials ⚠️

Impacted file tree graph

@@             Coverage Diff              @@
##               main   #24764      +/-   ##
============================================
- Coverage     77.04%   77.04%   -0.01%     
- Complexity    11699    11700       +1     
============================================
  Files          2499     2499              
  Lines         96492    96539      +47     
  Branches      10632    10637       +5     
============================================
+ Hits          74346    74377      +31     
- Misses        18391    18403      +12     
- Partials       3755     3759       +4     
Files with missing lines Coverage Δ Complexity Δ
...wirlds/platform/crypto/EnhancedKeyStoreLoader.java 68.92% <69.23%> (-0.46%) 0.00 <0.00> (ø)

... and 16 files with indirect coverage changes

Impacted file tree graph

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Jeffrey Morgan <jeffrey.morgan@swirldslabs.com>
Signed-off-by: Jeffrey Morgan <jeffrey.morgan@swirldslabs.com>
@Jeffrey-morgan34
Copy link
Copy Markdown
Contributor Author

Superseded by #24790

@Jeffrey-morgan34 Jeffrey-morgan34 deleted the coe-679-keystoreloader-observability branch April 7, 2026 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants