Skip to content

Commit ed96481

Browse files
committed
Merge branch 'ai-docs-updates' of github.com:netdata/netdata into ai-docs-updates
2 parents b33057b + 4d5fb92 commit ed96481

13 files changed

Lines changed: 157 additions & 210 deletions

File tree

CHANGELOG.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,11 @@
66

77
**Merged pull requests:**
88

9+
- Revert "ml: implement fixed time-based training windows \(\#20638\)" [\#21045](https://github.com/netdata/netdata/pull/21045) ([vkalintiris](https://github.com/vkalintiris))
10+
- chore\(go.d/ddsnmp\): Improve profile sorting by match specificity [\#21042](https://github.com/netdata/netdata/pull/21042) ([ilyam8](https://github.com/ilyam8))
11+
- Add documentation on using custom CA certificates to Learn [\#21041](https://github.com/netdata/netdata/pull/21041) ([ralphm](https://github.com/ralphm))
12+
- Context loading priority to vnodes [\#21040](https://github.com/netdata/netdata/pull/21040) ([stelfrag](https://github.com/stelfrag))
13+
- improve\(go.d/ddsnmp\): switch profile matching to `selector` [\#21039](https://github.com/netdata/netdata/pull/21039) ([ilyam8](https://github.com/ilyam8))
914
- fix\(docs\): update mermaid diagrams leftovers plus syntax issues [\#21034](https://github.com/netdata/netdata/pull/21034) ([kanelatechnical](https://github.com/kanelatechnical))
1015
- docs: fix mdx parsing scalability.md [\#21032](https://github.com/netdata/netdata/pull/21032) ([ilyam8](https://github.com/ilyam8))
1116
- Revert "chore\(docs\): rename REST API sidebar to Netdata APIs" [\#21031](https://github.com/netdata/netdata/pull/21031) ([ilyam8](https://github.com/ilyam8))
@@ -442,12 +447,6 @@
442447
- improve\(go.d/snmp profiles\): simplify \_generic-if.yaml and add interface type tags [\#20505](https://github.com/netdata/netdata/pull/20505) ([ilyam8](https://github.com/ilyam8))
443448
- fix snmp prof mikrotik mem tagging [\#20504](https://github.com/netdata/netdata/pull/20504) ([ilyam8](https://github.com/ilyam8))
444449
- feat\(go.d/ddsnmp\): make SNMP profile collection configurable [\#20503](https://github.com/netdata/netdata/pull/20503) ([ilyam8](https://github.com/ilyam8))
445-
- Use ARAL for labels [\#20502](https://github.com/netdata/netdata/pull/20502) ([stelfrag](https://github.com/stelfrag))
446-
- SNMP: audiocodes profile [\#20501](https://github.com/netdata/netdata/pull/20501) ([Ancairon](https://github.com/Ancairon))
447-
- chore\(go.d/ddsnmp\): better label values sanitization [\#20500](https://github.com/netdata/netdata/pull/20500) ([ilyam8](https://github.com/ilyam8))
448-
- SNMP: second pass of aruba profiles [\#20499](https://github.com/netdata/netdata/pull/20499) ([Ancairon](https://github.com/Ancairon))
449-
- SNMP: Arista profiles [\#20498](https://github.com/netdata/netdata/pull/20498) ([Ancairon](https://github.com/Ancairon))
450-
- chore\(go.d/ddsnmp\): fix table metrics again [\#20497](https://github.com/netdata/netdata/pull/20497) ([ilyam8](https://github.com/ilyam8))
451450

452451
## [v2.5.4](https://github.com/netdata/netdata/tree/v2.5.4) (2025-06-24)
453452

docs/.map/map.csv

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@ https://github.com/netdata/netdata/edit/master/docs/netdata-agent/configuration/
5858
https://github.com/netdata/netdata/edit/master/docs/netdata-agent/configuration/running-the-netdata-agent-behind-a-reverse-proxy/Running-behind-h2o.md,H2O,Published,Netdata Agent/Configuration/Securing Agents/Running the Agent behind a reverse proxy,
5959
https://github.com/netdata/netdata/edit/master/docs/netdata-agent/configuration/optimize-the-netdata-agents-performance.md,Performance Optimization,Published,Netdata Agent/Configuration,"While the Netdata Agent is designed to monitor a system with only 1% CPU, you can optimize its performance for low-resource systems."
6060
https://github.com/netdata/netdata/edit/master/docs/netdata-agent/configuration/organize-systems-metrics-and-alerts.md,"Organize systems, metrics, and alerts",Published,Netdata Agent/Configuration,
61+
https://github.com/netdata/netdata/edit/master/docs/netdata-agent/configuration/using-custom-ca-certificates-with-netdata.md,"Using custom CA certificates with Netdata",Published,Netdata Agent/Configuration,
6162
https://github.com/netdata/netdata/edit/master/src/daemon/README.md,Daemon,Published,Netdata Agent,
6263
https://github.com/netdata/netdata/edit/master/src/database/README.md,Database,Published,Netdata Agent,
6364
https://github.com/netdata/netdata/edit/master/src/libnetdata/log/README.md,Logging,Published,Netdata Agent,

docs/netdata-agent/configuration/using-custom-ca-certificates-with-netdata.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,6 @@ Windows. Instead, certificates must be installed into the bundled MSYS2 environm
116116
using the following instructions:
117117

118118
1. Ensure the certificate file to be installed is in PEM or DER format.
119-
2. Copy the certificate file to `C:\Program FIles\Netdata\etc\pki\ca-trust\source\anchors`. You may need to create
119+
2. Copy the certificate file to `C:\Program Files\Netdata\etc\pki\ca-trust\source\anchors`. You may need to create
120120
this directory.
121121
3. In an administrative command prompt, run `C:\Program Files\Netdata\usr\bin\update-ca-trust.exe`

packaging/version

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
v2.6.0-323-nightly
1+
v2.6.0-330-nightly

src/database/engine/mrg-internals.h

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -232,16 +232,21 @@ static bool metric_release(MRG *mrg, METRIC *metric) {
232232

233233
if (refcount_release(&metric->refcount) == 0) {
234234
// we are the last user
235-
bool already_deleted = __atomic_load_n(&metric->deleted, __ATOMIC_ACQUIRE);
236-
if (already_deleted || !acquired_metric_has_retention(mrg, metric)) {
237-
if (!already_deleted) {
238-
acquired_for_deletion_metric_delete(mrg, metric);
235+
if (!acquired_metric_has_retention(mrg, metric)) {
236+
// This metric is eligible for deletion.
237+
// Atomically check and set the 'deleted' flag.
238+
// If __atomic_test_and_set returns 'true', it means the flag was already set.
239+
if (!__atomic_test_and_set(&metric->deleted, __ATOMIC_ACQ_REL)) {
240+
// We won the race. The flag was 'false' and we set it to 'true'.
241+
// We are now responsible for deletion.
242+
acquired_for_deletion_metric_delete(mrg, metric);
243+
uuidmap_free(metric->uuid);
244+
aral_freez(mrg->index[partition].aral, metric);
245+
__atomic_sub_fetch(&mrg->index[partition].stats.entries_acquired, 1, __ATOMIC_RELAXED);
246+
__atomic_sub_fetch(&mrg->index[partition].stats.current_references, 1, __ATOMIC_RELAXED);
247+
return true;
239248
}
240-
uuidmap_free(metric->uuid);
241-
aral_freez(mrg->index[partition].aral, metric);
242-
__atomic_sub_fetch(&mrg->index[partition].stats.entries_acquired, 1, __ATOMIC_RELAXED);
243-
__atomic_sub_fetch(&mrg->index[partition].stats.current_references, 1, __ATOMIC_RELAXED);
244-
return true;
249+
// Another thread is already deleting it. nothing to do
245250
}
246251
}
247252

src/database/sqlite/sqlite_aclk.c

Lines changed: 48 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -92,9 +92,14 @@ enum {
9292
IDX_IS_REGISTERED,
9393
};
9494

95+
struct children {
96+
int vnodes;
97+
int normal;
98+
};
99+
95100
static int create_host_callback(void *data, int argc, char **argv, char **column)
96101
{
97-
int *number_of_chidren = data;
102+
struct children *node_data = data;
98103
UNUSED(argc);
99104
UNUSED(column);
100105

@@ -175,7 +180,10 @@ static int create_host_callback(void *data, int argc, char **argv, char **column
175180
pulse_host_status(host, 0, 0); // this will detect the receiver status
176181
}
177182

178-
(*number_of_chidren)++;
183+
if (IS_VIRTUAL_HOST_OS(host))
184+
node_data->vnodes++;
185+
else
186+
node_data->normal++;
179187

180188
#ifdef NETDATA_INTERNAL_CHECKS
181189
char node_str[UUID_STR_LEN] = "<none>";
@@ -973,26 +981,41 @@ void create_aclk_config(RRDHOST *host, nd_uuid_t *host_uuid __maybe_unused, nd_u
973981
"SELECT ni.host_id, ni.node_id FROM host h, node_instance ni " \
974982
"WHERE h.host_id = ni.host_id AND ni.node_id IS NOT NULL"
975983

984+
985+
uv_sem_t ctx_sem;
986+
976987
void aclk_synchronization_init(void)
977988
{
978989
char *err_msg = NULL;
979990
int rc;
980991

981992
nd_log_daemon(NDLP_INFO, "Creating archived hosts");
982-
int number_of_children = 0;
983-
rc = sqlite3_exec_monitored(db_meta, SQL_FETCH_ALL_HOSTS, create_host_callback, &number_of_children, &err_msg);
993+
struct children node_data = { 0, 0};
994+
995+
rc = sqlite3_exec_monitored(db_meta, SQL_FETCH_ALL_HOSTS, create_host_callback, &node_data, &err_msg);
984996

985997
if (rc != SQLITE_OK) {
986998
nd_log_daemon(NDLP_ERR, "SQLite error when loading archived hosts, rc = %d (%s)", rc, err_msg);
987999
sqlite3_free(err_msg);
9881000
}
9891001

990-
nd_log_daemon(NDLP_INFO, "Created %d archived hosts", number_of_children);
1002+
nd_log_daemon(
1003+
NDLP_INFO,
1004+
"Created %d archived hosts (%d children and %d vnodes)",
1005+
node_data.normal + node_data.vnodes,
1006+
node_data.normal,
1007+
node_data.vnodes);
1008+
1009+
bool sem_init = true;
1010+
uv_sem_init(&ctx_sem, 0);
1011+
9911012
// Trigger host context load for hosts that have been created
9921013
if (unlikely(!metadata_queue_load_host_context())) {
9931014
nd_log_daemon(NDLP_WARNING, "Failed to queue command to load contexts for archived hosts");
9941015
// Reset context load flag so that contexts will be loaded on demand
9951016
reset_host_context_load_flag();
1017+
uv_sem_destroy(&ctx_sem);
1018+
sem_init = false;
9961019
}
9971020

9981021
rc = sqlite3_exec_monitored(db_meta, SQL_FETCH_ALL_INSTANCES, aclk_config_parameters, NULL, &err_msg);
@@ -1004,9 +1027,28 @@ void aclk_synchronization_init(void)
10041027

10051028
aclk_initialize_event_loop();
10061029

1007-
if (!number_of_children)
1030+
if (!(node_data.normal + node_data.vnodes))
10081031
aclk_queue_node_info(localhost, true);
10091032

1033+
if (sem_init) {
1034+
int finished_vnodes = 0;
1035+
time_t deadline = now_realtime_sec() + 60; // hard timeput to avoid infinite block
1036+
while (finished_vnodes < node_data.vnodes) {
1037+
if (uv_sem_trywait(&ctx_sem) == 0) {
1038+
finished_vnodes++;
1039+
continue;
1040+
}
1041+
1042+
if (now_realtime_sec() >= deadline) {
1043+
nd_log_daemon(NDLP_WARNING, "Vnodes context load still in progress, continue with agent start");
1044+
break;
1045+
}
1046+
sleep_usec(100 * USEC_PER_MS);
1047+
}
1048+
if (finished_vnodes == node_data.vnodes) {
1049+
uv_sem_destroy(&ctx_sem);
1050+
}
1051+
}
10101052
nd_log_daemon(NDLP_INFO, "ACLK sync initialization completed");
10111053
}
10121054

src/database/sqlite/sqlite_metadata.c

Lines changed: 40 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1741,6 +1741,7 @@ __thread sqlite3 *db_meta_thread = NULL;
17411741
__thread sqlite3 *db_context_thread = NULL;
17421742
__thread bool main_context_thread = false;
17431743

1744+
extern uv_sem_t ctx_sem;
17441745
static void restore_host_context(void *arg)
17451746
{
17461747
struct host_context_load_thread *hclt = arg;
@@ -1787,6 +1788,10 @@ static void restore_host_context(void *arg)
17871788

17881789
aclk_queue_node_info(host, false);
17891790

1791+
if (IS_VIRTUAL_HOST_OS(host)) {
1792+
uv_sem_post(&ctx_sem);
1793+
}
1794+
17901795
// Check and clear the thread local variables
17911796
if (!main_context_thread) {
17921797
db_meta_thread = NULL;
@@ -1847,7 +1852,7 @@ void reset_host_context_load_flag()
18471852
RRDHOST *host;
18481853
dfe_start_reentrant(rrdhost_root_index, host)
18491854
{
1850-
rrdhost_flag_set(host, RRDHOST_FLAG_PENDING_CONTEXT_LOAD);
1855+
rrdhost_flag_clear(host, RRDHOST_FLAG_PENDING_CONTEXT_LOAD);
18511856
}
18521857
dfe_done(host);
18531858
}
@@ -1876,36 +1881,44 @@ static void ctx_hosts_load(uv_work_t *req)
18761881
size_t host_count = 0;
18771882
size_t sync_exec = 0;
18781883
size_t async_exec = 0;
1879-
dfe_start_reentrant(rrdhost_root_index, host) {
1880-
if (!rrdhost_flag_check(host, RRDHOST_FLAG_PENDING_CONTEXT_LOAD))
1881-
continue;
18821884

1883-
if (unlikely(SHUTDOWN_REQUESTED(config)))
1884-
break;
1885+
for (int pass=0 ; pass < 2 ; pass++) {
1886+
dfe_start_reentrant(rrdhost_root_index, host) {
1887+
// pass 0 will do vnodes (skip the rest)
1888+
// pass 1 will do the rest (skip vnodes)
1889+
if (pass == IS_VIRTUAL_HOST_OS(host))
1890+
continue;
18851891

1886-
nd_log_daemon(NDLP_DEBUG, "Loading context for host %s", rrdhost_hostname(host));
1887-
1888-
int rc = 0;
1889-
bool thread_found = cleanup_finished_threads(hclt, max_threads, false, &thread_index);
1890-
if (thread_found) {
1891-
__atomic_store_n(&hclt[thread_index].busy, true, __ATOMIC_RELAXED);
1892-
hclt[thread_index].host = host;
1893-
hclt[thread_index].thread = nd_thread_create("CTXLOAD", NETDATA_THREAD_OPTION_DEFAULT, restore_host_context, &hclt[thread_index]);
1894-
rc = (hclt[thread_index].thread == NULL);
1895-
async_exec += (rc == 0);
1896-
// if it failed, mark the thread slot as free
1897-
if (rc)
1898-
__atomic_store_n(&hclt[thread_index].busy, false, __ATOMIC_RELAXED);
1899-
}
1900-
// if single thread, thread creation failure or failure tofind slot
1901-
if (rc || !thread_found) {
1902-
sync_exec++;
1903-
struct host_context_load_thread hclt_sync = {.host = host};
1904-
restore_host_context(&hclt_sync);
1892+
if (!rrdhost_flag_check(host, RRDHOST_FLAG_PENDING_CONTEXT_LOAD))
1893+
continue;
1894+
1895+
if (unlikely(SHUTDOWN_REQUESTED(config)))
1896+
break;
1897+
1898+
nd_log_daemon(NDLP_DEBUG, "Loading context for host %s", rrdhost_hostname(host));
1899+
1900+
int rc = 0;
1901+
bool thread_found = cleanup_finished_threads(hclt, max_threads, false, &thread_index);
1902+
if (thread_found) {
1903+
__atomic_store_n(&hclt[thread_index].busy, true, __ATOMIC_RELAXED);
1904+
hclt[thread_index].host = host;
1905+
hclt[thread_index].thread = nd_thread_create("CTXLOAD", NETDATA_THREAD_OPTION_DEFAULT, restore_host_context, &hclt[thread_index]);
1906+
rc = (hclt[thread_index].thread == NULL);
1907+
async_exec += (rc == 0);
1908+
// if it failed, mark the thread slot as free
1909+
if (rc)
1910+
__atomic_store_n(&hclt[thread_index].busy, false, __ATOMIC_RELAXED);
1911+
}
1912+
// if single thread, thread creation failure or failure tofind slot
1913+
if (rc || !thread_found) {
1914+
sync_exec++;
1915+
struct host_context_load_thread hclt_sync = {.host = host};
1916+
restore_host_context(&hclt_sync);
1917+
}
1918+
host_count++;
19051919
}
1906-
host_count++;
1920+
dfe_done(host);
19071921
}
1908-
dfe_done(host);
19091922

19101923
bool should_clean_threads = cleanup_finished_threads(hclt, max_threads, true, NULL);
19111924

src/ml/ml.cc

Lines changed: 9 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -56,13 +56,13 @@ ml_dimension_calculated_numbers(ml_worker_t *worker, ml_dimension_t *dim)
5656
training_response.first_entry_on_response = rrddim_first_entry_s_of_tier(dim->rd, 0);
5757
training_response.last_entry_on_response = rrddim_last_entry_s_of_tier(dim->rd, 0);
5858

59-
size_t min_n = Cfg.min_training_window / dim->rd->rrdset->update_every;
60-
size_t max_n = Cfg.training_window / dim->rd->rrdset->update_every;
59+
size_t min_n = Cfg.min_train_samples;
60+
size_t max_n = Cfg.max_train_samples;
6161

6262
// Figure out what our time window should be.
6363
training_response.query_before_t = training_response.last_entry_on_response;
6464
training_response.query_after_t = std::max(
65-
training_response.query_before_t - Cfg.training_window, // Fixed time window
65+
training_response.query_before_t - static_cast<time_t>((max_n - 1) * dim->rd->rrdset->update_every),
6666
training_response.first_entry_on_response
6767
);
6868

@@ -379,7 +379,7 @@ int ml_dimension_load_models(RRDDIM *rd, sqlite3_stmt **active_stmt) {
379379
if (unlikely(rc != SQLITE_OK))
380380
goto bind_fail;
381381

382-
rc = sqlite3_bind_int64(res, ++param, now_realtime_sec() - (Cfg.num_models_to_use * Cfg.train_every));
382+
rc = sqlite3_bind_int64(res, ++param, now_realtime_sec() - (Cfg.num_models_to_use * Cfg.max_train_samples));
383383
if (unlikely(rc != SQLITE_OK))
384384
goto bind_fail;
385385

@@ -674,29 +674,13 @@ ml_dimension_train_model(ml_worker_t *worker, ml_dimension_t *dim)
674674
memcpy(worker->scratch_training_cns, worker->training_cns,
675675
training_response.total_values * sizeof(calculated_number_t));
676676

677-
size_t smoothing_window = (dim->rd->rrdset->update_every > nd_profile.update_every) ? 1 : Cfg.max_samples_to_smooth;
678-
679677
ml_features_t features = {
680-
Cfg.diff_n, smoothing_window, Cfg.lag_n,
678+
Cfg.diff_n, Cfg.smooth_n, Cfg.lag_n,
681679
worker->scratch_training_cns, training_response.total_values,
682680
worker->training_cns, training_response.total_values,
683681
worker->training_samples
684682
};
685-
686-
// Calculate dynamic sampling ratio based on expected output size
687-
// After diff and smooth, we'll have approximately this many vectors
688-
size_t expected_vectors = training_response.total_values;
689-
if (Cfg.diff_n > 0) expected_vectors--;
690-
if (smoothing_window > 1) expected_vectors = expected_vectors - smoothing_window + 1;
691-
expected_vectors = expected_vectors - Cfg.lag_n;
692-
693-
double sampling_ratio = 1.0;
694-
if (expected_vectors > Cfg.max_training_vectors) {
695-
sampling_ratio = (double)Cfg.max_training_vectors / expected_vectors;
696-
}
697-
698-
// Apply sampling during lag feature extraction
699-
ml_features_preprocess(&features, sampling_ratio);
683+
ml_features_preprocess(&features);
700684

701685
ml_kmeans_init(&dim->kmeans);
702686
ml_kmeans_train(&dim->kmeans, &features, Cfg.max_kmeans_iters, training_response.query_after_t, training_response.query_before_t);
@@ -722,7 +706,7 @@ ml_dimension_predict(ml_dimension_t *dim, calculated_number_t value, bool exists
722706
}
723707

724708
// Save the value and return if we don't have enough values for a sample
725-
unsigned n = Cfg.diff_n + Cfg.max_samples_to_smooth + Cfg.lag_n;
709+
unsigned n = Cfg.diff_n + Cfg.smooth_n + Cfg.lag_n;
726710
if (dim->cns.size() < n) {
727711
dim->cns.push_back(value);
728712
return false;
@@ -747,11 +731,11 @@ ml_dimension_predict(ml_dimension_t *dim, calculated_number_t value, bool exists
747731
memcpy(dst_cns, dim->cns.data(), n * sizeof(calculated_number_t));
748732

749733
ml_features_t features = {
750-
Cfg.diff_n, Cfg.max_samples_to_smooth, Cfg.lag_n,
734+
Cfg.diff_n, Cfg.smooth_n, Cfg.lag_n,
751735
dst_cns, n, src_cns, n,
752736
dim->feature
753737
};
754-
ml_features_preprocess(&features, 1.0);
738+
ml_features_preprocess(&features);
755739

756740
/*
757741
* Lock to predict

0 commit comments

Comments
 (0)