Helm installed application metric Grafana dashboards by eero-t · Pull Request #800 · opea-project/GenAIInfra

eero-t · 2025-02-11T19:34:48Z

Description

Add Helm installed Grafana dashboards
- App component metrics (when monitoring is enabled)
- App component scaling (when autoscaling is enabled)
Update docs accordingly & move files referred by docs to better locations

Note: to make reviewing the changes easier, I'll file separate PR for removing redundant files from under kubernetes/Observability/ after this is merged. Scaling dashboard in there included LLM metrics only for TGI, new one supports also vLLM.

Issues

n/a.

Type of change

New feature (non-breaking change which adds new functionality)

Dependencies

Requires HG workflow fix #814.

Tests

Manually tested that dashboard installation, and the dashboards themselves work.

lianhao · 2025-02-12T07:26:50Z

@eero-t Is opea-project/GenAIComps#1280 related to this failure?

eero-t · 2025-02-12T14:16:56Z

@eero-t Is opea-project/GenAIComps#1280 related to this failure?

@lianhao Yes. For some reason CI did not catch the the original change failing the CI...

(That issue does not happen in production. Only CI creates multiple Orchestrator instances in the same program.)

eero-t · 2025-02-13T13:45:11Z

Rebased to main to rerun CI, no other changes.

@lianhao CI still failing, but this time all failures are due to:

for img in `helm template -n $NAMESPACE -f helm-charts//chatqna/${value_file} $RELEASE_NAME helm-charts//chatqna | grep 'image:' | grep 'opea/' | awk '{print $2}' | xargs`
+ .github/workflows/scripts/e2e/chart_test.sh check_local_opea_image 100.83.111.229:5000/opea/chatqna-ui:latest
Failed to get image manifest 100.83.111.229:5000/opea/chatqna-ui:latest

lianhao · 2025-02-14T00:51:08Z

Bugs in GenAIExamples and GenAIComps resulted the latest changes are not populated into the CI's image registry repo. These bugs just got fixed yesterday. I'll manually populate the related container images

yongfengdu · 2025-02-14T09:04:26Z

I'm afraid the job of release charts will think "assets" is a helm chart and report fail.
https://github.com/opea-project/GenAIInfra/blob/main/.github/workflows/push-release-charts.yaml#L52
Either move the assets to elsewhere, or have the push-release-charts skip "assets" directory.

UIDs identifying the dashboards for Grafana, are based on the chart "fullname" values. That way, if dashboard configMaps get installed multiple times as dependencies for different OPEA application charts, Grafana can differentiate them based on those UIDs. (At least as long as those application instances are given different Helm release names, not just installed to separate namespaces.) Another alternative would have been omitting the dashboard UIDs, to let Grafana generate them. However, that would have meant dashboard URLs changing on every Helm re-deployment, as UID is part of the Grafana dashboard URL! Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>

* s/{{err}}/{{ printf "{{err}}" }}/ * s/{{le}}/{{ printf "{{le}}" }}/ Another possibility would be putting the dashboard JSON spec to a separate file and reading it to Helm configMap template with: {{ .Files.Get "dashboards/metrics.json" | toJson | indent 4 }} But that would require extra scripting to replace dashboard spec "title" and "uid" fields with suitable templated values. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>

This assume "prometheusNamespace" variable value to be correct. Metrics dashboard is installed when monitoring is enabled, and scaling dashboard when also HPA autoscaling is enabled. NOTE: Regardless of which application installs the dashboard(s), they are identical except for the title in the Grafana dashboards list (and dashboard internal "uid"). Both dashboards show metrics for _any_ of the OPEA applications that process streamed tokens with an LLM, not just for ChatQnA or DocSum. Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>

* Describe new Helm functionality in monitoring.md * Update kubernetes-addons/Observability/ - Add index to README + move dashboard import info to its own section - Remove duplicate Protheus / Grafana install info - Move related files to better locations * New dashboad screenshots Signed-off-by: Eero Tamminen <eero.t.tamminen@intel.com>

yongfengdu · 2025-02-17T01:30:53Z

eero-t · 2025-02-17T15:56:08Z

@eero-t Would you please add filter for assets here? grep -vE 'valuefiles.yaml|common|assets|.md|.sh' | https://github.com/opea-project/GenAIInfra/blob/main/.github/workflows/push-release-charts.yaml#L54C3-L54C5

@yongfengdu I filed separate PR for fixing the workflow, as it had bugs that were not related to this PR, see #814.

eero-t · 2025-02-17T17:20:15Z

Btw. In this PR "ChatQnA" & "DocSum" both have dashboard chart as dep when monitoring is enabled. However the resulting dashboard(s) are not really app specific and can also be installed separately (for apps that don't have that dependency). This could confuse users, if user installs multiple apps.

eero-t requested review from lianhao and yongfengdu as code owners February 11, 2025 19:34

eero-t force-pushed the dashboards branch from c9aa55a to d4a5131 Compare February 11, 2025 19:39

eero-t force-pushed the dashboards branch from d4a5131 to 7c1a269 Compare February 13, 2025 13:22

lianhao force-pushed the dashboards branch from 7c1a269 to 91df8f9 Compare February 14, 2025 01:36

lianhao approved these changes Feb 14, 2025

View reviewed changes

eero-t added 4 commits February 14, 2025 16:26

eero-t force-pushed the dashboards branch from 91df8f9 to 9cf0adb Compare February 14, 2025 14:26

eero-t mentioned this pull request Feb 17, 2025

Improve GH workflow updated chart identification #814

Closed

1 task

poussa self-requested a review February 18, 2025 10:20

poussa approved these changes Feb 18, 2025

View reviewed changes

poussa merged commit f46e8c1 into opea-project:main Feb 18, 2025
18 checks passed

joshuayao mentioned this pull request Feb 19, 2025

[Feature] Helm installed Grafana dashboards #819

Closed

3 tasks

eero-t mentioned this pull request Feb 19, 2025

Remove obsolete files from kubernetes-addons/Observability/ #821

Merged

1 task

eero-t deleted the dashboards branch February 21, 2025 15:14

eero-t mentioned this pull request Apr 22, 2025

App dashboard improvements #850

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Helm installed application metric Grafana dashboards#800

Helm installed application metric Grafana dashboards#800
poussa merged 4 commits intoopea-project:mainfrom
eero-t:dashboards

eero-t commented Feb 11, 2025 •

edited

Loading

Uh oh!

lianhao commented Feb 12, 2025

Uh oh!

eero-t commented Feb 12, 2025

Uh oh!

eero-t commented Feb 13, 2025

Uh oh!

lianhao commented Feb 14, 2025

Uh oh!

yongfengdu commented Feb 14, 2025

Uh oh!

yongfengdu commented Feb 17, 2025

Uh oh!

eero-t commented Feb 17, 2025

Uh oh!

eero-t commented Feb 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

eero-t commented Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issues

Type of change

Dependencies

Tests

Uh oh!

lianhao commented Feb 12, 2025

Uh oh!

eero-t commented Feb 12, 2025

Uh oh!

eero-t commented Feb 13, 2025

Uh oh!

lianhao commented Feb 14, 2025

Uh oh!

yongfengdu commented Feb 14, 2025

Uh oh!

yongfengdu commented Feb 17, 2025

Uh oh!

eero-t commented Feb 17, 2025

Uh oh!

eero-t commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

eero-t commented Feb 11, 2025 •

edited

Loading

eero-t commented Feb 17, 2025 •

edited

Loading