Skip to content

Migration from v0.x to v1.x made policy evaluation significantly slower #8266

@piax93

Description

@piax93

Short description

We have recently modernized our policies and OPA deployment to v1. Unexpectedly, without any actual modification being made to the policy code (apart from the required additional v1 syntax) or the input data, we have been observing significantly slower policy evaluation. In our main use case, this brought the average eval time from 1.9ms to 3.5ms (yes, almost double).
I'd appreciate if you could comment on possible reasons why this is happening, because honestly it has been quite strange to see.

Steps To Reproduce

The OPA versions we have been working with are:

  • Before: 0.70.0
  • Now: 1.12.1

We tried to do a bit of profiling of our policies, and this is an example of what we see (the arrows indicate v0 -> v1 numbers):

path.rego
  METRICS:
  - timer_rego_external_resolve_ns: 260 -> 264
  - timer_rego_load_files_ns: 231987 -> 382944
  - timer_rego_module_compile_ns: 990473 -> 3554274
  - timer_rego_module_parse_ns: 184491 -> 327298
  - timer_rego_query_compile_ns: 50437 -> 83776
  - timer_rego_query_eval_ns: 11991 -> 22304
  - timer_rego_query_parse_ns: 19809 -> 19939
  PROFILE:
  - total_time_ns: 12771 -> 23436

Where path.rego is a module that looks something like this:

package path

default enabled := false
default rule_name := "UNKNOWN"
environment := opa.runtime().env.ENVIRONMENT
service_config := data.paths[input.service]

always_allowed_paths := data.always_allowed_paths if {
	data.always_allowed_paths
} else := ["/status"]

enabled if {
	service_config
	service_config.enabled
	not input.path in always_allowed_paths
} else if {
	service_config
	service_config.cluster_enabled[environment]
	not input.path in always_allowed_paths
}

match_longest_path_prefix(paths, search_path) := longest_prefix if {
	matching_prefixes := [[count(rule_path), rule_path] |
		rule_path := object.keys(paths)[_]
		startswith(search_path, rule_path)
	]
	longest_prefix := max(matching_prefixes)[1]
}

rule_name := rule if {
	enabled
	input.method
	count(service_config.paths_by_method) == 1
	rule := object.get(service_config.paths_by_method[""], input.method, "NO_HTTP_METHOD")
} else := rule if {
	enabled
	input.method
	path_prefix := match_longest_path_prefix(service_config.paths_by_method, input.path)
	rule := object.get(service_config.paths_by_method[path_prefix], input.method, "NO_HTTP_METHOD")
} else := rule if {
	enabled
	not input.method
	rule := "MISSING_METHOD_INPUT"
}

Our query input just has 4 simple string fields, and data.paths is just a few nested objects, not insanely large, can fit in <100KB of memory.

Expected behavior

I would have expected performance to remain more or less the same, if not better, considering that some of the 1.x releases in between the ones we have been using promise some optimizations happening.

Additional context

N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions