feat: pluggable azure blob cache provider with auto-discovery (experimental)#139
feat: pluggable azure blob cache provider with auto-discovery (experimental)#139hasethuraman wants to merge 5 commits into
Conversation
|
@hasethuraman few questions about this feature:
|
eb11665 to
aae3093
Compare
Apologies for the confusion @noa-neria Updated the PR description & relevant code. This is mainly for the integration with distributed cache solution which must handle the CACHEMISS scenario which would further speed up the streaming for cached model in the cluster. Please let me know if this makes sense. |
Greptile SummaryThis PR adds an experimental pluggable cache provider for Azure Blob Storage reads, with zero-config auto-discovery via
Confidence Score: 4/5Safe to merge for an experimental feature; the new code paths are guarded behind a runtime flag and fall back gracefully when no provider is installed. The only new finding beyond the previously flagged issues is that
Important Files Changed
Reviews (11): Last reviewed commit: "feat(azure): shared CacheLibHandle singl..." | Re-trigger Greptile |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a pluggable, runtime-loaded Azure blob cache provider (C ABI + singleton loader) and integrates it into AsyncAzureClient so blob-range reads may be served from a provider .so. Includes BUILD rules, a test provider .so, unit and e2e tests, and documentation for the new env var. Changes
Sequence DiagramsequenceDiagram
participant Client
participant AsyncAzureClient
participant AzCacheProviderLoader
participant CacheLibrary as "Dynamic Cache Library"
participant AzureSDK
Client->>AsyncAzureClient: DownloadBlobRangeAsync(account, container, blob, buf, offset, len)
AsyncAzureClient->>AsyncAzureClient: check _cache_enabled
alt cache enabled
AsyncAzureClient->>AzCacheProviderLoader: read(account, container, blob, buf, offset, len)
AzCacheProviderLoader->>CacheLibrary: blob_read(account, container, blob, buf, offset, len, &err)
CacheLibrary-->>AzCacheProviderLoader: bytes_read or error_string
AzCacheProviderLoader-->>AsyncAzureClient: success/failure
AsyncAzureClient->>Client: callback or throw on failure
else cache disabled
AsyncAzureClient->>AzureSDK: perform SDK download (createDownloadBlobFn)
AzureSDK-->>AsyncAzureClient: bytes_read or error
AsyncAzureClient->>Client: callback or error
end
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (4)
cpp/azure/client/async_azure_client/async_azure_client.h (1)
89-103: Consider usingis_cache_enabled()for consistency.Line 89 directly calls
AzCacheProviderLoader::instance().is_enabled()instead of the newly addedis_cache_enabled()method. While functionally equivalent, using the wrapper method would be more consistent.♻️ Proposed fix
- if (AzCacheProviderLoader::instance().is_enabled()) + if (is_cache_enabled())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/azure/client/async_azure_client/async_azure_client.h` around lines 89 - 103, Replace the direct call to AzCacheProviderLoader::instance().is_enabled() with the wrapper method is_cache_enabled() for consistency: in the block that creates the DownloadBlobTask (the lambda using AzCacheProviderLoader::instance().read and the push_task(std::move(task)) call), change the condition to call AzCacheProviderLoader::instance().is_cache_enabled() so the code uses the new consistent API surface.docs/src/usage.md (1)
239-242: Add language specifier to fenced code block.The code block showing expected log output should have a language specifier for consistency and to satisfy linting.
📝 Proposed fix
You should see: -``` +```text AzCacheProvider: loading cache library: /path/to/your_cache_provider.so AzCacheProvider: cache provider loaded successfully from /path/to/your_cache_provider.so</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@docs/src/usage.mdaround lines 239 - 242, The fenced code block showing
AzCacheProvider log output lacks a language specifier; update the markdown in
docs/src/usage.md to add a language tag (e.g.,text orconsole) to the
existing block that contains the AzCacheProvider lines so linting and syntax
highlighting work correctly while preserving the exact log content.</details> </blockquote></details> <details> <summary>cpp/azure/azcache_provider/azcache_provider_test.cc (1)</summary><blockquote> `62-78`: **Handle leak: `dlopen` result is not closed.** Each call to `load_cache_fn()` opens a new handle that is never closed. While the OS reclaims on test exit and the same `.so` will likely return the same handle, it's cleaner to either store the handle in the fixture and close it in `TearDown`, or use a class member. <details> <summary>♻️ Store handle in fixture for proper cleanup</summary> ```diff class SimpleFileCacheTest : public ::testing::Test { protected: fs::path cache_dir_; fs::path so_path_; + void* so_handle_ = nullptr; void SetUp() override { // ... existing code ... } void TearDown() override { + if (so_handle_) + { + dlclose(so_handle_); + so_handle_ = nullptr; + } fs::remove_all(cache_dir_); unsetenv("RUNAI_CACHE_DIR"); } az_cache_read_fn load_cache_fn() { - void* handle = dlopen(so_path_.c_str(), RTLD_NOW | RTLD_LOCAL); - if (!handle) + if (!so_handle_) + { + so_handle_ = dlopen(so_path_.c_str(), RTLD_NOW | RTLD_LOCAL); + } + if (!so_handle_) { ADD_FAILURE() << "dlopen failed: " << dlerror(); return nullptr; } auto fn = reinterpret_cast<az_cache_read_fn>( - dlsym(handle, AZ_CACHE_READ_SYMBOL)); + dlsym(so_handle_, AZ_CACHE_READ_SYMBOL)); // ... } }; ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@cpp/azure/azcache_provider/azcache_provider_test.cc` around lines 62 - 78, load_cache_fn opens the shared library with dlopen but never closes the handle, leaking it; modify the test fixture to store the void* handle as a member (e.g., add a member like lib_handle_), have load_cache_fn assign the dlopen result to that member instead of a local variable, and in the fixture TearDown (or the fixture destructor) call dlclose(lib_handle_) if non-null and reset it; keep using so_path_ and AZ_CACHE_READ_SYMBOL for dlsym and ensure load_cache_fn still returns the az_cache_read_fn pointer. ``` </details> </blockquote></details> <details> <summary>cpp/azure/azcache_provider/azcache_provider_loader.cc (1)</summary><blockquote> `79-100`: **Consider defensive cleanup of `error_string` on success path.** A misbehaving cache provider might allocate `error_string` even on success. While the contract says it should only be set on failure, adding a defensive `free` on the success path would prevent a memory leak from non-conforming providers. <details> <summary>🛡️ Optional defensive fix</summary> ```diff LOG(SPAM) << "AzCacheProvider: cache read " << length << " bytes from " << container << "/" << blob << " offset=" << offset; + if (error_string) + { + free(error_string); // Defensive: provider should not set on success + } return true; } ``` </details> <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@cpp/azure/azcache_provider/azcache_provider_loader.cc` around lines 79 - 100, The success path currently ignores a non-null error_string which could be allocated by a misbehaving provider; in the function containing the bytes_read check (variables: bytes_read, length, container, blob, offset, and error_string) add a defensive cleanup: before logging the success (LOG(SPAM) ...) and returning true, check if error_string is non-null and free it to avoid leaking memory from non-conforming providers (keeping the existing free(error_string) in the failure/error branch). ``` </details> </blockquote></details> </blockquote></details> <details> <summary>🤖 Prompt for all review comments with AI agents</summary>Verify each finding against the current code and only fix it if needed.
Inline comments:
In@docs/src/env-vars.md:
- Line 123: Update the broken intra-document anchor link on the line containing
"See Azure Blob Cache Provider
(Experimental)..." to point to the
correct document by using a relative cross-document link (e.g.,
"usage.md#azure-blob-cache-provider-experimental"); modify the link text in
docs/src/env-vars.md so it references the usage.md anchor instead of a local
anchor.In
@tests/azure/test_azurecache.py:
- Around line 74-79: The test invokes subprocess.run to compile
simple_file_cache_test.cc using "gcc"; change the compiler invocation in the
subprocess.run call inside tests/azure/test_azurecache.py to use "g++" instead
of "gcc" so the .cc C++ source is compiled correctly (update the argument list
where ["gcc", "-shared", "-fPIC", "-o", so_path, src] appears and keep the
existing capture_output/text/error handling intact).- Around line 182-185: The test test_missing_cache_lib_falls_back currently only
unsets RUNAI_STREAMER_EXPERIMENTAL_AZURE_CACHE_LIB but does not assert fallback
behavior; update it to either (A) actually exercise the streaming path and
assert the cache provider is not used (e.g., start a subprocess or invoke the
streaming function and check that the Azure cache provider is disabled/not
initialized or that reads come from Azure instead), referencing the same test
name test_missing_cache_lib_falls_back and any provider init/check function used
in tests (e.g., the cache provider initialization method or is_enabled flag) to
verify provider disabled, or (B) mark the test as a placeholder by adding a
skip/ TODO so it’s clear it’s incomplete; ensure the chosen approach asserts the
expected fallback to Azure when RUNAI_STREAMER_EXPERIMENTAL_AZURE_CACHE_LIB is
unset.
Nitpick comments:
In@cpp/azure/azcache_provider/azcache_provider_loader.cc:
- Around line 79-100: The success path currently ignores a non-null error_string
which could be allocated by a misbehaving provider; in the function containing
the bytes_read check (variables: bytes_read, length, container, blob, offset,
and error_string) add a defensive cleanup: before logging the success (LOG(SPAM)
...) and returning true, check if error_string is non-null and free it to avoid
leaking memory from non-conforming providers (keeping the existing
free(error_string) in the failure/error branch).In
@cpp/azure/azcache_provider/azcache_provider_test.cc:
- Around line 62-78: load_cache_fn opens the shared library with dlopen but
never closes the handle, leaking it; modify the test fixture to store the void*
handle as a member (e.g., add a member like lib_handle_), have load_cache_fn
assign the dlopen result to that member instead of a local variable, and in the
fixture TearDown (or the fixture destructor) call dlclose(lib_handle_) if
non-null and reset it; keep using so_path_ and AZ_CACHE_READ_SYMBOL for dlsym
and ensure load_cache_fn still returns the az_cache_read_fn pointer.In
@cpp/azure/client/async_azure_client/async_azure_client.h:
- Around line 89-103: Replace the direct call to
AzCacheProviderLoader::instance().is_enabled() with the wrapper method
is_cache_enabled() for consistency: in the block that creates the
DownloadBlobTask (the lambda using AzCacheProviderLoader::instance().read and
the push_task(std::move(task)) call), change the condition to call
AzCacheProviderLoader::instance().is_cache_enabled() so the code uses the new
consistent API surface.In
@docs/src/usage.md:
- Around line 239-242: The fenced code block showing AzCacheProvider log output
lacks a language specifier; update the markdown in docs/src/usage.md to add a
language tag (e.g.,text orconsole) to the existing block that contains
the AzCacheProvider lines so linting and syntax highlighting work correctly
while preserving the exact log content.</details> <details> <summary>🪄 Autofix (Beta)</summary> Fix all unresolved CodeRabbit comments on this PR: - [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended) - [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes </details> --- <details> <summary>ℹ️ Review info</summary> <details> <summary>⚙️ Run configuration</summary> **Configuration used**: Organization UI **Review profile**: CHILL **Plan**: Pro Plus **Run ID**: `73061442-98a8-43ca-af7a-42ed2f0e9003` </details> <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 2e26a8cf78012f2c5fb10c9f4c66c0d2114612d4 and aae30930cf428b3da0e7d1be97a3ac8aae5a05e9. </details> <details> <summary>📒 Files selected for processing (12)</summary> * `cpp/azure/azcache_provider/BUILD` * `cpp/azure/azcache_provider/azcache_provider_loader.cc` * `cpp/azure/azcache_provider/azcache_provider_loader.h` * `cpp/azure/azcache_provider/azcache_provider_test.cc` * `cpp/azure/azcache_provider/runai_azcache_provider.h` * `cpp/azure/azcache_provider/simple_file_cache_test.cc` * `cpp/azure/client/async_azure_client/BUILD` * `cpp/azure/client/async_azure_client/async_azure_client.h` * `cpp/azure/client/client.cc` * `docs/src/env-vars.md` * `docs/src/usage.md` * `tests/azure/test_azurecache.py` </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
aae3093 to
7367e06
Compare
hasethuraman
left a comment
There was a problem hiding this comment.
Addressed all open review comments in commit 7367e06:
-
Path traversal: absolute paths (greptile) — Added
if (s[0] == '/') return 1;at the top ofhas_path_traversal()to reject absolute paths. Good catch. -
Broken cross-document link (CodeRabbit) — Fixed
#azure-blob-cache-provider-experimental→usage.md#azure-blob-cache-provider-experimental. -
gcc → g++ (CodeRabbit) — Changed compiler to
g++for the.ccsource file in test fixture. -
Incomplete fallback test (CodeRabbit) — Added TODO comment explaining that full fallback verification requires a subprocess with Azure credentials due to the singleton design.
Additionally: cached is_cache_enabled() as a const bool _cache_enabled member in AsyncAzureClient, initialized once at construction, to avoid repeated singleton access on every blob range read.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
cpp/azure/client/async_azure_client/async_azure_client.h (1)
86-99: Consider including container/blob in the error message for easier debugging.When the cache provider read fails, the error message "Cache provider read failed" lacks context about which blob failed. The
AzCacheProviderLoader::read()already logs details, but including identifiers in the thrown exception would help correlate errors.💡 Suggested improvement
DownloadBlobTask task{ [container_name, blob_name, buffer, offset, length]() { bool ok = AzCacheProviderLoader::instance().read( container_name, blob_name, buffer, offset, length); if (!ok) { - throw std::runtime_error("Cache provider read failed"); + throw std::runtime_error( + "Cache provider read failed for " + container_name + "/" + blob_name); } }, std::move(callback) };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/azure/client/async_azure_client/async_azure_client.h` around lines 86 - 99, The thrown exception inside the _cache_enabled branch of DownloadBlobTask should include the container/blob identifiers to aid debugging: update the lambda passed to DownloadBlobTask (the one calling AzCacheProviderLoader::instance().read with container_name and blob_name) so that when read() returns false it throws a std::runtime_error containing container_name and blob_name (and optionally offset/length) instead of the generic "Cache provider read failed", then push_task as before; reference the DownloadBlobTask lambda, AzCacheProviderLoader::instance().read, container_name and blob_name.tests/azure/test_azurecache.py (1)
21-22: Unused import:time.The
timemodule is imported but never used in this file.🧹 Proposed fix
import tempfile -import time🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/azure/test_azurecache.py` around lines 21 - 22, Remove the unused import of the time module from tests/azure/test_azurecache.py by deleting the "import time" statement so there are no unused imports; locate the top-level import line that references time and remove it (no other changes to functions or tests are required).docs/src/usage.md (1)
238-243: Add language identifier to fenced code block.The code block showing example log output should specify a language (e.g.,
textorplaintext) for consistency and to satisfy linting rules.📝 Proposed fix
You should see: -``` +```text AzCacheProvider: loading cache library: /path/to/your_cache_provider.so AzCacheProvider: cache provider loaded successfully from /path/to/your_cache_provider.so</details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@docs/src/usage.mdaround lines 238 - 243, The fenced code block showing
example log output lacks a language identifier; update the block surrounding the
two AzCacheProvider lines to include a language tag (e.g., use ```text orconsistent—locate the example block containing "AzCacheProvider: loading cache library..." and "AzCacheProvider: cache provider loaded successfully..." and add the language identifier after the opening backticks.cpp/azure/azcache_provider/azcache_provider_test.cc (1)
63-78: Minor:dlopenhandle is never closed.The
load_cache_fn()helper opens the shared library but never stores or closes the handle. For tests this is acceptable since the process exits, but it's technically a resource leak. Consider storing the handle in the fixture and closing inTearDown()if this pattern is reused.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/azure/azcache_provider/azcache_provider_test.cc` around lines 63 - 78, load_cache_fn opens the shared object with dlopen but never closes the handle, leaking a resource; modify the test fixture to store the handle (e.g., add a void* so_handle_ member initialized to nullptr), set so_handle_ = dlopen(so_path_.c_str(), ...) inside load_cache_fn (closing any previous handle if non-null), and return the dlsym result as before; then implement TearDown() to check if so_handle_ is non-null and call dlclose(so_handle_) and set it to nullptr, handling error cases from dlopen/dlsym as already done by ADD_FAILURE().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/azure/azcache_provider/azcache_provider_loader.h`:
- Around line 3-5: Remove the unused `#include` <mutex> from
azcache_provider_loader.h: the header and its implementation use std::atomic
(e.g., the singleton flag/atomic<bool> usage) and do not reference std::mutex or
any mutex types, so delete the <mutex> include to avoid an unused include and
keep the file minimal.
---
Nitpick comments:
In `@cpp/azure/azcache_provider/azcache_provider_test.cc`:
- Around line 63-78: load_cache_fn opens the shared object with dlopen but never
closes the handle, leaking a resource; modify the test fixture to store the
handle (e.g., add a void* so_handle_ member initialized to nullptr), set
so_handle_ = dlopen(so_path_.c_str(), ...) inside load_cache_fn (closing any
previous handle if non-null), and return the dlsym result as before; then
implement TearDown() to check if so_handle_ is non-null and call
dlclose(so_handle_) and set it to nullptr, handling error cases from
dlopen/dlsym as already done by ADD_FAILURE().
In `@cpp/azure/client/async_azure_client/async_azure_client.h`:
- Around line 86-99: The thrown exception inside the _cache_enabled branch of
DownloadBlobTask should include the container/blob identifiers to aid debugging:
update the lambda passed to DownloadBlobTask (the one calling
AzCacheProviderLoader::instance().read with container_name and blob_name) so
that when read() returns false it throws a std::runtime_error containing
container_name and blob_name (and optionally offset/length) instead of the
generic "Cache provider read failed", then push_task as before; reference the
DownloadBlobTask lambda, AzCacheProviderLoader::instance().read, container_name
and blob_name.
In `@docs/src/usage.md`:
- Around line 238-243: The fenced code block showing example log output lacks a
language identifier; update the block surrounding the two AzCacheProvider lines
to include a language tag (e.g., use ```text or ```plaintext) so the markdown
linter accepts it and syntax highlighting is consistent—locate the example block
containing "AzCacheProvider: loading cache library..." and "AzCacheProvider:
cache provider loaded successfully..." and add the language identifier after the
opening backticks.
In `@tests/azure/test_azurecache.py`:
- Around line 21-22: Remove the unused import of the time module from
tests/azure/test_azurecache.py by deleting the "import time" statement so there
are no unused imports; locate the top-level import line that references time and
remove it (no other changes to functions or tests are required).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: bbf12699-0b23-42a7-8bb6-9b5b11010e8d
📒 Files selected for processing (13)
cpp/azure/azcache_provider/BUILDcpp/azure/azcache_provider/azcache_provider_loader.cccpp/azure/azcache_provider/azcache_provider_loader.hcpp/azure/azcache_provider/azcache_provider_test.cccpp/azure/azcache_provider/runai_azcache_provider.hcpp/azure/azcache_provider/simple_file_cache_test.cccpp/azure/client/async_azure_client/BUILDcpp/azure/client/async_azure_client/async_azure_client.cccpp/azure/client/async_azure_client/async_azure_client.hcpp/azure/client/client.ccdocs/src/env-vars.mddocs/src/usage.mdtests/azure/test_azurecache.py
✅ Files skipped from review due to trivial changes (2)
- cpp/azure/client/async_azure_client/BUILD
- docs/src/env-vars.md
🚧 Files skipped from review as they are similar to previous changes (1)
- cpp/azure/client/client.cc
7367e06 to
69ced42
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
cpp/azure/azcache_provider/azcache_provider_test.cc (1)
62-78: Consider closing the library handle after tests or reusing a single handle.
load_cache_fn()callsdlopenbut neverdlclose, leaking the handle. While this is acceptable in tests (the OS cleans up on process exit), you could either:
- Store the handle in the fixture and call
dlcloseinTearDown- Reuse a single handle across test cases
♻️ Optional improvement to track and close the handle
class SimpleFileCacheTest : public ::testing::Test { protected: fs::path cache_dir_; fs::path so_path_; + void* lib_handle_ = nullptr; void SetUp() override { ... } void TearDown() override { fs::remove_all(cache_dir_); unsetenv("RUNAI_CACHE_DIR"); + if (lib_handle_) { + dlclose(lib_handle_); + lib_handle_ = nullptr; + } } az_cache_read_fn load_cache_fn() { - void* handle = dlopen(so_path_.c_str(), RTLD_NOW | RTLD_LOCAL); - if (!handle) + if (!lib_handle_) { + lib_handle_ = dlopen(so_path_.c_str(), RTLD_NOW | RTLD_LOCAL); + } + if (!lib_handle_) { ADD_FAILURE() << "dlopen failed: " << dlerror(); return nullptr; } auto fn = reinterpret_cast<az_cache_read_fn>( - dlsym(handle, AZ_CACHE_READ_SYMBOL)); + dlsym(lib_handle_, AZ_CACHE_READ_SYMBOL)); ... } };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/azure/azcache_provider/azcache_provider_test.cc` around lines 62 - 78, The test currently leaks the dlopen handle in load_cache_fn() by never calling dlclose; modify the test fixture to store the void* handle (returned by dlopen using so_path_) as a member (e.g., fixture field) and change load_cache_fn() to reuse that handle (dlopen once) or to set the handle member when opening; then implement TearDown() in the test fixture to call dlclose(handle) if handle is non-null, and update load_cache_fn() to not return early without setting or reusing the stored handle (referencing load_cache_fn, so_path_, dlopen, dlclose, and the fixture TearDown).tests/azure/test_azurecache.py (1)
21-21: Remove unusedtimeimport.The
timemodule is imported but not used in this test file.🧹 Proposed fix
import subprocess import sys import tempfile -import time import numpy as np🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/azure/test_azurecache.py` at line 21, Remove the unused import statement "import time" from the test file; locate the top-level import line "import time" in tests/azure/test_azurecache.py and delete it so there are no unused imports left in the module.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/azure/client/async_azure_client/async_azure_client.h`:
- Around line 86-100: The code throws a generic std::runtime_error("Cache
provider read failed") inside the DownloadBlobTask when
AzCacheProviderLoader::instance().read(...) returns false, losing provider error
details; update the failure path in the _cache_enabled branch (DownloadBlobTask
lambda) to capture and include the provider's error information (e.g., call a
loader method like AzCacheProviderLoader::instance().last_error() or equivalent)
or call the loader's logging before throwing, and construct the thrown exception
message to include that detailed text so push_task receives an exception with
the provider error included.
In `@docs/src/usage.md`:
- Around line 238-243: The fenced code block showing the example log output (the
block containing the two "AzCacheProvider: ..." lines) is missing a language
specifier; update that triple-backtick block to include a language like text or
console (e.g., ```text) so linting and syntax highlighting work correctly and
the example log output is consistent.
---
Nitpick comments:
In `@cpp/azure/azcache_provider/azcache_provider_test.cc`:
- Around line 62-78: The test currently leaks the dlopen handle in
load_cache_fn() by never calling dlclose; modify the test fixture to store the
void* handle (returned by dlopen using so_path_) as a member (e.g., fixture
field) and change load_cache_fn() to reuse that handle (dlopen once) or to set
the handle member when opening; then implement TearDown() in the test fixture to
call dlclose(handle) if handle is non-null, and update load_cache_fn() to not
return early without setting or reusing the stored handle (referencing
load_cache_fn, so_path_, dlopen, dlclose, and the fixture TearDown).
In `@tests/azure/test_azurecache.py`:
- Line 21: Remove the unused import statement "import time" from the test file;
locate the top-level import line "import time" in tests/azure/test_azurecache.py
and delete it so there are no unused imports left in the module.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 1b4dbc49-9dc5-403c-b5e1-7905f51c3fd3
📒 Files selected for processing (13)
cpp/azure/azcache_provider/BUILDcpp/azure/azcache_provider/azcache_provider_loader.cccpp/azure/azcache_provider/azcache_provider_loader.hcpp/azure/azcache_provider/azcache_provider_test.cccpp/azure/azcache_provider/runai_azcache_provider.hcpp/azure/azcache_provider/simple_file_cache_test.cccpp/azure/client/async_azure_client/BUILDcpp/azure/client/async_azure_client/async_azure_client.cccpp/azure/client/async_azure_client/async_azure_client.hcpp/azure/client/client.ccdocs/src/env-vars.mddocs/src/usage.mdtests/azure/test_azurecache.py
✅ Files skipped from review due to trivial changes (2)
- cpp/azure/client/async_azure_client/BUILD
- docs/src/env-vars.md
🚧 Files skipped from review as they are similar to previous changes (4)
- cpp/azure/client/async_azure_client/async_azure_client.cc
- cpp/azure/client/client.cc
- cpp/azure/azcache_provider/azcache_provider_loader.cc
- cpp/azure/azcache_provider/BUILD
69ced42 to
877d721
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (3)
cpp/azure/azcache_provider/azcache_provider_test.cc (1)
62-78: Minor: dlopen handle is not closed.Each call to
load_cache_fn()opens the shared library but never callsdlclose(). While this is acceptable in test code (process termination cleans up), storing and closing the handle would be cleaner. Sincedlopenwith the same path returns the same handle (reference counted), this doesn't cause multiple loads, just a minor resource tracking issue.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/azure/azcache_provider/azcache_provider_test.cc` around lines 62 - 78, The test helper load_cache_fn opens the shared object with dlopen and never calls dlclose, leaking the handle; modify load_cache_fn to store the dlopen result in a local variable (handle already present), and after obtaining the symbol via dlsym call dlclose(handle) before returning (or keep and manage the handle lifecycle in a static/field if you need the handle alive); ensure you still return the function pointer (az_cache_read_fn) and report errors via ADD_FAILURE() when dlopen or dlsym fail (use the existing so_path_, handle, dlopen, dlsym, dlclose, and AZ_CACHE_READ_SYMBOL identifiers to locate and update the code).tests/azure/test_azurecache.py (2)
21-21: Unused import.
timeis imported but never used in the file.🧹 Proposed fix
import subprocess import sys import tempfile -import time🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/azure/test_azurecache.py` at line 21, Remove the unused import "time" from the import section of tests/azure/test_azurecache.py; locate the line that reads "import time" and delete it so there are no unused imports remaining (or replace it with a needed import only if intended to be used by a test in functions/classes in this file).
145-159: Consider removing redundant env var settings in inline script.Lines 147-148 set
RUNAI_STREAMER_EXPERIMENTAL_AZURE_CACHE_LIBandRUNAI_CACHE_DIRin the script, but these are already passed viaenv=envat line 167. The subprocess inherits the environment dict, making these assignments redundant.Not a bug, but could simplify the script slightly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/azure/test_azurecache.py` around lines 145 - 159, The inline Python script assigned to the variable script sets RUNAI_STREAMER_EXPERIMENTAL_AZURE_CACHE_LIB and RUNAI_CACHE_DIR via os.environ but those are already provided through the env dictionary passed to the subprocess (env), so remove the redundant os.environ assignments from the script string; keep the imports, uri, SafetensorsStreamer usage and assertions intact, and rely on the existing env dict used when launching the subprocess to supply those variables.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@cpp/azure/azcache_provider/azcache_provider_test.cc`:
- Around line 62-78: The test helper load_cache_fn opens the shared object with
dlopen and never calls dlclose, leaking the handle; modify load_cache_fn to
store the dlopen result in a local variable (handle already present), and after
obtaining the symbol via dlsym call dlclose(handle) before returning (or keep
and manage the handle lifecycle in a static/field if you need the handle alive);
ensure you still return the function pointer (az_cache_read_fn) and report
errors via ADD_FAILURE() when dlopen or dlsym fail (use the existing so_path_,
handle, dlopen, dlsym, dlclose, and AZ_CACHE_READ_SYMBOL identifiers to locate
and update the code).
In `@tests/azure/test_azurecache.py`:
- Line 21: Remove the unused import "time" from the import section of
tests/azure/test_azurecache.py; locate the line that reads "import time" and
delete it so there are no unused imports remaining (or replace it with a needed
import only if intended to be used by a test in functions/classes in this file).
- Around line 145-159: The inline Python script assigned to the variable script
sets RUNAI_STREAMER_EXPERIMENTAL_AZURE_CACHE_LIB and RUNAI_CACHE_DIR via
os.environ but those are already provided through the env dictionary passed to
the subprocess (env), so remove the redundant os.environ assignments from the
script string; keep the imports, uri, SafetensorsStreamer usage and assertions
intact, and rely on the existing env dict used when launching the subprocess to
supply those variables.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: b6e9f90f-9cd6-4ab8-aeae-ada341e2b9b5
📒 Files selected for processing (13)
cpp/azure/azcache_provider/BUILDcpp/azure/azcache_provider/azcache_provider_loader.cccpp/azure/azcache_provider/azcache_provider_loader.hcpp/azure/azcache_provider/azcache_provider_test.cccpp/azure/azcache_provider/runai_azcache_provider.hcpp/azure/azcache_provider/simple_file_cache_test.cccpp/azure/client/async_azure_client/BUILDcpp/azure/client/async_azure_client/async_azure_client.cccpp/azure/client/async_azure_client/async_azure_client.hcpp/azure/client/client.ccdocs/src/env-vars.mddocs/src/usage.mdtests/azure/test_azurecache.py
✅ Files skipped from review due to trivial changes (4)
- cpp/azure/client/async_azure_client/BUILD
- docs/src/env-vars.md
- docs/src/usage.md
- cpp/azure/azcache_provider/BUILD
🚧 Files skipped from review as they are similar to previous changes (3)
- cpp/azure/client/async_azure_client/async_azure_client.cc
- cpp/azure/azcache_provider/azcache_provider_loader.cc
- cpp/azure/client/async_azure_client/async_azure_client.h
|
@noa-neria I am also getting an internal review FYI. Will update once it is done |
…mental) - Define blob_read C API header for custom cache providers - Dynamic loader with dlopen/dlsym for cache provider .so - Auto-discover cache library in Python site-packages via dladdr - RUNAI_STREAMER_EXPERIMENTAL_AZURE_CACHE_ENABLED as a disable switch - Example simple_file_cache_test implementation for reference/testing - Unit tests and integration test infrastructure - Documentation: env-vars.md and usage.md updated Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
877d721 to
ccbe45e
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
tests/azure/test_azurecache.py (1)
27-29: Path resolution may fail in some test environments.The relative path construction using
os.path.dirname(__file__)with multiple..segments assumes a specific directory structure. If tests are run from an installed package or a different working directory, this path may not resolve correctly.The existing
pytest.skip()in the fixture handles this gracefully, but consider adding a comment explaining the expected directory structure.📝 Suggested documentation improvement
+# Path to the C++ test provider source file. +# This assumes the test is run from the repository root with the standard layout: +# tests/azure/test_azurecache.py +# cpp/azure/azcache_provider/simple_file_cache_test.cc SIMPLE_FILE_CACHE_SRC = os.path.join( os.path.dirname(__file__), "..", "..", "cpp", "azure", "azcache_provider", "simple_file_cache_test.cc" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/azure/test_azurecache.py` around lines 27 - 29, SIMPLE_FILE_CACHE_SRC path resolution can fail in installs or different CWDs; add a concise comment above the SIMPLE_FILE_CACHE_SRC definition explaining the expected repository layout (that tests assume the repo source tree with cpp/azure/azcache_provider/simple_file_cache_test.cc located two levels up) and note that the test fixture uses pytest.skip() when the file isn't found—this documents why the relative path is used and when the skip will occur so future readers understand the assumption and fallback.cpp/azure/azcache_provider/azcache_provider_test.cc (1)
62-78: Minor: dlopen handle is never closed.The
load_cache_fn()helper callsdlopen()but never stores the handle for laterdlclose(). While this is acceptable in tests (process exit cleans up), it could cause issues if tests are run repeatedly or if the test framework reuses processes.Consider storing the handle in the fixture and calling
dlclose()inTearDown().♻️ Suggested improvement
class SimpleFileCacheTest : public ::testing::Test { protected: fs::path cache_dir_; fs::path so_path_; + void* lib_handle_ = nullptr; void SetUp() override { // ... existing code ... } void TearDown() override { + if (lib_handle_) { + dlclose(lib_handle_); + lib_handle_ = nullptr; + } fs::remove_all(cache_dir_); unsetenv("RUNAI_CACHE_DIR"); } blob_read_fn load_cache_fn() { - void* handle = dlopen(so_path_.c_str(), RTLD_NOW | RTLD_LOCAL); - if (!handle) + if (!lib_handle_) { + lib_handle_ = dlopen(so_path_.c_str(), RTLD_NOW | RTLD_LOCAL); + } + if (!lib_handle_) { ADD_FAILURE() << "dlopen failed: " << dlerror(); return nullptr; } auto fn = reinterpret_cast<blob_read_fn>( - dlsym(handle, BLOB_READ_SYMBOL)); + dlsym(lib_handle_, BLOB_READ_SYMBOL)); if (!fn) { ADD_FAILURE() << "dlsym failed: " << dlerror(); } return fn; } };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/azure/azcache_provider/azcache_provider_test.cc` around lines 62 - 78, load_cache_fn() opens a shared object with dlopen but never saves the returned handle for dlclose; modify the test fixture to store the void* handle (returned by dlopen using so_path_ and BLOB_READ_SYMBOL lookup) on the fixture (e.g. a member like dl_handle_), return the blob_read_fn as before, and implement TearDown() to call dlclose(dl_handle_) if non-null and clear it to ensure the handle is closed between tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/azure/azcache_provider/azcache_provider_loader.cc`:
- Around line 94-101: The env-var handling currently treats an explicitly-set
empty RUNAI_AZURE_CACHE_LIB_ENV value as valid, which lets dlopen("") bind the
main program; update the logic in the azcache provider loader so that when
utils::try_getenv<std::string>(RUNAI_AZURE_CACHE_LIB_ENV, lib_path) returns true
you also treat lib_path.empty() as a disabled/invalid value (same behavior as
is_disabled_value), log that it was explicitly disabled, and return early before
the dlopen call; apply the same empty-string check where the code later handles
RUNAI_AZURE_CACHE_LIB_ENV/loader path prior to calling dlopen to avoid binding
the main program handle.
- Around line 61-65: The call to ::tolower in is_disabled_value risks undefined
behavior when char is signed; modify the std::transform lambda to cast each
character to unsigned char before calling ::tolower (i.e.,
transform(lower.begin(), lower.end(), lower.begin(), [](unsigned char c){ return
std::tolower(static_cast<unsigned char>(c)); }) or equivalent) so that the
characters in string lower are safely converted to lower-case; update the
transform usage in is_disabled_value accordingly.
---
Nitpick comments:
In `@cpp/azure/azcache_provider/azcache_provider_test.cc`:
- Around line 62-78: load_cache_fn() opens a shared object with dlopen but never
saves the returned handle for dlclose; modify the test fixture to store the
void* handle (returned by dlopen using so_path_ and BLOB_READ_SYMBOL lookup) on
the fixture (e.g. a member like dl_handle_), return the blob_read_fn as before,
and implement TearDown() to call dlclose(dl_handle_) if non-null and clear it to
ensure the handle is closed between tests.
In `@tests/azure/test_azurecache.py`:
- Around line 27-29: SIMPLE_FILE_CACHE_SRC path resolution can fail in installs
or different CWDs; add a concise comment above the SIMPLE_FILE_CACHE_SRC
definition explaining the expected repository layout (that tests assume the repo
source tree with cpp/azure/azcache_provider/simple_file_cache_test.cc located
two levels up) and note that the test fixture uses pytest.skip() when the file
isn't found—this documents why the relative path is used and when the skip will
occur so future readers understand the assumption and fallback.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: 3ff28027-e486-4385-bf2e-608adaa0c533
📒 Files selected for processing (13)
cpp/azure/azcache_provider/BUILDcpp/azure/azcache_provider/azcache_provider_loader.cccpp/azure/azcache_provider/azcache_provider_loader.hcpp/azure/azcache_provider/azcache_provider_test.cccpp/azure/azcache_provider/runai_azcache_provider.hcpp/azure/azcache_provider/simple_file_cache_test.cccpp/azure/client/async_azure_client/BUILDcpp/azure/client/async_azure_client/async_azure_client.cccpp/azure/client/async_azure_client/async_azure_client.hcpp/azure/client/client.ccdocs/src/env-vars.mddocs/src/usage.mdtests/azure/test_azurecache.py
✅ Files skipped from review due to trivial changes (4)
- cpp/azure/client/async_azure_client/async_azure_client.cc
- cpp/azure/client/async_azure_client/BUILD
- cpp/azure/azcache_provider/BUILD
- docs/src/usage.md
kyleknap
left a comment
There was a problem hiding this comment.
Just leaving some comments/questions that I had as I work through the PR
| } | ||
|
|
||
| // libstreamerazure.so → lib/ → libstreamer/ → runai_model_streamer/ → site-packages/ | ||
| auto site_packages = azure_so.parent_path().parent_path().parent_path().parent_path(); |
There was a problem hiding this comment.
Instead of relying on where the current location of libstreamerazure.so, I wonder if it is possible to leverage Python itself (e.g., using sys.path or site.getsitepackages()) to get the full list of locations Python will look for packages? Maybe if we are able to call one of these functions as part of the bindings to model streamer interface we could inject the value down into configuration to be read as part of this logic?
Mainly with the logic as is, it will probably work for most cases (e.g., a customer installing the package from PyPI) but not sure if it will hold for cases like editable installs or environments where the site packages is searched in multiple different locations (e.g., setting PYTHONPATH)
There was a problem hiding this comment.
Common python layer calls the libstreamerazure.so & so we dont have a chance to call azure specific. And we wanted zero touch without user doing any explicit import. Since libstreamerazure.so is expected to be in site-packages, we will support the tachyon installed in same site. WDYT? We can enhance c++ code to look into PYTHONPATH later?
There was a problem hiding this comment.
Resolving it for now & track later.
There was a problem hiding this comment.
I agree that this should be moved to the python layer.
We can add a new hook prepare_runtime similar to the existing scheme in safetensors streamer:
runai_model_streamer/s3_utils/s3_utils.py is the existing scheme-dispatch layer:
- Scheme detectors:
is_s3_path,is_gs_path,is_azure_path. - Lazy importers: `get_s3_files_module(), get_gcs_files_module(), get_azure_files_module() — each returns None if the backend package isn't installed.
- Thin per-scheme wrappers:
s3_pull_files,azure_glob, etc. — they fetch the backend module and delegate.
And safetensors_streamer.py:36-45, 56-62 is the dispatch site:
if is_s3_path(path): files = s3_glob(...)
elif is_gs_path(path): files = gcs_glob(...)
elif is_azure_path(path): files = azure_glob(...)
Each cloud's logic lives in its own package — runai_model_streamer_azure is the natural place for "before we touch Azure, prepare any Azure runtime state" — including resolving the cache .so.
The resolver code in azure implementation for prepare_runtime can set the environment variable of the package name (unless alreday set)
@hasethuraman @kyleknap If that is ok with you, I can create a quick PR for prepare_runtime which you can use after rebasing
There was a problem hiding this comment.
@noa-neria I like this idea of having a hook to do any prep work prior to going into C++ runtime. I think it will be generally useful in the future.
In terms of passing state from the prepare_runtime() hook, will it only be environment variables we'll be able to leverage/set? Or will it be able to set any in-code configuration that will cross the Python layer to the C++ layer (e.g., a config, context object, or even a configuration dictionary)? Main concern is if it just environment variables we can set, it might make sense to have some sort of teardown hook as well to make sure we reset environment variables back if this is only being called when instantiating the SafetensorsStreamer?
There was a problem hiding this comment.
I hope we can take this to a follow up PR. As the changes are increasing, can we merge this so that we get soaking time for what we have manwhile.
There was a problem hiding this comment.
@hasethuraman I'm good with deferring this improvement to a fast follow up PR. The RUNAI_STREAMER_EXPERIMENTAL_AZURE_CACHE_LIB env var can be leveraged in the scenarios that I was concerned won't work.
…o mode, non-singleton loader - Caller-owned error buffer (stack-allocated, no cross-.so malloc/free) - CacheMode enum: 0=disabled, 1=required, auto=default (matches RUNAI_STREAMER_DIST) - Non-singleton: from_env() factory, shared_ptr ownership per client - Removed disable-via-lib-path; _ENABLED is sole on/off control - Added ec.message() to debug logs for canonical path and auto-discovery - Fixed tolower UB (unsigned char cast) - Updated usage.md blob_read signature to match new API Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
3d1b185 to
49ba365
Compare
|
@noa-neria addressed most of the internal review comments and as it continues, I would like to request your & other reviewers to go through. Thanks. I may be offline for a week but @kyleknap has a better idea now what this PR's intent is & requesting @kyleknap to take questions meanwhile if works. Will take up further review comments / left over questions once I am back. |
|
if there are no comments, can we go ahead for merge? |
There was a problem hiding this comment.
@hasethuraman thank you for your patiance. We've been quite busy lately.
There are few comments I've added.
Also, have you considered adding versioning for the ABI, if things may change in the future?
Since the cache shared library is not part of the runai-model-streamer package, querying the ABI version after dlopen is a good practice.
For example:
/* runai_azcache_provider.h */
#define RUNAI_CACHE_ABI_VERSION 1
#define RUNAI_CACHE_ABI_VERSION_SYMBOL "runai_cache_abi_version"
/* Provider exports: */
extern "C" uint32_t runai_cache_abi_version(void) { return RUNAI_CACHE_ABI_VERSION; }
Loader flow:
dlopenthe library.dlsym("runai_cache_abi_version"). Missing → reject (pre-versioning build, treat as incompatible). Present → call it.- If the returned version is in the set the streamer accepts → install blob_read. Otherwise log "expected vN, got vM" and either fall back (Auto) or fail (Required).
| } | ||
|
|
||
| // libstreamerazure.so → lib/ → libstreamer/ → runai_model_streamer/ → site-packages/ | ||
| auto site_packages = azure_so.parent_path().parent_path().parent_path().parent_path(); |
There was a problem hiding this comment.
I agree that this should be moved to the python layer.
We can add a new hook prepare_runtime similar to the existing scheme in safetensors streamer:
runai_model_streamer/s3_utils/s3_utils.py is the existing scheme-dispatch layer:
- Scheme detectors:
is_s3_path,is_gs_path,is_azure_path. - Lazy importers: `get_s3_files_module(), get_gcs_files_module(), get_azure_files_module() — each returns None if the backend package isn't installed.
- Thin per-scheme wrappers:
s3_pull_files,azure_glob, etc. — they fetch the backend module and delegate.
And safetensors_streamer.py:36-45, 56-62 is the dispatch site:
if is_s3_path(path): files = s3_glob(...)
elif is_gs_path(path): files = gcs_glob(...)
elif is_azure_path(path): files = azure_glob(...)
Each cloud's logic lives in its own package — runai_model_streamer_azure is the natural place for "before we touch Azure, prepare any Azure runtime state" — including resolving the cache .so.
The resolver code in azure implementation for prepare_runtime can set the environment variable of the package name (unless alreday set)
@hasethuraman @kyleknap If that is ok with you, I can create a quick PR for prepare_runtime which you can use after rebasing
|
@noa-neria thank you for review. Will work on the comments. Regarding the prepare_runtime hook, if I understand correctly, users dont need to add any additional import statements or something for this hook to be executed. If it is, I am all for it. We can have a follow up PR to integrate with hook to unblock this one. |
Correct - |
- docs: values accepted are 0, 1, auto (not true/false) - docs: disable example uses =0 instead of =false - tests: replace stale 'singleton' references with accurate ClientMgr pool explanation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Updated the code to support ABI version |
- Header: RUNAI_CACHE_ABI_VERSION=1, runai_cache_abi_version symbol - Loader: checks version after dlopen, rejects missing/mismatched - Test .so: exports runai_cache_abi_version() returning 1 - Tests: 8 new loader tests covering mode × version combinations Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
28395cc to
1e5436f
Compare
|
@hasethuraman thanks for handling these issues! One more question after reviewing the code again: Is there a reason each The reason I'm asking: by design This is unlike our treatment of the CSP plugin libraries, which are |
- Refactor AzCacheProviderLoader to use a shared CacheLibHandle via weak_ptr singleton, ensuring single dlopen per process - Add shutdown_fn typedef and SHUTDOWN_SYMBOL to ABI header - CacheLibHandle destructor calls shutdown() for graceful provider cleanup before process exit (no dlclose for safety) - Mutex-protected creation and destruction to prevent race conditions - Update tests: shared handle reuse, shutdown_called verification - Update example cache (simple_file_cache_test.cc) to export shutdown() Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@noa-neria agree on the concern. Thanks for direction. We have limitations in dlclose at the moment. However, introduced the shared ptr for lib cache across clients. And on streamer close, calling a new function shutdown() on cache lib to make the cache resources claim possible in future till we get dlclose to work. |
|
Want your agent to iterate on Greptile's feedback? Try greploops. |
Pluggable Azure Blob Cache Provider with Auto-Discovery
Adds a pluggable cache provider for Azure Blob Storage reads with zero-config auto-discovery. When a compatible cache provider package (e.g.,
tachyon-client) is installed alongsiderunai-model-streamer, the streamer automatically discovers and loads it at runtime — no environment variable configuration needed.How it works
libStorageDirect.so) into Python site-packagesdladdr(walking from the runai .so location)dlopenand theblob_readsymbol is resolvedComponents
runai_azcache_provider.h: C API contract (blob_read) for cache provider .so pluginsazcache_provider_loader.cc: Singleton that auto-discovers viadladdr, falls back todlopen/dlsymAsyncAzureClientintegration: Routes reads through cache when provider is discoveredsimple_file_cache_test.cc: Example cache implementation for reference and unit testingRUNAI_STREAMER_EXPERIMENTAL_AZURE_CACHE_ENABLED: Disable switch to turn off cache even when provider is installedenv-vars.mdandusage.mdupdated with auto-discovery flow and cache provider authoring guideTesting
Summary by CodeRabbit
New Features
Documentation
Tests