feat(proxy): serve Microsoft-style compressed symbol files (.pd_, .dl_, .ex_)#1951
feat(proxy): serve Microsoft-style compressed symbol files (.pd_, .dl_, .ex_)#1951bosiakov wants to merge 5 commits into
Conversation
107f6e0 to
95205d2
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 95205d2. Configure here.
There was a problem hiding this comment.
Out of curiosity, what is your use-case for the proxy endpoint? This is still an experimental feature and there were some talks about possibly removing it again as we didn't get much/any feedback that this is useful or getting used.
There was a problem hiding this comment.
@Dav1dde Thanks for asking!
We're a C++ engineering org self-hosting Sentry for native crash collection.
Our PDBs are 8 to 12 GB each.
Engineers on the Windows side would like one URL to point both Visual Studio and WinDbg at via _NT_SYMBOL_PATH, so PDBs would flow from CI through sentry-cli upload into the debugger.
The /proxy endpoint handles .pdb URLs but not the .pd_ compressed form that Microsoft symsrv clients ask for first, so we've been running a parallel HTTP mirror just for the compressed form. It works but is more infrastructure than we'd like for something Sentry already holds.
The change is opt-in. This PR would round out the symsrv compatibility story for us.
Thanks for taking a look!
There was a problem hiding this comment.
Thanks for explaining your usecase, will be discussing this with the team. We had some discussions around the proxy endpoint but we never really made the decision to remove it as it was already there and largely just used existing Symbolicator infrastructure.
The endpoint always had short-comings, it really doesn't work reliably in all cases to translate between different layouts, but it does provide the illusion it should. It never really integrated well with Sentry itself, pulling symbol source definitions from Sentry, sharing a single authentication mechanism etc.
This is quite a large change to endpoint, so we'll need to revive the discussions and make a decision, whether we want to keep the endpoint at all.
It's really unfortunate that you put in all the effort and this is just my vague response, but I hope at least it gives some context.

Symbolicator can ingest Microsoft style compressed symbol files (
.pd_,.dl_,.ex_) from upstream symbol servers but cannot serve them on the/proxyendpoint. Tools that speak the symsrv protocol (WinDbg, Visual Studio, symchk) request the underscore form and expect a CAB body. This PR adds optional CAB output behind a newcompressed_proxyconfig flag, defaulting to off so existing deployments are unchanged.When the flag is enabled, the proxy serves the underscore form in two ways. If the upstream source delivered a CAB, the bytes are preserved byte for byte in a new
raw_compressedcache and returned as is. Otherwise a fresh MSZIP CAB is synthesized from the cached object and stored in a newcab_synthcache for reuse.Changes
Wire up the new request shape in the proxy handler. Detect the
.pd_/.dl_/.ex_leaf, rewrite to the uncompressed form for object lookup, attach thevnd.ms-cab-compressedcontent type on the way out. With the flag off, the underscore form returns 404 so we never accidentally serve decompressed bytes under a misleading filename. (endpoints/proxy.rs)Preserve upstream CAB bytes during download.
maybe_decompress_filereturns aDecompressOutcomethat includes the original payload as a sibling tempfile when the source is CAB. The allocation happens only after the CAB magic is matched, so non CAB downloads pay nothing extra. Other compressed formats (gzip, zstd, zlib, zip) are decompressed as before but not preserved, because their bytes cannot honestly be served asvnd.ms-cab-compressed. (download/compression.rs,download/fetch_file.rs,objects/data_cache.rs)Synthesize CAB on demand for uploads and non CAB upstreams. The new
cab_synth_cache.rswraps the cached object in a single folder, single file MSZIP CAB using thecabcrate writer. The compression runs insidetokio::task::spawn_blockingso multi GB inputs do not stall async workers. Results are cached so the cost is paid once per symbol. (objects/cab_synth_cache.rs)Compose the two paths in
ObjectsActor::fetch_compressed. Ensure the underlying object is fetched (which populates the upstream mirror as a side effect), look inraw_compressedfirst, fall back tocab_synth. Plumb the new caches and the config flag throughObjectsActor::newandRequestService::fetch_compressed_object. (objects/mod.rs,objects/raw_compressed_cache.rs,service.rs)Extend the cache layer with two primitives needed for the above.
Cacher::store_externallypersists a tempfile into the on disk cache from outside the normalcomputeflow, used by the tee path.Cacher::lookup_onlychecks the cache without ever invokingcomputeand without caching negatives, used by the proxy so araw_compressedentry that appears after a first lookup is picked up by the second. (caching/memory.rs)Register the new caches alongside the existing ones (
raw_compressed,cab_synth) with their version constants, cleanup wiring, and config plumbing. Add thecompressed_proxy: boolfield with a default of false. (caching/mod.rs,caching/config.rs,caches/versions.rs,caching/cleanup.rs,config.rs,services.rs)Benchmark
CAB synthesis is the only added per request CPU cost. The other paths are filesystem rename or mmap and stay in the microsecond range. Numbers below come from a single threaded run of the cab 0.6 MSZIP writer on Apple Silicon, release build.
Cost is paid once per
(scope, symbol)pair on first cache miss. Subsequent hits read from thecab_synthmmap in single digit milliseconds. Concurrent requests for the same symbol deduplicate to one compression viaCacher::compute_memoized. Memory footprint stays bounded: input is a kernel demand paged mmap, output streams through the writer to a tempfile, DEFLATE state fits in a few hundred KB. With the flag off, the codepath is bit for bit identical to before.