11---
2- title : " IPIP-0445: trustless gateway skip-leaves option "
2+ title : " IPIP-0445: Option to Skip Raw Blocks in Gateway Responses "
33date : 2023-10-09
44ipip : open
55editors :
6- - name : Hugo VALTIER
6+ - name : Hugo Valtier
77 github : Jorropo
88 url : https://jorropo.net/
99 affiliation :
1010 name : Protocol Labs
1111 url : https://protocol.ai/
12+ - name : Marcin Rataj
13+ github : lidel
14+ url : https://lidel.org/
15+ affiliation :
16+ name : Protocol Labs
17+ url : https://protocol.ai/
1218relatedIssues :
1319 - https://github.com/ipfs/specs/issues/444
1420order : 445
@@ -17,88 +23,152 @@ tags: ['ipips']
1723
1824## Summary
1925
20- Introduce ` skip-leaves ` flag for the : cite [ trustless-gateway] .
26+ Introduce ` skip-raw-blocks ` flag for the : cite [ trustless-gateway] .
2127
2228## Motivation
2329
2430Allow clients to read a stream which only contain proofs in a bottom heavy
2531graph using ` raw ` codec for it's leaves.
2632
27- Usefull with unixfs for features like webseeds [ #444 ] ( https://github.com/ipfs/specs/issues/444 ) .
33+ Usefull for UnixFS for features like webseeds
34+ ([ ipfs/specs #444 ] ( https://github.com/ipfs/specs/issues/444 ) ), where metadata
35+ about a DAG is fetched from a trustless gateway, but the actual raw data can be
36+ fetched from any source that supports either trustless gateway specification,
37+ or plain HTTP Range Requests, allowing for trustless and verifiable data
38+ retrieval from plain HTTP (non-IPFS) data sources.
2839
2940## Detailed design
3041
31- The ` skip-leaves ` CAR Content-Type parameter on : cite [ trustless-gateway]
42+ The ` skip-raw-blocks ` URL query parameter on : cite [ trustless-gateway]
3243allows clients to download an entity except blocks with the multicodec
3344` raw ` (` 0x55 ` ).
3445
3546- When set to ` y ` , the parameter instructs the gateway not to transmit
36- blocks tagged with the ` raw ` multicodec.
37- - If set to ` n ` , or left unspecified, the gateway MUST transmit ` raw `
38- multicodec blocks.
47+ blocks referenced with a CID with the ` raw ` multicodec.
48+ - If set to ` n ` , or left unspecified, there is no special handling of ` raw `
49+ multicodec blocks (the existing default behavior remains the same) .
3950
4051Importantly, unless explicitly specified as ` y ` , the default operational
41- mode of the gateway MUST assume the value of ` skip-leaves ` to be ` n ` .
52+ mode of the gateway MUST assume the value of ` skip-raw-blocks ` to be ` n ` .
4253
4354## Design rationale
4455
4556### User Benefit
4657
47- Implementing the ` skip-leaves ` parameter offers several benefits to users:
58+ Implementing the ` skip-raw-blocks ` parameter offers several benefits to users:
4859
49601 . ** Verification Flexibility:** Clients can verify out-of-band (OOB) received
5061 files in their deserialized form without necessitating the transmission of
5162 raw blocks from the gateway.
63+
52642 . ** Incremental Download:** Clients can incrementally download files in
5365 deserialized forms from non-IPFS servers. Allowing applications to share
54- distribution for IPFS and non IPFS clients.
55- 3 . ** Efficient Block Discovery:** With the ` skip-leaves ` option enabled,
66+ distribution for IPFS and non-IPFS clients.
67+
68+ 3 . ** Efficient Block Discovery:** With the ` skip-raw-blocks ` option enabled,
5669 clients can quickly discover numerous candidate blocks without being
5770 bottlenecked by the gateway's transmission of raw blocks.
5871
72+ 4 . ** Non-IPFS HTTP Mirrors Become Useful:** Legacy data that is already exposed
73+ over HTTP in deserialized form can now act as sources for specific block
74+ byte ranges, without having to support any IPFS specific APIs. Plain HTTP
75+ Range Requests can be used for fetching remaining raw block data, and the
76+ metadata read via ` skip-raw-blocks=y ` is enough for a client to verify the
77+ remaining raw block byte ranges fetched from non-IPFS system match expected
78+ CIDs.
79+
5980### Compatibility
6081
61- Setting the default value of the ` skip-leaves ` parameter to ` n ` ensures
82+ Setting the default value of the ` skip-raw-blocks ` parameter to ` n ` ensures
6283backward compatibility with existing clients and systems that are unaware
6384of this new flag.
6485
65- ### Prevention of Amplification Attacks and Efficient Server Operation
86+ ### Alternatives
6687
67- By utilizing the ` raw ` (` 0x55 ` ) codec servers can trivially determine whether
68- to fetch or skip a block without having to learn any new information.
69- Although more limited and not able to handle unixfs file using dag-pb for their
70- leaves, it allows both the client and server to trivially verify a block
71- must not be fetched. Preventing issues of Amplification where a server could
72- need to fetch multiple orders more data than the client when executing the
73- request.
88+ An alternative approach would be to request blocks individually.
89+ However, it adds extra round trips and more per HTTP request overhead
90+ and thus is undesirable.
7491
75- ### Why not ` dag-scope=skip-leaves ` ?
92+ #### Why not ` dag-scope=skip-raw-blocks ` ?
7693
77- The ` dag-scope ` parameter determines the overall range of blocks to retrieve,
78- while ` skip-leaves ` selectively filters specific blocks within that range .
94+ The existing ` dag-scope ` parameter determines the overall range of blocks to retrieve,
95+ while ` skip-raw-blocks ` selectively filters specific blocks across all scopes and ranges .
7996Combining them under one parameter would restrict their combined utility.
8097
8198For example:
82- - A client is streaming a video from a webseed and the user seeked through the
99+ - A client is streaming a video from a webseed and the user seeks through the
83100 video, then the client would send ` dag-scope=entity&entity-bytes=42:1337 `
84- with ` skip-leaves =y ` to download the proofs for the required section of the
85- video.
86- - A client is verifying an OOB transfered directory in deserialized form,
87- then ` dag-scope=all ` with ` skip-leaves =y ` makes sense.
101+ with ` skip-raw-blocks =y ` to download the proofs for the required section of the
102+ video, and then fetches remaining raw data byte ranges from a faster CDN .
103+ - A client is verifying an OOB transferred directory in deserialized form,
104+ then ` dag-scope=all ` with ` skip-raw-blocks =y ` makes sense.
88105
89- ### Alternatives
106+ #### Why not CAR content type parameter ?
90107
91- An alternative approach would be to request blocks individually.
92- However it adds extra round trips and more per HTTP request overhead
93- and thus is undesireable.
108+ CAR content type's
109+ ([ application/vnd.ipld.car] ( https://www.iana.org/assignments/media-types/application/vnd.ipld.car ) )
110+ optional parameters like ` order ` and ` dups ` impact the way data is represented
111+ when returned as a CAR stream, but does modify the scope of the data itself.
112+ Does not add nor subtract data from the response.
113+
114+ The scope of the data is controlled by URL content path and optional
115+ ` dag-scope ` , ` entity-bytes ` URL parameters. This is where ` skip-raw-blocks `
116+ belongs.
117+
118+ This is not just a matter of aesthetics: the URL path and query parameters
119+ allow for caching of different subsets of a DAG in a way that is interoperable
120+ with existing HTTP tools and clients, minimizes risk of caching incomplete DAG
121+ response due to HTTP cache misconfiguration. Thanks to ` skip-raw-blocks ` being
122+ in the URL query, we ensure CAR responses without ` raw ` blocks will be cached
123+ under different key than full responses (just like already existing ` dag-scope `
124+ and ` entity-bytes ` ).
125+
126+ #### Why not generic ` skip-leaves ` that skips all leaves, not just ` raw ` blocks?
127+
128+ Prevention of amplification attacks and efficient server operation.
129+
130+ By utilizing the ` raw ` (` 0x55 ` ) codec servers can trivially determine whether
131+ to fetch or skip a block without having to fetch it to learn any new
132+ information.
133+
134+ If we framed this feature around skipping all leaf nodes, that would require
135+ server to fetch the leaves to learn if they have any child nodes. This would
136+ force server to fetch data that is never returned to the client.
137+
138+ Although ` skip-raw-blocks ` is more limited and not able to handle UnixFS files
139+ chunked without ` --raw-leaves ` option, it allows both the client and server to
140+ trivially verify a block must not be fetched. Preventing issues of
141+ Amplification where a server could need to fetch multiple orders more data than
142+ the client when executing the request.
94143
95144## Security
96145
97- None .
146+ This IPIP does not impact security model of trustless gateway .
98147
99148## Test fixtures
100149
101- TODO
150+ ::: issue
151+
152+ TODO: update below section with CIDs or CARs from conformance tests
153+
154+ Scenarios we should check:
155+ - [ ] reuse existing UnixFS DAG that has raw-leaves, request it with
156+ ` skip-raw-blocks=n ` , confirm the response includes expected raw leaves' CIDs
157+ - [ ] create a new CAR fixture that only have non-raw blocks. Request it with
158+ ` skip-raw-blocks=y ` , confirm the response includes expected CIDs and does not
159+ include raw blocks referenced by parents.
160+ - important part is creating CAR fixture by hand, and ensure the raw blocks are
161+ NEVER announced anywhere (generate fixture with random data, add to ipfs
162+ with raw-leaves option, then export DAG without ` raw ` blocks (use go-car's
163+ [ ` filter ` ] ( https://github.com/ipld/go-car/tree/master/cmd/car#readme ) or
164+ similar)
165+ - Why? This goes extra mile, but ensures every conformant gateway
166+ implementation is not doing useless work of fetching raw blocks which are
167+ not required for fulfilling ` skip-raw-blocks=y ` requests). We did
168+ similar thing for ` entity-bytes ` and it was the only way we could show
169+ bugs in Saturn project's cache implementation at the time.
170+
171+ :::
102172
103173### Copyright
104174
0 commit comments