-
|
I wanted to know which file order DwarFS choose. As I really thought this output was sorted alphabetically, and that inodes number were the actual ordering. This is simply the silesia corpus, and no option were given (all defaults). |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
There actually isn't a way to do this. See below for details.
That output is asciibetically ordered, but the inode numbers don't have an extra meaning attached to them. They are guaranteed to be consistent across multiple runs when using the same options, but that's about it.
That doesn't really come as a surprise. Nilsimsa works fine on average, but it is by no means perfect.
True, that output isn't really documented at the moment. The two numbers in square brackets are the begin and end indices of the chunk range for the inode (see the metadata format description). But chunks can reference block data in any order, so that won't help much either. The reason there's no output of an overall file order is that there isn't really one well-defined "overall file order". Nilsimsa ordering is done per-stream (i.e. per category). Each file can be decomposed into multiple fragments by the categorizers, each of which can end up in a different stream, and each fragment can be ordered differently within its stream. So there isn't really "one" order in this case. Also, since multiple segmenter streams are later merged into a single stream of file system blocks, this would potentially require the previously determined fragment order(s) to be updated somehow. Finally, this still leaves the problem that in each stream, a fragment A that is nilsimsa-ordered after fragment B can refer to data in fragment C which itself is nilsimsa-ordered before A. That being said, when It would certainly be possible to output per-stream fragment order. If you only have a single stream (i.e. no categorizers enabled), you can already get that indirectly by enabling |
Beta Was this translation helpful? Give feedback.
There actually isn't a way to do this. See below for details.
That output is asciibetically ordered, but the inode numbers don't have an extra meaning attached to them. They are guaranteed to be consistent across multiple runs when using the same options, but that's about it.
That doesn't really come as a surprise. Nilsimsa works fine on average, but it is by no means perfect.