-
Notifications
You must be signed in to change notification settings - Fork 74
Closed
Labels
bugSomething isn't workingSomething isn't workingfsfsspec implementationfsspec implementationpriority-p1
Description
While working on #222, I discovered that find has a bug with the cache.
Let assume self.path=root/tmp/ and a folder structure like:
root/tmp/
└── fo1/
├── file2.pdf
└── fo2/
├── file3.pdf
└── fo3/
└── file4.pdf
now let's do some tests:
f = fs.find('root/tmp/fo1/')
print(f)
> ['root/tmp/fo1/file2.pdf', 'root/tmp/fo1/fo2/file3.pdf', 'root/tmp/fo1/fo2/fo3/file4.pdf']
f = fs.find('root/tmp/fo1/fo2')
print(f)
> ['root/tmp/fo1/fo2/fo3/file4.pdf', 'root/tmp/fo1/fo2/file3.pdf']
and that is correct,
but if we do only find('root/tmp/fo1/fo2'):
f = fs.find('root/tmp/fo1/fo2')
print(f)
>[]
This happens because find relay on the cache, and at the start the cache is only populated with ids from one level down self.path
so in the last example, the content of the cache is:
{
'1IETDYYj23PgGaInZofa9MyANyBlOoiyh': 'tmp',
'1k6u2-FStB6rOlq6hmDXlRl2aLES1l6vp': 'tmp/fo1',
}
I think, because there is no tmp/fo1/fo2 (the starting path of find), query_ids stays empty and the method return an empty list.
The lines of code involved are:
Lines 469 to 483 in 27bbf4c
| def find(self, path, detail=False, **kwargs): | |
| bucket, base = self.split_path(path) | |
| seen_paths = set() | |
| dir_ids = [self._ids_cache["ids"].copy()] | |
| contents = [] | |
| while dir_ids: | |
| query_ids = { | |
| dir_id: dir_name | |
| for dir_id, dir_name in dir_ids.pop().items() | |
| if posixpath.commonpath([base, dir_name]) == base | |
| if dir_id not in seen_paths | |
| } | |
| if not query_ids: | |
| continue |
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingfsfsspec implementationfsspec implementationpriority-p1