Hello everyone,
Problem
I'm currently teaching an MLOps crash course, and some students are facing an issue with dvc-gdrive. They're unable to sync with gdrive at a certain point. I've been troubleshooting the problem and discovered that the directories that are being logged as 'no such file or directory' have a duplicate directory in gdrive. This is because gdrive allows multiple directories with the same name.
dvc pull
Collecting |3.06k [00:05, 557entry/s]
ERROR: failed to transfer '00c597b3f6e01f32dbeb5dc41ba24c96' - [Errno 2] No such file or directory: '1APGY4tjfFhNr9lK4UdgGc81FgcNvCqLd/files/md5/00/c597b3f6e01f32dbeb5dc41ba24c96'
ERROR: failed to transfer '34fa1599305a5796ea5158cd2a29045b' - [Errno 2] No such file or directory: '1APGY4tjfFhNr9lK4UdgGc81FgcNvCqLd/files/md5/34/fa1599305a5796ea5158cd2a29045b'
Troubleshooting
Here is the Google Drive folder (public dir) : https://drive.google.com/drive/folders/1BQ8SfIRmsdZi4KAXtcyFZOKs9_6BL4BU .
And this is the ds.dvc file:
outs:
- md5: 99224f4a9ed7e9ac086c48831f0ef676.dir
size: 68986649
nfiles: 3060
hash: md5
path: ds
The files only exist in one of the folders. For example, 00c597b3f6e01f32dbeb5dc41ba24c96 (as logged in the log attached to this gh issue) exists in the first one only. In the Google Drive folder, you'll find the duplicates:


DVC Doctor
dvc doctor
DVC version: 3.30.1 (pip)
-------------------------
Platform: Python 3.10.13 on Windows-10-10.0.22621-SP0
Subprojects:
dvc_data = 2.22.0
dvc_objects = 1.2.0
dvc_render = 0.6.0
dvc_task = 0.3.0
scmrepo = 1.5.0
Supports:
gdrive (pydrive2 = 1.16.0),
http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
s3 (s3fs = 2023.10.0, boto3 = 1.28.64)
Config:
Global: C:\Users\maltempi\AppData\Local\iterative\dvc
System: C:\ProgramData\iterative\dvc
Cache types: hardlink
Cache directory: NTFS on D:\
Caches: local
Remotes: gdrive
Workspace directory: NTFS on D:\
Repo: dvc, git
Repo.site_cache_dir: C:\ProgramData\iterative\dvc\Cache\repo\db21a55dd073a7d61d29785b6476808
Thank you very much in advance. Please, let me know if you need any additional info.
Hello everyone,
Problem
I'm currently teaching an MLOps crash course, and some students are facing an issue with dvc-gdrive. They're unable to sync with gdrive at a certain point. I've been troubleshooting the problem and discovered that the directories that are being logged as 'no such file or directory' have a duplicate directory in gdrive. This is because gdrive allows multiple directories with the same name.
Troubleshooting
Here is the Google Drive folder (public dir) : https://drive.google.com/drive/folders/1BQ8SfIRmsdZi4KAXtcyFZOKs9_6BL4BU .
And this is the ds.dvc file:
The files only exist in one of the folders. For example,
00c597b3f6e01f32dbeb5dc41ba24c96(as logged in the log attached to this gh issue) exists in the first one only. In the Google Drive folder, you'll find the duplicates:DVC Doctor
Thank you very much in advance. Please, let me know if you need any additional info.