Skip to content

No such file or directory error on try dvc pull #38

@maltempi

Description

@maltempi

Hello everyone,

Problem

I'm currently teaching an MLOps crash course, and some students are facing an issue with dvc-gdrive. They're unable to sync with gdrive at a certain point. I've been troubleshooting the problem and discovered that the directories that are being logged as 'no such file or directory' have a duplicate directory in gdrive. This is because gdrive allows multiple directories with the same name.

 dvc pull
Collecting                                                                                                                 |3.06k [00:05,  557entry/s]
ERROR: failed to transfer '00c597b3f6e01f32dbeb5dc41ba24c96' - [Errno 2] No such file or directory: '1APGY4tjfFhNr9lK4UdgGc81FgcNvCqLd/files/md5/00/c597b3f6e01f32dbeb5dc41ba24c96'
ERROR: failed to transfer '34fa1599305a5796ea5158cd2a29045b' - [Errno 2] No such file or directory: '1APGY4tjfFhNr9lK4UdgGc81FgcNvCqLd/files/md5/34/fa1599305a5796ea5158cd2a29045b'

Troubleshooting

Here is the Google Drive folder (public dir) : https://drive.google.com/drive/folders/1BQ8SfIRmsdZi4KAXtcyFZOKs9_6BL4BU .
And this is the ds.dvc file:

outs:
- md5: 99224f4a9ed7e9ac086c48831f0ef676.dir
  size: 68986649
  nfiles: 3060
  hash: md5
  path: ds

The files only exist in one of the folders. For example, 00c597b3f6e01f32dbeb5dc41ba24c96 (as logged in the log attached to this gh issue) exists in the first one only. In the Google Drive folder, you'll find the duplicates:

image
image

DVC Doctor

 dvc doctor
DVC version: 3.30.1 (pip)
-------------------------
Platform: Python 3.10.13 on Windows-10-10.0.22621-SP0
Subprojects:
        dvc_data = 2.22.0
        dvc_objects = 1.2.0
        dvc_render = 0.6.0
        dvc_task = 0.3.0
        scmrepo = 1.5.0
Supports:
        gdrive (pydrive2 = 1.16.0),
        http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2023.10.0, boto3 = 1.28.64)
Config:
        Global: C:\Users\maltempi\AppData\Local\iterative\dvc
        System: C:\ProgramData\iterative\dvc
Cache types: hardlink
Cache directory: NTFS on D:\
Caches: local
Remotes: gdrive
Workspace directory: NTFS on D:\
Repo: dvc, git
Repo.site_cache_dir: C:\ProgramData\iterative\dvc\Cache\repo\db21a55dd073a7d61d29785b6476808

Thank you very much in advance. Please, let me know if you need any additional info.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions