Skip to content

Raspberry Pi 3 NFS speed problems #758

@rvagg

Description

@rvagg

This seems to have come up since the outage last week (#749). The summary is that the power took out all of the infra, the UPS that they are all on didn't last long at all (it's more overloaded than it should be, I'm reorganising things over the coming weeks to shift the load to be more reasonable), so it was a hard-shutdown for everything.

On startup, I took the opportunity to do a cleanup and update; this also involved cleaning out the Jenkins workspaces on all of the Pi's that are mounted via NFS to a single host on the same network. Unfortunately this put a big load on the network when they all got back up again because they all end up doing git clones at the same time and it would be more efficient to just let them do this on their own SD cards in this initial run. I'm going to have to be more judicious in future in my "cleanup" because of this cost.

But now we should be roughly back to normal, with workspaces having active versions of the most frequently used repos/branches. The Raspberry Pi 2's seem to be getting through their jobs pretty quickly. The 1 B+'s also seem to be doing things at their expected speed—not super fast but as fast as they historically have. But the Pi 3's seem to get stalled on git checkout or git clone activities. It may be because we're still playing catch-up but the amount of time it takes is unreasonable for the configuration and doesn't explain why the 2's have been so fast to catch up but the 3's haven't.

I'm going to be reorganising the network soon, I have some new gear coming for my network that should speed up the ARM cluster as well, it could be about the particular switch the 3's are on (the 1's and 2's are on separate switches too fwiw) or maybe the router that ties them all together is dodgy (wouldn't surprise me, it was cheap, it's going to be replaced soon).

So, this issue can be used to track ongoing problems that anyone experiences (ping @nodejs/testing), and I'll also use it to document changes that I'm making on my end that may improve the situation. We'll close it when we think it's satisfactorily resolved.

FYI I have concerns about using NFS in general, it's such an ancient protocol and I've never seen it perform well in any situation. I just don't know of a better option here. I could take them off NFS entirely and let them use their SD cards—NFS hasn't saved us a ton of time but it does also give us disk space that we don't get on the SD cards, particularly on the 1's which only have 8G. I also have persistent problems with mounting NFS on startup, I generally have to do it manually after the machines come back up. Perhaps its time to try sshfs or cifs or some other, I'm open to suggestions here!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions