Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions ideas/2026.md
Original file line number Diff line number Diff line change
Expand Up @@ -300,3 +300,42 @@ Let's make more use of the Ryan(Tm) update bot!
- add updateScripts
- add tests to existing scripts that are failing
- create a metric to analyze existing script pass/fail/update-time

## Generative Nix: Surveying LLM Proficiency In NixOS

Effort: small (90 hours)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not trust any survey that took less than 100 hours to conduct.

For reference, see NixOS/nixpkgs#410741 (comment) for a possible "survey", although this one cannot be conducted externally.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Survey in the sense of "see what's out there", as one might survey a landscape to make a map--not survey as in "let's poll a bunch of people". Sorry for any miscommunication.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Survey in the sense of "see what's out there", as one might survey a landscape to make a map--not survey as in "let's poll a bunch of people".

IIUC, you want to benchmark and rank LLMs to determine the currently best one for Nix. With LLMs constantly being obsoleted by better ones, would it not be better to establish a benchmark suite for continously updating the ranking instead of providing a one-time ranking?

Delegating this effort to the Nix community sounds like a lot of effort, when IMHO LLMs should be the ones promoting and declaring their domain proficiencies.

Either way, take my input with a grain of salt because I am not really interested in using LLMs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deliverables for this project include exactly that, a reusable selection of benchmarks for that purpose.


LLMs have gone from toys to industry-standard tooling but their ability to program or debug Nix hasn't really been studied; some folks have good luck with them and some don't, and so it'd be helpful to know where support for Nix in these LLMs stands.

For this project, the mentee would:
- Select a handful of commercial and open-source state-of-the-art LLMs
- Select a handful of representative tasks using Nix, Nixpkgs, and NixOS
- Select performance criteria
- Conduct experiments to benchmark performance of each LLM against each task

Deliverables:
- Selection of benchmark tasks and criteria for LLM usage on Nix/NixOS/Nixpkgs
- Blog post describing results of commerical and open-source state-of-the-art LLMs on those tests

Skills required:
- Basic experience with LLMs
- Basic scripting knowledge
- Access to commercial LLMs and open-source LLMs
- Ability to design and conduct experiments
- Ability to write and communicate clearly

Skills suggested:
- Basic knowledge of Nix
- Basic knowledge of NixOS systems administration
- Basic knowledge of Nixpkgs

Possible Mentors:
- [@crertel](https://github.com/crertel)

Difficulty: medium

References:
- https://haskellforall.com/2026/02/my-experience-with-vibe-coding

Prior efforts:
- None, really.