Require all packages to solve / compile and include all valid compilers in their metadata #669

thomashoneyman · 2023-11-13T21:55:49Z

Fixes #577. Fixes #255. Fixes #696. Fixes #649. Fixes #527.

The core problem solved here is identifying what compilers are compatible with a specific package version, such as [email protected]. We need this to support an oft-requested feature for Pursuit: filtering search results by a compiler version (or range). It's also useful for other things; for one, it allows us to add the compiler version as a constraint to the solver to produce a build plan that works with the specific compiler given.

Metadata files now include a compilers key in published metadata that lists either a bare version (the version used to publish the package) or an array of versions (the full set of compilers known to work with the package). The reason for two representations is that computing the full set of compilers can take a long time; this approach lets us upload the package right away and compute the rest of the valid compilers in a fixup pass. A bare version means the full set has not been computed yet.

All packages must now be solvable. We can't compile a package version if we can't fetch its dependencies, so this becomes a requirement for all packages.

There are only 2 scenarios in which we need to compute the available compilers for a package version:

A new package version is published
A new compiler version is published

This PR is focused on the first case, and we should do a followup for the second case. (The second case is straightforward, and @colinwahl's compiler versions script already essentially implements it. It's been omitted from this PR for brevity).

A new package version can be published via the legacy importer or via a user submitting an API call, but the result is the same: eventually the publish pipeline is run. For that reason I've decided to compute the compiler versions for a package version as part of the publish pipeline where we're already determining resolutions and building with a specific compiler. That centralizes the logic to a single place.

Therefore this PR centers on two things: trying compilers to find all that work for a package version at the end of publishing, and updating the legacy importer to determine a valid compiler version and resolutions before calling publish.

I've added some tests and I've run the legacy importer locally; it's about 500 packages in so far and every failure appears to be correct. More comments in the PR.

Update 2026-01-07: This now also contains #709, which restricts concurrency on jobs and adds a job executor loop.

thomashoneyman

I've included a few review comments that describe how various parts of the code work. But I'm also happy to jump on a call in the PureScript chat to walk through this and answer any questions.

thomashoneyman · 2023-11-13T21:57:18Z

app/src/App/API.purs

-publish :: forall r. PackageSource -> PublishData -> Run (PublishEffects + r) Unit
-publish source payload = do


We no longer need the PackageSource type because we no longer have exemptions for "legacy" vs. "current" packages. All packages must solve and compile. We had exemptions before because we weren't sure what compiler version to use to publish legacy packages but we now manually verify one that works before we ever run publish.

thomashoneyman · 2023-11-13T21:58:09Z

app/src/App/API.purs

+      -- supports syntax back to 0.14.0. We'll still try to validate the package
+      -- but it may fail to parse.
      Operation.Validation.validatePursModules files >>= case _ of
+        Left formattedError | payload.compiler < unsafeFromRight (Version.parse "0.15.0") -> do


As in the comment above, this code will fail packages that have syntax that our version of language-cst-parser doesn't support. In our case that's 0.15.0+. I've therefore relaxed this requirement for packages before 0.15.0 or else they would be spuriously rejected.

We could potentially sub-in a regex check for "module where", even though it's fragile, for pre-0.15.0 packages.

thomashoneyman · 2023-11-13T22:00:36Z

app/src/App/API.purs

      Right versions -> pure versions

    case Map.lookup manifest.version published of
+      Nothing | payload.compiler < unsafeFromRight (Version.parse "0.14.7") -> do


purs publish will fail for packages prior to 0.14.7 because that's when we added support for the purs.json file format. Before that the compiler looks for specific Bowerfile fields that aren't present in the purs.json file. Since all these packages ought to already be published to Pursuit I think this is fine. We can't do anything about it anyway until #525.

thomashoneyman · 2023-11-13T22:01:19Z

app/src/App/API.purs

-  case compilationResult of
-    Left error
-      -- We allow legacy packages to fail compilation because we do not
-      -- necessarily know what compiler to use with them.
-      | source == LegacyPackage -> do
-          Log.debug error
-          Log.warn "Failed to compile, but continuing because this package is a legacy package."
-      | otherwise ->
-          Except.throw error
-    Right _ ->
-      pure unit


This code is no longer needed because packages must compile.

app/src/App/API.purs

thomashoneyman · 2023-11-13T22:05:29Z

app/src/App/Effect/Cache.purs

          case inFs of
            Nothing -> pure $ reply Nothing
            Just entry -> do
-              Log.debug $ "Fell back to on-disk entry for " <> memory


These are just so noisy. Maybe we can introduce a Log.superDebug.

Is this log useful at all? I think it's ok to just remove it

Yea, I think they're not really useful now that we're confident the cache works correctly. I had them in there from when I first developed it and would either sometimes see things I thought should be cached not get cached, or I wanted to make sure something I removed from the cache really was.

thomashoneyman · 2023-11-13T22:07:12Z

scripts/src/LegacyImporter.purs

+    publishLegacyPackage :: Manifest -> Run _ Unit
+    publishLegacyPackage (Manifest manifest) = do


This is where we solve, compile, and then publish each package in turn. Publish failures are saved to cache and, at the end of the process, written to a publish-failures.json file that records every version that failed and its reason so we can hand-review it. I've run this on a few hundred packages and it's looking correct.

thomashoneyman · 2023-11-13T22:10:24Z

An example from the publish-failures.json file:

{
  "string-parsers": {
    "3.0.1": {
      "reason": "No versions found in the registry for lists in range\n  >=4.0.0 (declared dependency)\n  <5.0.0 (declared dependency)",
      "tag": "SolveFailed"
    },
    "3.1.0": {
      "reason": "No versions found in the registry for lists in range\n  >=4.0.0 (declared dependency)\n  <5.0.0 (declared dependency)",
      "tag": "SolveFailed"
    }
  },
}

thomashoneyman · 2023-11-14T14:36:31Z

I've uncovered two rare but significant issues affecting the legacy importer (unrelated to this PR).

Topological sorting
First: in some cases our typical approach of topologically sorting manifests by their dependencies (ignoring their explicit bounds) will fail to read a valid index from disk. This happens when a package like functors at one time depends on another (like contravariant), but at other times the dependency is flipped. Since we don't take specific version ranges into consideration, we're at the mercy of other conflicting sorting orders in the map as to whether these end up going in the right order or not. Here's the output of a failed run reading an index from disk:

success: 'Inserted [email protected]'
success: 'Inserted [email protected]'
success: 'Inserted [email protected]'
success: 'Inserted [email protected]'
fail: 'Failed to insert [email protected]: \n  - contravariant>=5.0.0 <6.0.0'
success: 'Inserted [email protected]'
success: 'Inserted [email protected]'
success: 'Inserted [email protected]'
success: 'Inserted [email protected]'
success: 'Inserted [email protected]'
success: 'Inserted [email protected]'
success: 'Inserted [email protected]'
success: 'Inserted [email protected]'
success: 'Inserted [email protected]'
success: 'Inserted [email protected]'
success: 'Inserted [email protected]'

This run fails to produce a valid index because [email protected] is unsatisfied in its dependency on contravariant, but of course we see contravariant get inserted a little later on.

The solution is to always consider version bounds when reading an index from disk where we expect bounds to be at least reasonably correct. I've implemented and tested that and situations like this no longer happen.

The reason we ignored ranges at first is because we had far fewer checks around correct bounds and because in the package sets we want to explicitly ignore ranges when working with an index (when doing, for example, the 'self-contained' check). I've preserved that behavior — you can always opt-in to either considering or ignoring ranges when working with an index.

Incorrect dependencies detected in legacy manifests derived from package sets

Second, in some cases a package like [email protected] will have its dependency list pruned to only those listed in its package sets entry, and those turn out to be overly-restrictive. In this case, specifically, the dependency on purescript-integers listed in its Bowerfile is removed because there is no such dependency in its package sets list; in the package sets that dependency ended up being picked up transitively, but when we go to solve and compile the package the solution picks a different transitive dependency and the package fails.

This shouldn't happen because [email protected] has a Bowerfile that explicitly lists a dependency on integers, which we trimmed out by deferring to the package sets entry. We did this assuming package sets entries are correct and because we didn't want overly-constrained dependency lists.

The second concern is no longer valid because with #667 we will remove unused dependencies. The first concern is no longer reasonable because we have at least one example of package sets dependency lists being incorrect.

The solution is simple: instead of preferring package sets entries over other manifests, just union them all and defer to the 'unused dependencies' pruning in the publishing pipeline to trim out ones that aren't actually needed.

…most manifest index ops

thomashoneyman · 2023-11-14T17:28:57Z

app/src/App/Effect/PackageSets.purs

    -- collected manifests. If this fails then the package set is not
    -- self-contained.
-    Tuple unsatisfied _ = ManifestIndex.maximalIndex (Set.fromFoldable success)
+    Tuple unsatisfied _ = ManifestIndex.maximalIndex ManifestIndex.IgnoreRanges (Set.fromFoldable success)


We always ignore ranges in package sets, but we should rely on them otherwise, especially now that we're actually solving packages as part of publishing and can be more trusting that they aren't bogus.

thomashoneyman · 2023-11-14T17:32:24Z

scripts/src/LegacyImporter.purs

+  let metadataPackage = unsafeFromRight (PackageName.parse "metadata")
+  Registry.readMetadata metadataPackage >>= case _ of
+    Nothing -> do
+      Log.info "Writing empty metadata file for the 'metadata' package"
+      let location = GitHub { owner: "purescript", repo: "purescript-metadata", subdir: Nothing }
+      let entry = Metadata { location, owners: Nothing, published: Map.empty, unpublished: Map.empty }
+      Registry.writeMetadata metadataPackage entry
+    Just _ -> pure unit


We agreed to not reserve package names pre-0.13.0, so this reserves only metadata for the "metadata" package used by legacy package sets.

scripts/src/LegacyImporter.purs

thomashoneyman · 2023-11-14T21:14:12Z

Can confirm that the fix is working with regards to generating manifests with full dependency lists to be pruned later — this is [email protected], for example:

{
  "name": "strings",
  "version": "3.5.0",
  "license": "MIT",
  "description": "String and char utility functions, regular expressions.",
  "location": {
    "githubOwner": "purescript",
    "githubRepo": "purescript-strings"
  },
  "dependencies": {
    "arrays": ">=4.0.1 <5.0.0",
    "either": ">=3.0.0 <4.0.0",
    "gen": ">=1.1.0 <2.0.0",
    "integers": ">=3.2.0 <4.0.0",
    "maybe": ">=3.0.0 <4.0.0",
    "partial": ">=1.2.0 <2.0.0",
    "unfoldable": ">=3.0.0 <4.0.0"
  }
}

...and the manifest index sorting is working as far as I can tell as well.

thomashoneyman · 2023-11-15T00:14:20Z

We also need to support spago.yaml files in the legacy importer, as some packages now use that format. Otherwise they will be excluded with a NoManifests error.

…ible compilers from deps

thomashoneyman · 2023-11-16T23:09:46Z

Here's another fun one: some packages, like [email protected], list dependencies which are entirely unused, in this case arrays. We then prune this dependency. However, then the package fails because it brings in other dependencies via that transitive dependency, such as either.

That means we can't get away with simply removing unused dependencies because we may also remove direct imports that were pulled in transitively by the unused dependency. Either we give up on removing unused dependencies, or we both remove unused dependencies and insert direct imports that weren't mentioned.

For dependencies we insert into a manifest we have their exact version via the solver; we could potentially do a bumpHighest on them to produce a range the same way we do when working with spago.dhall files.

I think that's preferable to giving up on the unused dependencies check but I'm curious if you disagree @f-f or @colinwahl.

thomashoneyman · 2023-11-17T20:31:12Z

As of the latest commit: we no longer simply remove unused dependencies. Instead, we loop. We remove unused dependencies, then bring in any transitive dependencies they would have brought in, and then check the new dependency list for unused dependencies and so on.

The result is that we remove unused dependencies while preserving any transitive dependencies they brought it which are used in the source code. Note that we don't go through and add all packages your code directly imports, we just do this for dependencies that are being removed.

f-f · 2025-06-07T14:44:11Z

@thomashoneyman I would like to merge this so we can base further work on the main branch - do you recall any good reasions for not merging and keeping it separate until we are ready with the concurrent-jobs branch?

thomashoneyman · 2025-06-07T15:18:09Z

Unfortunately it’s been long enough I can’t quite remember. Some possible issues come to mind:

The original plan was to merge and re-upload all packages (using my local machine instead of the server, by running the legacy importer in bulk mode). This will cause breakage by dropping a number of packages that are outside the cutoff.
New packages that don’t satisfy the requirements in this change will start to be rejected
There will be more commits happening now as compiler jobs run, which makes it more likely we run afoul of the git concurrency issues that pushed us towards the registry server in the first place
There might be breaking changes in the registry library that affect Spago? I’m not totally sure. There’s the new compilers field.

Otherwise I think it’s fine, and these may all be minor or nonissues. Is there a disadvantage to working against this branch vs merging to main?

thomashoneyman · 2025-06-07T19:01:45Z

I should emphasize too that if this merges we will need to do the reupload right away, because in compilation the solver will consider package dependencies with the compiler version included; if no packages have any compiler versions then new uploads are going to fail

f-f · 2025-06-08T13:13:58Z

Right, thanks for the clarifications @thomashoneyman!

So the plan is:

I am running the legacy importer locally to build a cache
I will now have a look at the Spago codebase to see if we're going to need any updates to deal with this
once the dry-run for the importer is done (in 24h or so) I will have a cache, so I'll reupload the registry and merge this branch. I'll need to think of what I need to disable to prevent uploads while I do the regen.

thomashoneyman · 2025-06-08T13:50:32Z

You would need to disable the GitHub actions on the registry repository and sleep/kill the server while you reupload

thomashoneyman · 2026-01-05T16:41:28Z

I re-ran the legacy importer in preparation for a reupload, only to find roughly 10% of registry versions are no longer available at their original source! Some of the packages have a cascading effect and will cause even more than this number of versions to be dropped from the registry because they are unreachable. Mostly these are the entire set of aws- packages, but also others like bytestrings.

We still have the tarballs of the package sources, however, so in the latest commits I've added an 'archive seeder' script which will identify packages which are registered today but produce a 404 if you fetch the original sources, then download the tarballs to a Git repository under the purescript organization. Then, the legacy importer is updated such that if it finds a 404 package it checks this archive. The archive is here: https://github.com/purescript/registry-archive

These are the results of the archive seeder run:

Packages checked: 1853
Versions checked: 11408

Packages needing archive (GitHub 404): 201
Versions needing archive: 982

Tarballs written: 982
Tarballs skipped (already exist): 0
Tarballs missing (not in S3): 0

Transient errors (re-run to retry): 0

If it's a while before we can do the reupload then we can run this script again to pick up any new 404 errors.

* Update database schemas and add job executor loop * Split Server module into Env, Router, JobExecutor, and Main * Fix up build * Run job executor * Fix integration tests * WIP matrix builds * add missing version to publish fixtures the publishCodec requires a version file but the test fixtures weren't updated to include it * Add missing packageName and packageVersion to InsertMatrixJob The JS insertMatrixJobImpl expects columns [jobId, packageName, packageVersion, compilerVersion, payload] but the PureScript types were missing packageName and packageVersion * Fix finishedAt timestamp to capture time after job execution * Implement matrix jobs, and the recursive enqueuing of new ones * Reset incomplete jobs so they can be picked up again * Run matrix jobs for the whole registry when finding a new compiler version * resolve build issues * fix smoke test * Split package jobs into separate tables, return all data from the job endpoint * implement thin client for github issues replaces the old GitHubIssue which ran registry jobs directly with one that hits the registry api instead. also added integration tests that ensure various jobs can be kicked off as github issue events and we get the resulting comments, issue close events, etc. * clean up test failures * reinstate missing comments * Remove COMMENT effect, add NOTIFY log * Implement endpoint for returning jobs * Check for existing jobs before enqueueing new ones * Add E2E test: publishing a package enqueues matrix jobs * Add E2E test: run a whole-registry upgrade when detecting a new compiler * Don't fail job fetch on unreadable logs * Fix archive seeder build * remove effect-4.0.0 from storage in unit tests * avoid race condition in initial jobs test The "can list jobs" test was asserting that initial matrix jobs have success: true, but the job executor runs asynchronously and jobs may not have completed by the time the test queries the API. Fixed by normalizing the 'success' field to a constant before comparison. * format * second test * Refactor e2e tests with wiremock scenarios (#713) * refactor e2e tests with wiremock scenarios also adds a number of new e2e tests for various scenarios * format, etc. * move out fixtures * relax cache deletion * strengthen assertions, fix discovered bugs * drop ref, move to manifest (#714) * review feedback * more feedback * trim tests down a bit to optimize speed to ~60s * Add endpoint for package set jobs + e2e tests for it * tweak unpublish test to verify matrix jobs fail gracefully * tweak agents to refer to scratch logs * remove slow archive seeder test * fix tests by bumping compiler --------- Co-authored-by: Thomas Honeyman <[email protected]> Co-authored-by: Fyodor Soikin <[email protected]> Co-authored-by: pacchettibotti <[email protected]> Co-authored-by: Thomas Honeyman <[email protected]>

thomashoneyman added 8 commits November 11, 2023 13:18

Add 'compilers' field to metadata

0b6dd4e

Add utilities for building with many compilers

e15e4a8

Remove PackageSource and require all packages to solve/compile

d8e7e41

Determine all compilers for package in publish pipeline

8e069b6

Initial cut at discovering compiler in legacy import

5348ee2

Always look up metadata / manifests in each publishing step

630c0bf

Testing the pipeline...

77d6e68

Better reporting of failures

8749bea

thomashoneyman commented Nov 13, 2023

View reviewed changes

thomashoneyman requested review from colinwahl and f-f November 13, 2023 22:08

Update union of package set / spago / bower deps, consider ranges in …

be93d18

…most manifest index ops

thomashoneyman commented Nov 14, 2023

View reviewed changes

scripts/src/LegacyImporter.purs Outdated Show resolved Hide resolved

Include spago.yaml files in legacy import

5a15433

thomashoneyman added 3 commits November 15, 2023 17:16

Retain compilation in cache

559275c

Consider compilers when solving

09d515a

Rely on solver per-compiler instead of looking at metadata for compat…

98ef892

…ible compilers from deps

Adjust unused dependency pruning to replace used transitive deps

ae621da

thomashoneyman added 4 commits November 17, 2023 15:35

Remove unused functions

5c54103

wip

441b960

Use cache when finding first suitable compiler

3495edb

WIP: Include missing direct imports

7ceab4c

thomashoneyman added the blocked label Aug 31, 2024

Merge branch 'master' into trh/compilers-in-metadata

7d74da3

f-f mentioned this pull request Jan 29, 2025

[RFE] Making a new release purescript/spago#1306

Closed

f-f added 3 commits June 7, 2025 14:11

Update SPEC.md

a3f086b

Add 'purescript' to the list of reserved packages

de7c6e3

Move to NonEmpty in dhall types

8c8d728

f-f added 2 commits June 8, 2025 16:48

compilers is a NonEmptyArray

7b57771

Merge branch 'master' into trh/compilers-in-metadata

e3d484b

f-f and others added 3 commits June 8, 2025 15:12

Fix tests

343b90b

Fix tests

67c7cb5

Merge branch 'master' into trh/compilers-in-metadata

88dcffe

f-f mentioned this pull request Dec 3, 2025

Proposal: store docs.json in the registry #704

Open

thomashoneyman added 2 commits December 6, 2025 06:59

Merge branch 'master' into trh/compilers-in-metadata

dec066b

fix e2e tests

e863d6b

f-f mentioned this pull request Dec 11, 2025

Job queue, matrix builder, concurrency control #709

Merged

thomashoneyman added 4 commits December 17, 2025 08:01

Merge branch 'master' into trh/compilers-in-metadata

f27cd33

update flake to latest

beb8d93

add an agents file

a55bef6

add archive seeder script

5ab364a

thomashoneyman force-pushed the trh/compilers-in-metadata branch from 930ed20 to 5ab364a Compare January 5, 2026 16:35

thomashoneyman and others added 4 commits January 6, 2026 18:03

free 404 packages

5cc6627

add ARCHIVE effect to fix getting archived pkgs

902bea1

add missing env vars

8c8c792

		publish :: forall r. PackageSource -> PublishData -> Run (PublishEffects + r) Unit
		publish source payload = do

		publishLegacyPackage :: Manifest -> Run _ Unit
		publishLegacyPackage (Manifest manifest) = do

Require all packages to solve / compile and include all valid compilers in their metadata #669

Are you sure you want to change the base?

Require all packages to solve / compile and include all valid compilers in their metadata #669

Uh oh!

Conversation

thomashoneyman commented Nov 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomashoneyman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thomashoneyman commented Nov 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomashoneyman commented Nov 14, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

thomashoneyman commented Nov 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomashoneyman commented Nov 15, 2023

Uh oh!

thomashoneyman commented Nov 16, 2023

Uh oh!

thomashoneyman commented Nov 17, 2023

Uh oh!

f-f commented Jun 7, 2025

Uh oh!

thomashoneyman commented Jun 7, 2025

Uh oh!

thomashoneyman commented Jun 7, 2025

Uh oh!

f-f commented Jun 8, 2025

Uh oh!

thomashoneyman commented Jun 8, 2025

Uh oh!

thomashoneyman commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

thomashoneyman commented Nov 13, 2023 •

edited

Loading

thomashoneyman commented Nov 13, 2023 •

edited

Loading

thomashoneyman commented Nov 14, 2023 •

edited

Loading