Require all packages to solve / compile and include all valid compilers in their metadata #669

thomashoneyman · 2023-11-13T21:55:49Z

Fixes #577. Fixes #255.

The core problem solved here is identifying what compilers are compatible with a specific package version, such as aff@7.0.0. We need this to support an oft-requested feature for Pursuit: filtering search results by a compiler version (or range). It's also useful for other things; for one, it allows us to add the compiler version as a constraint to the solver to produce a build plan that works with the specific compiler given.

Metadata files now include a compilers key in published metadata that lists either a bare version (the version used to publish the package) or an array of versions (the full set of compilers known to work with the package). The reason for two representations is that computing the full set of compilers can take a long time; this approach lets us upload the package right away and compute the rest of the valid compilers in a fixup pass. A bare version means the full set has not been computed yet.

All packages must now be solvable. We can't compile a package version if we can't fetch its dependencies, so this becomes a requirement for all packages.

There are only 2 scenarios in which we need to compute the available compilers for a package version:

A new package version is published
A new compiler version is published

This PR is focused on the first case, and we should do a followup for the second case. (The second case is straightforward, and @colinwahl's compiler versions script already essentially implements it. It's been omitted from this PR for brevity).

A new package version can be published via the legacy importer or via a user submitting an API call, but the result is the same: eventually the publish pipeline is run. For that reason I've decided to compute the compiler versions for a package version as part of the publish pipeline where we're already determining resolutions and building with a specific compiler. That centralizes the logic to a single place.

Therefore this PR centers on two things: trying compilers to find all that work for a package version at the end of publishing, and updating the legacy importer to determine a valid compiler version and resolutions before calling publish.

I've added some tests and I've run the legacy importer locally; it's about 500 packages in so far and every failure appears to be correct. More comments in the PR.

thomashoneyman

I've included a few review comments that describe how various parts of the code work. But I'm also happy to jump on a call in the PureScript chat to walk through this and answer any questions.

thomashoneyman · 2023-11-13T21:57:18Z

app/src/App/API.purs

-publish :: forall r. PackageSource -> PublishData -> Run (PublishEffects + r) Unit
-publish source payload = do


We no longer need the PackageSource type because we no longer have exemptions for "legacy" vs. "current" packages. All packages must solve and compile. We had exemptions before because we weren't sure what compiler version to use to publish legacy packages but we now manually verify one that works before we ever run publish.

thomashoneyman · 2023-11-13T21:58:09Z

app/src/App/API.purs

      Operation.Validation.validatePursModules files >>= case _ of
+        Left formattedError | payload.compiler < unsafeFromRight (Version.parse "0.15.0") -> do


As in the comment above, this code will fail packages that have syntax that our version of language-cst-parser doesn't support. In our case that's 0.15.0+. I've therefore relaxed this requirement for packages before 0.15.0 or else they would be spuriously rejected.

We could potentially sub-in a regex check for "module where", even though it's fragile, for pre-0.15.0 packages.

thomashoneyman · 2023-11-13T22:00:36Z

app/src/App/API.purs

@@ -504,20 +515,30 @@ publish source payload = do
      Right versions -> pure versions

    case Map.lookup manifest.version published of
+      Nothing | payload.compiler < unsafeFromRight (Version.parse "0.14.7") -> do


purs publish will fail for packages prior to 0.14.7 because that's when we added support for the purs.json file format. Before that the compiler looks for specific Bowerfile fields that aren't present in the purs.json file. Since all these packages ought to already be published to Pursuit I think this is fine. We can't do anything about it anyway until #525.

thomashoneyman · 2023-11-13T22:01:19Z

app/src/App/API.purs

-  case compilationResult of
-    Left error
-      -- We allow legacy packages to fail compilation because we do not
-      -- necessarily know what compiler to use with them.
-      | source == LegacyPackage -> do
-          Log.debug error
-          Log.warn "Failed to compile, but continuing because this package is a legacy package."
-      | otherwise ->
-          Except.throw error
-    Right _ ->
-      pure unit


This code is no longer needed because packages must compile.

app/src/App/API.purs

thomashoneyman · 2023-11-13T22:05:29Z

app/src/App/Effect/Cache.purs

@@ -168,7 +168,6 @@ handleMemoryFs env = case _ of
          case inFs of
            Nothing -> pure $ reply Nothing
            Just entry -> do
-              Log.debug $ "Fell back to on-disk entry for " <> memory


These are just so noisy. Maybe we can introduce a Log.superDebug.

Is this log useful at all? I think it's ok to just remove it

Yea, I think they're not really useful now that we're confident the cache works correctly. I had them in there from when I first developed it and would either sometimes see things I thought should be cached not get cached, or I wanted to make sure something I removed from the cache really was.

thomashoneyman · 2023-11-13T22:07:12Z

scripts/src/LegacyImporter.purs

+    publishLegacyPackage :: Manifest -> Run _ Unit
+    publishLegacyPackage (Manifest manifest) = do


This is where we solve, compile, and then publish each package in turn. Publish failures are saved to cache and, at the end of the process, written to a publish-failures.json file that records every version that failed and its reason so we can hand-review it. I've run this on a few hundred packages and it's looking correct.

thomashoneyman · 2023-11-13T22:10:24Z

An example from the publish-failures.json file:

{
  "string-parsers": {
    "3.0.1": {
      "reason": "No versions found in the registry for lists in range\n  >=4.0.0 (declared dependency)\n  <5.0.0 (declared dependency)",
      "tag": "SolveFailed"
    },
    "3.1.0": {
      "reason": "No versions found in the registry for lists in range\n  >=4.0.0 (declared dependency)\n  <5.0.0 (declared dependency)",
      "tag": "SolveFailed"
    }
  },
}

thomashoneyman · 2023-11-14T14:36:31Z

I've uncovered two rare but significant issues affecting the legacy importer (unrelated to this PR).

Topological sorting
First: in some cases our typical approach of topologically sorting manifests by their dependencies (ignoring their explicit bounds) will fail to read a valid index from disk. This happens when a package like functors at one time depends on another (like contravariant), but at other times the dependency is flipped. Since we don't take specific version ranges into consideration, we're at the mercy of other conflicting sorting orders in the map as to whether these end up going in the right order or not. Here's the output of a failed run reading an index from disk:

success: 'Inserted distributive@5.0.0'
success: 'Inserted exists@5.1.0'
success: 'Inserted exists@5.0.0'
success: 'Inserted profunctor@5.0.0'
fail: 'Failed to insert functors@4.1.1: \n  - contravariant>=5.0.0 <6.0.0'
success: 'Inserted functors@3.1.1'
success: 'Inserted functors@3.1.0'
success: 'Inserted functors@3.0.1'
success: 'Inserted functors@3.0.0'
success: 'Inserted foldable-traversable@5.0.0'
success: 'Inserted either@4.1.0'
success: 'Inserted foldable-traversable@5.0.1'
success: 'Inserted either@4.1.1'
success: 'Inserted contravariant@4.0.1'
success: 'Inserted const@4.0.0'
success: 'Inserted contravariant@5.0.0'

This run fails to produce a valid index because functors@4.1.1 is unsatisfied in its dependency on contravariant, but of course we see contravariant get inserted a little later on.

The solution is to always consider version bounds when reading an index from disk where we expect bounds to be at least reasonably correct. I've implemented and tested that and situations like this no longer happen.

The reason we ignored ranges at first is because we had far fewer checks around correct bounds and because in the package sets we want to explicitly ignore ranges when working with an index (when doing, for example, the 'self-contained' check). I've preserved that behavior — you can always opt-in to either considering or ignoring ranges when working with an index.

Incorrect dependencies detected in legacy manifests derived from package sets

Second, in some cases a package like strings@3.5.0 will have its dependency list pruned to only those listed in its package sets entry, and those turn out to be overly-restrictive. In this case, specifically, the dependency on purescript-integers listed in its Bowerfile is removed because there is no such dependency in its package sets list; in the package sets that dependency ended up being picked up transitively, but when we go to solve and compile the package the solution picks a different transitive dependency and the package fails.

This shouldn't happen because strings@3.5.0 has a Bowerfile that explicitly lists a dependency on integers, which we trimmed out by deferring to the package sets entry. We did this assuming package sets entries are correct and because we didn't want overly-constrained dependency lists.

The second concern is no longer valid because with #667 we will remove unused dependencies. The first concern is no longer reasonable because we have at least one example of package sets dependency lists being incorrect.

The solution is simple: instead of preferring package sets entries over other manifests, just union them all and defer to the 'unused dependencies' pruning in the publishing pipeline to trim out ones that aren't actually needed.

…most manifest index ops

thomashoneyman · 2023-11-14T17:28:57Z

app/src/App/Effect/PackageSets.purs

@@ -428,7 +428,7 @@ validatePackageSet (PackageSet set) = do
    -- We can now attempt to produce a self-contained manifest index from the
    -- collected manifests. If this fails then the package set is not
    -- self-contained.
-    Tuple unsatisfied _ = ManifestIndex.maximalIndex (Set.fromFoldable success)
+    Tuple unsatisfied _ = ManifestIndex.maximalIndex ManifestIndex.IgnoreRanges (Set.fromFoldable success)


We always ignore ranges in package sets, but we should rely on them otherwise, especially now that we're actually solving packages as part of publishing and can be more trusting that they aren't bogus.

thomashoneyman · 2023-11-14T17:32:24Z

scripts/src/LegacyImporter.purs

+  let metadataPackage = unsafeFromRight (PackageName.parse "metadata")
+  Registry.readMetadata metadataPackage >>= case _ of
+    Nothing -> do
+      Log.info "Writing empty metadata file for the 'metadata' package"
+      let location = GitHub { owner: "purescript", repo: "purescript-metadata", subdir: Nothing }
+      let entry = Metadata { location, owners: Nothing, published: Map.empty, unpublished: Map.empty }
+      Registry.writeMetadata metadataPackage entry
+    Just _ -> pure unit


We agreed to not reserve package names pre-0.13.0, so this reserves only metadata for the "metadata" package used by legacy package sets.

scripts/src/LegacyImporter.purs

thomashoneyman · 2023-11-14T21:14:12Z

Can confirm that the fix is working with regards to generating manifests with full dependency lists to be pruned later — this is strings@3.5.0, for example:

{
  "name": "strings",
  "version": "3.5.0",
  "license": "MIT",
  "description": "String and char utility functions, regular expressions.",
  "location": {
    "githubOwner": "purescript",
    "githubRepo": "purescript-strings"
  },
  "dependencies": {
    "arrays": ">=4.0.1 <5.0.0",
    "either": ">=3.0.0 <4.0.0",
    "gen": ">=1.1.0 <2.0.0",
    "integers": ">=3.2.0 <4.0.0",
    "maybe": ">=3.0.0 <4.0.0",
    "partial": ">=1.2.0 <2.0.0",
    "unfoldable": ">=3.0.0 <4.0.0"
  }
}

...and the manifest index sorting is working as far as I can tell as well.

thomashoneyman · 2023-11-15T00:14:20Z

We also need to support spago.yaml files in the legacy importer, as some packages now use that format. Otherwise they will be excluded with a NoManifests error.

…ible compilers from deps

thomashoneyman · 2023-11-16T23:09:46Z

Here's another fun one: some packages, like transformers@3.6.0, list dependencies which are entirely unused, in this case arrays. We then prune this dependency. However, then the package fails because it brings in other dependencies via that transitive dependency, such as either.

That means we can't get away with simply removing unused dependencies because we may also remove direct imports that were pulled in transitively by the unused dependency. Either we give up on removing unused dependencies, or we both remove unused dependencies and insert direct imports that weren't mentioned.

For dependencies we insert into a manifest we have their exact version via the solver; we could potentially do a bumpHighest on them to produce a range the same way we do when working with spago.dhall files.

I think that's preferable to giving up on the unused dependencies check but I'm curious if you disagree @f-f or @colinwahl.

thomashoneyman · 2023-11-17T20:31:12Z

As of the latest commit: we no longer simply remove unused dependencies. Instead, we loop. We remove unused dependencies, then bring in any transitive dependencies they would have brought in, and then check the new dependency list for unused dependencies and so on.

The result is that we remove unused dependencies while preserving any transitive dependencies they brought it which are used in the source code. Note that we don't go through and add all packages your code directly imports, we just do this for dependencies that are being removed.

thomashoneyman · 2023-12-05T00:32:04Z

Encountered another issue: sometimes we remove an unused dependency (as per #667) from a package (for example, we remove record as a dependency from variant@8.0.0) but this causes a failure in another package because that one was bringing in the dependency transitively (for example, codec-argonaut@9.2.0 uses record and was bringing it in transitively via variant).

In short, we can't just remove unused dependencies and insert any transitive dependencies that were being brought in by the unused package as described in #669. Some other package downstream may have been relying on the transitive dependencies.

We either have to:

Give up on pruning unused dependencies and have a policy that we accept transitive dependencies (and we should be clear that removing a dependency = a breaking change, even if the dependency was unused, because other packages may be relying on it)
Fix manifests not just by removing unused dependencies and inserting anything they brought in transitively, but by inserting all directly-imported packages as dependencies in the manifest.

For (2) I have not yet come up with a robust method. However, @natefaubion suggested in the PureScript chat that we could:

Take the transitive dependencies of the untrimmed dependencies
Take the transitive dependencies of the trimmed dependencies
See what the diff is and add the discovered transitive dependency

thomashoneyman · 2023-12-07T17:56:56Z

In this latest iteration I add support for inserting transitive dependencies as well as removing unused ones. This relies on keeping the untrimmed legacy manifests in a TransitivizedRegistry so we can use it for solving when discovering missing package ranges. Here's how we solve:

First we solve the untrimmed manifest using solveSteps on the legacy registry. This produces ranges for the manifest dependencies (and their dependencies) up to the point where the solver would have to make a commitment, and then it halts.
We look up the missing packages in the solveSteps result and take their ranges.
Then, we solve the untrimmed manifest again with solveFull. This produces exact versions for every dependency and transitive dependency in the manifest.
We iterate through every resolution's untrimmed manifest; if we see any of the remaining missing packages in its dependencies then we update the missing package's range.

The result is that most ranges are produced by the solver via solveSteps, which is more likely to be correct since it considers all the ranges that could be possible in a solution. However, some ranges will still be missing so we take the remaining ranges from the legacy resolutions' manifests.

The results seem good so far. For example, enums@3.1.0 uses both sources of ranges:

[2023-12-07T12:32:29.552Z INFO] Found fixable dependency errors: Missing dependencies (control, gen, maybe, newtype, nonempty, partial, prelude, tuples)

Let's fix these. We produce legacy resolutions for the package:

[2023-12-07T12:32:29.674Z DEBUG] Got legacy resolutions:
{
  "arrays": "4.4.0",
  "bifunctors": "3.0.0",
  "control": "3.3.1",
  "distributive": "3.0.0",
  "eff": "3.2.3",
  "either": "3.2.0",
  "foldable-traversable": "3.7.1",
  "gen": "1.3.1",
  "globals": "3.0.0",
  "identity": "3.1.0",
  "integers": "3.2.0",
  "invariant": "3.0.0",
  "math": "2.1.1",
  "maybe": "3.1.0",
  "monoid": "3.3.1",
  "newtype": "2.0.0",
  "nonempty": "4.3.0",
  "partial": "1.2.1",
  "prelude": "3.3.0",
  "st": "3.0.0",
  "strings": "3.5.0",
  "tailrec": "3.3.0",
  "tuples": "4.1.0",
  "type-equality": "2.1.0",
  "unfoldable": "3.2.0",
  "unsafe-coerce": "3.0.0"
}

Then we take a look at the transitive solution (ie. using the transitivized registry)

[2023-12-07T12:32:29.772Z DEBUG] Got transitive solution:
{
  "bifunctors": ">=3.0.0 <4.0.0",
  "control": ">=3.0.0 <4.0.0",
  "either": ">=3.0.0 <4.0.0",
  "foldable-traversable": ">=3.0.0 <4.0.0",
  "invariant": ">=3.0.0 <4.0.0",
  "maybe": ">=3.0.0 <4.0.0",
  "monoid": ">=3.0.0 <4.0.0",
  "newtype": ">=2.0.0 <3.0.0",
  "partial": ">=1.2.0 <2.0.0",
  "prelude": ">=3.0.0 <4.0.0",
  "strings": ">=3.0.0 <4.0.0",
  "tuples": ">=4.0.0 <5.0.0",
  "unfoldable": ">=3.0.0 <4.0.0"
}

We're able to get most package ranges from here, but some are missing in this solution (gen and nonempty). So we continue, this time taking the ranges by looking at every package in the legacy resolutions to see if gen and nonempty are listed in their ranges, and produce the following result:

[2023-12-07T12:32:29.804Z INFO] [NOTIFY] Your package is using a legacy manifest format, so we have adjusted your dependencies to remove unused ones and add direct-imported ones. Your dependency list was:
{
  "either": ">=3.0.0 <4.0.0",
  "strings": ">=3.0.0 <4.0.0",
  "unfoldable": ">=3.0.0 <4.0.0"
}

We have added the following packages: control, gen, maybe, newtype, nonempty, partial, prelude, tuples

Your new dependency list is:
{
  "control": ">=3.0.0 <4.0.0",
  "either": ">=3.0.0 <4.0.0",
  "gen": ">=1.1.0 <2.0.0",
  "maybe": ">=3.0.0 <4.0.0",
  "newtype": ">=2.0.0 <3.0.0",
  "nonempty": ">=4.2.0 <5.0.0",
  "partial": ">=1.2.0 <2.0.0",
  "prelude": ">=3.0.0 <4.0.0",
  "strings": ">=3.0.0 <4.0.0",
  "tuples": ">=4.0.0 <5.0.0",
  "unfoldable": ">=3.0.0 <4.0.0"
}

So we get "gen": ">=1.1.0 <2.0.0" and "nonempty": ">=4.2.0 <5.0.0", which is probably a little restrictive (maybe if we'd had other resolutions we would have had gen: >=1.0.0 <2.0.0) but I think it's pretty harmless.

thomashoneyman · 2023-12-07T19:16:26Z

lib/src/Operation/Validation.purs

+-- | Verifies that the manifest lists dependencies imported in the source code,
+-- | no more (ie. unused) and no less (ie. transitive). The graph passed to this
+-- | function should be the output of 'purs graph' executed on the 'output'
+-- | directory of the package compiled with its dependencies.
+noTransitiveOrMissingDeps :: Manifest -> PursGraph -> (FilePath -> Either String PackageName) -> Either (Either (NonEmptyArray AssociatedError) ValidateDepsError) Unit
+noTransitiveOrMissingDeps (Manifest manifest) graph parser = do


This is now a check in the validation portion of the library (cc: @f-f) if package managers want to use it.

Ah that's cool - once we merge this I'll try to fit it in Spago, hopefully we can reuse some of the graph code

thomashoneyman · 2023-12-08T22:16:54Z

Results of the run now that we fix dependencies in general:

--------------------
PUBLISH FAILURES
--------------------

999 out of 1443 packages had at least 1 version fail (516 packages had all versions fail).
6471 out of 10695 versions failed.

  - Publishing failed: 22
  - Solving failed (compiler): 104
  - No compilers usable for publishing: 628
  - Solving failed (dependencies): 5717

Full list of publish failures with reasons: publish-failures.json
Packages that had no successful versions: removed-packages.txt
Metadata produced by this run, with compilers associated: metadata.zip
Manifest index produced by this run, with dependencies all fixed: registry-index.zip

…ices

thomashoneyman · 2023-12-10T02:05:58Z

Some packages (notably codec-argonaut and codec-json, but others as well) were failing before we ever discovered a compiler to attempt publishing with because we attempted to solve and compile them with the current index. To fix them we have to solve with both the legacy and current indices and try both; we bail out early if either solution contains packages not in the current index.

New results:

--------------------
PUBLISH FAILURES
--------------------

983 out of 1443 packages had at least 1 version fail (507 packages had all versions fail or 35%)
6359 out of 10695 versions failed or 59%

  - Dependency compiler conflict: 1
  - Publishing failed: 92
  - No compilers usable for publishing: 567
  - Solving failed (dependencies): 5699

As usual here are the relevant files:

Full list of publish failures with reasons: publish-failures.json
Packages that had no successful versions: removed-packages.txt
Metadata produced by this run, with compilers associated: metadata.zip
Manifest index produced by this run, with dependencies all fixed: registry-index.zip

thomashoneyman · 2023-12-10T17:24:35Z

We have decided to manually fix the manifests for deku, bolson, and rito so that these widely-used packages are not dropped. That's available in the latest commit.

thomashoneyman · 2023-12-11T23:17:30Z

With the patches in for rito / deku / bolson:

--------------------
PUBLISH FAILURES
--------------------

984 out of 1443 packages had at least 1 version fail.
  - 503 packages had all versions fail.

6317 out of 10696 versions failed.
  - Publishing failed: 94
  - No compilers usable for publishing: 558
  - Solving failed (dependencies): 5665

As usual:

thomashoneyman · 2023-12-19T00:49:56Z

As discussed in the PureScript chat, the latest commit makes a small tweak to preserve packages that are from the core org or its derivatives, or which has had a tag since the 0.13 release date (May 29, 2019).

These 49 packages are now reserved with empty metadata:
reserved-packages.txt

Some notable newly-reserved names include monad-eff, coproducts, functor-products, maps, sets, web3, and so on.

454 package names will be freed:
removed-packages.txt

This is the full list of packages that made it to the 0.13 or organization cutoff, along with their latest tag dates:
packages-publish-013.json

f-f · 2023-12-22T16:53:16Z

I had a look at the above files, and they look good to me - the list of packages that we'll reserve is minimal and meaningful, and the list of ~450 freed packages seems sensible (I have checked quite a few manually and they all seem to be old enough to not be worth supporting, as people won't be able to build with them anyways)

I started looking at the code but it's a sizeable patch so it will take a few days

thomashoneyman · 2023-12-22T18:04:50Z

Let me know if you encounter tricky bits and would like an explanation. Also happy to jump on a quick call and walk through sections of the code.

f-f · 2023-12-22T18:30:23Z

Thanks! I think the logistics of that will be tricky over the Christmas days, but I let's see if that works for next week

f-f · 2023-11-29T10:46:47Z

SPEC.md

@@ -234,11 +234,12 @@ For example:

 All packages in the registry have an associated metadata file, which is located in the `metadata` directory of the `registry` repository under the package name. For example, the metadata for the `aff` package is located at: https://github.com/purescript/registry/blob/main/metadata/aff.json. Metadata files are the source of truth on all published and unpublished versions for a particular package for what there content is and where the package is located. Metadata files are produced by the registry, not by package authors, though they take some information from package manifests.

-Each published version of a package records three fields:
+Each published version of a package records four fields:


Surely we can make this more future proof 😄

Suggested change

Each published version of a package records four fields:

Each published version of a package records the following fields:

Yea, I think you're right 😆

f-f · 2023-11-29T10:48:12Z

SPEC.md


 - `hash`: a [`Sha256`](#Sha256) of the compressed archive fetched by the registry for the given version
 - `bytes`: the size of the tarball in bytes
 - `publishedTime`: the time the package was published as an `ISO8601` string
+- `compilers`: compiler versions this package is known to work with. This field can be in one of two states: a single version indicates that the package worked with a specific compiler on upload but has not yet been tested with all compilers, whereas a non-empty array of versions indicates the package has been tested with all compilers the registry supports.


Wouldn't it be tidier to only allow a non-empty array instead of several possible types? After all, the state with multiple compilers listed is going to be a superset of the first state.

The issue with the non-empty array is that it isn't clear whether an array of a single element represents one of:

a package that has been published with the given compiler, but which hasn't been tested against the full set of compilers

a package that has been tested against the full set of compilers and only works with one

When are we going to end up in a situation where we don't test the package against the whole set of compilers? My reading of the PR is that we always do?

In any case, we'll always have packages that are not "tested against the full set of compilers": when a new compiler version comes out, then all packages will need a retest, and if a package doesn't have the new compiler in the array then we don't know if it's not compatible or if it hasn't been tested yet.

Maybe we need another piece of state somewhere else?

When are we going to end up in a situation where we don't test the package against the whole set of compilers? My reading of the PR is that we always do?

Yes, as implemented here we just go ahead and test everything as soon as we've published. However, I split out the state because in our initial discussions we worried about how long it takes for the compiler builds to run (it takes publishing from N seconds to N minutes in some cases — large libraries or ones that leverage a lot of type machinery). We'd originally talked about the compiler matrix being a cron job that runs later in the day. I just made it part of the publishing pipeline directly because it was simpler to implement.

If we decide that it's OK for publishing to take a long time then we can eliminate this state and just test the compilers immediately. In that case we'd just have a non-empty array.

In any case, we'll always have packages that are not "tested against the full set of compilers": when a new compiler version comes out, then all packages will need a retest, and if a package doesn't have the new compiler in the array then we don't know if it's not compatible or if it hasn't been tested yet.

Yea, that's a good point. You don't know if the metadata you're reading just hasn't been reached yet by an ongoing mass compiler build to check a new compiler.

Maybe we need another piece of state somewhere else?

Off the top of my head I don't know a good place to put some state about possible compiler support; the metadata files are not helpful if a new compiler comes out and we're redoing the build since they're only aware of the one package.

If we decide that it's OK for publishing to take a long time then we can eliminate this state and just test the compilers immediately. In that case we'd just have a non-empty array.

I'm cool with this if you are.

We'll always have packages that are not "tested against the full set of compilers" [...] maybe we need another piece of state somewhere else?

We could either a) say that the supported list of compilers for a package can potentially be missing the current compiler if the matrix is currently running and not bother with state or b) put a JSON file or something in the metadata directory that indicates whether the compiler matrix is running. Then consumers can look at that.

Personally the matrix runs infrequently enough (just new compiler releases!) that I would rather opt for (a).

I pondered this for a few days and I think it's complicated?

Since we're going towards a model where we'd only run one registry job at a time and queue the rest (to prevent concurrent pushes to the repo), I'm afraid that running the whole matrix at once would make publishing very slow.
Something that we could do to counteract this could be to split the "publish" and the "matrix runs": on publishing we'd just add the package metadata with one compiler, and at the end of the publishing job we'd queue a series of "compiler matrix" jobs, each testing one compiler. These jobs would be of low priority, so new publishes would get in front of the queue, and things can stay snappy.

Personally the matrix runs infrequently enough (just new compiler releases!) that I would rather opt for (a).

The approach detailed above implies that we're in a world where we do (a), i.e. the list of compilers is always potentially out of date, and that's fine.

Additional note about the above: since the above would be introducing an "asynchronous matrix builder", we need to consider the dependency tree in our rebuilding: if a package A is published with compiler X, and then a package B depending on it is immediately published after it (a very common usecase since folks seem to publish their packages in batches), then we'd need to either make sure that matrix-build jobs for B are always run after matrix-build jobs for A, or retry them somehow.

f-f · 2023-11-29T10:50:57Z

types/v1/Metadata.dhall

@@ -14,6 +14,7 @@ let PublishedMetadata =
      { hash : Sha256
      , bytes : Natural
      , publishedTime : ISO8601String
+      , compilers : < Single : Version | Many : List Version >


Dhall supports NonEmpty

f-f · 2023-12-23T13:40:58Z

app/src/App/Effect/Cache.purs

@@ -168,7 +168,6 @@ handleMemoryFs env = case _ of
          case inFs of
            Nothing -> pure $ reply Nothing
            Just entry -> do
-              Log.debug $ "Fell back to on-disk entry for " <> memory


Is this log useful at all? I think it's ok to just remove it

f-f · 2023-12-23T14:21:21Z

lib/src/Metadata.purs

  , hash :: Sha256
  , publishedTime :: DateTime
+
+  -- UNSPECIFIED: Will be removed in the future.


I once again forgot why we are removing this 😄

I think it's because we only need it when we have the importer running off of Git tags but I don't fully remember either. I'm still in favor of recording the full location of a package in each published version so we can always reconstruct where it came from.

f-f · 2023-12-23T14:23:54Z

app/src/App/Legacy/Manifest.purs

@@ -171,6 +163,44 @@ fetchLegacyManifest name address ref = Run.Except.runExceptAt _legacyManifestErr

  pure { license, dependencies, description }

+-- | Some legacy manifests must be patched to be usable.
+patchLegacyManifest :: PackageName -> Version -> LegacyManifest -> LegacyManifest


This piece of code is quite nice and tidy for all the work it's doing!

f-f · 2023-12-23T14:26:55Z

lib/src/Operation/Validation.purs

+-- | Verifies that the manifest lists dependencies imported in the source code,
+-- | no more (ie. unused) and no less (ie. transitive). The graph passed to this
+-- | function should be the output of 'purs graph' executed on the 'output'
+-- | directory of the package compiled with its dependencies.
+noTransitiveOrMissingDeps :: Manifest -> PursGraph -> (FilePath -> Either String PackageName) -> Either (Either (NonEmptyArray AssociatedError) ValidateDepsError) Unit
+noTransitiveOrMissingDeps (Manifest manifest) graph parser = do


Ah that's cool - once we merge this I'll try to fit it in Spago, hopefully we can reuse some of the graph code

f-f · 2023-12-23T14:30:59Z

lib/src/Solver.purs

+      -- then we don't add it to the dependencies to avoid over-
+      -- constraining the solver.
+      compilers <- Either.hush eitherCompilers
+      -- Otherwise, we construct a maximal range for the compilers the


If we want to make it 100% correct then we can choose one of the subsets of the range (presumably the most recent). E.g. in the case of a package supporting 0.15.0 and 0.15.2 but not 0.15.1, then we'd pick 0.15.2 only.

But I agree that it's unlikely that this would happen in the wild, so I'd not worry about that until we stumble on this issue.

f-f · 2023-12-27T14:25:32Z

app/src/App/API.purs

@@ -1038,7 +1045,7 @@ publishToPursuit { packageSourceDir, dependenciesDir, compiler, resolutions } =
    Left error ->
      Except.throw $ "Could not publish your package to Pursuit because an error was encountered (cc: @purescript/packaging): " <> error
    Right _ ->
-      Comment.comment "Successfully uploaded package docs to Pursuit! 🎉 🚀"
+      FS.Extra.remove tmp


We don't notify anymore for successful docs publishing?

We do, the comment is just done outside of this function now. See e.g. line 581.

f-f · 2023-12-27T14:33:54Z

scripts/src/LegacyImporter.purs

+  let metadataPackage = unsafeFromRight (PackageName.parse "metadata")
+  let pursPackage = unsafeFromRight (PackageName.parse "purs")


We need these to be considered separately right? I.e. they can't be in the section above with the filterPackages_0_13 - we have versions for metadata but we want none, and purs hasn't been published at all.
We should probably reserve purescript too?

thomashoneyman · 2023-12-27T18:16:39Z

@f-f I think I've responded to your comments above, but GitHub isn't showing all of them in this main PR view so I'm not completely sure. Please let me know if I missed one!

thomashoneyman · 2024-01-03T01:25:31Z

@f-f Did you have any other questions or comments about this work?

f-f · 2024-01-03T06:57:04Z

@thomashoneyman I am through with the review - the overall shape of the code is good, and the only thing left to really resolve is the thing about "testing against the full set of compilers"

thomashoneyman added 8 commits November 11, 2023 13:18

Add 'compilers' field to metadata

0b6dd4e

Add utilities for building with many compilers

e15e4a8

Remove PackageSource and require all packages to solve/compile

d8e7e41

Determine all compilers for package in publish pipeline

8e069b6

Initial cut at discovering compiler in legacy import

5348ee2

Always look up metadata / manifests in each publishing step

630c0bf

Testing the pipeline...

77d6e68

Better reporting of failures

8749bea

thomashoneyman commented Nov 13, 2023

View reviewed changes

thomashoneyman requested review from f-f and colinwahl November 13, 2023 22:08

Update union of package set / spago / bower deps, consider ranges in …

be93d18

…most manifest index ops

thomashoneyman commented Nov 14, 2023

View reviewed changes

scripts/src/LegacyImporter.purs Outdated Show resolved Hide resolved

Include spago.yaml files in legacy import

5a15433

thomashoneyman added 3 commits November 15, 2023 17:16

Retain compilation in cache

559275c

Consider compilers when solving

09d515a

Rely on solver per-compiler instead of looking at metadata for compat…

98ef892

…ible compilers from deps

Adjust unused dependency pruning to replace used transitive deps

ae621da

thomashoneyman added 4 commits November 17, 2023 15:35

Remove unused functions

5c54103

wip

441b960

Use cache when finding first suitable compiler

3495edb

WIP: Include missing direct imports

7ceab4c

thomashoneyman added 2 commits December 4, 2023 16:41

Merge branch 'master' into trh/compilers-in-metadata

ec388d1

Update flake

5b17cb3

thomashoneyman added 2 commits December 7, 2023 11:56

Integrate inserting missing dependencies

f924b31

Tweaks for efficiency

3cdb9b9

thomashoneyman commented Dec 7, 2023

View reviewed changes

(hopefully) final run of the importer

d0181e5

thomashoneyman added 2 commits December 8, 2023 17:18

Update spec to note transitive dependencies requirement.

6f9f0cd

attempt to discover publish compiler with both legacy and current ind…

2721c6a

…ices

thomashoneyman added 2 commits December 9, 2023 21:09

Tweaks

f8d0f80

Patch some legacy manifests

e2d6e87

Range tweaks for bolson/deku/rito

b8a21a8

thomashoneyman added 3 commits December 18, 2023 15:19

Update to fix darwin support for spago builds

3d7ab49

Clean up publish stats

6bc8d09

Enforce an explicit 0.13 date cutoff / core org cutoff

9acbc94

f-f reviewed Dec 27, 2023

View reviewed changes

thomashoneyman added 2 commits January 4, 2024 16:12

Merge branch 'master' into trh/compilers-in-metadata

d2c3b9a

Move location check above manifest parse

bea2013

		publish :: forall r. PackageSource -> PublishData -> Run (PublishEffects + r) Unit
		publish source payload = do

		Operation.Validation.validatePursModules files >>= case _ of
		Left formattedError \| payload.compiler < unsafeFromRight (Version.parse "0.15.0") -> do

		publishLegacyPackage :: Manifest -> Run _ Unit
		publishLegacyPackage (Manifest manifest) = do

	Each published version of a package records four fields:
	Each published version of a package records the following fields:

		let metadataPackage = unsafeFromRight (PackageName.parse "metadata")
		let pursPackage = unsafeFromRight (PackageName.parse "purs")

Require all packages to solve / compile and include all valid compilers in their metadata #669

Are you sure you want to change the base?

Require all packages to solve / compile and include all valid compilers in their metadata #669

Conversation

thomashoneyman commented Nov 13, 2023

thomashoneyman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomashoneyman commented Nov 13, 2023 • edited

thomashoneyman commented Nov 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomashoneyman commented Nov 14, 2023 • edited

thomashoneyman commented Nov 15, 2023

thomashoneyman commented Nov 16, 2023

thomashoneyman commented Nov 17, 2023

thomashoneyman commented Dec 5, 2023

thomashoneyman commented Dec 7, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomashoneyman commented Dec 8, 2023

thomashoneyman commented Dec 10, 2023

thomashoneyman commented Dec 10, 2023

thomashoneyman commented Dec 11, 2023 • edited

thomashoneyman commented Dec 19, 2023 • edited

f-f commented Dec 22, 2023

thomashoneyman commented Dec 22, 2023

f-f commented Dec 22, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomashoneyman commented Dec 27, 2023

thomashoneyman commented Jan 3, 2024

f-f commented Jan 3, 2024

thomashoneyman commented Nov 13, 2023 •

edited

thomashoneyman commented Nov 14, 2023 •

edited

thomashoneyman commented Dec 7, 2023 •

edited

thomashoneyman commented Dec 11, 2023 •

edited

thomashoneyman commented Dec 19, 2023 •

edited