[WIP/do not merge] Replace anyhow with typed errors in wit-parser#2460
[WIP/do not merge] Replace anyhow with typed errors in wit-parser#2460PhoebeSzmucer wants to merge 16 commits intobytecodealliance:mainfrom
anyhow with typed errors in wit-parser#2460Conversation
|
Thanks for this! I mentioned this on Zulip but wanted to echo here too -- I agree with the direction of this change and I think it would be good to have concrete error types for
Overall I'd say let's figure out the final design of the error types in play here, and once that's sorted out figure out how best to land this (e.g. all-at-once vs incrementally) -- does that sound ok? |
|
Thank you for a very detailed response @alexcrichton !
Totally agreed. I'll try to merge some non-controversial pieces first.
To clarify - we'd just make
For a very basic LSP we don't need it, but I think it would be a good idea to be more fine-grained than just span+message. It's the difference between the consumers (the LSP, or even the CLI) knowing just what to highlight and what to say, vs these tools actually understanding what the error is. Typed variants unlock richer things -
I agree that the Span is the right internal primitive, but it raises the question for the public API - the consumer needs to know which source map a given
This is manageable but leaks internal architecture to callers. In case of An alternative is for errors to carry already-resolved locations. Not pre-rendered strings, but structured I'm happy to keep exposing global spans in the API, but for that I might need to modify how package cycles are computed - from my understanding cycles are detected during toposort before source maps are merged, which is why I did eager resolution in my PR. I think if we merge the source maps before toposort that would work. |
|
How about a design like this: #[derive(Debug)]
#[repr(transparent)]
pub struct WitError {
inner: Box<WitErrorInner>,
}
struct WitErrorInner {
kind: WitErrorKind,
location: Option<WitLocation>,
}
#[non_exhaustive]
pub enum WitErrorKind {
// ... full details of `WitError` as today, slightly modified maybe ...
}
pub struct WitLocation {
file: String,
line: u32,
col: u32,
}
impl WitError {
pub fn location(&self) -> Option<&WitLocation> {
// ...
}
pub fn span(&self) -> Option<Span> {
// ...
}
pub fn kind(&self) -> &WitErrorKind {
// ...
}
/// Called by `rewrite_error` throughout the codebase
pub(crate) fn update_location(&mut self, map: &SourceMap) {
// ... update `self.location` ...
}
/// Convenience constructors for throughout this crate.
pub(crate) fn duplicate_package(..) -> WitError {
// ...
}
}
impl fmt::Display for WitError {
// ..
}
impl core::error::Error for WitError {
// ..
}Here One final thing I might bikeshed as well is that the current ad-hoc growth of the crate has a few error enums throughout (e.g. toposort, package name stuff, lex errors, etc), and I think it would be best to unify everything into a single large How's that sound? |
|
I just pushed another version that we can consider going with - it compiles (modulo snapshots). I pushed the code just now. Essentially: Parse/AST phase (src/ast/error.rs): pub struct ParseErrors(Box<[ParseErrorKind]>); // Maintains the possibility of us returning more errors in the future
#[non_exhaustive]
pub enum ParseErrorKind {
Lex(lex::Error),
Syntax { span: Span, message: String },
ItemNotFound { span: Span, name: String, kind: String, hint: Option<String> },
TypeCycle { span: Span, name: String, kind: String },
}Resolve phase (resolve/error.rs) pub struct ResolveErrors(Box<[ResolveErrorKind]>); // Maintains the possibility of us returning more errors in the future
#[non_exhaustive]
pub enum ResolveErrorKind {
PackageNotFound { span: Span, requested: PackageName, known: Vec<PackageName> },
InvalidTransitiveDependency { span: Span, name: String },
DuplicatePackage { name: PackageName, span1: Span, span2: Span },
PackageCycle { package: PackageName, span: Span },
Semantic { span: Span, message: String }, // Sort of a "catch all" - we can explode this into more variants later
}All of the variants contain one or more global spans. This design is a bit different from your latest proposed one. There are a couple of decisions that we should pick:
Thanks for iterating with this on me - once we agree on these questions I'll update accordingly |
And yeah definitely. I agree and I'm happy to do that. |
|
Personally I think I'd still lean towards one error type for the crate rather than multiple error types, but do you think for the LSP purposes it'd be useful to have multiple kinds of errors? For many-errors-vs-one, I like the idea of being able to report many errors when parsing WIT as opposed to stopping on the first one, but I suspect that fully supporting this is going to require some deeper refactoring. For example idiomatically throughout the code errors are propagated with I suppose another way to put this is that I don't at least personally have all that much experience in a multi-error-returning system. I'd hope to reach some sort of middle-ground where it reports reasonable errors, tries to not have a billion cascading errors, but doesn't baloon infrastructure to precisely track cascading errors just yet. My rough assumption is that doing all this will be more involved than adding a slice to the error types themselves, so if you're ok with it I'd prefer to separate that out to a future change perhaps? |
Agreed.
Agreed for a CLI, but for the LSP we would probably want that - otherwise it's pretty random which file gets the resolution diagnostic. But I guess that shows that the compiler and the LSP are pretty different, and we will need to put some more thought into how wit-parser should be optimally structured to support both. I will go with a single error for now, and maybe in the future we can revisit if we decide returning a list is better. Are we ok occasionally making breaking compile-time changes in this package?
Yes, I think the split is useful for LSP purposes specifically because parse and resolve can then be separate salsa queries in the LSP. Per-file parsing can cached independently from workspace-level resolution. Having distinct types makes it natural to handle diagnostics at the right granularity - parse errors belong to a single file, resolve errors belong to the workspace. With a unified type you lose that distinction at the type level, which might force us to do That said, I'm happy to go with a single type if you feel strongly - it's easy to split later if we decide to (again, if we can afford to make more breaking changes). |
Ok sounds good to me, and yeah breaking API changes here are fine. We're still in a mode of "each release is a new major version" so it's fine to have futher breaking changes later (it's already a breaking change to switch away from
What you say is convincing to me, so sounds good! For landing this, given that there's going to be two high-level error types -- parsing & resolving -- how about landing those separately? For example the first error to land would be the parsing result and then afterwards could be the resolve result. The error types would look similar in their implementation, and I think that's fine. |
Motivation
Today, getting a file path and line number out of a parse error requires parsing the human-readable error string that anyhow produces. This is fragile: it breaks silently when error message formatting changes, and it's simply the wrong abstraction.
I want to use
wit-parserin an LSP for WIT/WAC, but this change would be good for any tool that embedswit-parserand wants to do something useful with errors.I Know This Is Large
I'm aware this PR is bigger than ideal. The core reason it's large is that error type changes propagate through the entire call stack - there's no natural seam to split at. If this is something that we'd like to merge in some form, I can try splitting up this PR into smaller chunks - I wanted to open this draft first to start a conversation.
Global vs local spans
Currently I'm returning global
Spanobjects as part ofWitError, because existing code uses global spans almost exclusively. This makes things annoying for the consumer - they have to use theSourceMapobject to map to file-local offsets. I'm considering changing the exposed spans to be resolved local spans.