Skip to content

Use discriminated unions to improve validation errors #244

@mvandenburgh

Description

@mvandenburgh

We recently saw these metadata validation errors on a dandiset in production dandi/dandi-archive#1958. These are the errors that were reported:

contributor: String should match pattern '^([\w\s\-\.']+),\s+([\w\s\-\.']+)$'
contributor: Input should be 'Organization'
contributor: String should match pattern 'https://ror.org/[a-z0-9]+$'

The invalid contributor in question turned out to be a Person with an invalid name field; in other words, the first validation error was the actual issue, while the other two were not relevant and somewhat misleading. What's happening here is pydantic has no idea from a validation perspective whether the object is intended to be a Person or an Organization , as contributor is of type List[Union[Person, Organization]], so it's checking both cases (i.e., first it validates the object as if it were a Person and gets the first error, then it validates it as a Organization and gets the other two errors).

I propose that we use discriminated unions on the schemaKey field of each pydantic model so we can avoid this in the future. This would allow pydantic to scope down the validation to the specific type of the object based on its schemaKey. If we had this in the above mentioned scenario, pydantic would have recognized that the invalid contributor is supposed to be a Person and would not have reported the additional misleading validation errors that assume it's an Organization.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions