Skip to content

Need way to mirror definitions #386

@jeffmcaffer

Description

@jeffmcaffer

There is interest in people replicating definition data locally to support robustness/performance, infrastructure control, and privacy.

Principles:

  • Readonly -- clearlydefined.io is still the source of truth with all curations, harvesting, ... done in the production service.
  • Definitions only -- Harvested data and curations are not included in the mirroring process.
  • Only point queries -- The production service supports arbitrary queries over definition documents. The local copy only needs point queries based on component coordinates.

Options:

  • rsync-style -- The definitions are just blobs so in theory we could mirror those as files and allow people to read from disk. That exposes the user to internal details of ClearlyDefined.
  • slave service -- Implement a path through the service code that is read-only and has the mirroring activity built in. This would shut down any write paths, not have a crawler, ... and implement whatever mirroring protocol we decide is best.

Random thoughts/topics

  • Must all definitions be aggressively computed? Currently we (re)compute definitions on demand in the event of schema changes. We could have the local service fall back to the remote service if the schemas don't match.
  • First replication is different. That could be a bulk download of a dump where as keeping up to date continuously replicates recent actions.
  • periodic or continuous. Need to determine if the use cases require up to the minute replication or if periodic (hourly, daily) replication is enough.
  • should be related to the need for an "event stream" that enables people to track new definitions.
  • Local scenarios may use different data store technology from the main service. A simple version would just put the data in the local file system. So this is not a straight record for record mirror. Rather it should use an API to read and write the data using the correct structures.
  • Local servers, being read-only, need not ever compute a definition.

cc: @jeffmendoza

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions