Skip to content

Comments

Support standart canonical state instances#5486

Draft
qnikst wants to merge 37 commits intoIntersectMBO:masterfrom
tweag:scls-canonical-2
Draft

Support standart canonical state instances#5486
qnikst wants to merge 37 commits intoIntersectMBO:masterfrom
tweag:scls-canonical-2

Conversation

@qnikst
Copy link
Collaborator

@qnikst qnikst commented Dec 15, 2025

Description

This MR adds support for canonical ledger state instances for the types that are using in canonical ledger state according to the CIP-0165.

To support SCLS instances for canonical types are introduced together with a test-suite.

This PR is done for the early review as for example it still uses scls libraries that are not in CHaP, we are going to add them once discussion questions about this PR will be resolved (if there will be questions).

Checklist

  • Commits in meaningful sequence and with useful messages.
  • Tests added or updated when needed.
  • CHANGELOG.md files updated for packages with externally visible changes.
    NOTE: New section is never added with the code changes. (See RELEASING.md).
  • Versions updated in .cabal and CHANGELOG.md files when necessary, according to the
    versioning process.
  • Version bounds in .cabal files updated when necessary.
    NOTE: If bounds change in a cabal file, that package itself must have a version increase. (See RELEASING.md).
  • Code formatted (use scripts/fourmolize.sh).
  • Cabal files formatted (use scripts/cabal-format.sh).
  • CDDL files are up to date (use scripts/gen-cddl.sh)
  • hie.yaml updated (use scripts/gen-hie.sh).
  • Self-reviewed the diff.

In order to be able to export and import ledger data to
CIP-165 [1]. The plan is to have `scls` public library
for each era that supports it, starting with Conway.

The scls library is structured using namespaces that
CIP-165 proposes and has instances for all types.

In order to not reimplemnet too many CBOR encodings for
the types that already have instance they were implemented
via LedgerCBOR.

In order to check conformance to SCLS spec CDDL conformance
and canonicity conformance tests were introduced, so it's
possible to develop ledger types and state without looking
at scls package and rely on typechecker and tests.

[1] cardano-foundation/CIPs#1083
Prepare an example of using scls. Example provides
abilities to:
  - read binary utxo and state files (generated by cardano-cli)
    and reencode in in scls
  - read and store the file as scls

This utility is useful as an example and for testing
scls canonicity.
@qnikst qnikst requested a review from a team as a code owner December 15, 2025 17:52
Copy link
Collaborator

@lehins lehins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few quick comments after taking a peek at the PR.

I am not really confident about this whole structure. I need to give it some thought.

Maybe it would be safer and clearer if there was a concrete ledger state Haskell data type for what is being stored in the canonical ledger state.
Then there would be a step that converts to and from that data type, which is ultimately stored in a canonical CBOR form.
I know this work is still in an early stage, but I'd like to flash out the design before we put a whole lot of work in.

Could I please ask you to remove all redundant parenthesis in this PR. They are driving me nuts. Otherwise I'll end up posting hundreds of suggestions with their removal.

I'll try to spend some time later this week trying to dig deeper into understanding what exactly is being proposed here.

import Data.Typeable (Typeable)
import Data.Word (Word8)

instance {-# OVERLAPPING #-} ToCanonicalCBOR v (CompactForm Coin) where
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not let overlapping instances into this repo. They are trouble and there usually are ways to avoid them. In these examples, I think, those pragmas are totally unnecessary, since they aren't seem to overlap anything


deriving via (LedgerCBOR v (EpochNo)) instance ToCanonicalCBOR v (EpochNo)

deriving via (LedgerCBOR v (EpochNo)) instance FromCanonicalCBOR v (EpochNo)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant parenthesis her and everywhere else.

Suggested change
deriving via (LedgerCBOR v (EpochNo)) instance FromCanonicalCBOR v (EpochNo)
deriving via (LedgerCBOR v EpochNo) instance FromCanonicalCBOR v EpochNo

cabal.project Outdated
source-repository-package
type: git
subdir: scls-cbor
location: https://github.com/tweag/cardano-canonical-ledger.git
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest renaming the repo, otherwise it sounds like there is Canonical and Non-Cononical Cardano Ledger 😅

Suggested change
location: https://github.com/tweag/cardano-canonical-ledger.git
location: https://github.com/tweag/cardano-canonical-ledger-state.git

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Fixed that though renamed that to cardano-cls in order not to clash name with cardano-ledger.

cabal.project Outdated

source-repository-package
type: git
subdir: scls-cbor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't be easier to just list them all?

Suggested change
subdir: scls-cbor
subdir:
scls-cbor
scls-format
...

I mean, not that it matters, since we don't merge PRs with SRPs, but even for development purposes it feels like it woulb simpler

testlib,

test-suite tests
test-suite testtestss
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidental stuttering

Suggested change
test-suite testtestss
test-suite tests

instance ToCanonicalCBOR v (PParamsUpdate ConwayEra) where
toCanonicalCBOR v (PParamsUpdate ConwayPParams {..}) =
encodeAsMap
[ mkEncodablePair v ("a0" :: Text) (unTHKD cppA0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using names for protocol parameters, we should be using Tags, same as in CBOR that is used on chain:

{ ? 0 : coin ; minfeeA
, ? 1 : coin ; minfeeB
, ? 2 : uint .size 4 ; max block body size
, ? 3 : uint .size 4 ; max transaction size
, ? 4 : uint .size 2 ; max block header size
, ? 5 : coin ; key deposit
, ? 6 : coin ; pool deposit
, ? 7 : epoch_interval ; maximum epoch
, ? 8 : uint .size 2 ; n_opt: desired number of stake pools
, ? 9 : nonnegative_interval ; pool pledge influence
, ? 10 : unit_interval ; expansion rate
, ? 11 : unit_interval ; treasury growth rate
, ? 16 : coin ; min pool cost
, ? 17 : coin ; ada per utxo byte
, ? 18 : cost_models ; cost models for script languages
, ? 19 : ex_unit_prices ; execution costs
, ? 20 : ex_units ; max tx ex units
, ? 21 : ex_units ; max block ex units
, ? 22 : uint .size 4 ; max value size
, ? 23 : uint .size 2 ; collateral percentage
, ? 24 : uint .size 2 ; max collateral inputs
, ? 25 : pool_voting_thresholds ; pool voting thresholds
, ? 26 : drep_voting_thresholds ; drep voting thresholds
, ? 27 : uint .size 2 ; min committee size
, ? 28 : epoch_interval ; committee term limit
, ? 29 : epoch_interval ; goveranance action validity period
, ? 30 : coin ; governance action deposit
, ? 31 : coin ; drep deposit
, ? 32 : epoch_interval ; drep inactivity period
, ? 33 : nonnegative_interval ; minfee refscriptcoinsperbyte

Names can change, while tags cannot.

@@ -0,0 +1,138 @@
{-# LANGUAGE AllowAmbiguousTypes #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE ImportQualifiedPost #-}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency in the codebase is key. We don't want to mix different styles in the same repo. (FTR. I am all in favor of the extension itself in general)

Suggested change
{-# LANGUAGE ImportQualifiedPost #-}

@qnikst
Copy link
Collaborator Author

qnikst commented Dec 16, 2025

Yes, definitely feel free to think about it as much as you need.

We do not expect that quite a large PR that was done without prior discussions will be merged as-is immediately.

Maybe it would be safer and clearer if there was a concrete ledger state Haskell data type for what is being stored in the canonical ledger state.

I'm not opposed to having such state especially if it makes sense for the cardano-ledger team and we are open to make it and define instances in terms of current instances.

However, there are reasons for having this set of instances defined. In SCLS CIP we planned the protocol to have two properties:
a. it should be possible to 'stream' the data and it should not be required to build a structure in the memory before dumping that
b. it should be possible to extract (read/write) only interesting part of the data (namespace)

I would take that for the ledger those requirements may make not much sense but implemented instances may help.
For an example you can see canonical-ledger-state executable that have an example how it can look like. We can read some part of the data from the storage and stream it (granted that currently the storage is in memory, but for other system or if read from scls file we can consume only required part of the data).

Also such type would lead to more entanglement between cardano-ledger and canonical state definition and changes to that state will have to pass CIP process and CIP should be the source of truth. I'm not the right person to judge here, but I thought that it could harm the processes of development in cardano-ledger, and we wanted to avoid that. Especially because canonical state will largely mimic the existing state.


As for the names vs tags, we did as in the first approach in the CIP (not yet accepted) and use names as meaningful entities, so they will not change unless the spec changes. And they were planned to be a sort of documentation that does depend on the protocol. But having those tags on the wire is definitely a better guarantees than CIP process as does not depend on the people. Thus switching to tags makes total sense (the only downside is that consumer will have to consult with the CDDL about what that field is and writing automated codes (if language supports them) can be harder).


So regarding this PR, I'm going to do following:

  1. Accept and update all current comments
  2. Perform a switch from names to tags (will discuss with my team first, but likely that we agree on the path), but in case the arguments above make sense feel free to stop me
  3. Participate in discussion, elaborate arguments if needed, and will wait for the resolution and if we want to keep a canonical state.

We no longer define overlapping instances for the CompactForm X type,
so OVERLAPPING instances can be dropped.

Thanks to the Alexey Kuleshevich for noticing that.
Remove redundant parenthesis all over the project.
@qnikst
Copy link
Collaborator Author

qnikst commented Dec 16, 2025

I think I've addressed all the comments except for names vs tags. After a short discussion with a team we decided that it would be right to put that into the alternatives section in CIP and get a final agreement in CIP. As CDDL interface for the SCLS data is defined by the CIP.

Both tags and names have those pros and cons, but we decided to give a chance for the consumers to vote

cardano-cls introduced additional namespace that is required to
bootstrap a node. This commit introduces necessary support on
the cardano-ledger side.
@qnikst
Copy link
Collaborator Author

qnikst commented Jan 7, 2026

@lehins have you chances to think about the changes?


From the last update there were few more changes from our side:

  1. updated branch to support all the changes in cardano-ledger that happened since the previous state
  2. prepared PR to CHaP so now it's possible to drop source package from the cabal.project, see Cardano canonical ledger state related package: add new packages cardano-haskell-packages#1207
  3. put discussion about whether to use tags or names in CIP
  4. addressed all comments on PR and created issues in cardano-cls repo

I would love to hear more feedback on this branch and if there are other updates we should proceed with.

@qnikst
Copy link
Collaborator Author

qnikst commented Jan 21, 2026

Finally we have managed to update the structure of the PR, it includes:

  1. we have introduced Canonical$Type data types for all canonical types except from BaseTypes that were defined in a new libs/cardano-canonical package.
  2. The package has instances for the lib/* types but does not contain conversion instances for the era specific types
  3. For the types that are era specific or types that can be created from the multiple types IsCanonical$Type type class was introduced (and in once case there was MkCanonicalGovActionState, FromCanonicalGovActionState, because conversion was not symmetrical
  4. All instances for the era specific types appear in the era libraries, if supported, i.e. in era/conway/impl/scls together with final tests (implemented by the helper)

In addition we have updated the repo to the latest cardano-cls package.

Our next step would be, as discussed to merge current master and update the work if needed.

But it's possible to check the PR at this moment, because changes may be a bit opinionated and do not fit the general style of the project.

CC @lehins

Copy link
Collaborator

@lehins lehins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am really sorry, but this PR is way too big for a sensible review.
Let's take a step back and restart small and then I believe we can make progress. Could you please split out parts of this PR into a separate smaller PR that handles creation of canonical state that contains only UTxO and nothing else

Also there is no need to create canonical types for any of the types that live on-chain and stored in the state. For example TxOut (and everything it contains) is one of those type, they are guaranteed not to change Haskell representation (or serialization for that matter) for a specific era ever. This is very important to avoid unnecessary duplication. Same applies to PParams, if they are to use tags instead of names, hence my push for using tags, otherwise we will end up with this duplication for every single era.

Overall, it does look like good work and direction seems to be reasonable, but there is no way we are merging a huge chunk of code lime this, because there is no way that we will be able to maintain it without completely rewriting it later. So, if we want to design something maintainable by the Ledger team we need to do this work one step at a time, like all other work that we do.

@qnikst
Copy link
Collaborator Author

qnikst commented Jan 24, 2026

@lehins thanks for starting the work on reviewing and helping us with finding a way of getting things merge!

While I have to express that this is a bit afflicting (as it would require to reorganise our plans) but totally understandable decision, and I agree that it will be a reasonable way forward. Just wanted to express a hope that we will be able to have a review on the future PRs in a timely manner, in that case it would be possible to get things merged in a reasonable timeframe without much pressure on either team.

To summarise the next actions we will do:

  1. We will update CIP-0165 and cardano-cls PR to shift from names to tags.
    I see you have quite strong points in for using tags. Additionally, cardano ledger team is the most affected one by the decision and the only who expressed an opinion on this topic. We will have to start with that task, as otherwise any change to the specification would require updating CIP, then updating CHaP with a new version of cardano-cls, so I would prefer not to run through this process unless in can be done solely with our team.
  2. We will create a series of small PRs with one namespace per PR. Currently with the number of changes required testing it was quite hard to support meaningful set of commits, but how we would.
  3. For PRs we will try to ensure our best that we do not introduce type proxies for the types that exist on the chain.

I would suggest to start with something more simple than UTxO for example blocks/v0 namespace, as it would allow us to agree on the general structure of the project, how the code will be split and how it will evolve, and what is the correct amount of code that would have to be written to support canonical representation in each new era. Because UTxO-s PR would require us to agree on the set of types and instances to be written in addition to project structure.

@lehins
Copy link
Collaborator

lehins commented Jan 24, 2026

. Just wanted to express a hope that we will be able to have a review on the future PRs in a timely manner, in that case it would be possible to get things merged in a reasonable timeframe without much pressure on either team.

It is much easier and quicker to review smaller, self contained PRs. That is precisely my motivation, behind the recommendation of splitting this work up, so we can actually review them in timely manner and get them merged, instead of be stuck in review process for months.

We will update CIP-0165 and cardano-cls PR to shift from names to tags

Yes, please.

We will create a series of small PRs with one namespace per PR.

In order to avoid redundant work, let's start with one PR, instead of a whole chain that will depend on each other. Using blocks namespace is also a fine choice for an initial PR. The reason why I was saying UTxO namespace would be a good contender is because:

  • cardano-canonical will not contain canonical types, cause we are dealing with TxIn and TxOut (and the rest of the types that they contain), which are stable by definition, since they live on chain. Therefore it can use TxIn type and TxOut type family from cardano-ledger-core.
  • While cardano-ledger-conway would define Canonical instance for the actual BabbageTxOut type.

For PRs we will try to ensure our best that we do not introduce type proxies for the types that exist on the chain.

Perfect.

@@ -0,0 +1,125 @@
cabal-version: 3.0
name: cardano-canonical
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename this package in a way that follows the convention and is a bit more clear what it means. It's not a big deal that it is longer.

Suggested change
name: cardano-canonical
name: cardano-ledger-canonical-state

time,
tree-diff,

library scls
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to avoid abbreviations as much as possible

Suggested change
library scls
library canonical-state


library scls
exposed-modules:
Cardano.Ledger.Conway.SCLS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will be a better prefix. Again abbreviations are not really saving time.

Suggested change
Cardano.Ledger.Conway.SCLS
Cardano.Ledger.Conway.CanonicalState


library
exposed-modules:
Cardano.Ledger.SCLS.BaseTypes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Also wanted to point out that having Ledger and SL(edger)`S in the name is the duplication that bugs me a little 😀

Suggested change
Cardano.Ledger.SCLS.BaseTypes
Cardano.Ledger.CanonicalState.BaseTypes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case you are worried of loosing Standard part, don't, we in ledger often have a different name that what the corresponding CIP defines the concept as.

-- ^^^^^^^
-- Namespace
-- @
module Cardano.Ledger.SCLS.LedgerCBOR (
Copy link
Collaborator

@lehins lehins Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can just be CBOR, we are already in Ledger See my other comment for the reason about the name

Suggested change
module Cardano.Ledger.SCLS.LedgerCBOR (
module Cardano.Ledger.CanonicalState.EraCBOR (

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I see why was called LedgerCBOR. This whole concept is actually incorrect and the name is not good either. I'll make a comment on the actual type

-- ^^^^^^^
-- Namespace
-- @
module Cardano.Ledger.SCLS.LedgerCBOR (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I see why was called LedgerCBOR. This whole concept is actually incorrect and the name is not good either. I'll make a comment on the actual type

Comment on lines +59 to +63
newtype LedgerCBOR (v :: Symbol) a = LedgerCBOR {unLedgerCBOR :: a}
deriving (Eq, Show)

instance EncCBOR a => ToCanonicalCBOR v (LedgerCBOR v a) where
toCanonicalCBOR _v (LedgerCBOR a) = assumeCanonicalEncoding $ toPlainEncoding (natVersion @9) (encCBOR a)
Copy link
Collaborator

@lehins lehins Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very wrong and dangerous, because any instance that will be derived using it will be forever stuck in Conway. I also changed the name to reflect it better. Here how it should look like:

Suggested change
newtype LedgerCBOR (v :: Symbol) a = LedgerCBOR {unLedgerCBOR :: a}
deriving (Eq, Show)
instance EncCBOR a => ToCanonicalCBOR v (LedgerCBOR v a) where
toCanonicalCBOR _v (LedgerCBOR a) = assumeCanonicalEncoding $ toPlainEncoding (natVersion @9) (encCBOR a)
newtype EraCBOR era (v :: Symbol) a = EraCBOR {unEraCBOR :: a}
deriving (Eq, Show)
instance (Era era, EncCBOR a) => ToCanonicalCBOR v (EraCBOR era v a) where
toCanonicalCBOR _v (LedgerCBOR a) = assumeCanonicalEncoding $ toEraCBOR @era

@qnikst qnikst marked this pull request as draft January 26, 2026 12:19
@qnikst
Copy link
Collaborator Author

qnikst commented Jan 26, 2026

Thanks for the deep review!

I'm converting this PR to draft as it will be superseded by the smaller ones.
We will address all the comments being made here to ensure everything is fixed in the subsequent PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants