Skip to content

feat: replace toml with cli config#732

Merged
Mirko-von-Leipzig merged 17 commits intonextfrom
mirko/cli
Mar 19, 2025
Merged

feat: replace toml with cli config#732
Mirko-von-Leipzig merged 17 commits intonextfrom
mirko/cli

Conversation

@Mirko-von-Leipzig
Copy link
Collaborator

@Mirko-von-Leipzig Mirko-von-Leipzig commented Mar 12, 2025

This PR replaces the node's configuration via toml files to pure clap cli args. We retain the ability to configure via a file using environment variables instead.

Left undone

  • Update github scripts (waiting on an initial review first)

Documentation in readme. I'll skip the readme and add it to the operator docs in a separate PR.

The faucet still uses a toml file. This file is re-used when creating a new account and when running the faucet. We may want to relook at the expected flow here but I didn't want to fiddle with it.

New flow

I've swopped around the subcommand structure so that its now miden-node <component> <action>. This was mostly to "organise" the store commands but I can imagine having additional subcommands for rpc and block-producer in the future.

I wanted to change up the store so that miden-node store bootstrap would also create the database and seed it with the genesis data. In this case the genesis.dat would become ephemeral (though it can be brought back when if we decide we need it). miden-node store start would now never create the db, but instead expect it to exist. I also want to move database migration into a separate subcommand (and not automatically do it on start). I decided not to do this in this PR as it was too much churn.

Simplifications

I've replaced the various store filepaths and directory paths with a single data_directory which will contain the database, blockstore and genesis.dat. I don't see a good reason to modify these at present.

I've also simplified node start by making the internal gRPC servers (block-producer & store) use localhost:0 which assigns a random open port. This means these no longer need to be configured at all. We can somewhat trivially make this overrideable but I don't see the point. If you want that, spawn the components separately imo :)

I'll also note that I was tempted to make the components more robust to startup failures, but it seems like tonic in general does not handle connection failures in many cases - meaning we cannot trivially loop/retry until all required components are ready. This may take a bit more work to achieve.

Usage

miden-node node start [OPTIONS] --rpc.url <URL> --store.data-directory <DIR>
miden-node rpc start [OPTIONS] --rpc.url <URL> --store.url <URL> --block-producer.url <URL>
miden-node block-producer start [OPTIONS] --store.url <STORE_URL> <URL>

miden-node store dump-genesis > genesis.toml
miden-node store bootstrap [OPTIONS] --data-directory <DIR> --accounts-directory <DIR>
miden-node store start [OPTIONS] --url <URL> --data-directory <DIR>

@Mirko-von-Leipzig Mirko-von-Leipzig marked this pull request as ready for review March 12, 2025 12:41
Comment on lines +21 to +46
Start {
/// Url at which to serve the RPC component's gRPC API.
#[arg(long = "rpc.url", env = ENV_RPC_URL, value_name = "URL")]
rpc_url: Url,

/// Directory in which the Store component should store the database and raw block data.
#[arg(long = "store.data-directory", env = ENV_STORE_DIRECTORY, value_name = "DIR")]
store_data_directory: PathBuf,

/// The remote batch prover's gRPC url. If unset, will default to running a prover
/// in-process which is expensive.
#[arg(long = "batch_prover.url", env = ENV_BATCH_PROVER_URL, value_name = "URL")]
batch_prover_url: Option<Url>,

/// The remote block prover's gRPC url. If unset, will default to running a prover
/// in-process which is expensive.
#[arg(long = "block_prover.url", env = ENV_BLOCK_PROVER_URL, value_name = "URL")]
block_prover_url: Option<Url>,

/// Enables the exporting of traces for OpenTelemetry.
///
/// This can be further configured using environment variables as defined in the official
/// OpenTelemetry documentation. See our operator manual for further details.
#[arg(long = "open-telemetry", default_value_t = false, env = ENV_ENABLE_OTEL, value_name = "bool")]
open_telemetry: bool,
},
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we shouldn't use the manual (non-derive) way of building these args. That would let us re-use the same ones and keep consistency.

I'm unsure about the naming scheme as well, and whether it should always remain consistent. I suspect I've done a little of column A and a bit of column B.

e.g. should store.url be used to configure RPC, block-producer AND the store start itself?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I fully understand the issue. Is it that when we do miden-node store start we use --url to specify store URL, but if we use miden-node block-producer start we use --store.url? This seems pretty natural to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree its pretty natural as is. I was considering if it would help cement the relation between the components if kept it consistent throughout.

i.e. it might make it more obvious that its the same URL if its store.url everywhere.

Comment on lines +60 to +61
let block_producer_url = format!("http://{block_producer_address}");
let channel = tonic::transport::Endpoint::try_from(block_producer_url)?.connect().await?;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit awkward and makes me think I was too hasty in removing our custom Endpoint type in #654.

As a summary:

  • Tonic wants a Uri but accepts strings in the format of http(s)://{SocketAddr}.
  • TcpListener::bind wants a SocketAddr.
  • Uri cannot be trivially converted to a SocketAddr
  • Url can be converted to a SocketAddr so we use it.
  • Clients may want to perform a DNS lookup to the server address of http://rpc.testnet.miden.io

I think its okay to leave as is, but we probably want to revisit this (again :'( ).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we had the Endpoint struct - how would this be different?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually know. It just seems messy and incorrect to me at the moment.

As an example, if you provide an https URL for a server at the moment then we don't actually support that. We strip the protocol from the URL when we convert to a socket address and don't add TLS support when we create the tonic server.

This is sort of fine because we don't (ourselves) want to use TLS internally. But it sort of points out the leaky abstraction here.

If we implement something ourselves then we'll probably have the opposite issue of rejecting weird but valid URLs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually maybe the real answer is making startup robust. The only reason we have TcpListener involved is to separate server initialization from the clients spinning up in serve.

If we have the components capable of starting up robustly with internal retries etc, then we can simply use Uri which is what tonic can use.

Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! Not a full review yet, but I left some comments inline.

I've swopped around the subcommand structure so that its now miden-node <component> <action>. This was mostly to "organise" the store commands but I can imagine having additional subcommands for rpc and block-producer in the future.

I think this is largely fine, but I left a couple of comments inline that I think some "node-wide" commands could be brought up to the root level rather than be under the node sub-command. For example:

miden-node start
miden-node bootstrap

Feel more natural to me than

miden-node node start
miden-node store bootstrap

I wanted to change up the store so that miden-node store bootstrap would also create the database and seed it with the genesis data. In this case the genesis.dat would become ephemeral (though it can be brought back when if we decide we need it). miden-node store start would now never create the db, but instead expect it to exist. I also want to move database migration into a separate subcommand (and not automatically do it on start). I decided not to do this in this PR as it was too much churn.

I like this and agree with doing it in a separate PR. One thing we should make sure is that the bootstrap command is deterministic (i.e., given the same genesis inputs file it initializes the DB in the same way).

I've replaced the various store filepaths and directory paths with a single data_directory which will contain the database, blockstore and genesis.dat. I don't see a good reason to modify these at present.

I think this is fine for now - the only question I have is if we'd still have the flexibility to somehow put the database file and the blockstore on different disks, if we wanted to.

Usage

Summarizing my various comments, I'd probably have the commands like something like this:

miden-node start [OPTIONS] --url <URL> --data-directory <DIR>
miden-node rpc start [OPTIONS] --url <URL> --store-url <URL> --block-producer-url <URL>
miden-node block-producer start [OPTIONS] --url <URL> --store-url <STORE_URL>

miden-node dump-genesis > genesis.toml
miden-node bootstrap [OPTIONS] --data-directory <DIR> --accounts-directory <DIR>
miden-node store start [OPTIONS] --url <URL> --data-directory <DIR>

Comment on lines +60 to +61
let block_producer_url = format!("http://{block_producer_address}");
let channel = tonic::transport::Endpoint::try_from(block_producer_url)?.connect().await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we had the Endpoint struct - how would this be different?

Comment on lines +21 to +46
Start {
/// Url at which to serve the RPC component's gRPC API.
#[arg(long = "rpc.url", env = ENV_RPC_URL, value_name = "URL")]
rpc_url: Url,

/// Directory in which the Store component should store the database and raw block data.
#[arg(long = "store.data-directory", env = ENV_STORE_DIRECTORY, value_name = "DIR")]
store_data_directory: PathBuf,

/// The remote batch prover's gRPC url. If unset, will default to running a prover
/// in-process which is expensive.
#[arg(long = "batch_prover.url", env = ENV_BATCH_PROVER_URL, value_name = "URL")]
batch_prover_url: Option<Url>,

/// The remote block prover's gRPC url. If unset, will default to running a prover
/// in-process which is expensive.
#[arg(long = "block_prover.url", env = ENV_BLOCK_PROVER_URL, value_name = "URL")]
block_prover_url: Option<Url>,

/// Enables the exporting of traces for OpenTelemetry.
///
/// This can be further configured using environment variables as defined in the official
/// OpenTelemetry documentation. See our operator manual for further details.
#[arg(long = "open-telemetry", default_value_t = false, env = ENV_ENABLE_OTEL, value_name = "bool")]
open_telemetry: bool,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I fully understand the issue. Is it that when we do miden-node store start we use --url to specify store URL, but if we use miden-node block-producer start we use --store.url? This seems pretty natural to me.

Comment on lines +27 to +53
/// Dumps the default genesis configuration to stdout.
///
/// Use this as a starting point to modify the genesis data for `bootstrap`.
DumpGenesis,

/// Bootstraps the blockchain database with the genesis block.
///
/// This populates the genesis block's data with the accounts and data listed in the
/// configuration file.
///
/// Each generated genesis account's data is also written to disk. This includes the private
/// key which can be used to create transactions for these accounts.
///
/// See also: `dump-genesis`
Bootstrap {
/// Genesis configuration file.
///
/// If not provided the default configuration is used.
#[arg(long, value_name = "FILE")]
config: Option<PathBuf>,
/// Directory in which to store the database and raw block data.
#[arg(long, env = ENV_STORE_DIRECTORY, value_name = "DIR")]
data_directory: PathBuf,
// Directory to write the account data to.
#[arg(long, value_name = "DIR")]
accounts_directory: PathBuf,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me that these commands should be store-specific. Currently the do affect only the store component, but in the future, bootstrapping process may affect multiple components. Maybe we could move it to the root so that the users could run something like:

miden-node bootstrap [OPTIONS] --data-directory <DIR> --accounts-directory <DIR>

Copy link
Collaborator Author

@Mirko-von-Leipzig Mirko-von-Leipzig Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give a potential example?

The reason I bundled these is because I thought (in general) that these would all be run on the instance running the store. That we're spinning up the store component and these are the set of commands required to do so.

If there is a block-producer bootstrap process then it would likely not have anything to do with this one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm - not quite sure what an example could be, and actually, thinking about this more, it does feel like the store component should have its own bootstrap command. But I do think it may be good to have one "global" bootstrap command which would currently just execute store's initialization process. So, both of these will currently do the same:

miden-node bootstrap [OPTIONS] --data-directory <DIR> --accounts-directory <DIR>
miden-node store bootstrap [OPTIONS] --data-directory <DIR> --accounts-directory <DIR>

But in the future the second one may be a subset of the first one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if we have a single canonical way of doing one thing, at least until there is a stronger indication that we need a combination of things.

I could see that we may want a quick-start option to spin up a node but its possible we would want that to form part of the "bundled" options then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could argue that these commands are different: one is for initializing the "monolithic" node and the other one is for initializing the store component. That is, the user who's setting up the store as a separate component would use the second command, but the user who is setting up the node to run as a combined binary would use the first command - and so, there is no overlap.

They do happen to do the same thing right now - but it may actually be cleaner to keep them separate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be happy with

miden-node store boostrap
           store start
           
miden-node bundled bootstrap
           bundled start

where bundled is the current node -- I don't think this is a great name, but node is also.. strange. I'm wary of dumping the full node commands in the root "folder" so to speak.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

miden-node store boostrap
store start

miden-node bundled bootstrap
bundled start

Yep, let's go with this. I do think not having the "bundled" word would make it a bit easier for the user - but the difference is probably marginal.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node alternate word dump

composite
combined
bundle
pack
in-process
full-node
full
all

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something that is a bit confusing now is that dump-genesis is only in miden-node store.

@Mirko-von-Leipzig
Copy link
Collaborator Author

I've addressed all comments, only open question is what to do regarding dump-genesis.

Copy link
Collaborator

@igamigo igamigo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the deepest review but looks good to me! It seems like various subcommands only include a "start" variant, so I wonder if it would just be worth leaving them as top-level commands (would it solve the dump-genesis ambiguity maybe?) and I also saw your comment about providing space for future commands which would make sense as well.

Comment on lines +13 to +40
/// Starts the block-producer component.
Start {
/// Url at which to serve the gRPC API.
#[arg(env = ENV_BLOCK_PRODUCER_URL)]
url: Url,

/// The store's gRPC url.
#[arg(long = "store.url", env = ENV_STORE_URL)]
store_url: Url,

/// The remote batch prover's gRPC url. If unset, will default to running a prover
/// in-process which is expensive.
#[arg(long = "batch-prover.url", env = ENV_BATCH_PROVER_URL)]
batch_prover_url: Option<Url>,

/// The remote block prover's gRPC url. If unset, will default to running a prover
/// in-process which is expensive.
#[arg(long = "block-prover.url", env = ENV_BLOCK_PROVER_URL)]
block_prover_url: Option<Url>,

/// Enables the exporting of traces for OpenTelemetry.
///
/// This can be further configured using environment variables as defined in the official
/// OpenTelemetry documentation. See our operator manual for further details.
#[arg(long = "enable-otel", default_value_t = false, env = ENV_ENABLE_OTEL)]
open_telemetry: bool,
},
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering your comment, is there anything to document (here and elsewhere) about the https support?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can solve the issue by making component start up more robust. Right now I wouldn't even know what exactly would work beyond (sometimes) discarding the protocol.

One startup is robust it should be able to handle https as well, even if we ourselves never use it for performance/simplicity reasons.

@Mirko-von-Leipzig
Copy link
Collaborator Author

Not the deepest review but looks good to me! It seems like various subcommands only include a "start" variant, so I wonder if it would just be worth leaving them as top-level commands (would it solve the dump-genesis ambiguity maybe?) and I also saw your comment about providing space for future commands which would make sense as well.

Yeah its unclear to me as well; but I think I prefer the consistency of it all. That its always miden-node <component> <action>

Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! What are some more immediate tasks beyond this PR?

  • Update documentation (readme and mdBook).
  • Update deployment scripts.
  • Make the genesis dat file "ephemeral" (i.e., get rid of the dump-genesis command).
  • Anything else?

@Mirko-von-Leipzig
Copy link
Collaborator Author

Looks good! Thank you! What are some more immediate tasks beyond this PR?

  • Update documentation (readme and mdBook).
  • Update deployment scripts.
  • Make the genesis dat file "ephemeral" (i.e., get rid of the dump-genesis command).
  • Anything else?

Allow each server to start even if a downstream server is offline which would improve this endpoint situation. Though the solution given in the issue with connect_lazy isn't entirely correct I don't think.

We can also now do #735 to configure the block-producer further.

@Mirko-von-Leipzig Mirko-von-Leipzig force-pushed the mirko/cli branch 2 times, most recently from 1d5e3bb to 4d131ce Compare March 19, 2025 09:22
@Mirko-von-Leipzig Mirko-von-Leipzig merged commit 37f56c6 into next Mar 19, 2025
1 check passed
@Mirko-von-Leipzig Mirko-von-Leipzig deleted the mirko/cli branch March 19, 2025 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants