Add a proposal for a data update that avoids list indices #448

gspencergoog · 2026-01-08T00:36:46Z

Description

This PR has a proposal for a different data binding method in the 0.9 protocol.

I'm not entirely convinced that it is the best thing since sliced bread, but it has its good and bad points.

The main driving idea is that LLMs are bad at list indices, and that list mutations are non-local (they affect the indexes of later items in the list), so this data model spec attempts to avoid them. Avoiding indices means avoiding JSON pointers, though, which is too bad, since they are well known.

The proposed solution also tends to make the data "flatter" and definitely requires more unique IDs to be generated and maintained, but LLMs seem to be better at that than indices.

See the specification/0.9/docs/data_proposal.md file for more details.

jacobsimionato

I personally feel strongly that it would be better to keep the protocol simple and solve this problem on the agent side. But I don't want to get in the way of you advancing the v0.9 proposal, so approving this for if you do decide to incorporate it!

jacobsimionato · 2026-01-08T01:33:43Z

specification/0.9/docs/data_proposal.md

+}
+```
+
+But if we had already removed the "Admin" item, the index would actually be "0", and the LLM would have to track that, and be informed of any external mutations to the list.


In a case where the agent and client is mutating the same data, perhaps a proper 2-way sync mechanism is needed anyway, and that would be another way of solving the incorrect index problem? I wonder if we should be focusing more on the idea of agent-client data model sync, and then solving these problems with agent-side utilities.

E.g. for this use case, if we had some automated two-way sync of the data model, then perhaps on the agent side we could have a tool call like:

replaceListItem(listPath: "/user/roles/", currentValue: "Admin", newValue, "Owner")

In this case, it would still be possible for agents to define data models which always have ids within list items, so that they can identify them via id for mutation.

That way, the complexity around LLM manipulation of data can be kept inside the agent, and potentially tailored to the specific agent use case. The protocol remains simpler.

In fact, I think most of this proposal could be implemented as an inference utility, at least for cases where the agent is the only actor modifying its own data.

Agent stores current state of data model in Hybrid Adjacency Map format

LLM is prompted to create data model updates using this format as you demonstrate

Agent framework applies the changes to generate new state

Agent framework converts new and old states to traditional nested format, and computes JSON patch diff

Agent sends JSON patch to client

jacobsimionato · 2026-01-08T01:43:11Z

specification/0.9/docs/data_proposal.md

+```
+
+### **Option B: Hybrid Adjacency Map (HAM) (Recommended)**
+


I must admit, I do love that the acronym is HAM!

jacobsimionato · 2026-01-08T01:55:50Z

specification/0.9/docs/data_proposal.md

+
+## **Hybrid Approach: Efficiency vs. Atomicity**
+
+The Hybrid Adjacency Map format encourages a mixed strategy that balances token efficiency with update granularity:


I worry that this optimizes the data update use case at the expense of other use cases. E.g. some use cases are:

LLM defines new data model from scratch. This proposal worsens latency and potentially reliability because the format is more verbose.

LLM updates existing data model. This proposal improves reliability.

Data model created or updated directly via tool result or database lookup, without LLM's involvement. This proposal adds technical complexity because the regular JSON needs to be converted to HAM.

I personally suspect that the majority of usage will be 1 and 3, at least initially. So maybe it's good to keep the protocol simple and performant out of the box for those immediate use cases, then let people add the technical complexity of HAM for cases where incremental LLM-driven manipulation of data is important?

gspencergoog · 2026-01-08T02:13:56Z

I personally feel strongly that it would be better to keep the protocol simple and solve this problem on the agent side. But I don't want to get in the way of you advancing the v0.9 proposal, so approving this for if you do decide to incorporate it!

Yeah, I think I'm not well enough convinced that this is the way to go to incorporate it. I think I'd like to try some other solutions, similar to what you suggest above: keep the protocol simple and introduce complexity between the agent and the LLM to get correct output and convert it to the standard format. It would be nicer for clients to deal with JSON pointers and raw JSON objects, even if we have to use something like this behind the scenes to get the quality we need from the LLM.

And your point about mutations not actually being that common is also relevant. Perhaps if it fails too often, we just tell the LLM that it can't add/remove individual items in lists and has to output an entirely new list if it needs to do that.

And the iterative correction loop in 0.9 might just mitigate some of the issues that crop up to, at least for referencing invalid indices.

Anyhow, I'm glad I tried this, and we can revisit the concepts here if they turn out to be useful. Thanks for reviewing it.

gspencergoog added 6 commits January 7, 2026 10:36

Add a proposal for a data update that avoids list indices

405ad27

Add Data Proposal

f72aea9

Address binding

04404a0

Implement data proposal

8d05238

Update evals.

567ec90

Merge branch 'main' into data_proposal

66132d8

gspencergoog requested a review from jacobsimionato January 8, 2026 00:37

jacobsimionato approved these changes Jan 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a proposal for a data update that avoids list indices #448

Add a proposal for a data update that avoids list indices #448

Uh oh!

gspencergoog commented Jan 8, 2026

Uh oh!

jacobsimionato left a comment

Uh oh!

jacobsimionato Jan 8, 2026

Uh oh!

jacobsimionato Jan 8, 2026

Uh oh!

jacobsimionato Jan 8, 2026

Uh oh!

jacobsimionato Jan 8, 2026

Uh oh!

gspencergoog commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		```

		### Option B: Hybrid Adjacency Map (HAM) (Recommended)


		## Hybrid Approach: Efficiency vs. Atomicity

		The Hybrid Adjacency Map format encourages a mixed strategy that balances token efficiency with update granularity:

Add a proposal for a data update that avoids list indices #448

Are you sure you want to change the base?

Add a proposal for a data update that avoids list indices #448

Uh oh!

Conversation

gspencergoog commented Jan 8, 2026

Description

Uh oh!

jacobsimionato left a comment

Choose a reason for hiding this comment

Uh oh!

jacobsimionato Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

jacobsimionato Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

jacobsimionato Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

jacobsimionato Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

gspencergoog commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants