Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement delta-based historical account storage/endpoints #290

Closed
polydez opened this issue Mar 28, 2024 · 9 comments · Fixed by #418
Closed

Implement delta-based historical account storage/endpoints #290

polydez opened this issue Mar 28, 2024 · 9 comments · Fixed by #418
Assignees
Milestone

Comments

@polydez
Copy link
Contributor

polydez commented Mar 28, 2024

Needs to be discussed:

From @bobbinth:

  1. For public account, we should have an endpoint that would return the full account state. A few questions about how this endpoint should work:
    a. Should it return only the latest state? Or should we try to return the state at some prior point in time? The former is going to be much easier but not sure if that will be sufficient.
    b. Should this endpoint allow returning only some portions of the account state? For example, should we be able to request just the storage or just the vault etc.?
  2. It would be cool if we somehow were send only the minimum set of changes required to bring the account state up to the latest point. So, if the public account was updated, I should be able to download only the delta to get it up to date. This means that for public accounts we'd need to store all historical deltas.
  3. How would the client get either the full account state or the delta? And how could we determine what is better to send in a given situation? (i.e., in some cases sending the full account state may be cheaper than sending a set of all deltas).

One potential approach is as follows:

  1. We don't change the state sync endpoint at all - or maybe change it very minimally. Some options here include:
    a. For public accounts, in addition to the latest state hash we send also the latest nonce.
    b. Or maybe we send all underlying hashes of the latest state - i.e., code root, vault root, storage root, and nonce - but nothing more than that.
  2. The client would be able to use the above info to determine that the state of the account hash changed and would use a separate endpoint to get the actual changes.
  3. This separate endpoint could be something like get_account_state_delta. The arguments would be account_id, from_nonce and to_nonce. This endpoint would return a single delta describing the changes which happened in the account between the two specified nonces. The big question here is if we can implement this efficiently. Doing something like this would require:
    a. Storing deltas for all accounts in the node (could be something like account_state_deltas table).
    b. Merging all deltas between the to/from nonces into a single delta on the fly. It would be awesome if we could pre-compute this somehow - but not sure that's possible.

This is a separate issue from the discussion here: #142 (comment)

@bobbinth bobbinth added this to the v0.3 milestone Apr 12, 2024
@bobbinth bobbinth modified the milestones: v0.3, v0.4 Apr 29, 2024
@bobbinth bobbinth modified the milestones: v0.4, v0.5 Jun 10, 2024
@Dominik1999 Dominik1999 moved this to In Progress in User's testnet Jul 18, 2024
@polydez
Copy link
Contributor Author

polydez commented Jul 24, 2024

I can propose several different solutions to store public accounts in order to serve get_account_state_delta endpoint.

Store full account states on each change

The simplest solution would be to store full account state on each change. This will give us ability to simply calculate delta between "from" and "to" nonces when requested. A drawback of this solution is ineffective disk space consumption. We will have to store full account even on just nonce change.

In order to reduce space consumption, I propose to save account vault, storage and code to separate table (something like "blobs"), using their hashes as primary key. Thus, if some of them wasn't modified, no additional data will be written. But each even small modification will require to save full object.

Store account deltas

Another approach would be to store full account initial state and delta on each modification. This will significantly reduce space needed, but we will have to load/apply all deltas in order to get account state in some point of time (the most frequently used operation I think will be getting the latest account state). We can reduce number of deltas needed to be applied by storing full account state each 10-50-100 (TBD) changes and always saving the latest state as full account for better performance.

Store account states as Merkle trees

In order to reduce space consumption and still efficiently calculate account differences, we can store account states as forests of Merkle trees, where each node represents one entity (account main data, asset, storage slot, account code, etc.) each tree root represents account's nonce and trees have partly intersecting nodes: we could use one of the advantages of Merkle trees, that two equivalent subtrees have the same hash (or the same ID, in other words).

graph TB
	r1[Account nonce 1] --> root-hash["hash(H1,...,Hn)"]
	root-hash --Main data hash--> main-data[Account main data]
	root-hash --Vault hash--> Vault --Vault root hash--> vault-smt[Smt]
	root-hash --Storage hash--> Storage
	root-hash --Code hash--> Code
	
	Storage --Slots hash--> Slots --Slots root hash--> slots-smt[Smt]
	Storage --Layout hash--> Layout
	Storage --Maps hash--> Maps
Loading

For example, if we change only balance of the account, we insert new root with a new nonce and only subtree corresponding to account data, where balance is stored, is inserted, all remaining subtrees point to the same nodes of the previous root (nonce).

graph TB
	subgraph Nonce 1
		r1[Account nonce 1] --> root-hash1["hash(H1,...,Hn)"]
		root-hash1 --Main data 1 hash--> main-data1[Account main data 1]
		root-hash1 --Vault hash--> Vault --Vault root hash--> vault-smt[Smt]
		root-hash1 --Storage hash--> Storage
		root-hash1 --Code hash--> Code
		
		Storage --Slots hash--> Slots --Slots root hash--> slots-smt[Smt]
		Storage --Layout hash--> Layout
		Storage --Maps hash--> Maps
	end
	
	subgraph Nonce 2
		r2[Account nonce 2] --> root-hash2["hash(H1,...,Hn)"]
		root-hash2 --Main data 2 hash--> main-data2[Account main data 2]
		root-hash2 --Vault hash--> Vault
		root-hash2 --Storage hash--> Storage
		root-hash2 --Code hash--> Code
	end
Loading

Thus, if we want to calculate delta between account states with nonces, for example, 5 and 100, we need to only calculate Merkle update between those two roots and send only changed information. This is pretty cheap, because we don't even need to apply 95 deltas, we should only traverse those two trees. Another advantage of this approach, that two different but similar accounts could reuse the same subtrees (for example, if accounts share the same code, they will point to code subtrees with the same hash).

One of drawbacks of this solution is that account storing/loading will require number of requests to the DB and SQLite is not very good for such workloads, RocksDb or another NoSQL would fit much better, but for now we can use SQLite. We can also make loading of particular account's data lazy, on demand, at least for the node (for clients it might be too slow to request all needed information, we probably should return them all requested data at once).

We also need to think about data granularity in favor of access speed: for example, storage might be represented as a single node or as a subtree of different storage slots or even values. We might want to choose more performant solution vs. more compact.

I'm not sure we have enough resources to follow this approach now. We will have to reimplement all structures for public accounts and (de)serialization in miden-base. Another solution will be implement structures for account as Merke tree with (de)serialization only in additional (SDK) crate, which will be used by both the node and client apps.

Merkle forest of binary data

As a more simplified solution for difficulties of the previous approach, we can deal not with account structure, but with abstract binary data, like trees of binary blocks. In this case we can use binary Merkle trie to index chunks of untyped binary data:

graph TB
	r1[Account nonce N] --> root-hash1[Root hash]
	root-hash1 --> node11[Hash 0]
	root-hash1 --> node12[Hash 1]

	node11 --> leafhash1[Hash 0-0] --> leaf1[Chunk: 0..256]
	node11 --> leafhash2[Hash 0-1] --> leaf2[Chunk: 256..512]
	node12 --> leafhash3[Hash 1-0] --> leaf3[Chunk: 512..768]
	node12 --> leafhash4[Hash 1-1] --> leaf4[Chunk: 768..1024]
Loading

But it's unclear for me, how to represent non-typed binary data as a Merkle tree effectively: after changing of account's data, its contents might shift several bytes left or right and this might produce absolute different Merkle tree which ruins the whole idea.

@bobbinth @igamigo, please share your opinion about what approach you think will be better for us. If something is unclear from the description above, please let me know, I will try to improve it.

@Mirko-von-Leipzig
Copy link
Contributor

I'm not yet familiar enough with storage and its usage to make an informed comment. But my 2c:

If you store things flat and normalized then finding the "delta" is trivial. It becomes a simple sql query. We have a relational db - squishing it into a nosql/kv shape will be painful and also slow.

SELECT * FROM xxx WHERE block_num BETWEEN x AND y -- QED

It can also remain competitive in terms of storage size.. depending of course. Storing things flat makes each element cost a constant. Normalizing the various hashes (if larger than an int) reduces this constant.

Lookups for trees are also bad:

  • You lookup one node at a time i.e. its sequential with no look-ahead possible
  • The lookups are random i.e. random IO access
  • OS file caching is destroyed as well
    This results in terrible performance as soon as their is any disk latency i.e. non-ssd, or any form of network attached storage.

TL;DR - my uninformed (ito Miden) opinion is to stick to relational layouts unless you can prove otherwise.

@bobbinth
Copy link
Contributor

The simplest solution would be to store full account state on each change. This will give us ability to simply calculate delta between "from" and "to" nonces when requested. A drawback of this solution is ineffective disk space consumption. We will have to store full account even on just nonce change.

I would actually say that this may not be that simple - especially if account states are large. So, overall, I think we can discard this approach.

Another approach would be to store full account initial state and delta on each modification. This will significantly reduce space needed, but we will have to load/apply all deltas in order to get account state in some point of time (the most frequently used operation I think will be getting the latest account state). We can reduce number of deltas needed to be applied by storing full account state each 10-50-100 (TBD) changes and always saving the latest state as full account for better performance.

For the purposes of this endpoint, I don't think we'll ever need to apply deltas to an account - but we will need to merge deltas together to get the "combined delta" (ability to merge deltas was recently implemented in 0xPolygonMiden/miden-base#797). But overall, I agree that once the number of deltas becomes large, merging them together becomes quite cumbersome. I also don't see an easy way to do "checkpoints" here - but I also haven't thought about it too much.

In order to reduce space consumption and still efficiently calculate account differences, we can store account states as forests of Merkle trees

I tend to agree with @Mirko-von-Leipzig here that using Merkle trees may be an overkill and that we can get the benefits of this approach with simpler data structures.

For example, we could store fungible asset updates in a dedicated table which could look something like this:

account_id nonce faucet_id amount_delta
0x123456... 1 0x56789... +10
0x123456... 3 0x56789... -4
0x123456... 10 0x56789... +6

Then, computing delta for fungible assets between any two nonces becomes a very simple query. We should be able to do something similar for non-fungible assets and account storage components (though the queries there may be somewhat more complex).

Overall, my thinking is that for now, we should go with the second approach (i.e., storing deltas and then merging them on the fly). This is by far the easiest approach in my opinion and it will allow us to provide an endpoint which shouldn't change in the future. Then, we can improve the internal implementation by splitting the deltas into smaller individual components which can be merged at the database level.

@polydez
Copy link
Contributor Author

polydez commented Jul 25, 2024

@bobbinth

For the purposes of this endpoint, I don't think we'll ever need to apply deltas to an account

I though, we need to be able to get full account state for some point of time. If not, so yes, we can continue to store account's latest state in separate table for purposes of transaction execution and serving of GetAccountDetails requests and additionally save deltas for clients synchronization.

@Mirko-von-Leipzig @bobbinth thank you for your responses! So let's implement the second approach (store accounts deltas) and will see what to improve in the future. For the fast and dirty solution, we'll store deltas serialized in binary form and will merge them by using the new function. In the future, let's think about putting values into the separate fields and calculations using SQL.

@polydez
Copy link
Contributor Author

polydez commented Jul 25, 2024

@bobbinth

We don't change the state sync endpoint at all - or maybe change it very minimally. Some options here include:
a. For public accounts, in addition to the latest state hash we send also the latest nonce.

One small question. It's unclear for me, isn't just block number enough? We update account states in the DB atomically on block applying, so I think it should be enough to get deltas between from_block_num and block_num. Yes, in some cases there will be more than one update of the same account during a single block number, but they all will be merged before returning.

@Mirko-von-Leipzig
Copy link
Contributor

Related to the above ^^ If we query delta's by block number range instead of by nonce then we can use bloom filters + store entire block's delta as a single blob. Throw in some caching and you've got fast lookups and highly compressed storage.

@polydez
Copy link
Contributor Author

polydez commented Jul 25, 2024

@Mirko-von-Leipzig nice idea, thanks! I will think about it.

@bobbinth
Copy link
Contributor

One small question. It's unclear for me, isn't just block number enough? We update account states in the DB atomically on block applying, so I think it should be enough to get deltas between from_block_num and block_num. Yes, in some cases there will be more than one update of the same account during a single block number, but they all will be merged before returning.

I think that using block numbers may actually work better than using nonces as sometimes the nonce may be updated a few times in a single block and then figuring what intermediate deltas are could be quite complicated. But I'll think through this a bit more.

@polydez polydez moved this from In Progress to In Review in User's testnet Jul 26, 2024
@polydez
Copy link
Contributor Author

polydez commented Jul 27, 2024

Closed by #418

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
3 participants