Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions on instruction gas cost #154

Open
lxfind opened this issue Mar 27, 2022 · 8 comments
Open

Questions on instruction gas cost #154

lxfind opened this issue Mar 27, 2022 · 8 comments

Comments

@lxfind
Copy link
Contributor

lxfind commented Mar 27, 2022

A few questions around how we chose the gas cost for some of the instructions.
In this table:

pub fn bytecode_instruction_costs() -> Vec<(Bytecode, GasCost)> {

  1. MoveTo writes new data into the storage, while MoveFrom removes data from storage. However, MoveFrom (459) is significantly more expensive than MoveTo (13). Why is MoveTo so cheap? What's the rationale behind this choice? Isn't MoveFrom considered as data deletion and should be encouraged instead of punished?
  2. Why is MoveFromGeneric (13) significantly cheaper than MoveFrom (459)?
  3. Why is ImmBorrowGlobal (23) more expensive than MutBorrowGlobal (21)? Should mutable borrows be more expensive since we expect mutations?
  4. We don't seem to distinguish the cost between global storage accesses and stack memory accesses. For instance, when we compute the cost of WriteRef, we don't really look at whether the reference is pointing to the stack or to a global storage. This makes the cost of global storage accesses (both reads and writes) way too cheap than it should be. Is this intentional?
  5. When we compute the gas cost for a native function call (
    let gas_amt = table.native_cost(native_table_idx.into());
    let memory_size = AbstractMemorySize::new(std::cmp::max(1, size) as GasCarrier);
    debug_assert!(memory_size.get() > 0);
    gas_amt.total().mul(memory_size)
    ), we are first adding the computation cost and memory cost, and then multiply the sum with the memory size. Shouldn't we only multiple the memory cost with memory size? Is this a bug?

cc @tzakian

@sblackshear
Copy link
Contributor

Can comment on (4), but the others do indeed seem suspicious to me.

Move does charge more for global storage accesses than for local ones, but ReadRef and WriteRef isn't the mechanism for this:

  • To do a ReadRef on a global value, the global must previously have been loaded into the VM's cache (e.g., via a call to borrow_global or move_from that missed the cache). I believe the charge for global reads happens at the point of the cache miss.
  • To do a WriteRef on a global value, the value will already be sitting in the local cache. The VM actually doesn't have a (convenient) way to determine whether it is writing a local or global value, and computationally the write costs the same. The VM does charge for global writes at the end of the transaction when it derives a WriteSet and then charges for each byte written to global storage.

@lxfind
Copy link
Contributor Author

lxfind commented Mar 27, 2022

The VM does charge for global writes at the end of the transaction when it derives a WriteSet and then charges for each byte written to global storage.

I couldn't find the code for this. Do you happen to know where in code we charge for this?

@tnowacki
Copy link
Contributor

(3) The gas costs are not prescriptive in this way. They (at least originally) are supposed to be inline with the actual computational load
(5) Might be a bug

@tnowacki
Copy link
Contributor

The VM does charge for global writes at the end of the transaction when it derives a WriteSet and then charges for each byte written to global storage.

I couldn't find the code for this. Do you happen to know where in code we charge for this?

I don't know offhand, but @vgao1996 might know. If not, I can dig. But there is a section of the VM that computes the right set, and my guess is it would be there

@tzakian
Copy link
Contributor

tzakian commented Mar 28, 2022

As regards the global write charges this logic can be found here: https://github.com/diem/diem/blob/main/diem-move/diem-vm/src/diem_vm_impl.rs#L609 and here https://github.com/diem/diem/blob/main/diem-move/diem-vm/src/diem_vm.rs#L267. Note that this logic is largely based on the way Diem writes back to storage and the average size of a Diem Account blob at that time.

Regarding the other questions:

  1. No incentives were factored into these gas prices. These gas prices were the result of instrumenting the VM back in 2019/2020 to record the time taken to execute each instruction (w.r.t. data size if applicable) across the Move test suite. The cost for an instruction was then the average time taken to execute that instruction and scaled so that basic operations (e.g., Add) had a cost of 1. My guess here for why MoveFrom is more expensive is that it may incur an overhead to pull the data into the cache, whereas MoveTo will write to the cache.
  2. Not sure about that. As mentioned above these measurements were taken some time ago -- my guess is that there might have been a change in how the two were implemented from when these measurements were taken (e.g., MoveFromGeneric could have resolved to a MoveFrom, so this is only showing the cost of that resolution)
  3. Ditto what Todd said
  4. --
  5. Maybe I'm reading this wrong, but all natives (IIRC) are defined as a cost w.r.t. to the size of the data they are operating over. So the cost for every native function would be of the form comp_cost * memory_size + mem_cost * memory_size = (comp_cost + mem_cost) * memory_size so this seems correct to me?

@lxfind
Copy link
Contributor Author

lxfind commented Mar 29, 2022

Thanks @tnowacki and @tzakian !

Regarding 5, why do we multiply comp_cost with memory_size? I would expect memory_size to be only a multiplier of memory_cost (which would be consistent with how we calculate gas cost on non-native call instructions.
For instance, when we push a new value into a vector, we are charging using memory_size equal to the size of the element, which should multiply memory_cost, but the computation is not linear to the size of the new element.

@tnowacki
Copy link
Contributor

Maybe I'm reading this wrong, but all natives (IIRC) are defined as a cost w.r.t. to the size of the data they are operating over. So the cost for every native function would be of the form comp_cost * memory_size + mem_cost * memory_size = (comp_cost + mem_cost) * memory_size so this seems correct to me?

If I understand this correctly, it assumes O(n) for time and memory. Maybe we need a time/mem scaling factor in the table for native costs?

@lxfind
Copy link
Contributor Author

lxfind commented Mar 30, 2022

Another question: assuming we share VM storage cache across execution sessions (or does it?), and we charge it when we try to read global resource but had a cache miss, does it make the gas cost of a transaction non-deterministic and dependent on the current cache state?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants