Stop defaulting to using enums as deltas, use bitflagged structs with custom Serialize and Deserialize instead #2

chinedufn · 2021-04-12T13:53:13Z

Right now if a struct has a small number of fields we default to using an enum to encode the delta.

Something like:

// We don't use generics in the generated code, just for illustration.
enum Delta3<A, B, C> {
    NoChange,
    Change_0(A),
    Change_1(B),
    Change_2(C),
    Change_0_1(A, B),
    Change_1_2(B, C),
    Change_0_2(A, C),
    Change_0_1_2(A, B, C),

The idea behind this was to be able to only transmit exactly what has changed at the cost of a single byte of overhead (the enum variant).

The downside of the enum approach is that it can't support more than a few fields. If you had 10 fields you would more than 2^10 (1024) enum variants to encode every combination of fields changing.

There is also some code complexity in needing to generate diffing and patching functions that have match blocks that match on every possible variant.

We can accomplish the 1 byte overhead without the downsides of the enum approach.

Something like:

struct MyStructDelta {
    changed_fields: u8, // Encode changed fields into the bits
    field_a: Option<DeltaTypeForFieldA>,
    field_b: Option<DeltaTypeForFieldB>,
}

We keep a single u8 within which we can use the 8 bits to encode whether or not up to 8 fields in the struct have changed.

If a struct has more than 8 fields we can use a u16 to encode changes.

Technically the enum approach would only need 1 byte overhead even if you have more than 8 fields, but I suspect that compile times would begin to suffer. This needs some investigation so that we can gauge the impact on compile times. We can leave the enum approach around as a non default approach and document our findings on the compile time impact.

From there we would generate a custom Serialize and Deserialize implementation for MyStructDelta. If a field is Option::None, we skip it when serializing the struct to a map.

When deserializing, we look at the changed_fields bits to know which fields to deserialize the map to.

Implementation

We would add a new FieldBatchingStrategy (probably needs to be renamed to something that better communicates that all it is is the way that we want to generate the delta. Perhaps DeltaFormat) variant, perhaps Self::BitflaggedStruct. We'd then add new tests and code to dipa-derive and dipa-derive-test in order to support this new way of generating deltas.

Then we'd update the documentation to explain the format and why it is the default.

The text was updated successfully, but these errors were encountered:

mainrs · 2021-04-12T14:20:11Z

Maybe a downside worth mentioning is that the u* implementation would be limited to 64/128 fields for a given struct. Could be something being mentioned in the docs as a traid-off between compilation time and number of supported fields maybe 🤔

_{I've been looking for a crate like this for some time, even started to work on one myself. I'll just use yours :)}

chinedufn · 2021-04-12T14:31:05Z

That's a good point. There are already some compile time errors when you have too many fields for the enum based deltas such as:

dipa/crates/dipa-derive-test/src/all_tests/ui/max_delta_batch_too_large.stderr

Lines 1 to 7 in adf0586

    
           error: The max_fields_per_batch attribute must be less than or equal to 7. 
        
           This limit is meant to prevent the use of large values, since as max_fields_per_batch grows 
        
           compile times grow exponentially. 
        
            --> $DIR/max_delta_batch_too_large.rs:4:31 
        
             | 
        
           4 | #[dipa(max_fields_per_batch = 8)] 
        
             |                               ^

So yeah we'd have documentation and also a compile time error that tells you to use #[dipa(delta_format = "bitflagged")] or whatever we end up calling it.

chinedufn · 2021-04-13T15:20:33Z

Actually sorry - there wouldn't be a field limit.

We are totally free to use multiple u*

For example, if there are 22 fields we would use a u16 and a u8

Or, if it made the logic simpler, we may just use only u8s. So 127 fields would just be 16 u8s, for example.

toothbrush7777777 · 2021-04-14T21:32:47Z

Actually sorry - there wouldn't be a field limit.

We are totally free to use multiple u*

For example, if there are 22 fields we would use a u16 and a u8

Or, if it made the logic simpler, we may just use only u8s. So 127 fields would just be 16 u8s, for example.

Maybe I'm misunderstanding something here but wouldn't you need only 3 u8s?

chinedufn · 2021-04-14T22:04:43Z

For the 22 field example?

You need to be able to say which of the 22 fields are included in the serialized format. This would require 22 bits of information. A single u8 would only give us 8 bits of information to work with.

But 1 u8 would be totally fine for structs with 8 or fewer fields. Hopefully that makes sense.

toothbrush7777777 · 2021-04-14T22:12:34Z

For the 22 field example?

You need to be able to say which of the 22 fields are included in the serialized format. This would require 22 bits of information. A single u8 would only give us 8 bits of information to work with.

But 1 u8 would be totally fine for structs with 8 or fewer fields. Hopefully that makes sense.

Yes, of course.
That was a mistake; I meant 3 u8s.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop defaulting to using enums as deltas, use bitflagged structs with custom Serialize and Deserialize instead #2

Stop defaulting to using enums as deltas, use bitflagged structs with custom Serialize and Deserialize instead #2

chinedufn commented Apr 12, 2021 •

edited

mainrs commented Apr 12, 2021 •

edited

chinedufn commented Apr 12, 2021

chinedufn commented Apr 13, 2021

toothbrush7777777 commented Apr 14, 2021 •

edited

chinedufn commented Apr 14, 2021

toothbrush7777777 commented Apr 14, 2021

Stop defaulting to using enums as deltas, use bitflagged structs with custom Serialize and Deserialize instead #2

Stop defaulting to using enums as deltas, use bitflagged structs with custom Serialize and Deserialize instead #2

Comments

chinedufn commented Apr 12, 2021 • edited

Implementation

mainrs commented Apr 12, 2021 • edited

chinedufn commented Apr 12, 2021

chinedufn commented Apr 13, 2021

toothbrush7777777 commented Apr 14, 2021 • edited

chinedufn commented Apr 14, 2021

toothbrush7777777 commented Apr 14, 2021

chinedufn commented Apr 12, 2021 •

edited

mainrs commented Apr 12, 2021 •

edited

toothbrush7777777 commented Apr 14, 2021 •

edited