Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop defaulting to using enums as deltas, use bitflagged structs with custom Serialize and Deserialize instead #2

Open
chinedufn opened this issue Apr 12, 2021 · 6 comments

Comments

@chinedufn
Copy link
Owner

chinedufn commented Apr 12, 2021

Right now if a struct has a small number of fields we default to using an enum to encode the delta.

Something like:

// We don't use generics in the generated code, just for illustration.
enum Delta3<A, B, C> {
    NoChange,
    Change_0(A),
    Change_1(B),
    Change_2(C),
    Change_0_1(A, B),
    Change_1_2(B, C),
    Change_0_2(A, C),
    Change_0_1_2(A, B, C),

The idea behind this was to be able to only transmit exactly what has changed at the cost of a single byte of overhead (the enum variant).

The downside of the enum approach is that it can't support more than a few fields. If you had 10 fields you would more than 2^10 (1024) enum variants to encode every combination of fields changing.

There is also some code complexity in needing to generate diffing and patching functions that have match blocks that match on every possible variant.

We can accomplish the 1 byte overhead without the downsides of the enum approach.

Something like:

struct MyStructDelta {
    changed_fields: u8, // Encode changed fields into the bits
    field_a: Option<DeltaTypeForFieldA>,
    field_b: Option<DeltaTypeForFieldB>,
}

We keep a single u8 within which we can use the 8 bits to encode whether or not up to 8 fields in the struct have changed.

If a struct has more than 8 fields we can use a u16 to encode changes.

Technically the enum approach would only need 1 byte overhead even if you have more than 8 fields, but I suspect that compile times would begin to suffer. This needs some investigation so that we can gauge the impact on compile times. We can leave the enum approach around as a non default approach and document our findings on the compile time impact.


From there we would generate a custom Serialize and Deserialize implementation for MyStructDelta. If a field is Option::None, we skip it when serializing the struct to a map.

When deserializing, we look at the changed_fields bits to know which fields to deserialize the map to.


Implementation

We would add a new FieldBatchingStrategy (probably needs to be renamed to something that better communicates that all it is is the way that we want to generate the delta. Perhaps DeltaFormat) variant, perhaps Self::BitflaggedStruct. We'd then add new tests and code to dipa-derive and dipa-derive-test in order to support this new way of generating deltas.

Then we'd update the documentation to explain the format and why it is the default.

@mainrs
Copy link

mainrs commented Apr 12, 2021

Maybe a downside worth mentioning is that the u* implementation would be limited to 64/128 fields for a given struct. Could be something being mentioned in the docs as a traid-off between compilation time and number of supported fields maybe 🤔

I've been looking for a crate like this for some time, even started to work on one myself. I'll just use yours :)

@chinedufn
Copy link
Owner Author

That's a good point. There are already some compile time errors when you have too many fields for the enum based deltas such as:

error: The max_fields_per_batch attribute must be less than or equal to 7.
This limit is meant to prevent the use of large values, since as max_fields_per_batch grows
compile times grow exponentially.
--> $DIR/max_delta_batch_too_large.rs:4:31
|
4 | #[dipa(max_fields_per_batch = 8)]
| ^

So yeah we'd have documentation and also a compile time error that tells you to use #[dipa(delta_format = "bitflagged")] or whatever we end up calling it.

@chinedufn
Copy link
Owner Author

Actually sorry - there wouldn't be a field limit.

We are totally free to use multiple u*

For example, if there are 22 fields we would use a u16 and a u8

Or, if it made the logic simpler, we may just use only u8s. So 127 fields would just be 16 u8s, for example.

@toothbrush7777777
Copy link

toothbrush7777777 commented Apr 14, 2021

Actually sorry - there wouldn't be a field limit.

We are totally free to use multiple u*

For example, if there are 22 fields we would use a u16 and a u8

Or, if it made the logic simpler, we may just use only u8s. So 127 fields would just be 16 u8s, for example.

Maybe I'm misunderstanding something here but wouldn't you need only 3 u8s?

@chinedufn
Copy link
Owner Author

For the 22 field example?

You need to be able to say which of the 22 fields are included in the serialized format. This would require 22 bits of information. A single u8 would only give us 8 bits of information to work with.

But 1 u8 would be totally fine for structs with 8 or fewer fields. Hopefully that makes sense.

@toothbrush7777777
Copy link

For the 22 field example?

You need to be able to say which of the 22 fields are included in the serialized format. This would require 22 bits of information. A single u8 would only give us 8 bits of information to work with.

But 1 u8 would be totally fine for structs with 8 or fewer fields. Hopefully that makes sense.

Yes, of course.
That was a mistake; I meant 3 u8s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants