You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently column idx are u16 and field lengths (for Strings, Binary, etc.) are u32. I expect in practice that column indexes would almost always fit in a 1 byte varint and field lengths typically in 3 bytes (if not 2).
The properties data is already not random access, it must be processed serially. So there's no loss of functionality there.
This would be a major breaking change, so I don't expect it to be adopted anytime soon, but if you end up making a breaking format release in #81, you should consider piling this on.
I was working with openaddresses data which is a lot of point geometries with short string columns. Using varints for columns and field lengths outputs a file 85% the size of the original.
The text was updated successfully, but these errors were encountered:
85%! Ouch... :S I can definitely admit to that the properties encoding deserved a bit more thought. I made it quickly after discovering that try to encode it into a generic flatbuffers schema was very space wasteful.
But yeah, a breaking change isn't likely to happen anytime soon or if ever. Might as well make a new format entirely, perhaps a custom binary encoding. I've been thinking lately and from the discussion at #291 that Flatbuffers (and protobuf) primary function is to allow for evolving schemas but as I see it now it's not an important feature - when a format becomes stable and more or less widespread there is no room for evolution, even backwards compatible.
Currently column idx are u16 and field lengths (for Strings, Binary, etc.) are u32. I expect in practice that column indexes would almost always fit in a 1 byte varint and field lengths typically in 3 bytes (if not 2).
The properties data is already not random access, it must be processed serially. So there's no loss of functionality there.
This would be a major breaking change, so I don't expect it to be adopted anytime soon, but if you end up making a breaking format release in #81, you should consider piling this on.
I made a prototype here: https://github.com/michaelkirk/flatgeobuf/tree/mkirk/varint
I was working with openaddresses data which is a lot of point geometries with short string columns. Using varints for columns and field lengths outputs a file 85% the size of the original.
The text was updated successfully, but these errors were encountered: