Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ts,rust] bbox query of a very small FGB (<13kb) will result in overfetch, which could cause HTTP 416 #338

Open
michaelkirk opened this issue Dec 21, 2023 · 2 comments

Comments

@michaelkirk
Copy link
Collaborator

When doing a bbox request, the ts and rust clients (maybe others?) over-fetch the initial request on the likely chance that it will obviate future requests.

The math is, we guess an overly generous header size (currently 2kb) and then the first three layers of the index. We have to assume the branching factor of the index, since we haven't fetched the header yet. Altogether this comes to about 13kb.

If the actual file is less than that, most web servers seem to happily return all the data up to the end of the file, so no problem.

Other webservers however, such as https://static-web-server.net, will return an HTTP 416: Range not satisfiable error, which breaks the client.

Proposed solution

At least for the initial header fetch, for which we have no knowledge of the actual file size, we should have a graceful fallback to a more conservative fetch upon receiving a 416. For very small files, it would seem to make sense to request the entire file at once, but it's probably a bad idea if this overly complicates all the various request code just to optimize this edge case of tiny files.

@kylebarron
Copy link
Contributor

kylebarron commented Feb 26, 2024

FWIW this does not happen only on the initial request. When testing geoarrow/geoarrow-rs#494 on a local file using the object-store LocalFileSystem it also overfetches. E.g. I see:

[src/io/flatgeobuf/reader/object_store_reader.rs:18:9] range = "bytes=0-12943"
[src/io/flatgeobuf/reader/object_store_reader.rs:31:9] start_range = 0
[src/io/flatgeobuf/reader/object_store_reader.rs:32:9] end_range = 12944
[src/io/flatgeobuf/reader/object_store_reader.rs:18:9] range = "bytes=12944-1061519"
[src/io/flatgeobuf/reader/object_store_reader.rs:31:9] start_range = 12944
[src/io/flatgeobuf/reader/object_store_reader.rs:32:9] end_range = 1061520
thread 'io::flatgeobuf::reader::r#async::test::test_countries' panicked at src/io/flatgeobuf/reader/object_store_reader.rs:38:14:
called `Result::unwrap()` on an `Err` value: Generic { store: "LocalFileSystem", source: OutOfRange { path: "/Users/kyle/github/geoarrow/geoarrow-rs/fixtures/flatgeobuf/countries.fgb", expected: 1048576, actual: 192736 } }

This file (https://github.com/geoarrow/geoarrow-rs/blob/6edb9f53e84d1784bd0dd49c68dd340f2f4b1434/fixtures/flatgeobuf/countries.fgb) is only 200kb in size the request is for ~1MB. Judging from the size of the request ((1061520-12944) / (2 ^ 20)) I presume this is because the caching layer of http_range_client itself overfetches. (In which case, maybe my version of the error is actually a bug in http_range_client?)

For my purposes, a workaround is to clamp the upper bound of the range to the size of the file, found via an initial HEAD request.

@bjornharrtell
Copy link
Member

I've also suspected the fix is not good enough, also for the TS implementation but I've unfortunately not had time to make a reproduction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants