-
-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve circular dependency of Rust implemention with geozero #283
Comments
Thanks @pka for an awesome write-up! I agree that option 1 is by far the simplest and safest from the code quality and maintenance perspective. Fundamentally, we have a bad situation of Geozero relies on 3rd party libraries to handle the low level processing of the geo data. For example MVT relies on protobuf, and GeoJSON on serde. Yet, This essentially is the option 2, and I think it is a better path long term. Yet, to get there, most of the refactoring would need to be done in close sync inside a single repo with a common CI. Thus, I would recommend starting with option 1, and possibly try to move to option 2. P.S note that |
Not sure I want to rule out any option at this point, but gut feeling says I'm not fond of option 1. I think I agree with "In theory, flatgeobuf should offer a low level binary stream encoding/decoding" so that the dependency situation can be similar to other formats for geozero. Instead of option 1 "temp" then option 2 "if possible" due to the need to close proximity and common CI, could that not be a (complex) branch here instead possibly including a fork of geozero for the life time of the branch? As a side note to perhaps explain my motivations for the above, I seem to have a preference toward centralizing flatgeobuf into this monorepo and concerted releases and versioning for the cross language support of flatgeobuf. But I do see this approach has some disadvantages too... |
The circular dependency is a bit annoying. It seems like there is some general support for option 2 as an end state, though how we get there might not be for sure. Moving everything to one repo just to move it back later seems not ideal. And a long running branch has it's own problems. I'm wondering if there's some smaller things we can do to move towards the desired end state that would be incrementally mergable/useful. One small piece might be the currently ubiquitous use of
Would folks be interested in a local flatgeobuf::{Error, Result} type that can be used for the non-geozero code? That's something I think we could merge in the nearer term. |
Motivation: Currently we have a circular dependency between the geozero and flatgeobuf crates. It seems like we're on board with breaking that cycle, but aren't quite sure how to do that. Some related discussion in flatgeobuf#283 What changed: 1. Introduce a new flatgeobuf::Error type and use it for code that doesn't depend on Geozero. 2. Relocate code that depends on Geozero into separate `geozero_integration` modules. This code still returns GeozeroError. 3. For tests, which might return either flatgeobuf::Error or geozero::GeozeroError, use Box<dyn std::error::Error>.
Motivation: Currently we have a circular dependency between the geozero and flatgeobuf crates. It seems like we're on board with breaking that cycle, but aren't quite sure how to do that. Some related discussion in flatgeobuf#283 What changed: 1. Introduce a new flatgeobuf::Error type and use it for code that doesn't depend on Geozero. 2. Relocate code that depends on Geozero into separate `geozero_integration` modules. This code still returns GeozeroError. 3. For tests, which might return either flatgeobuf::Error or geozero::GeozeroError, use Box<dyn std::error::Error>.
Motivation: Currently we have a circular dependency between the geozero and flatgeobuf crates. It seems like we're on board with breaking that cycle, but aren't quite sure how to do that. Some related discussion in flatgeobuf#283 What changed: 1. Introduce a new flatgeobuf::Error type and use it for code that doesn't depend on Geozero. 2. Relocate code that depends on Geozero into separate `geozero_integration` modules. This code still returns GeozeroError. 3. For tests, which might return either flatgeobuf::Error or geozero::GeozeroError, use Box<dyn std::error::Error>.
Motivation: Currently we have a circular dependency between the geozero and flatgeobuf crates. It seems like we're on board with breaking that cycle, but aren't quite sure how to do that. Some related discussion in flatgeobuf#283 What changed: 1. Introduce a new flatgeobuf::Error type and use it for code that doesn't depend on Geozero. 2. Relocate code that depends on Geozero into separate `geozero_integration` modules. This code still returns GeozeroError. 3. For tests, which might return either flatgeobuf::Error or geozero::GeozeroError, use Box<dyn std::error::Error>.
Motivation: Currently we have a circular dependency between the geozero and flatgeobuf crates. It seems like we're on board with breaking that cycle, but aren't quite sure how to do that. Some related discussion in flatgeobuf#283 What changed: 1. Introduce a new flatgeobuf::Error type and use it for code that doesn't depend on Geozero. 2. Relocate code that depends on Geozero into separate `geozero_integration` modules. This code still returns GeozeroError. 3. For tests, which might return either flatgeobuf::Error or geozero::GeozeroError, use Box<dyn std::error::Error>.
I recently grepped for flatgeobuf in the geozero "core" crate and found so few hits, that I think option 3, removing flatgeobuf from the geozero crate is the way to go. The other crates would still have a dependency on flatgeobuf. For example geozero-cli:
As long we have versionend dependencies, this shouldn't be a problem. After a geozero update we have to stay on the old version as dependency for geozero-cli until we've also updated flatgeobuf. That all, or am I missing something? |
Motivation: Currently we have a circular dependency between the geozero and flatgeobuf crates. It seems like we're on board with breaking that cycle, but aren't quite sure how to do that. Some related discussion in flatgeobuf#283 What changed: 1. Introduce a new flatgeobuf::Error type and use it for code that doesn't depend on Geozero. 2. Relocate code that depends on Geozero into separate `geozero_integration` modules. This code still returns GeozeroError. 3. For tests, which might return either flatgeobuf::Error or geozero::GeozeroError, use Box<dyn std::error::Error>.
I think you're right that currently more of geozero lives in FGB than FGB lives in geozero. So if the metric is "What is the least lines of code to move", removing fgb from geozero would probably win. However, I will lay my cards on the table and say:
That said, it's dumb to manage software projects on only instincts and feelings. 😅 I'm willing to put effort into manifesting my vision here, and think I could have it wrapped up in a week or two, but if someone feels like its a bad idea and wants to do it the other way, I won't stop you. I don't want my notion of perfection to stand in the way of your progress. Maybe a middle ground is to scrutinize PR's like #310 under the lens of "What if Michael disappears and the big switch never happens?" would we still want this? I think that I can build up some mostly uncontroversial changes that will make the ultimate switchover fairly easy without doing a bunch of "damage" to the current codebase. |
The reason why geozero exists is that I wanted to have a generic processing API for my flatgeobuf implementation in Rust. That's also the reason why most initial tests and examples in geozero use flatgeobuf. So trying to remove geozero from flatgeobuf feels (!) strange to me, indeed. I could definitely see better APIs and when a geo-iterator API gets real, I'm all for implementing it also for flatgeobuf. So isolating the geozero usage as you do in #310 is fine for me, but it doesn't solve this issue though. My point is, that it doesn't make sense that geozero depends on flatgeobuf, but it makes sense the other way around. Or to formulate it more objectively: An API should not depend on a certain storage format, but a format implementation can use a generic processing API. |
I wonder if there's a misunderstanding here. I'm not advocating for making geozero require flatgeobuf, which would be strange indeed. I'm advocating for flatgeobuf to be "just another format" that geozero optionally supports. Specifically:
Part of this would require extracting, and potentially duplicating, some things that flatgeobuf relies on that currently live in geozero like Duplication isn't great, but I think of it more like decoupling. For instance, currently, if flatgeobuf wanted to add a new data type, all our existing writers would have to support it, which isn't inherently bad, though a little awkward at times. But the same thing won't happen if any other format (like arrow) wanted to add a new data type. To have flatgeobuf's features be the mediator for how all other formats interact in geozero feels a bit like "the tail wagging the dog". |
We have a long standing maintenance problem because of the circular dependency between Rust Flatgeobuf and geozero. I've discussed that with @nyurik and we see the following solutions:
Pros/Cons:
So before going into more detail, is there an option you would rule out, @bjornharrtell?
The text was updated successfully, but these errors were encountered: