Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC - Spatial Queries & Virtualisation in XYFlow #4239

Open
ncthbrt opened this issue May 2, 2024 · 17 comments
Open

RFC - Spatial Queries & Virtualisation in XYFlow #4239

ncthbrt opened this issue May 2, 2024 · 17 comments
Labels
feature request New feature or request

Comments

@ncthbrt
Copy link

ncthbrt commented May 2, 2024

Summary

This RFC explores the potential of adding additional functionality to XYFlow to allow for more complex graphs while still remaining performant for smaller graphs and the general case.

Motivation

For larger more complex graphs, it is possible to achieve improved performance when zoomed in by only rendering the nodes that are visible within the viewport. Currently this is achieved by performing a naïve axis aligned bounding box check on each node & edge. This still requires that all node and edge data is available to the renderer and does come with a performance overhead, especially for smaller graphs and when zoomed out.

Similarly, the multi-selection rect uses similar logic to find nodes that intersect with the selection rect. The multi-selection rect also does not support direct selection of edges.

For advanced cases with a client/server architecture, it would be desirable to be able to lazily fetch data for a given region of the viewport.

For client architectures, it would be desirable to be able to speed up viewport intersection and selection rect performance, and to potentially offer the ability for direct box selection of edges.

Implementation

There is a class of data-structures known as BVH (Bounding Volume Hierarchies) that allow for fast, coarse grained intersection testing. BVHs are widely used in systems such as mapping software and in game development for rendering (including ray-tracing) and physics. These work by placing an axis aligned bounding box around each entity within the system and constructing a tree that contains these boxes in a manner that allows for efficient lookup. Insertion and removal has a time complexity of approximately $O(log_k(n))$, depending upon the data structure used. For larger graphs, this may represent a significant improvement over a naïve approach.

However this is not a given and would require profiling for specific use cases. Additionally baking in a particular structure and its inherent assumptions would negatively impact bundle size. Therefore the API offers a means for users to provide two functions, potentially asynchronous ones, that fetch the nodes and edges respectively within a particular bounding box. These are an overload of the already existing nodes and edges props. When the viewport is moved, or the selection rect is used, these functions are called. Additionally, the API allows the means of invalidating the cached value of a particular node or group of nodes, or even a region within the viewport. This has precedent in that there is already a useUpdateNodeInternals() hook.

This offers a third way of interacting with the XYFlow libs, which represents a middle-ground between controlled and uncontrolled flows. In this third way, the data store remains the authoritative source of graph information, while removing the expensive data integration step and allowing users to use intersection tests of their choice.

There is an additional option to optimistically update nodes when moved or connected which helps maintain interactivity in the face of slow-downs.

XYFlow provides adapters for particular BVH implementations and has created a pro example of a client/server architecture that uses these new APIs to lazily load data as needed. An additional pro example shows how to implement collision resolution using these APIs that reuses the BVH provided by the BVH package.

May also want to add option to configure padding of viewport, so that panning and zooming don't necessarily always trigger a refetch.

Drawbacks

  • Adding a third way to interact with XYFlow libs may introduce additional complexity for users
  • Adding a third way to interact with XYFlow libs may introduce an additional maintenance burden.
  • Maintaining internal state becomes more complex as cache invalidation needs to be handled.
  • Optimistic updates may introduce additional complexity in internals.

Alternatives

Intersection Function

This alternative narrows the scope of the feature to simply provide an function that returns the set of node/edge ids that intersect with a given bounding rect. This would be a lot simpler to implement, however it has the downside that it may not provide enough of a performance boost for large graphs to make the implementation lift worth it.

Allow nodes and edges to come and go

Implementors could provide a transient set of nodes and edges based on viewport position. To implement this, it would need to be validated that nodes and edges can arbitrarily be removed or added to the set of nodes and edges without catastrophic performance hitches. Additionally this would not solve the problem of maintaining user interactivity in cases where updates to the store are slow or asynchronous. To solve that in user land, it would require a similar effort to implementing it once in the library.

Bake in BVH support into the library

Has similar issues to the intersection function alternative.

Unresolved Questions

Open Question: Ordering may be tricky with optimistic updates and cache handling and invalidation

Open Question: How to document and communicate these changes effectively to users

Open Question: How to effectively support this across all packages

Open Question: MiniMap support?

Open Question: Selection-rect support for edges. Implement parser for edge renderer?

@ncthbrt ncthbrt added the feature request New feature or request label May 2, 2024
@peterkogo
Copy link
Member

Thanks so much for the RFC! I am just going to dump a couple of thoughts here.

  1. Performance metrics to check out for Spatial Querying implementation: initial build time, single update, bulk update, point query, rect query, memory

  2. I know BVH trees from 3D game engines, where you have a lot of overlapping bounding boxes and the goal is mostly ray intersections. It might be interesting to see how different spatial data structures behave in this regard as flow graphs are usually a little further spaced out. But maybe this becomes irrelevant when edges are also taken into consideration... Do you have a good resource/information/experience on how in our case BVH might be the way to go?

  3. In my head it would work like this:

  • maintain an AABB in internal node

  • build up a BVH tree in adoptUserNodes or updateNodeInternals

  • query the BVH when viewport moves to render visible nodes (make mounting/unmounting of nodes faster to prevent lag)

  • (maybe) only update the tree after node dragging has finished

  • expose interface for querying the BVH tree

    If I look at the performance of javascript BVH implementations I don't think an async function is really needed and it might degrade performance as async does not come for free and complicates computation ordering. Plus some of that work has to be done anyway when updating node dimensions/positions.

    Is there a really a use case for exposing anything more, even have a cache?

  1. I think if you'd want to lazy load nodes you'd have to load all nodes and edges (maybe not all edges) in advance with no information attached to them. And then implement a placeholder node that fetches the information when it gets into view/big enough. You could also create a way to batch the calls from these nodes. But I feel like this can be implemented implicitly and does not require more BVH API surface area.

  2. This would also pave the way for canvas based edges

@moklick
Copy link
Member

moklick commented May 10, 2024

Hey @ncthbrt

thanks for this detailed RFC!

Even if React Flow wasn't built for huge graphs, I really like the idea of having a better support and a better performance in general. In v12 we already improved the performance for dragging nodes in bigger graphs, but the onlyRenderVisibleElements implementation wasn't touched.

It was always important to us, that React Flow is flexible and adjustable. Maybe it's an option here to expose some functionality, so that users can implement their own intersection algorithms (just yesterday someone created an issue #4272 that goes into that direction). For now we are using getNodesInside for the selection rectangle and onlyRenderVisibleElements. Would it be possible for you to use a BVH if you could overwrite that function somehow? It would be nice if we could find a better implementation than the current naive approach and give users the option to overwrite that functionality. What do you think about that? I would like to start with a small change. In my view it would make sense to concentrate on the virtualization topic first.

Thanks @peterkogo, what do you think about this? Would it be possible to expose some functions to be able to implement a more performant virtualization strategy?

@ncthbrt
Copy link
Author

ncthbrt commented May 10, 2024

@moklick It would be possible to do that I think. I can work on a POC to that end?

Thanks for the comments @peterkogo

@moklick
Copy link
Member

moklick commented May 10, 2024

That would be great, but let's wait for @peterkogo feedback here. I know he is also interested in the topic and maybe he already started something!

Would you like to make a POC for a better built-in virtualization or expose functionality or both?

To answer you open questions:

Ordering may be tricky with optimistic updates and cache handling and invalidation

Not needed for now when we focus on virtualization first

How to document and communicate these changes effectively to users

We should try not to introduce breaking changes, but only more options. We could create a page under /learn where we explain how to use a quadtree or something like that with the new API.

How to effectively support this across all packages

Let's focus on React first.

MiniMap` support?

I think it would make sense to implement an alternative canvas based minimap at some point but I would like to postpone this task.

Selection-rect support for edges. Implement parser for edge renderer?

In my view a naive approach via nodes is enough (if one connected node is visible, an edge is visible too)

@ncthbrt
Copy link
Author

ncthbrt commented May 10, 2024

Happy to do that too!

@peterkogo
Copy link
Member

peterkogo commented May 13, 2024

@ncthbrt I talked to @moklick about how to move forward on this.

You submitted this RFC at an undoubtably tumultuous time here at xyflow :) Just to give a bit of context, we have changed quite a lot about the inner workings of React Flow for the next release and are in the middle of a rewrite of Svelte Flow. There is one (hopefully) very last thing we would like to get into v12, which would influence (and simplify) this RFC immensely - namely how we handle node origins.

Just as a rough timeframe, I will create a PR with the node origin changes in the next 2 weeks, so we can move forward on this afterwards. I'll keep you posted!

@ncthbrt
Copy link
Author

ncthbrt commented May 13, 2024 via email

@peterkogo
Copy link
Member

peterkogo commented Jun 4, 2024

I made a shallow deep dive into acceleration structures for spatial queries so here are my 2 cents.

Due to the declarative nature of React Flow (pass in edges, get out fancy graphs), we pretty much have to expect nodes and edges to change 100% in between updates. Whatever algorithm we chose in the end, fully rebuilding the structure should be fast. To further extend on this point, we can substantially simplify the implementation if we manage to do this in a stateless fashion and without additional heuristics (e.g when do we just update vs fully rebuild).

I did some preliminary testing on implementing Quadtrees, a BVH and using the libraries rbush and flatbush.

Quadtrees

  • Pretty fast to construct, but you end up with a lot of inefficient trees for various situations
  • does not work so good if the full extent you'd like to subdivide is not known from the start (which might happen if we don't want to iterate over all nodes before adding)
  • managing a lot of empty space as well as managing node density, is both influenced by tree depth
  • deciding on what depth and how many nodes per leaf to choose can vary depending on type of flow graph

BVH

  • surprisingly complex and a lot of work involved in optimizing the partition algorithm
  • might take a little longer to construct
  • you end up with a pretty good tree very easily

rbush & flatbush

  • bulk insertion is really fast
  • almost optimal tree
  • it can deal with a lot of weird spatial distributions quite well
  • it's a tested library and just weighs around 3kb
  • flatbush is only really faster for tens of thousands of nodes (when fully rebuilding)

So thats that. I am very inclined to just use rbush - it should be noted however that its currently in a broken state in terms of build dependencies (already opened a pull request) & es module support (forking might be the easiest route, not sure if library will be updated...)

I will release a small benchmark repo for these things soon, however have to deal with js module issues 🥲 first...

If anyone has some libraries I am unaware of or some comments on my preliminary findings feel invited to join the conversation!

@fredericrous
Copy link

This RFC initiative looks exciting. I have a use case that I wonder could complicate the algorithm:

How about nodes that grow?

For context, on my project I have a diagram that represents the sitemap of a website. I drag and drop elements onto the nodes, that make the nodes grow and I then trigger a relayout with d3-flextree and d3-hierarchy

@peterkogo
Copy link
Member

@fredericrous this RFC would support dynamic changes to node size & position. Should be no problem.

@ncthbrt
Copy link
Author

ncthbrt commented Jun 9, 2024

Nice work on the POC @peterkogo.

Still want to experiment with improving performance beyond culling nodes but not in a huge rush to do so right now, so this could be a great intermediate win.

@moklick
Copy link
Member

moklick commented Jun 10, 2024

Thanks for the deep dive and the detailed explanation @peterkogo!

We could fork rbush and publish it under @xyflow/rbush. If your PR mourner/rbush#138 gets merged, we can replace the dependency with the original rbush package again.

Do I get it right, that the rough process would be:

  • build a new tree when nodes come in (in adoptUserNodes for example)
  • use tree for culling (onlyRenderVisibleElements)
  • use tree for node selections

Questions:

  • do we want to export helpers to query it?

@peterkogo
Copy link
Member

peterkogo commented Jun 10, 2024

That sounds about right!
We can expose functions for finding intersections for a specific node, finding intersections for an arbitrary rectangle and finding all colliding nodes.

@ncthbrt
Copy link
Author

ncthbrt commented Jun 10, 2024

Would edges also be included in this?

@moklick
Copy link
Member

moklick commented Jun 10, 2024

Currently we create a box of the connected nodes. We could do the same here and create a tree for the edges too. wdyt @peterkogo

@peterkogo
Copy link
Member

peterkogo commented Jun 10, 2024

@ncthbrt what other ideas do you have?

Still want to experiment with improving performance beyond culling nodes [...]

Some ideas from me: add a inView prop to custom nodes & edges, so it's easier to lazy load things. Maybe have some kind of padding to load things that are slightly out of view as well.

Would edges also be included in this?

I would include edges sooner or later because implementing hover effects on canvas based edges will require this eventually. Though for culling only, having a fast way to determine if either the source or target node is in view would already come a long way. edit: this does not work. Just use a bounding box for edges as well.

Edit: edges will probably have a separate tree for various reasons

@ncthbrt
Copy link
Author

ncthbrt commented Jun 10, 2024

The biggest bottleneck that I've measured for manipulating large, zoomed out graphs has been the store update logic (I'm using yrs, a Rust port of yjs) and updating a thousand or more nodes in the store within a transaction can be quite costly. My hypothesis is that a tiered/hierarchical approach to updates would be appropriate for slower stores. In this hierarchy, responsiveness would be maintained by optimistically updating properties in an internal store until an appropriate idle point is reached at which point the updates are written back. This would also reduce the amount of time spent on integrating changes as updates would be batched.

Some ideas from me: add a inView prop to custom nodes & edges, so it's easier to lazy load things. Maybe have some kind of padding to load things that are slightly out of view as well.

That is a good idea as well!

My idea is tangent to this issue however

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants