-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
follow-up on #13150: improve the closure-computation algorithm again #13156
base: trunk
Are you sure you want to change the base?
Conversation
This change improves the asymptotic behavior of the closure-computation by ensuring that closure-computation work is shared between nodes instead of being duplicated. Consider for example the relation a -> b b -> c c -> a The previous code would first compute a -> [b, c, a] by traversing the graph, then b -> [c, a, b] by traversing the graph again, then c -> [a, b, c] again. The new code traverses the graph until it finds a cyclic node (starting from [a] this would be [a]), then compute a -> [a, b, c] using the same algorithm as before, and then immediately computes the result for c and c from the result of a, without any additional traversal. Performance results using the benchmark script of Florian Weimer ( https://gitlab.com/gasche-snippets/transitive-closure ). Implementations: - `old` is the code in 5.2 - `new` is ocaml#13150 - `again` is the algorithm in this commit Tests: - `isolated` is a diagonal relation (1 -> 1, 2 -> 2, 3 -> 3, ...) - `cycle` is (1 -> 2, 2 -> 3, 3 -> 4, ..., n -> 1) - `complete` is (1 -> [1, 2, ..., n], 2 -> [1, 2, ..., n], ...) | isolated | Mean [ms] | Relative | |:---------|------------:|------------:| | `old` | 162.1 ± 6.3 | 1.66 ± 0.08 | | `new` | 97.7 ± 2.9 | 1.00 | | `again` | 203.1 ± 4.1 | 2.08 ± 0.07 | | cycle | Mean [s] | Relative | |:--------|--------------:|----------------:| | `old` | 1.952 ± 0.054 | 596.91 ± 154.38 | | `new` | 0.034 ± 0.002 | 10.29 ± 2.71 | | `again` | 0.003 ± 0.001 | 1.00 | | complete | Mean [ms] | Relative | |:---------|-------------:|-------------:| | `old` | 59.3 ± 5.3 | 8.46 ± 1.38 | | `new` | 218.5 ± 15.8 | 31.15 ± 4.83 | | `again` | 7.0 ± 1.0 | 1.00 | The new code has a larger constant factor than the ocaml#13150 algorithm for isolated graphs, which shows as a 2x slowdown. This makes it slightly slower than the "old" code in 5.2 in this trivial case. It performs much better than both trunk and 5.2 in the other cases.
One may reasonably wonder whether it is worth having an algorithm that is faster in the "hard" cases, if it has worse constant factors in the "easy" case (2x slowdown). Let me explain why I think it is worth it.
|
@fweimer-rh I wonder if you would be willing to give a look at the code and give your impression on the complexity/performance tradeoff. |
@gasche I extracted the previously problematic relation graph from the Coccinelle/Menhir build, and your implementation is indeed much faster than what's in trunk today (one second versus 5.7 seconds). I'll try an array-based implementation and will report if it is faster. |
I had some discussions with @chambart earlier this week about this issue (the original #13150, not this new algorithm), and we suspect that for menhir-generated parsers we should simply not compute the transitive closure at all (we expect that very few parameters will be invariant or unused, and for the invariant parameters it's unlikely we will have any opportunity to actually make use of that).
I don't have any code for these suggestions at the moment, but if someone is interested in testing them I could put it on my todo list. |
A note that per-file flags are a big pain for downstream packagers. You have to remember that Dune is not obvious for people who don't work on OCaml daily. I use OCaml a lot but I have no idea how to add a per-file flag via Dune. Someone coming to an OCaml project as an outside downstream packager just thinks Dune is weird. Not to mention there are many other build systems used by OCaml projects, and a lot of half-baked hand-written Makefiles, which makes everything harder. |
The per-file flags could be made to work independently of the build system with the introduction of unit headers (ocaml/RFCs#26). For example menhir would add at the start of the generated file an attribute disabling this optimisation, and the compiler would read this attribute and change its settings accordingly. |
I hadn't thought of this argument for unit headers: "easier than figuring out how to pass per-file flags with Dune. Genius! @lthls hopefully this discussion motivated you to review this PR :-) (The implementation uses persistent maps and sets just like the original code -- I thought that it would offer a fairer comparison. I am happy to start using Hashtbl as suggested by @fweimer-rh to see if it improves constant factors, now that we know that there exists a real program that is affected by those constant factors.) |
(I tried using hashtables for the |
The Flambda backend contains a transitive-closure computation algorithm that misbehaves on certain Menhir-generated programs. #13150 introduced an improved algorithm that fixes the case known to misbehave, but is asymptotically disappointing in other cases -- see the discussions there for the details. The cases where #13150 performs worse than it should have not been found in the wild yet, but were exhibited by the simple relations in the benchmark script proposed by @fweimer-rh. I was nerd-sniped and decided to tweak the algorithm to perform better.
The PR improves the asymptotic behavior of the closure-computation by ensuring that closure-computation work is shared between nodes instead of being duplicated.
Consider for example the relation
a -> b
b -> c
c -> a
The previous code would first compute
a -> [b, c, a]
by traversing the graph, then
b -> [c, a, b]
by traversing the graph again, then
c -> [a, b, c]
again. The new code traverses the graph until it finds a cyclic node (starting from [a] this would be [a]), then compute
a -> [a, b, c]
using the same algorithm as before, and then immediately computes the result for c and b from the result of a, without any additional traversal.
Performance results using the benchmark script of Florian Weimer ( https://gitlab.com/gasche-snippets/transitive-closure ).
Implementations:
old
is the code in 5.2new
is flambda: Improve transitive closure in invariant_params_in_recursion #13150again
is the algorithm in this commitTests:
isolated
is a diagonal relation (1 -> 1, 2 -> 2, 3 -> 3, ...)cycle
is (1 -> 2, 2 -> 3, 3 -> 4, ..., n -> 1)complete
is (1 -> [1, 2, ..., n], 2 -> [1, 2, ..., n], ...)old
new
again
old
new
again
old
new
again
The new code has a larger constant factor than the #13150 algorithm for isolated graphs, which shows as a 2x slowdown. This makes it slightly slower than the "old" code in 5.2 in this trivial case.
It performs much better than both trunk and 5.2 in the other cases.