More performance optimizations #136

alexdowad · 2014-09-03T06:37:51Z

Investigate effect of making SortedSet balance less frequently (or even moving to RB trees rather than AVL trees)
Even when there is no need to rebalance, SortedSet sometimes copies more nodes than is necessary. Fix this.
SortedSet#delete_at traverses the tree twice. Add a method to AVLNode so it just needs to traverse the tree once.
When a SortedSet is created with a 1-param block (to derive sort keys), should the sort keys for each node be cached? (In some cases, the computation to derive the sort key might be expensive.)
In Set#union, check if other is a Hamster::Set which is larger than self. If so, flip around self and other. (It will be cheaper to build up the result using the larger trie.)
In Set#intersection, if other is significantly smaller than self, but is not a Hamster::Set, then create an empty Trie, iterate through other, and add all the elements which are also in self to the new Trie.
In Set#subset? and Set#proper_subset?, if other is not a ::Set or Hamster::Set (i.e. #include? may be slow), and if other is large, it may be faster to build a temporary ::Set out of other first, and use that for testing inclusion. (Don't bother using a Hamster::Set for this purpose, it will be slower.) Another option would be to iterate through other, count the number of elements which are in self, and see if the number is equal to self.size.
Likewise in Set#disjoint?: if self.size < other.size but other is not a ::Set or Hamster::Set, testing other.include?(item) may be slow (and may lead to O(n^2) run time). In this case, it may be better to build a temporary ::Set for testing inclusion, or just use the 2nd branch so the inclusion test is done on self, not other.
In Trie, add methods for bulk insertion and deletion (which copy less nodes than repeated adding/deletion of single elements). Use these for operations like Set#union and so on.
Investigate the effect on Vector of using an implementation strategy more like the Persistent::Vector library. That is, all Vector tries would always be composed of full, 32-item nodes. Then, each Vector would have a "tail" array of 0-31 additional items at the end. Vector#push would only have to modify the trie once for every 32 pushes; the rest of the time, it would just be copying the "tail" array. Likewise for Vector#pop. If this proves worthwhile, consider having both "head" and "tail" arrays for odd numbers of items at the front and back. This could also make Vector#shift and #unshift fast. Retrieving an item from a Vector by index would require checking whether it falls within the "head", the "tail", or the vector trie itself.
If giving Vectors a "head" and "tail" proves to be a worthwhile and successful strategy, compare its performance as a stack/queue with Deque. If Vector can be made almost as efficient as Deque for that purpose, drop Deque and allow Vector to replace it.
Otherwise, if keeping Deque is necessary, try to mitigate the nasty worst-case scenario which happens when a large Deque is built up, and then items are alternately popped from the front and back. (It will keep reversing the list, and reversing it back again.) If, say, we are reversing @rear and moving the items to @front, I would suggest: copy the first 1/3 of the items and keep them on @rear. Reverse the remaining 2/3 and put them on @front.
There is another, less deadly worst-case scenario when retrieving the #first or #last item from a Deque with all the items in a long @front or @rear. Perhaps each Deque can keep the last item in @front and @rear in instance variables, so #first and #last can always be returned quickly.

The text was updated successfully, but these errors were encountered:

…and arg is large Set issue hamstergem#136

alexdowad · 2014-10-22T03:39:54Z

Great work, thanks!

dubek · 2014-11-03T19:33:32Z

One more possible optimization:

[ ] A quick benchmark shows that Vector#shift is 1.2x - 1.8x faster than Vector#drop(1), although semantically they are identical. Need to investigate for more params (are two shifts faster than drop(2)?).

alexdowad · 2014-11-04T03:50:27Z

@dubek, good find! I just pushed an optimization based on this: 6813845. Rather than delegating to #shift, I made #drop use an implementation similar to #shift.

dubek · 2014-11-05T06:08:07Z

Similar one: SortedSet#drop is way slower than SortedSet#above, especially for large sorted sets. Consider something like (not tested!):

def drop(n)
  # verify n is positive, less than size, etc.
  above(at(n - 1))
end

and maybe similar for SortedSet#take (implement in terms of #below). If no one beats it to me, I might get a chance to try/implement tomorrow.

alexdowad · 2014-11-05T07:07:58Z

@dubek, thanks for this suggestion. Please have a look at the latest master.

dubek · 2014-11-05T13:39:29Z

@alexdowad Looks good. I think you should return clear instead of EmptySortedSet (in SortedSet#drop and SortedSet#take) so the original @comparator is kept. And probably SortedSet#clear should be modified to handle cases when @comparator is set or not (like Hash#clear). Or am I missing something?

Aside: Might be a matter of taste, but I think we would have enjoyed a way to add assertions or invariants to our code (checked during testing). So after each call, verify for example that @trie.size equals to sum of all children and entries; and that @comparator / @default_proc is kept, and so on. Not sure if there's a clean way to do it in Ruby; there's this: https://github.com/jorgemanrubia/solid_assert but it's a bit old and I'm not sure about performance implications (ideally I'd add such an assert on each loop of Trie operations, for example). If you think it's valuable maybe we should have an open issue to look into it.

alexdowad · 2014-11-05T13:57:26Z

@dubek, the value of @comparator doesn't matter for an empty sorted set; its job is to determine the order of elements, but an empty set has no elements to order. (But if you can see some case where it would make a difference, please let me know.) However, you do have a good point: if called on a subclass, #drop and #take should return an instance of the subclass.

Do you need assertions in the "middle" of methods? Or would checking assertions on method entry/exit be enough? If so, I can think of a way to do it with no performance impact in "production". (Write a class macro which "hooks" a method with pre/post assertions only when the tests are running.)

alexdowad · 2014-11-05T14:03:02Z

PS. Just pushed a fix for the problem you noticed with SortedSet#take and #drop.

dubek · 2014-11-05T14:05:41Z

Sorry, still buggy (unless I'm missing something) on 435e4ca :

2.1.3 :001 > sorted_set = Hamster::SortedSet.new([4,5,6]) { |a| -a }
 => Hamster::SortedSet[6, 5, 4] 
2.1.3 :002 > sorted_set.drop(3).add(5).add(4).add(6)
 => Hamster::SortedSet[4, 5, 6]

alexdowad · 2014-11-05T14:15:06Z

OK, now I see why we need the @comparator even for an empty sorted set. Do you think #clear should do the same?

dubek · 2014-11-05T14:19:22Z

I think that:

#drop and #take should return clear if they're about to return an empty SortedSet
#clear should act like Hash#clear: if there's a @comparator then alloc an empty node with it, otherwise return self.class.empty.

alexdowad · 2014-11-05T14:21:22Z

OK. You are absolutely right about that.

dubek · 2014-11-05T15:22:21Z

One more possible optimization:

For SortedSet it should be faster to determine #subset?, #disjoint?, #intersect and so on by walking on the two SortedSets together, in order. So for example for #instersect I can start by walking on the higher between the two sets' minimums, and stop walking when I reach the lower of the two sets' maximums (where walking means - advance the cursor who is pointing at the lower value; if there's a match between the two cursors, put the value in the intersection result). So if we have two SortedSets of sizes N and M (N > M) then #intersect should be O(lgN + lgM + M), whereas today it is O(M * lnN).

The problem with this suggested optimization is that it is only valid if the two SortedSets have the same order (comparator). Anyway we can check that?

dubek · 2014-11-05T15:26:37Z

Whoops, my last comment is regarding #subset?, not #intersect. I look at #intersect and it is implemented as AVL keep_only which performs the divide-and-conquer. Sorry.

alexdowad · 2014-11-05T15:40:52Z

@dubek, right now there is no way to check if 2 SortedSets have the same order. If you want to do something like that, the best I can think of would be to check if both are using the default order, and switch to the faster code if they are. (Perhaps use a single, canonical Proc object for the default comparator.)

dubek · 2014-11-05T15:43:08Z

We need C++ templates ;-)

I found that SortedSet#filter can be made 3x faster with @node.bulk_delete ; committing it soon.

issue #136

dubek · 2014-11-05T16:19:52Z

Pushed to master: fix for the SortedSet#drop and #take when clearing the collection, and optimization for SortedSet#filter.

alexdowad · 2014-12-26T21:41:16Z

I just experimented with loosening the balance requirements for SortedSet. Surprisingly, it makes insertion performance slightly worse.

Retrospect: Since our AVL trees are immutable, each insertion means copying all the nodes to the root, at the very least. Allocating a couple more nodes here and there to rebalance the trees doesn't add much overhead on top of that. And keeping the trees in very strict balance means you don't have to traverse down as far for the next insertion.

alexdowad · 2014-12-26T22:16:23Z

I've checked off the proposed optimization for SortedSet#delete_at, not because I've done anything about it, but because it is too trivial to worry about.

alexdowad · 2014-12-28T20:13:45Z

While working on one of these optimizations, I just discovered and fixed 2 bugs in SortedSet!

dubek mentioned this issue Oct 21, 2014

Do we need so many alternate names for methods? #131

Closed

dubek added a commit to dubek/hamster that referenced this issue Oct 21, 2014

Optimize Set#union for better performance when receiver is small Set …

4212885

…and arg is large Set issue hamstergem#136

dubek mentioned this issue Oct 21, 2014

Optimize Set#union for better performance when receiver is small Set and arg is large Set #141

Merged

dubek mentioned this issue Oct 22, 2014

Optimize Hash#merge and Set#union by using Trie#bulk_put #143

Closed

dubek added a commit that referenced this issue Nov 5, 2014

Optimize SortedSet#filter for speed

45fca20

issue #136

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More performance optimizations #136

More performance optimizations #136

alexdowad commented Sep 3, 2014

alexdowad commented Oct 22, 2014

dubek commented Nov 3, 2014

alexdowad commented Nov 4, 2014

dubek commented Nov 5, 2014

alexdowad commented Nov 5, 2014

dubek commented Nov 5, 2014

alexdowad commented Nov 5, 2014

alexdowad commented Nov 5, 2014

dubek commented Nov 5, 2014

alexdowad commented Nov 5, 2014

dubek commented Nov 5, 2014

alexdowad commented Nov 5, 2014

dubek commented Nov 5, 2014

dubek commented Nov 5, 2014

alexdowad commented Nov 5, 2014

dubek commented Nov 5, 2014

dubek commented Nov 5, 2014

alexdowad commented Dec 26, 2014

alexdowad commented Dec 26, 2014

alexdowad commented Dec 28, 2014

More performance optimizations #136

More performance optimizations #136

Comments

alexdowad commented Sep 3, 2014

alexdowad commented Oct 22, 2014

dubek commented Nov 3, 2014

alexdowad commented Nov 4, 2014

dubek commented Nov 5, 2014

alexdowad commented Nov 5, 2014

dubek commented Nov 5, 2014

alexdowad commented Nov 5, 2014

alexdowad commented Nov 5, 2014

dubek commented Nov 5, 2014

alexdowad commented Nov 5, 2014

dubek commented Nov 5, 2014

alexdowad commented Nov 5, 2014

dubek commented Nov 5, 2014

dubek commented Nov 5, 2014

alexdowad commented Nov 5, 2014

dubek commented Nov 5, 2014

dubek commented Nov 5, 2014

alexdowad commented Dec 26, 2014

alexdowad commented Dec 26, 2014

alexdowad commented Dec 28, 2014