Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More performance optimizations #136

Open
9 of 13 tasks
alexdowad opened this issue Sep 3, 2014 · 20 comments
Open
9 of 13 tasks

More performance optimizations #136

alexdowad opened this issue Sep 3, 2014 · 20 comments

Comments

@alexdowad
Copy link
Contributor

  • Investigate effect of making SortedSet balance less frequently (or even moving to RB trees rather than AVL trees)
  • Even when there is no need to rebalance, SortedSet sometimes copies more nodes than is necessary. Fix this.
  • SortedSet#delete_at traverses the tree twice. Add a method to AVLNode so it just needs to traverse the tree once.
  • When a SortedSet is created with a 1-param block (to derive sort keys), should the sort keys for each node be cached? (In some cases, the computation to derive the sort key might be expensive.)
  • In Set#union, check if other is a Hamster::Set which is larger than self. If so, flip around self and other. (It will be cheaper to build up the result using the larger trie.)
  • In Set#intersection, if other is significantly smaller than self, but is not a Hamster::Set, then create an empty Trie, iterate through other, and add all the elements which are also in self to the new Trie.
  • In Set#subset? and Set#proper_subset?, if other is not a ::Set or Hamster::Set (i.e. #include? may be slow), and if other is large, it may be faster to build a temporary ::Set out of other first, and use that for testing inclusion. (Don't bother using a Hamster::Set for this purpose, it will be slower.) Another option would be to iterate through other, count the number of elements which are in self, and see if the number is equal to self.size.
  • Likewise in Set#disjoint?: if self.size < other.size but other is not a ::Set or Hamster::Set, testing other.include?(item) may be slow (and may lead to O(n^2) run time). In this case, it may be better to build a temporary ::Set for testing inclusion, or just use the 2nd branch so the inclusion test is done on self, not other.
  • In Trie, add methods for bulk insertion and deletion (which copy less nodes than repeated adding/deletion of single elements). Use these for operations like Set#union and so on.
  • Investigate the effect on Vector of using an implementation strategy more like the Persistent::Vector library. That is, all Vector tries would always be composed of full, 32-item nodes. Then, each Vector would have a "tail" array of 0-31 additional items at the end. Vector#push would only have to modify the trie once for every 32 pushes; the rest of the time, it would just be copying the "tail" array. Likewise for Vector#pop. If this proves worthwhile, consider having both "head" and "tail" arrays for odd numbers of items at the front and back. This could also make Vector#shift and #unshift fast. Retrieving an item from a Vector by index would require checking whether it falls within the "head", the "tail", or the vector trie itself.
  • If giving Vectors a "head" and "tail" proves to be a worthwhile and successful strategy, compare its performance as a stack/queue with Deque. If Vector can be made almost as efficient as Deque for that purpose, drop Deque and allow Vector to replace it.
  • Otherwise, if keeping Deque is necessary, try to mitigate the nasty worst-case scenario which happens when a large Deque is built up, and then items are alternately popped from the front and back. (It will keep reversing the list, and reversing it back again.) If, say, we are reversing @rear and moving the items to @front, I would suggest: copy the first 1/3 of the items and keep them on @rear. Reverse the remaining 2/3 and put them on @front.
  • There is another, less deadly worst-case scenario when retrieving the #first or #last item from a Deque with all the items in a long @front or @rear. Perhaps each Deque can keep the last item in @front and @rear in instance variables, so #first and #last can always be returned quickly.
@alexdowad
Copy link
Contributor Author

Great work, thanks!

@dubek
Copy link
Contributor

dubek commented Nov 3, 2014

One more possible optimization:

[ ] A quick benchmark shows that Vector#shift is 1.2x - 1.8x faster than Vector#drop(1), although semantically they are identical. Need to investigate for more params (are two shifts faster than drop(2)?).

@alexdowad
Copy link
Contributor Author

@dubek, good find! I just pushed an optimization based on this: 6813845. Rather than delegating to #shift, I made #drop use an implementation similar to #shift.

@dubek
Copy link
Contributor

dubek commented Nov 5, 2014

Similar one: SortedSet#drop is way slower than SortedSet#above, especially for large sorted sets. Consider something like (not tested!):

def drop(n)
  # verify n is positive, less than size, etc.
  above(at(n - 1))
end

and maybe similar for SortedSet#take (implement in terms of #below). If no one beats it to me, I might get a chance to try/implement tomorrow.

@alexdowad
Copy link
Contributor Author

@dubek, thanks for this suggestion. Please have a look at the latest master.

@dubek
Copy link
Contributor

dubek commented Nov 5, 2014

@alexdowad Looks good. I think you should return clear instead of EmptySortedSet (in SortedSet#drop and SortedSet#take) so the original @comparator is kept. And probably SortedSet#clear should be modified to handle cases when @comparator is set or not (like Hash#clear). Or am I missing something?

Aside: Might be a matter of taste, but I think we would have enjoyed a way to add assertions or invariants to our code (checked during testing). So after each call, verify for example that @trie.size equals to sum of all children and entries; and that @comparator / @default_proc is kept, and so on. Not sure if there's a clean way to do it in Ruby; there's this: https://github.com/jorgemanrubia/solid_assert but it's a bit old and I'm not sure about performance implications (ideally I'd add such an assert on each loop of Trie operations, for example). If you think it's valuable maybe we should have an open issue to look into it.

@alexdowad
Copy link
Contributor Author

@dubek, the value of @comparator doesn't matter for an empty sorted set; its job is to determine the order of elements, but an empty set has no elements to order. (But if you can see some case where it would make a difference, please let me know.) However, you do have a good point: if called on a subclass, #drop and #take should return an instance of the subclass.

Do you need assertions in the "middle" of methods? Or would checking assertions on method entry/exit be enough? If so, I can think of a way to do it with no performance impact in "production". (Write a class macro which "hooks" a method with pre/post assertions only when the tests are running.)

@alexdowad
Copy link
Contributor Author

PS. Just pushed a fix for the problem you noticed with SortedSet#take and #drop.

@dubek
Copy link
Contributor

dubek commented Nov 5, 2014

Sorry, still buggy (unless I'm missing something) on 435e4ca :

2.1.3 :001 > sorted_set = Hamster::SortedSet.new([4,5,6]) { |a| -a }
 => Hamster::SortedSet[6, 5, 4] 
2.1.3 :002 > sorted_set.drop(3).add(5).add(4).add(6)
 => Hamster::SortedSet[4, 5, 6] 

@alexdowad
Copy link
Contributor Author

OK, now I see why we need the @comparator even for an empty sorted set. Do you think #clear should do the same?

@dubek
Copy link
Contributor

dubek commented Nov 5, 2014

I think that:

  • #drop and #take should return clear if they're about to return an empty SortedSet
  • #clear should act like Hash#clear: if there's a @comparator then alloc an empty node with it, otherwise return self.class.empty.

@alexdowad
Copy link
Contributor Author

OK. You are absolutely right about that.

@dubek
Copy link
Contributor

dubek commented Nov 5, 2014

One more possible optimization:

For SortedSet it should be faster to determine #subset?, #disjoint?, #intersect and so on by walking on the two SortedSets together, in order. So for example for #instersect I can start by walking on the higher between the two sets' minimums, and stop walking when I reach the lower of the two sets' maximums (where walking means - advance the cursor who is pointing at the lower value; if there's a match between the two cursors, put the value in the intersection result). So if we have two SortedSets of sizes N and M (N > M) then #intersect should be O(lgN + lgM + M), whereas today it is O(M * lnN).

The problem with this suggested optimization is that it is only valid if the two SortedSets have the same order (comparator). Anyway we can check that?

@dubek
Copy link
Contributor

dubek commented Nov 5, 2014

Whoops, my last comment is regarding #subset?, not #intersect. I look at #intersect and it is implemented as AVL keep_only which performs the divide-and-conquer. Sorry.

@alexdowad
Copy link
Contributor Author

@dubek, right now there is no way to check if 2 SortedSets have the same order. If you want to do something like that, the best I can think of would be to check if both are using the default order, and switch to the faster code if they are. (Perhaps use a single, canonical Proc object for the default comparator.)

@dubek
Copy link
Contributor

dubek commented Nov 5, 2014

We need C++ templates ;-)

I found that SortedSet#filter can be made 3x faster with @node.bulk_delete ; committing it soon.

dubek added a commit that referenced this issue Nov 5, 2014
@dubek
Copy link
Contributor

dubek commented Nov 5, 2014

Pushed to master: fix for the SortedSet#drop and #take when clearing the collection, and optimization for SortedSet#filter.

@alexdowad
Copy link
Contributor Author

I just experimented with loosening the balance requirements for SortedSet. Surprisingly, it makes insertion performance slightly worse.

Retrospect: Since our AVL trees are immutable, each insertion means copying all the nodes to the root, at the very least. Allocating a couple more nodes here and there to rebalance the trees doesn't add much overhead on top of that. And keeping the trees in very strict balance means you don't have to traverse down as far for the next insertion.

@alexdowad
Copy link
Contributor Author

I've checked off the proposed optimization for SortedSet#delete_at, not because I've done anything about it, but because it is too trivial to worry about.

@alexdowad
Copy link
Contributor Author

While working on one of these optimizations, I just discovered and fixed 2 bugs in SortedSet!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants