-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index Result rows rather than to convert them into hashes #51744
Conversation
Last big refactor was in rails#37614. Somewhat extracted from rails#51744. The concern about columns with the same name didn't make much sense to me because in both code paths, the last one wins, so we can simplify the whole methods. Additionally by implementing `columns_index`, we have a decent template always available.
Last big refactor was in rails#37614. Somewhat extracted from rails#51744. The concern about columns with the same name didn't make much sense to me because in both code paths, the last one wins, so we can simplify the whole methods. Additionally by implementing `columns_index`, we have a decent template always available.
Or maybe it doesn't need to? It's documented as returning a Hash, but it would really be for the better if we always returned |
443e1b4
to
a269936
Compare
Last big refactor was in rails#37614. Somewhat extracted from rails#51744. The concern about columns with the same name didn't make much sense to me because in both code paths, the last one wins, so we can simplify the whole methods. Additionally by implementing `columns_index`, we have a decent template always available.
3906232
to
aeb2e5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me. And I'm ok changing the implementation of Result#each
to always return a hash like object
Thank you, I wanted a second opinion on that. I'll do it then. |
Using the same benchmark as rails#51726 A significant part of the memory footprint comes from `Result#each`: ``` Total allocated: 4.61 MB (43077 objects) Total retained: 3.76 MB (27621 objects) allocated memory by file ----------------------------------- 391.52 kB activerecord/lib/active_record/result.rb retained memory by file ----------------------------------- 374.40 kB activerecord/lib/active_record/result.rb ``` Rows are initially stored as arrays, but `Result#each` convert them to hashes. Depending on how many elements they contain, hashes use between 2 and 5 times as much memory than arrays. | Length | Hash | Array | Diff | | --------- | --------- | --------- | --------- | | 1 | 160B | 40B | -120B | | 2 | 160B | 40B | -120B | | 3 | 160B | 40B | -120B | | 4 | 160B | 80B | -80B | | 5 | 160B | 80B | -80B | | 6 | 160B | 80B | -80B | | 7 | 160B | 80B | -80B | | 8 | 160B | 80B | -80B | | 9 | 464B | 160B | -304B | | 10 | 464B | 160B | -304B | | 11 | 464B | 160B | -304B | | 12 | 464B | 160B | -304B | | 13 | 464B | 160B | -304B | | 14 | 464B | 160B | -304B | | 15 | 464B | 160B | -304B | | 16 | 464B | 160B | -304B | | 17 | 912B | 160B | -752B | | 18 | 912B | 160B | -752B | | 19 | 912B | 320B | -592B | | 20 | 912B | 320B | -592B | | 21 | 912B | 320B | -592B | | 22 | 912B | 320B | -592B | | 23 | 912B | 320B | -592B | | 24 | 912B | 320B | -592B | | 25 | 912B | 320B | -592B | | 26 | 912B | 320B | -592B | | 27 | 912B | 320B | -592B | | 28 | 912B | 320B | -592B | | 29 | 912B | 320B | -592B | | 30 | 912B | 320B | -592B | | 31 | 912B | 320B | -592B | | 32 | 912B | 320B | -592B | | 33 | 1744B | 320B | -1424B | | 34 | 1744B | 320B | -1424B | | 35 | 1744B | 320B | -1424B | | 36 | 1744B | 320B | -1424B | | 37 | 1744B | 320B | -1424B | | 38 | 1744B | 320B | -1424B | | 39 | 1744B | 640B | -1104B | | 40 | 1744B | 640B | -1104B | | 41 | 1744B | 640B | -1104B | | 42 | 1744B | 640B | -1104B | | 43 | 1744B | 640B | -1104B | | 44 | 1744B | 640B | -1104B | | 45 | 1744B | 640B | -1104B | | 46 | 1744B | 640B | -1104B | | 47 | 1744B | 640B | -1104B | | 48 | 1744B | 640B | -1104B | | 49 | 1744B | 640B | -1104B | | 50 | 1744B | 640B | -1104B | Rather than to convert rows into hashes, we can loopkup the column index into a single Hash common to all rows. To not complexify the code too much, rather than to pass the row array and the column index, we wrap both into an `IndexedRow` object, which uses an extra `40B` object, but that's still less memory even in the worst case. After: ``` Total allocated: 4.32 MB (43079 objects) Total retained: 3.65 MB (29725 objects) allocated memory by file ----------------------------------- 101.66 kB activerecord/lib/active_record/result.rb retained memory by file ----------------------------------- 84.70 kB activerecord/lib/active_record/result.rb ``` As for access speed, it's of course a bit slower, but not by much, it's between `1.5` and `2` times slower, but remains in the 10's of M iterations per second, so I think this overhead is negligible compared to all the work needed to access a model attribute. Also the `LazyAttributeSet` class only access this once, after the attribute is casted, the resulting value is still stored in a regular `Hash`.
Using the samebenchmark as #51726
A significant part of the memory footprint comes from
Result#each
:Rows are initially stored as arrays, but
Result#each
convert them to hashes. Depending on how many elements they contain, hashes use between 2 and 5 times as much memory than arrays.Rather than to convert rows into hashes, we can loopkup the column index into a single hash common to all rows. To not complexify the code too much, rather than to pass the row array and the column index, we wrap both into an
IndexedRow
object, which uses an extra40B
object, but that's still less memory even in the worst case.After:
As for access speed, it's of course a bit slower, but not by much, it's between
1.5
and2
times slower, but remains in the 10's of M iterations per second, so I think this overhead is negligible compared to all the work needed to access a model attribute.Also the
LazyAttributeSet
class only access this once, after the attribute is casted, the resulting value is still stored in a regularHash
.cc @tenderlove @jhawthorn I'd like your opinion on this idea. I think it's worth it. We could potentially only do this when there is more than 1 row and more than 8 columns if we think the gains aren't quite big enough for small models.
Also NB:
Result#each
is public API, I changed it because it's a proof of concept, but this should be an alternate private API.