Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hash_map.zig: Pass self by value and less pointer-int conversion #19989

Merged
merged 1 commit into from
May 27, 2024

Conversation

SeanTheGleaming
Copy link
Contributor

  • Pass self via value instead of reference in numerous places throughout HashMapUnmanged and HashMap (orignally proposed in fix(HashMap): change capacity() to *const Self #19770)
  • Replaced some instances of @intFromPtr and @ptrFromInt with @ptrCast and @alignCast, and pointer arithmetic where appropriate

With this, the only remaining instance of pointer-int conversion in hash_map.zig is in HashMapUnmanaged.removeByPtr, which easily be able to be eliminated once pointer subtraction is supported.

 - Used `Self` instead of `*const Self` where appropriate (orignally proposed in ziglang#19770)
 - Replaced `@intFromPtr` and `@ptrFromInt` with `@ptrCast`, `@alignCast`, and pointer arithmetic where appropriate

With this, the only remaining instance on pointer-int conversion in hash_map.zig is in `HashMapUnmanaged.removeByPtr`, which easily be able to be eliminated once pointer subtraction is supported.
@matklad
Copy link
Contributor

matklad commented May 17, 2024

Would be great to run some benchmarks before/after, to make sure this doesn't regress performance. My understanding is that this should improve performance in the ideal Zig, but also that the current version of Zig isn't quiet ideal yet.

@linusg
Copy link
Sponsor Contributor

linusg commented May 20, 2024

@matklad see #19770 (comment):

If this was done originally because the compiler was choosing to pass the hashmap by value but it was actually faster to pass by reference then that sounds like a compiler bug and we should fix that instead of using a workaround. The standard library is the last place I would expect to see compiler bug workarounds.

and

The above analysis is accurate -- such workarounds are effectively banned in std. There have been (and continue to be) bugs around PRO (Parameter Reference Optimization), but working around them in std is considered bad form.

@matklad
Copy link
Contributor

matklad commented May 20, 2024

Yes, I am aware of that context. The expected value of benchmarking here is still quiet high:

  • benchmark not showing significant perf differences is a baseline result, but still is a useful confirmation that the compiler works as intended.
  • benchmark showing significant improvement would be surprising, welcomed, but not really actionable.
  • benchmark showing a significant regression would be surprising and actionable: hash maps are in general super-hot code, so this would put more priority on fixing the compiler to deal with at least this case more optimally.

Not saying that we shouldn't land this without a benchmark, or that we should block this on compiler improvements should the results be disappointing! Just that with a benchmark this change would be even more valuable than it already is: if it is a performance regression, chasing it after the fact would be quite hard, but also important, because this is hash-map specifically, and not some random cold code!

@SeanTheGleaming
Copy link
Contributor Author

Alright, I've finally gotten around to doing some benchmarks. If you want to run the benchmark on your own machine, all of the code is at this repo, and you can run the benchmark with zig build run. With that out of the way, looking at the benchmarks I've run on my machine, the difference between passing by pointer and value seems to be negligible to the point that it's hard to tell which one is faster.

To be comprehensive, here is my benchmark results for ReleaseFast mode:

benchmark                                   runs     total time     time/run (avg ± σ)     (min ... max)                p75        p99        p995      
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
ptr-arg hashmap: Call contains (unmanaged) 1024     840.895ms      821.187us ± 41.79us    (771.046us ... 1.033ms)      835.314us  977.317us  1.005ms   
val-arg hashmap: Call contains (unmanaged) 1024     840.774ms      821.068us ± 54.162us   (771.046us ... 1.337ms)      837.752us  1.028ms    1.111ms   
ptr-arg hashmap: Call contains (managed) 1024     867.909ms      847.567us ± 66.465us   (771.049us ... 1.129ms)      873.053us  1.067ms    1.1ms     
val-arg hashmap: Call contains (managed) 1024     860.379ms      840.214us ± 64.813us   (771.048us ... 1.247ms)      866.001us  1.046ms    1.106ms   
ptr-arg hashmap: Call capacity (unmanaged) 1024     696.319ms      679.999us ± 44.924us   (642.543us ... 1.149ms)      680.966us  877.051us  895.442us 
val-arg hashmap: Call capacity (unmanaged) 1024     709.52ms       692.89us ± 115.045us   (642.543us ... 2.01ms)       680.642us  1.327ms    1.475ms   
ptr-arg hashmap: Call capacity (managed) 1024     864.335ms      844.078us ± 331.547us  (642.544us ... 2.381ms)      771.53us   1.827ms    2.03ms    
val-arg hashmap: Call capacity (managed) 1024     757.228ms      739.48us ± 152.851us   (642.547us ... 2.165ms)      732.957us  1.457ms    1.902ms   
ptr-arg hashmap: Call count (unmanaged) 1024     143.425ms      140.063us ± 17.142us   (128.528us ... 273.527us)    135.506us  220.129us  241.11us  
val-arg hashmap: Call count (unmanaged) 1024     150.1ms        146.582us ± 23.613us   (135.463us ... 382.999us)    143.232us  271.365us  307.654us 
ptr-arg hashmap: Call count (managed) 1024     865.575ms      845.288us ± 112.199us  (771.048us ... 1.891ms)      857.128us  1.467ms    1.76ms    
val-arg hashmap: Call count (managed) 1024     843.41ms       823.642us ± 48.515us   (771.048us ... 1.39ms)       835.33us   971.274us  1.135ms   
ptr-arg hashmap: Call get (unmanaged) 1024     7.415s         7.242ms ± 290.368us    (6.943ms ... 8.522ms)        7.387ms    8.193ms    8.269ms   
val-arg hashmap: Call get (unmanaged) 1024     7.497s         7.321ms ± 348.766us    (6.943ms ... 8.837ms)        7.506ms    8.391ms    8.683ms   
ptr-arg hashmap: Call get (managed) 1024     7.596s         7.418ms ± 303.219us    (7.072ms ... 8.814ms)        7.584ms    8.498ms    8.593ms   
val-arg hashmap: Call get (managed) 1024     7.64s          7.461ms ± 337.37us     (7.072ms ... 8.898ms)        7.664ms    8.504ms    8.63ms    
ptr-arg hashmap: Many calls (unmanaged) 1024     8.281s         8.087ms ± 358.786us    (7.586ms ... 9.798ms)        8.281ms    9.114ms    9.324ms   
val-arg hashmap: Many calls (unmanaged) 1024     8.251s         8.058ms ± 368.044us    (7.586ms ... 9.502ms)        8.297ms    9.192ms    9.219ms   
ptr-arg hashmap: Many calls (managed) 1024     9.095s         8.881ms ± 427.16us     (8.357ms ... 10.752ms)       9.103ms    10.162ms   10.265ms  
val-arg hashmap: Many calls (managed) 1024     9.167s         8.953ms ± 369.686us    (8.357ms ... 10.503ms)       9.149ms    10.083ms   10.202ms

And here is my benchmark results for Debug mode:

benchmark                                   runs     total time     time/run (avg ± σ)     (min ... max)                p75        p99        p995      
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
ptr-arg hashmap: Call contains (unmanaged) 1024     7.902s         7.717ms ± 438.753us    (7.11ms ... 12.57ms)         7.828ms    9.485ms    9.824ms   
val-arg hashmap: Call contains (unmanaged) 1024     7.803s         7.62ms ± 298.93us      (7.148ms ... 11.868ms)       7.693ms    8.445ms    8.691ms   
ptr-arg hashmap: Call contains (managed) 1024     9.104s         8.891ms ± 666.825us    (8.109ms ... 16.428ms)       9.044ms    11.423ms   11.9ms    
val-arg hashmap: Call contains (managed) 1024     8.828s         8.621ms ± 592.865us    (8.112ms ... 15.346ms)       8.79ms     11.01ms    12.502ms  
ptr-arg hashmap: Call capacity (unmanaged) 1024     2.805s         2.739ms ± 414.185us    (2.313ms ... 5.698ms)        2.727ms    4.843ms    5.287ms   
val-arg hashmap: Call capacity (unmanaged) 1024     2.728s         2.665ms ± 241.557us    (2.313ms ... 4.915ms)        2.709ms    3.839ms    4.145ms   
ptr-arg hashmap: Call capacity (managed) 1024     8.984s         8.773ms ± 389.197us    (8.223ms ... 12.681ms)       8.916ms    10.193ms   11.243ms  
val-arg hashmap: Call capacity (managed) 1024     9.204s         8.988ms ± 588.288us    (8.21ms ... 13.33ms)         9.158ms    11.257ms   11.629ms  
ptr-arg hashmap: Call count (unmanaged) 1024     2.495s         2.437ms ± 187.103us    (2.153ms ... 4.69ms)         2.489ms    3.179ms    3.55ms    
val-arg hashmap: Call count (unmanaged) 1024     2.478s         2.42ms ± 167.383us     (2.12ms ... 3.41ms)          2.466ms    3.003ms    3.227ms   
ptr-arg hashmap: Call count (managed) 1024     9.129s         8.915ms ± 537.586us    (8.113ms ... 12.996ms)       9.135ms    10.888ms   11.219ms  
val-arg hashmap: Call count (managed) 1024     9.061s         8.849ms ± 432.427us    (8.157ms ... 11.116ms)       9.07ms     10.366ms   10.511ms  
ptr-arg hashmap: Call get (unmanaged) 1024     8.828s         8.621ms ± 350.495us    (8.222ms ... 11.044ms)       8.784ms    9.912ms    10.255ms  
val-arg hashmap: Call get (unmanaged) 1024     9.03s          8.819ms ± 456.875us    (8.223ms ... 12.453ms)       9.004ms    10.3ms     11.177ms  
ptr-arg hashmap: Call get (managed) 1024     12.507s        12.214ms ± 517.908us   (11.543ms ... 18.533ms)      12.364ms   13.943ms   14.856ms  
val-arg hashmap: Call get (managed) 1024     12.413s        12.122ms ± 611.512us   (11.377ms ... 16.251ms)      12.378ms   14.292ms   15.061ms  
ptr-arg hashmap: Many calls (unmanaged) 1024     19.053s        18.607ms ± 827.405us   (17.717ms ... 23.062ms)      19.015ms   21.421ms   22.219ms  
val-arg hashmap: Many calls (unmanaged) 1024     19.445s        18.989ms ± 1.099ms     (17.742ms ... 29.752ms)      19.379ms   22.988ms   25.257ms  
ptr-arg hashmap: Many calls (managed) 1024     37.847s        36.96ms ± 2.908ms      (34.378ms ... 83.93ms)       37.64ms    44.634ms   50.221ms  
val-arg hashmap: Many calls (managed) 1024     37.67s         36.788ms ± 1.363ms     (34.591ms ... 64.526ms)      37.262ms   40.792ms   42.835ms

@Vexu Vexu merged commit c0da92f into ziglang:master May 27, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants