Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce custom hash table data structures. #3940

Merged
merged 23 commits into from
Jun 8, 2024

Conversation

chandlerc
Copy link
Contributor

@chandlerc chandlerc commented May 7, 2024

The hash table design is heavily based on Abseil's "Swiss Tables" design. It uses an array of bytes storing metadata about each entry and an array of entries where each is a pair of key and value. The metadata byte consists of 7-bits of hash of the key (distinct from the bits used to index the table), and one bit indicating the presence of a special entry -- either empty or deleted.

There are a large range of optimizations and other nuanced aspects of this hash table design and implementation, a good point to understand that context is raw_hashtable.h which has an overview of the design and references to various other files for relevant details.

@chandlerc chandlerc requested a review from josh11b May 7, 2024 10:21
Copy link
Contributor

@josh11b josh11b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to give feedback on map.h early, before you update set.h

common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/hashing.h Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
Copy link
Contributor Author

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, definitely useful to get map.h tidied up before I port it to set.h. I think I've responded to all the comments there.

common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/hashing.h Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
Copy link
Contributor

@josh11b josh11b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sending what comments I have so far.

common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Show resolved Hide resolved
common/raw_hashtable.h Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
Copy link
Contributor Author

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the feedback so far, PTAL.

common/raw_hashtable.h Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/map.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
Copy link
Contributor

@josh11b josh11b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished a pass through raw_hashtable.h

common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
Copy link
Contributor Author

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL!

Beyond inline replies, also was able to simplify the specialization strategy for the table type and improved some other comments spotted as I was working.

Also, think this is far enough along in review I'm moving it out of draft.

common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
Copy link
Contributor Author

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL!

Beyond inline replies, also was able to simplify the specialization strategy for the table type and improved some other comments spotted as I was working.

Also, think this is far enough along in review I'm moving it out of draft.

@chandlerc chandlerc marked this pull request as ready for review May 24, 2024 01:42
@chandlerc chandlerc requested a review from josh11b May 24, 2024 01:42
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
The hash table design is heavily based on Abseil's ["Swiss
Tables"][swiss-tables] design. It uses an array of bytes storing
metadata about each entry and an array of entries where each is a pair
of key and value. The metadata byte consists of 7-bits of hash of the
key (distinct from the bits used to index the table), and one bit
indicating the presence of a special entry -- either empty or deleted.

[swiss-tables]: https://abseil.io/about/design/swisstables

See the comments in `raw_hashtable.h` for a detailed overview of the
design.

Co-authored-by: josh11b <15258583+josh11b@users.noreply.github.com>
@chandlerc
Copy link
Contributor Author

So, I'm most of the way through adding really nice support for handling stateful keys like indices into vectors and such. It turned out to be necessary to even move the toolchain over to these data structures as it also lets us use something other than operator== for equality, and we need that for APFloat keys. There are a few interesting things that emerged:

  • I've ended up removing unnecessary functionality that also makes the stateful case unimplementable.

  • I've found and cleaned up a number of innocuous inconsistencies in the code.

  • I've been able to implement the test you asked for where we control the hashing and force every key to collide. I can implement even more of these if desired.

Wanted to both let you know @josh11b about my progress there, and also ask whether you'd like me to merge this into this code review or keep it as a follow-up. I can easily manage either way.

@josh11b
Copy link
Contributor

josh11b commented May 28, 2024

So, I'm most of the way through adding really nice support for handling stateful keys like indices into vectors and such. It turned out to be necessary to even move the toolchain over to these data structures as it also lets us use something other than operator== for equality, and we need that for APFloat keys. There are a few interesting things that emerged:

  • I've ended up removing unnecessary functionality that also makes the stateful case unimplementable.
  • I've found and cleaned up a number of innocuous inconsistencies in the code.
  • I've been able to implement the test you asked for where we control the hashing and force every key to collide. I can implement even more of these if desired.

Wanted to both let you know @josh11b about my progress there, and also ask whether you'd like me to merge this into this code review or keep it as a follow-up. I can easily manage either way.

I'm fine either way. If it simplifies the files haven't reviewed yet, it would be a benefit.

This allows customizing both the hash and equality comparison. It also
does so in a way that enables stateful key contexts such as indices into
a specific vector. This works by requiring the context to be packaged on
every API accepting a key. When the context is stateless and can be
default constructed, this works via a default argument. When it is
stateful, the argument must be provided explicitly.

This allows us to add a stress test that forces colliding hashes in
egregious ways, as well as allowing the staeful hashing. There are also
a number of basic cleanups to code that were uncovered while working on
this.
@chandlerc
Copy link
Contributor Author

So, I'm most of the way through adding really nice support for handling stateful keys like indices into vectors and such. It turned out to be necessary to even move the toolchain over to these data structures as it also lets us use something other than operator== for equality, and we need that for APFloat keys. There are a few interesting things that emerged:

  • I've ended up removing unnecessary functionality that also makes the stateful case unimplementable.
  • I've found and cleaned up a number of innocuous inconsistencies in the code.
  • I've been able to implement the test you asked for where we control the hashing and force every key to collide. I can implement even more of these if desired.

Wanted to both let you know @josh11b about my progress there, and also ask whether you'd like me to merge this into this code review or keep it as a follow-up. I can easily manage either way.

I'm fine either way. If it simplifies the files haven't reviewed yet, it would be a benefit.

After discussion, it seems a bit borderline with pros and cons, but on the whole probably simplifies the review. Added and pushed. I did make it a clean follow-up commit so that it is easy to see the diff: 807de9e (#3940)

common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable.h Show resolved Hide resolved
common/raw_hashtable.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
Copy link
Contributor

@josh11b josh11b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished a pass on raw_hashtable_metadata_group.h

common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Show resolved Hide resolved
Copy link
Contributor

@josh11b josh11b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect many of the comments about set.h also apply to map.h

common/set.h Show resolved Hide resolved
common/map.h Show resolved Hide resolved
common/set.h Outdated Show resolved Hide resolved
common/map.h Show resolved Hide resolved
common/set.h Outdated Show resolved Hide resolved
common/set.h Outdated Show resolved Hide resolved
Copy link
Contributor

@josh11b josh11b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read through set_test.cpp and started on map_test.cpp

common/set_test.cpp Outdated Show resolved Hide resolved
common/set_test.cpp Show resolved Hide resolved
common/set_test.cpp Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
Copy link
Contributor

@josh11b josh11b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now looked at everything except for files with names containing "benchmark"

common/map_test.cpp Show resolved Hide resolved
common/map_test.cpp Outdated Show resolved Hide resolved
common/map_test.cpp Outdated Show resolved Hide resolved
common/map_test.cpp Show resolved Hide resolved
common/map_test.cpp Outdated Show resolved Hide resolved
common/map_test.cpp Show resolved Hide resolved
Copy link
Contributor

@josh11b josh11b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished a complete pass

common/raw_hashtable_benchmark_helpers.h Outdated Show resolved Hide resolved
common/raw_hashtable_benchmark_helpers.h Outdated Show resolved Hide resolved
common/raw_hashtable_benchmark_helpers.h Outdated Show resolved Hide resolved
common/raw_hashtable_benchmark_helpers.h Outdated Show resolved Hide resolved
common/map_benchmark.cpp Show resolved Hide resolved
common/set_benchmark.cpp Outdated Show resolved Hide resolved
common/set_benchmark.cpp Outdated Show resolved Hide resolved
common/set_benchmark.cpp Outdated Show resolved Hide resolved
common/set_benchmark.cpp Show resolved Hide resolved
common/map_benchmark.cpp Show resolved Hide resolved
Co-authored-by: josh11b <15258583+josh11b@users.noreply.github.com>
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
chandlerc and others added 4 commits June 7, 2024 01:56
Copy link
Contributor Author

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed most of the metadata_group stuff. Sorry my search-and-replace for mask with bits missed so much, I think I must have messed up that search really badly. =[

Still need to go through comments on other benchmark files, but figured I'd post these updates.

common/raw_hashtable_benchmark_helpers.h Outdated Show resolved Hide resolved
common/raw_hashtable_benchmark_helpers.h Outdated Show resolved Hide resolved
common/raw_hashtable_benchmark_helpers.h Outdated Show resolved Hide resolved
common/raw_hashtable_benchmark_helpers.h Outdated Show resolved Hide resolved
common/raw_hashtable_benchmark_helpers.cpp Show resolved Hide resolved
common/raw_hashtable_metadata_group_benchmark.cpp Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group_benchmark.cpp Outdated Show resolved Hide resolved
Co-authored-by: josh11b <15258583+josh11b@users.noreply.github.com>
common/map_benchmark.cpp Outdated Show resolved Hide resolved
common/set_benchmark.cpp Outdated Show resolved Hide resolved
Copy link
Contributor Author

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I got all the benchmark comments now as well as a collection of other comments I had missed that were still open. PTAL!

common/map_benchmark.cpp Show resolved Hide resolved
common/map_benchmark.cpp Show resolved Hide resolved
common/set_benchmark.cpp Show resolved Hide resolved
common/set_benchmark.cpp Outdated Show resolved Hide resolved
common/set_benchmark.cpp Outdated Show resolved Hide resolved
common/set_benchmark.cpp Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
@chandlerc chandlerc requested a review from josh11b June 7, 2024 18:05
common/raw_hashtable_benchmark_helpers.cpp Show resolved Hide resolved
common/raw_hashtable_metadata_group_benchmark.cpp Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
Copy link
Contributor

@josh11b josh11b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still a "TODO" in the PR summary, though hopefully that has been largely addressed by now.

Co-authored-by: josh11b <15258583+josh11b@users.noreply.github.com>
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
common/raw_hashtable_metadata_group.h Outdated Show resolved Hide resolved
@chandlerc chandlerc requested a review from josh11b June 8, 2024 01:01
@josh11b josh11b enabled auto-merge June 8, 2024 01:37
@chandlerc
Copy link
Contributor Author

Huge thanks to Josh for all the review on this ... really extensive PR. Also thanks to all the folks who gave feedback on the design or approach or discussed!

@josh11b josh11b added this pull request to the merge queue Jun 8, 2024
Merged via the queue into carbon-language:trunk with commit 21a81bc Jun 8, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants