Skip to content
This repository has been archived by the owner on Dec 26, 2023. It is now read-only.

Investigate non-throwing overflow resolution #21

Open
martinus opened this issue Apr 23, 2019 · 18 comments
Open

Investigate non-throwing overflow resolution #21

martinus opened this issue Apr 23, 2019 · 18 comments
Assignees
Labels
quality quality improvement

Comments

@martinus
Copy link
Owner

For extremely bad hash, overflow can happen. Investigate how to resolve this gracefully (and without losing performance!)

@martinus martinus self-assigned this Apr 23, 2019
@martinus martinus added the quality quality improvement label Apr 23, 2019
@felixguendling
Copy link

and without losing performance

If this is not possible I could imagine a solution that lets the user choose either pure performance (throw on overflow, as it is now) or a bit less performance but with graceful error handling. Of course, if it is possible to have both (graceful overflow handling and performance) this option is obsolete.

@asbai
Copy link

asbai commented Aug 4, 2019

I think we can achieve this by sacrificing time or space efficiency only when the above extreme situations occur. This will ensure that we maintain optimal performance in most cases, but will continue to work in extreme conditions.

@martinus
Copy link
Owner Author

martinus commented Aug 4, 2019

I use a marker that counts the distance to the original bucket. I currently throw an exception when that marker would overflow. Instead, I could give the maximum value of the marker a special meaning where I switch to a slow linear search instead. That way there should be almost no performance penalty in the normal case, and I could still handle the extreme case.

@asbai
Copy link

asbai commented Aug 5, 2019

I could give the maximum value of the marker a special meaning where I switch to a slow linear search instead

Does this mean that in that case, all items in the container will degrade into linear probing? If so, is this container likely to automatically promote it self back to robin hood hashing in the future?

@martinus
Copy link
Owner Author

martinus commented Aug 8, 2019

well, it's always robin hood hashing, even when there are lots of collisions. Collision handling is always a linear search. When you remove the colliding elements there will be less linear search.

@asbai
Copy link

asbai commented Aug 8, 2019

If I understand correctly, you means there will be two linear search algorithms, one is we currently used (the fast version), and the other slower one is used to solve the problem of overflow (marker reaches the maximum), right?

So we can now determine that in the new implementation, the faster linear search algorithm may be replaced by the slower one. Then:

  1. Does this replacement occur globally in the specified container or only in some parts of that container where the conflict is more intense?
  2. Is there a reverse process for this replacement? i.e.: If a container conflict is no longer so intense, is it possible for the linear search algorithm to be replaced back to the fast one automatically?

@tblut
Copy link

tblut commented Dec 19, 2019

Is there any update on this? I'm currently using your excellent implementation for sparse grids. The problem is that sometimes I get an overflow exception, which unfortunately forces me to use the much slower and memory hungry, but perfectly stable, std::unordered_map again ...
I would be very happy if you made some optional setting that makes the map 100% stable. Even if it makes it a little bit slower, it would still be much faster than the STL version!

@martinus
Copy link
Owner Author

What hash are you using? With a reasonably good hash this should never happen. Can you post just the hash code?

@tblut
Copy link

tblut commented Dec 19, 2019

I'm using your default hash-function with values computed as follows:

uint32_t cellX = static_cast<uint32_t>((point.x - bounds.min.x) / bounds.size.x * gridResolution);
uint32_t cellY = static_cast<uint32_t>((point.y - bounds.min.y) / bounds.size.y * gridResolution);
uint32_t cellZ = static_cast<uint32_t>((point.z - bounds.min.z) / bounds.size.z * gridResolution);
uint32_t cellIndex = cellX | (cellY << 10) | (cellZ << 20);

auto& entry = grid[cellIndex]; // grid is a robin_hood::unordered_map<uint32_t, ...>

Edit: I just remembered that I "fixed" the exceptions by changing my key to the code above. I used this before and it would cause the exception more often:
cellIndex = cellX + cellY * gridResolution + cellZ * gridResolution * gridResolution

Also pretty interesting is that the overflow occurs when I simply copy all the elements from one map into another:

for (const auto& entry : source.grid) {
    target.grid[entry.first] = entry.second;
}

@martinus
Copy link
Owner Author

Can you share a reproducer? What compiler are you using, and do you compiler for 64 or 32 bit?

@tblut
Copy link

tblut commented Dec 20, 2019

I wrote you an email with the link to the minimal reproducer. I'm using Visual Studio 16.4.2 and only compile for 64 bit. Thanks!

@martinus
Copy link
Owner Author

Thanks for the reproducer! I could reproduce it, here's what's the issue:

for (const auto& entry : sourceNode.grid) {
	targetNode.grid[entry.first] = entry.second;
}

You iterate a large hashmap, and insert all the entries into targetNode.grid map. All hashmaps use the same hash function, so it's basically iterating from lowest to highest hash value.

When inserting, since the targetNode.grid is small, all the elements that are inserted will first cause a collision (because their hashes are so small), so eventually robin_hood throws in the towel and causes an exception. std::unordered_map doesn't throw but insertion will be slow because it's boiling down to a linear search.

There is no ideal way to fix this problem, in your case you can do this in PointCloudConverter.cpp:

void PointCloudConverter::MergeOctreesInternal(Node& targetNode, Node& sourceNode,
		uint32_t depth, uint32_t partitionDepth, uint32_t partitionIndex) {
	targetNode.grid.reserve(targetNode.grid.size() + sourceNode.grid.size());
	for (const auto& entry : sourceNode.grid) {
		targetNode.grid[entry.first] = entry.second;
	}

With that reserve() call the map has already plenty of size for all the data to insert, so it doesn't feel the effect of hash collision due to linear insertion. With that change it works for me.

@martinus
Copy link
Owner Author

On second thought, this is really an issue of bad quality of the hash. If you replace robin_hood's hash_int with this code, this issue also doesn't happen (then there is also no need for reserve()):

inline size_t hash_int(uint64_t obj) noexcept {
#if ROBIN_HOOD(HAS_UMUL128)
    uint64_t h;
    uint64_t l = detail::umul128(obj ^ UINT64_C(0xa0761d6478bd642f), UINT64_C(0xe7037ed1a0b428db), &h);
    return h ^ l;
#elif ROBIN_HOOD(BITNESS) == 32
    uint64_t const r = obj * UINT64_C(0xca4bcaa75ec3f625);
    auto h = static_cast<uint32_t>(r >> 32U);
    auto l = static_cast<uint32_t>(r);
    return h + l;
#else
    // murmurhash 3 finalizer
    uint64_t h = obj;
    h ^= h >> 33;
    h *= 0xff51afd7ed558ccd;
    h ^= h >> 33;
    h *= 0xc4ceb9fe1a85ec53;
    h ^= h >> 33;
    return static_cast<size_t>(h);
#endif
}

@tblut
Copy link

tblut commented Dec 30, 2019

Finally came around to trying your other hash function and it seems to work for all the datasets that I have. In my opinion, a fallback to a slower linear search should still be added for absolute stability. The issue is that if my program needs to run for multiple hours, I need to be sure that it won't crash due to a very rare sequence of hashs.

@martinus
Copy link
Owner Author

You are of course right, a linear search fallback would be important to have. I don't have the time currently for this though. But I'm at least working on an improved hash.

@martinus
Copy link
Owner Author

I believe I've (finally) a simple & fast reproducer:

TEST_CASE("overflow_finder_simple") {
    robin_hood::unordered_set<uint64_t> set;

    for (uint64_t i = 0; i < UINT64_C(1000000); ++i) {
        auto h = robin_hood::hash<uint64_t>{}(i);
        if (UINT64_C(0) == (h & UINT64_C(0x1ff))) {
            set.insert(i);
        }
    }
}

This shows that the lower bits of the hash are very bad, which is what can lead to the overflow.

mitigation strategy: create an initial very simple mixing hash that's part of the robin_hood table, that basically multiplies the value with a random odd number, and this is then passed on to the robin_hood hash. When overflow happens, add an odd constant to the mix multiplier, and rehash the map.

Advantages/disadvantages of this approach:

  • no need to refactor the infobyte array, which would produce some slowdowns
  • Only a single multiplication is needed at hashing => not much performance slowdown.
  • allows to use the hash's address as initial multipler, to be a bit more resilient against overflow attacks
  • implementation of rehashing is necessary - but can probably be implemented as a copy of the map.

@roncapat
Copy link

roncapat commented Jun 4, 2020

Sorry to step in, but I have trouble with hashing of grid indexes (so, two ints, I'm working on a discrete spatial grid). Should I be able to fix the error by swapping the hash_int as proposed above? @martinus

I have map overflows exceptions. I tried the above solution but seems to slow down my algorithm.

Do you know more specific and efficient hashing functions for a continuous bidimensional range of integers (AKA grid coordinates)?

@cculianu
Copy link

cculianu commented Dec 9, 2020

I have been getting this exception extremely rarely when inserting sha256 hashed data -- truly random data really, into the hash table (I just take the first 64 bits of the 256 bit number)!!

Is this still some sort of known issue? I have seen this on so-called "good" hash functions, but extremely rarely.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
quality quality improvement
Projects
None yet
Development

No branches or pull requests

6 participants