Fix embargo timeout in dandelion++ #9295

vtnerd · 2024-04-20T00:34:30Z

Summary

@Boog900 pointed out that the embargo duration in Dandelion++ was incorrect - it was using poisson distribution instead of exponential distribution. I don't recall why I used poisson distribution, other than it takes an "average" parameter, which I took to mean the average embargo timeout. This is not the same distribution as meant in the Dandelion++ paper.

The primary difference is that the average embargo timeout will drop from ~39s to ~7s. There shouldn't be any loss in privacy as a result of this, because the propagation time to 10 nodes is roughly 1.75s.

Additionally @Boog900 discovered that the paper stated log but almost certainly meant ln (which helps bring down the average fluff time too).

Fluff probability

Is once again 10%, which should result in longer stem phases. Since the distribution is now much shorter for the embargo timeout, this shouldn't result in longer flood times.

Fallout

I'm not aware of any fingerprinting that can be done on the existing implementation. The randomized duration should still make it difficult to determine which node in the stem-set fluffed first. Perhaps @Boog900 can share some thoughts on this topic.

Fluff Timers

I reduced the average poisson distribution for fluff delay from 5s to 1s. This is an arbitrary change, but was made due to the new reality of much shorter embargo timeouts. @Boog900 thoughts on this portion of the code? Dandelion++ doesn't really specify a randomized flush interval for fluff mode, this comes from inspecting the Bitcoin code.

Poisson Distribution

Poisson is still being used in a few places, but I am not aware of any issues right now. I will dig deeper to see if these need changing:

The delay when "forwarding" from i2p/tor to p2p/clearnet is using a poisson distribution
The dandelion++/noise epoch has a minimum time, with a randomized poisson distribution added
The fluff timers have use a poisson distribution for flushing

I'm not aware of these timers violating the Dandelion++ paper (again read above about fluff timers).

Future

I expect some feedback from @Boog900 and possibly others as to the additional changes that need to be made.

vtnerd · 2024-04-20T01:00:31Z

I should also mention this does mean in unlucky cases where a blackhole occurs after just one hop, could result in longer delays than with a poisson distribution (where the overwhelming number of values are around 39s).

Boog900 · 2024-04-20T15:39:00Z

I should also mention this does mean in unlucky cases where a blackhole occurs after just one hop, could result in longer delays than with a poisson distribution

This does bring up an interesting point, using the exponential distribution could make it easier to estimate how many hops the transaction did before it reached the black hole.

If the attacker keeps track of the time it receives a tx, and the time it takes for the tx to be broadcasted, then it could calculate the probability of that happening for different amounts of hops.

For example if the tx gets blackholed after one hop then the average time for that tx to get diffused is 75s whereas a tx that makes it 9 hops will have an average time of 8.3s, so if the tx takes 300s to get diffused then we can say that is much more likely to happen with 1 hops than 9. The paper seemingly doesn't mention this.

Fallout

The problem with using the poisson distribution is that it is not memoryless, so nodes earlier in the stem phase are slightly more likely to fluff first under a black hole attack. How much more likely? I don't know exactly but just off the top of my head I can't imagine it being significant.

Fluff Timers

I feel 1 second is too low, although the previous was 5 seconds it was 2.5 for outgoing connections:

monero/src/cryptonote_protocol/levin_notify.cpp

Line 86 in c821478

    
           constexpr const fluff_duration fluff_average_out{fluff_duration{fluff_average_in} / 2};

this will change it to half a second. I would rather be on the safe side here.

src/cryptonote_core/tx_pool.cpp

vtnerd · 2024-04-21T22:32:41Z

For example if the tx gets blackholed after one hop then the average time for that tx to get diffused is 75s whereas a tx that makes it 9 hops will have an average time of 8.3s, so if the tx takes 300s to get diffused then we can say that is much more likely to happen with 1 hops than 9. The paper seemingly doesn't mention this.

I'm wondering whether my parameters are too high - we previously lowered the parameters so that the diffusion came quicker. Should I do the same here? The worst case scenario is more likely and longer than the existing poisson method.

This does bring up an interesting point, using the exponential distribution could make it easier to estimate how many hops the transaction did before it reached the black hole.

This doesn't reveal the origin IP address though. So I think it's still better to go with the paper here.

The problem with using the poisson distribution is that it is not memoryless, so nodes earlier in the stem phase are slightly more likely to fluff first under a black hole attack. How much more likely? I don't know exactly but just off the top of my head I can't imagine it being significant.

Poisson distribution is also considered memoryless - but it may have different properties making it less suitable.

I feel 1 second is too low, although the previous was 5 seconds it was 2.5 for outgoing connections:

Revert back to 5 seconds? I didn't want to overlap with the blackhole timeout.

selsta · 2024-04-21T22:37:44Z

In the past we had a lot of sybil nodes that were intentionally blackholing transactions, a significantly longer average time to diffusion would be bad for user experience.

I don't know if these sybil nodes are still there.

Boog900 · 2024-04-22T01:22:44Z

I'm wondering whether my parameters are too high - we previously lowered the parameters so that the diffusion came quicker. Should I do the same here? The worst case scenario is more likely and longer than the existing poisson method.

I think so, especially if we have had problems with black holes in the past.

If were to choose a time for which we would want a chosen percentage of txs to be fluffed under, if they were to be immediately black holed, we could find the highest k value possible for a certain ep.

For example if we were to say we want 90% of txs to be fluffed under 60s with ep=0.1 in an black hole attack where the tx gets dropped immediately, the highest k value we can use is 6 with on average 91% of txs having a value less than 60s.

I think we could get away with k=8, with ep=0.1 this means our fluff probability would be 0.125. Using this value means ~85% of txs will get fluffed under 90s if they were to be immediately black holed. This is reasonable IMO, considering block time is ~2 mins and this will only affect txs which get immediately black holed.

With k=10 70% of txs that get immediately black holed will be fluffed under 90s and with k=9 ~78%.

This doesn't reveal the origin IP address though. So I think it's still better to go with the paper here.

True, just wanted to mention.

Poisson distribution is also considered memoryless

The time between events in a Poisson process is memoryless, it can be modeled with the exponential distribution, but I don't think the Poisson distribution itself is memoryless.

Revert back to 5 seconds? I didn't want to overlap with the blackhole timeout.

I think so, I don't think overlapping is too big a concern due to how variable the output of the exponential distribution is.

vtnerd · 2024-04-22T21:16:09Z

New force push has the parameters recommended by @Boog900 . I'm a little worried the new timeout may not be aggressive enough - but I'm leaning towards it being acceptable.

Boog900 · 2024-04-25T01:49:05Z

We could go lower but 8 should be fine, more numbers:

Txs fluffed under 180s when immediately black holed:

k=9, 95%
k=8, 97.9%
k=7, 99.4%

This means if an attacker managed to black hole every transaction immediately with k=8 85% would be fluffed under 90s and ~98% under 180s. For safety we could add an upper bound on the timer, to prevent an unlucky situation.

Boog900 reviewed Apr 20, 2024

View reviewed changes

src/cryptonote_core/tx_pool.cpp Outdated Show resolved Hide resolved

Boog900 reviewed Apr 20, 2024

View reviewed changes

src/cryptonote_core/tx_pool.cpp Outdated Show resolved Hide resolved

vtnerd force-pushed the fix/embargo_timeout branch from cbff1b8 to 8d86d61 Compare April 21, 2024 22:52

Fix emargo timeout in dandelion++

b6039f9

vtnerd force-pushed the fix/embargo_timeout branch from 8d86d61 to b6039f9 Compare April 22, 2024 21:14

0xFFFC0000 added pending review networking labels Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix embargo timeout in dandelion++ #9295

Fix embargo timeout in dandelion++ #9295

vtnerd commented Apr 20, 2024

vtnerd commented Apr 20, 2024

Boog900 commented Apr 20, 2024

vtnerd commented Apr 21, 2024

selsta commented Apr 21, 2024

Boog900 commented Apr 22, 2024 •

edited

vtnerd commented Apr 22, 2024

Boog900 commented Apr 25, 2024

Fix embargo timeout in dandelion++ #9295

Are you sure you want to change the base?

Fix embargo timeout in dandelion++ #9295

Conversation

vtnerd commented Apr 20, 2024

Summary

Fluff probability

Fallout

Fluff Timers

Poisson Distribution

Future

vtnerd commented Apr 20, 2024

Boog900 commented Apr 20, 2024

Fallout

Fluff Timers

vtnerd commented Apr 21, 2024

selsta commented Apr 21, 2024

Boog900 commented Apr 22, 2024 • edited

vtnerd commented Apr 22, 2024

Boog900 commented Apr 25, 2024

Boog900 commented Apr 22, 2024 •

edited