Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix embargo timeout in dandelion++ #9295

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vtnerd
Copy link
Contributor

@vtnerd vtnerd commented Apr 20, 2024

Summary

@Boog900 pointed out that the embargo duration in Dandelion++ was incorrect - it was using poisson distribution instead of exponential distribution. I don't recall why I used poisson distribution, other than it takes an "average" parameter, which I took to mean the average embargo timeout. This is not the same distribution as meant in the Dandelion++ paper.

The primary difference is that the average embargo timeout will drop from ~39s to ~7s. There shouldn't be any loss in privacy as a result of this, because the propagation time to 10 nodes is roughly 1.75s.

Additionally @Boog900 discovered that the paper stated log but almost certainly meant ln (which helps bring down the average fluff time too).

Fluff probability

Is once again 10%, which should result in longer stem phases. Since the distribution is now much shorter for the embargo timeout, this shouldn't result in longer flood times.

Fallout

I'm not aware of any fingerprinting that can be done on the existing implementation. The randomized duration should still make it difficult to determine which node in the stem-set fluffed first. Perhaps @Boog900 can share some thoughts on this topic.

Fluff Timers

I reduced the average poisson distribution for fluff delay from 5s to 1s. This is an arbitrary change, but was made due to the new reality of much shorter embargo timeouts. @Boog900 thoughts on this portion of the code? Dandelion++ doesn't really specify a randomized flush interval for fluff mode, this comes from inspecting the Bitcoin code.

Poisson Distribution

Poisson is still being used in a few places, but I am not aware of any issues right now. I will dig deeper to see if these need changing:

  • The delay when "forwarding" from i2p/tor to p2p/clearnet is using a poisson distribution
  • The dandelion++/noise epoch has a minimum time, with a randomized poisson distribution added
  • The fluff timers have use a poisson distribution for flushing

I'm not aware of these timers violating the Dandelion++ paper (again read above about fluff timers).

Future

I expect some feedback from @Boog900 and possibly others as to the additional changes that need to be made.

@vtnerd
Copy link
Contributor Author

vtnerd commented Apr 20, 2024

I should also mention this does mean in unlucky cases where a blackhole occurs after just one hop, could result in longer delays than with a poisson distribution (where the overwhelming number of values are around 39s).

@Boog900
Copy link
Contributor

Boog900 commented Apr 20, 2024

I should also mention this does mean in unlucky cases where a blackhole occurs after just one hop, could result in longer delays than with a poisson distribution

This does bring up an interesting point, using the exponential distribution could make it easier to estimate how many hops the transaction did before it reached the black hole.

If the attacker keeps track of the time it receives a tx, and the time it takes for the tx to be broadcasted, then it could calculate the probability of that happening for different amounts of hops.

For example if the tx gets blackholed after one hop then the average time for that tx to get diffused is 75s whereas a tx that makes it 9 hops will have an average time of 8.3s, so if the tx takes 300s to get diffused then we can say that is much more likely to happen with 1 hops than 9. The paper seemingly doesn't mention this.

Fallout

The problem with using the poisson distribution is that it is not memoryless, so nodes earlier in the stem phase are slightly more likely to fluff first under a black hole attack. How much more likely? I don't know exactly but just off the top of my head I can't imagine it being significant.

Fluff Timers

I feel 1 second is too low, although the previous was 5 seconds it was 2.5 for outgoing connections:

constexpr const fluff_duration fluff_average_out{fluff_duration{fluff_average_in} / 2};
this will change it to half a second. I would rather be on the safe side here.

@vtnerd
Copy link
Contributor Author

vtnerd commented Apr 21, 2024

For example if the tx gets blackholed after one hop then the average time for that tx to get diffused is 75s whereas a tx that makes it 9 hops will have an average time of 8.3s, so if the tx takes 300s to get diffused then we can say that is much more likely to happen with 1 hops than 9. The paper seemingly doesn't mention this.

I'm wondering whether my parameters are too high - we previously lowered the parameters so that the diffusion came quicker. Should I do the same here? The worst case scenario is more likely and longer than the existing poisson method.

This does bring up an interesting point, using the exponential distribution could make it easier to estimate how many hops the transaction did before it reached the black hole.

This doesn't reveal the origin IP address though. So I think it's still better to go with the paper here.

The problem with using the poisson distribution is that it is not memoryless, so nodes earlier in the stem phase are slightly more likely to fluff first under a black hole attack. How much more likely? I don't know exactly but just off the top of my head I can't imagine it being significant.

Poisson distribution is also considered memoryless - but it may have different properties making it less suitable.

I feel 1 second is too low, although the previous was 5 seconds it was 2.5 for outgoing connections:

Revert back to 5 seconds? I didn't want to overlap with the blackhole timeout.

@selsta
Copy link
Collaborator

selsta commented Apr 21, 2024

In the past we had a lot of sybil nodes that were intentionally blackholing transactions, a significantly longer average time to diffusion would be bad for user experience.

I don't know if these sybil nodes are still there.

@Boog900
Copy link
Contributor

Boog900 commented Apr 22, 2024

I'm wondering whether my parameters are too high - we previously lowered the parameters so that the diffusion came quicker. Should I do the same here? The worst case scenario is more likely and longer than the existing poisson method.

I think so, especially if we have had problems with black holes in the past.

If were to choose a time for which we would want a chosen percentage of txs to be fluffed under, if they were to be immediately black holed, we could find the highest k value possible for a certain ep.

For example if we were to say we want 90% of txs to be fluffed under 60s with ep=0.1 in an black hole attack where the tx gets dropped immediately, the highest k value we can use is 6 with on average 91% of txs having a value less than 60s.

I think we could get away with k=8, with ep=0.1 this means our fluff probability would be 0.125. Using this value means ~85% of txs will get fluffed under 90s if they were to be immediately black holed. This is reasonable IMO, considering block time is ~2 mins and this will only affect txs which get immediately black holed.

With k=10 70% of txs that get immediately black holed will be fluffed under 90s and with k=9 ~78%.

This doesn't reveal the origin IP address though. So I think it's still better to go with the paper here.

True, just wanted to mention.

Poisson distribution is also considered memoryless

The time between events in a Poisson process is memoryless, it can be modeled with the exponential distribution, but I don't think the Poisson distribution itself is memoryless.

Revert back to 5 seconds? I didn't want to overlap with the blackhole timeout.

I think so, I don't think overlapping is too big a concern due to how variable the output of the exponential distribution is.

@vtnerd
Copy link
Contributor Author

vtnerd commented Apr 22, 2024

New force push has the parameters recommended by @Boog900 . I'm a little worried the new timeout may not be aggressive enough - but I'm leaning towards it being acceptable.

@Boog900
Copy link
Contributor

Boog900 commented Apr 25, 2024

We could go lower but 8 should be fine, more numbers:

Txs fluffed under 180s when immediately black holed:

  • k=9, 95%
  • k=8, 97.9%
  • k=7, 99.4%

This means if an attacker managed to black hole every transaction immediately with k=8 85% would be fluffed under 90s and ~98% under 180s. For safety we could add an upper bound on the timer, to prevent an unlucky situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants