Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A lot of 150/2 transactions in the txpool causes memory spike / OOM #9317

Open
selsta opened this issue May 3, 2024 · 34 comments
Open

A lot of 150/2 transactions in the txpool causes memory spike / OOM #9317

selsta opened this issue May 3, 2024 · 34 comments
Labels

Comments

@selsta
Copy link
Collaborator

selsta commented May 3, 2024

I have noticed this a couple times now that monerod gets OOMd when there are a lot of 150/2 transactions in the txpool. This happened on VPS with both 8GB and 16GB RAM.

@thilool
Copy link

thilool commented May 3, 2024

I had a similar issue which happened now the second time after I upgraded from v0.18.3.2 to v0.18.3.3 (looks more like issue #9315)
I run a 32GB RAM standalone server on a fiber connection and an SSD so I would rule out any hardware limitations.

here is what monerod shows:
2024-04-28 14:51:07.835 I [batch] DB resize needed
2024-04-28 14:51:08.435 I LMDB Mapsize increased. Old: 215675MiB, New: 216699MiB
2024-04-28 15:23:50.517 I ----- BLOCK ADDED AS ALTERNATIVE ON HEIGHT 3137335
2024-04-28 15:23:50.518 I id: <13f334bd761b12b8d382c7a9f2fed8708c198270906ba82621ee990740245fa6>
2024-04-28 15:23:50.518 I PoW: <20a551bdbc8ac6a1744da38033033e53e6fd297915c9bd77845e940200000000>
2024-04-28 15:23:50.518 I difficulty: 257613739237
2024-04-29 08:17:34.150 I ----- BLOCK ADDED AS ALTERNATIVE ON HEIGHT 3137818
2024-04-29 08:17:34.154 I id: <6d056d352284efea4a3f21a46d6fbcb98194fc0ab994c470626cfd9737e6b946>
2024-04-29 08:17:34.154 I PoW:
2024-04-29 08:17:34.155 I difficulty: 239388303636
2024-04-29 14:21:49.674 E Failed to parse transaction from blob
2024-04-29 18:05:08.287 E Failed to parse transaction from blob
2024-04-30 06:11:25.791 I ----- BLOCK ADDED AS ALTERNATIVE ON HEIGHT 3138489
2024-04-30 06:11:25.797 I id:
2024-04-30 06:11:25.797 I PoW: <3fa592639c5da5a00ce082f0bb99efaa33a4e819db90a9994587e30100000000>
2024-04-30 06:11:25.797 I difficulty: 253900650451
2024-05-01 04:07:39.429 W ge_frombytes_vartime failed at 457
2024-05-01 08:55:18.591 I [69.234.65.55:46556 INC] Sync data returned a new top block candidate: 3139275 -> 3139276 [Your node is 1 blocks (2.0 minutes) behind]
2024-05-01 08:55:18.593 I SYNCHRONIZATION started
2024-05-01 13:31:16.704 W ge_frombytes_vartime failed at 457
2024-05-01 19:51:21.045 I Recalculating difficulties from height 3102800 to height 3139634
2024-05-02 05:34:52.597 E Failed to parse transaction from blob
2024-05-02 05:58:59.706 E Failed to parse transaction from blob
2024-05-02 05:59:21.923 E Failed to parse transaction from blob
2024-05-02 06:01:45.093 W ge_frombytes_vartime failed at 457
2024-05-02 06:05:17.435 E Failed to parse transaction from blob
Killed

Hope fully this helps. Something in the latest version seems to cause this issue.
The version v0.18.3.2 was running for months straight. This is now the second crash within a week.

Greetings!

@selsta
Copy link
Collaborator Author

selsta commented May 3, 2024

Something in the latest version seems to cause this issue. The version v0.18.3.2 was running for months straight. This is now the second crash within a week.

This is not related to the latest version, the reason why this only happened in the last week is because before that no one ever added a ton of 150/2 transactions into the txpool.

@thilool
Copy link

thilool commented May 3, 2024

What exactly are 150/2 transactions?

Can you point me to any documentation?

Thanks!

@selsta
Copy link
Collaborator Author

selsta commented May 3, 2024

150/2 means transactions with 150 inputs and 2 outputs. It's basically someone consolidating a lot of inputs. Such transactions are large and take a while to verify. Someone has transacted a lot of these at the same time which caused the tx pool to fill up.

@thilool
Copy link

thilool commented May 3, 2024

when you say 'takes time to verify' would a server being used as well for mining benefit from leaving a few threads extra available for such transactions spikes? Or is the CPU not really the bottleneck but rather RAM or SSD?

interesting to see is that a manual restart of monerod always works without any problems. it syncs back up and runs as nothing ever happened.
Is there an option to execute an automated restart in case it crashed?

@Column01
Copy link

Column01 commented May 3, 2024

I just checked my node today and it reported being 2 months ahead after the issue I had that was linked here.

2024-05-03 01:14:10.069	    7ed3293fe6c0	INFO	msgwriter	src/common/scoped_message_writer.h:102	Height: 3140496/3140496 (100.0%) on mainnet, not mining, net hash 2.03 GH/s, v16, 32(out)+63(in) connections, uptime 0d 5h 30m 27s
2024-05-03 03:20:49.070	[P2P9]	INFO	global	src/cryptonote_protocol/cryptonote_protocol_handler.inl:409	[94.16.112.22:45386 INC] Sync data returned a new top block candidate: 3140561 -> 3097104 [Your node is 43457 blocks (2.0 months) ahead] 

This is the output of diff

2024-05-03 16:23:55.530	    7ed3293fe6c0	INFO	msgwriter	src/common/scoped_message_writer.h:102	BH: 3140966, TH: bfe1f2f2add49f50204d41ec0c9aa55d12d52fbfc7dd651f9e54071c5f76dd77, DIFF: 246857749344, CUM_DIFF: 350313291089291902, HR: 2057147911 H/s

@selsta
Copy link
Collaborator Author

selsta commented May 3, 2024

@Column01 you can ignore that message, it means a different peer claimed that 3097104 is the correct height which isn't true. Your node doesn't have any issues.

@selsta
Copy link
Collaborator Author

selsta commented May 3, 2024

when you say 'takes time to verify' would a server being used as well for mining benefit from leaving a few threads extra available for such transactions spikes? Or is the CPU not really the bottleneck but rather RAM or SSD?

Leaving a couple threads available is recommended for good node performance, but I don't think it would have helped in this specific case.

Is there an option to execute an automated restart in case it crashed?

Nothing built in but you can use for example systemd for this.

@HardenedSteel
Copy link

HardenedSteel commented May 3, 2024

Some people reported same issue on Reddit, copy & pasting to here:

During this attack that has been going on for about 3 hours, my node (v0.18.3.3) has been killed twice by Linux OOM killer, due to too high memory usage.

It seems someone has found out how to make nodes use massive amounts of memory.During this attack that has been going on for about 3 hours, my node (v0.18.3.3) has been killed twice by Linux OOM killer, due to too high memory usage.

It seems someone has found out how to make nodes use massive amounts of memory.

can you make a bug report? https://github.com/monero-project/monero

I have nothing concrete to write in a bug report, other than the fact that my node consumed ~14GB more memory than usual, was killed by Linux OOM-killer and then was restarted by systemd, due to my unit being configured with "Restart=always".
I have no idea how it was triggered or if it would've even been a problem on a host with more memory available.

Mine was killed last night too

@Column01
Copy link

Column01 commented May 3, 2024

Someone reported same issue on Reddit

That is very interesting indeed. I do not think this is a coincidence, then

@thilool
Copy link

thilool commented May 4, 2024

I will adjust the monerod process priority based on this documentation and see what happens.

https://www.baeldung.com/linux/memory-overcommitment-oom-killer

@thilool
Copy link

thilool commented May 4, 2024

Most of my processes are running with an oom_score of '666'
monerod was running above 700 so it was one of the first candidates to be killed.

I adjusted the oom_score_adj to -700 (sudo choom -p MONEROD_PROCESSID -n 700) which made it the number 2 in line after the unkillable processes with an oom_score of 0.

If it gets killed now with 32GB of RAM we have a much bigger issue concerning this attack vector.

I still cannot grasp how so much memory can be used in this case. Can anyone point me to some simple math how we get in the 10+GB range of memory usage?
Thanks!

@Boog900
Copy link
Contributor

Boog900 commented May 5, 2024

This could be because of how txs are relayed.

When we receive txs in the fluff stage, we add them to our fluff queue and set a timer on each connection except the one that sent the txs to us. Then when the timer is up we make a message and the txs get sent on that connection:

for (auto& connection : connections)
{
std::sort(connection.first.begin(), connection.first.end()); // don't leak receive order
make_payload_send_txs(*zone_->p2p, std::move(connection.first), connection.second, zone_->pad_txs, true);
}

bool make_payload_send_txs(connections& p2p, std::vector<blobdata>&& txs, const boost::uuids::uuid& destination, const bool pad, const bool fluff)
{
epee::byte_slice blob = make_tx_message(std::move(txs), pad, fluff).finalize_notify(NOTIFY_NEW_TRANSACTIONS::ID);
return p2p.send(std::move(blob), destination);
}

So it seems for every tx thats broadcasted the bytes of the tx are copied for each connection that receives the tx. So if we have 96 connections, then each tx will be fluffed to 95 peers (1 of them sent the tx to us).

It's easy to see how this can add up: 10GB / 95 = ~108MB which means we only need around 108MB of txs for this to potentially use 10GB with 96 connections (I don't know how big the tx pool was when the crashes happened).

It's more nuanced than I laid out there, each connection has a randomized timer so not all 95 will fluff at once for example and I haven't actually tested this so I might be missing something.

@vtnerd what do you think?

@selsta
Copy link
Collaborator Author

selsta commented May 6, 2024

@Boog900 The txpool reached around 100MB, but those transactions got propagated over the time of like an hour, they didn't get submitted all at once.

@Jayd603
Copy link

Jayd603 commented May 7, 2024

[1842267.939549] Out of memory: Killed process 37137 (monerod) total-vm:292086252kB, anon-rss:57011624kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:507476kB oom_score_adj:0

How much memory do I need? sheesh. : - )

@thilool
Copy link

thilool commented May 8, 2024

Do I see correct that your VM killed the process with 292GB available? Or did monerod use that much memory?
Did you run other memory intensive processes and monerod was just the first to be killed?

Obviously seems to be a more widespread issue.

Is there anything I can do to help? Logging certain processes for example which helps to isolate the problem or helps to find a fix?

@selsta
Copy link
Collaborator Author

selsta commented May 8, 2024

@thilool @Jayd603 how many incoming / outgoing connections to you approximately have?

I'm curious if there was anyone who had this issue with no incoming connections.

@thilool
Copy link

thilool commented May 9, 2024

currently I am running around 40-60(out) and 100-150(in).
my OOM_adj seems to have worked so far and now I see monerod sitting at 10.2GB of memory usage. I'll keep you posted on that.

monerod managed to get through 2 failed 'parse transaction from blob' which before was often the trigger to a OOM kill.

see the status below:

2024-05-06 21:42:51.443 I difficulty: 271257517664
2024-05-06 21:42:51.499 I ###### REORGANIZE on height: 3143297 of 3143297 with cum_difficulty 350902947730358755
2024-05-06 21:42:51.499 I alternative blockchain size: 2 with cum_difficulty 350903219324713192
2024-05-06 21:42:52.986 I ----- BLOCK ADDED AS ALTERNATIVE ON HEIGHT 3143297
2024-05-06 21:42:52.986 I id: <7a53b19e2e794c2d012cbfb5e7ee79830668d893c9f4fe660065c76f5fd8cae8>
2024-05-06 21:42:52.986 I PoW:
2024-05-06 21:42:52.986 I difficulty: 271257517664
2024-05-06 21:42:54.319 I REORGANIZE SUCCESS! on height: 3143297, new blockchain size: 3143299
2024-05-06 21:42:54.372 I Synced 3143299/3143299
2024-05-06 22:27:54.860 I ----- BLOCK ADDED AS ALTERNATIVE ON HEIGHT 3143314
2024-05-06 22:27:54.860 I id: <74b6e103c62b0c44389a2c175fbc8cd9b31b33f3d692930963e883197d0d7814>
2024-05-06 22:27:54.860 I PoW:
2024-05-06 22:27:54.860 I difficulty: 266444980508
2024-05-07 10:47:23.348 E Failed to parse transaction from blob
2024-05-07 15:19:42.640 E Failed to parse transaction from blob
2024-05-07 23:42:56.943 I [batch] DB resize needed
2024-05-07 23:42:57.534 I LMDB Mapsize increased. Old: 216699MiB, New: 217723MiB
2024-05-08 14:42:51.569 I ----- BLOCK ADDED AS ALTERNATIVE ON HEIGHT 3144474
2024-05-08 14:42:51.575 I id:
2024-05-08 14:42:51.575 I PoW:
2024-05-08 14:42:51.575 I difficulty: 235033673331
2024-05-08 14:44:57.949 I ###### REORGANIZE on height: 3144474 of 3144474 with cum_difficulty 351195806642158375
2024-05-08 14:44:57.949 I alternative blockchain size: 2 with cum_difficulty 351196041538715801
2024-05-08 14:44:58.442 I ----- BLOCK ADDED AS ALTERNATIVE ON HEIGHT 3144474
2024-05-08 14:44:58.442 I id: <15d5f24014d151a542c2b27c5a78d0d89bb853a06aeabc23c7a042593d990dc0>
2024-05-08 14:44:58.442 I PoW: <8d2495dd5ba535538b17c3ced5b515ea43737744285ce21cbeaae00200000000>
2024-05-08 14:44:58.442 I difficulty: 235033673331
2024-05-08 14:44:59.533 I REORGANIZE SUCCESS! on height: 3144474, new blockchain size: 3144476
2024-05-09 01:21:02.822 W Attempt to get block from height 3144803 failed -- block not in db
2024-05-09 06:47:44.252 E Transaction not found in pool
status
Height: 3145017/3145017 (100.0%) on mainnet, smart mining at 0 H/s, net hash 1.98 GH/s, v16, 43(out)+112(in) connections, uptime 5d 16h 18m 6s

@selsta
Copy link
Collaborator Author

selsta commented May 9, 2024

monerod managed to get through 2 failed 'parse transaction from blob' which before was often the trigger to a OOM kill.

"parse transaction from blob" is unrelated

@Jayd603
Copy link

Jayd603 commented May 9, 2024

I had 64GB RAM in a VM, monerod was running under Docker, got OOM, docker container wasn't set to auto restart. Now it is, so at least when it crashes it can restart itself.

@Jayd603
Copy link

Jayd603 commented May 10, 2024

@thilool @Jayd603 how many incoming / outgoing connections to you approximately have?

I'm curious if there was anyone who had this issue with no incoming connections.

root@ce015ec34800:~/xmr# netstat -na | egrep -c EST
1224

I've seen it a good amount higher.

@chaserene
Copy link

I'm curious if there was anyone who had this issue with no incoming connections.

@selsta I had an unusually low number of incoming connections during the May 2 consolidation flood, which couldn't be helped even with multiple restarts and a good number of --add-priority-nodes. my node was syncing so slowly that it looked like it had stopped syncing. meanwhile, monerod's resource utilization was ridiculously low (350 MB ram, 0-1% CPU), so it's possible that for some reason I couldn't connect to any good peers. some people suggested that my node may have been stuck on a bad block, so I popped the height back to before the flood, but this didn't help either. so I just waited and eventually it synced. once it was syncing properly, peer count was also healthy again.

while stalled, my txpool was empty all the time and the best known height alternated between my local height and the actual height I saw on block explorers.

@selsta
Copy link
Collaborator Author

selsta commented May 11, 2024

@chaserene What about the amount of incoming and outgoing connections before your node "crashed" / got killed for the first time, not afterwards?

@spackle-xmr
Copy link

Is limiting in/out peers the best potential recommendation for node operators at this time?
On that topic, can anyone confirm that lowering --max-txpool-weight from the default will cause disconnections?

@selsta
Copy link
Collaborator Author

selsta commented May 11, 2024

@spackle-xmr limiting in / out is currently a theory, it hasn't been confirmed to work yet.

On that topic, can anyone confirm that lowering --max-txpool-weight from the default will cause disconnections?

I have seen that behaviour, yes. I had troubles getting incoming connections.

@Boog900
Copy link
Contributor

Boog900 commented May 12, 2024

The txpool reached around 100MB, but those transactions got propagated over the time of like an hour, they didn't get submitted all at once.

Ah ok, transactions are periodically re-broadcasted after being in the txpool for too long, so this could still happen if a lot of txs re-broadcasts line up.

On that topic, can anyone confirm that lowering --max-txpool-weight from the default will cause disconnections?

I have seen that behaviour, yes. I had troubles getting incoming connections.

I mentioned this is -dev a while ago but I'll put it here again for visibility. I think this is also because of how txs are relayed.

If you receive a tx, add it to your fluff queue, drop the tx from the txpool, then receive it again, it is added to the fluff queue a second time. When the fluff timer fires we would then broadcast a message with the same tx twice causing the peer to disconnect.

I have tested this and I have managed to make monerod broadcast a message with the same tx twice.

It makes sense that incoming connections are effected more as they have a longer average fluff timer, 5 seconds instead of 2.5 seconds for outbound. Longer time to receive the tx again and add it to the queue.

@selsta
Copy link
Collaborator Author

selsta commented May 12, 2024

@Boog900

Ah ok, transactions are periodically re-broadcasted after being in the txpool for too long, so this could still happen if a lot of txs re-broadcasts line up.

The txpool was empty again after around 12h, would you hat cause re-broadcasts? Also any ideas to improve efficiency here?

If you receive a tx, add it to your fluff queue, drop the tx from the txpool, then receive it again, it is added to the fluff queue a second time. When the fluff timer fires we would then broadcast a message with the same tx twice causing the peer to disconnect.

Would it be possible to add some check to the fluff queue to disallow duplicate transactions?

It makes sense that incoming connections are effected more as they have a longer average fluff timer, 5 seconds instead of 2.5 seconds for outbound. Longer time to receive the tx again and add it to the queue.

I had troubles keeping both outgoing and incoming connections, I'm not sure if incoming had more issues but what you are saying seems plausible.

@Boog900
Copy link
Contributor

Boog900 commented May 12, 2024

The txpool was empty again after around 12h, would you hat cause re-broadcasts?

sorry, I don't know what you mean.

Also any ideas to improve efficiency here?

Yes, changing how txs are broadcasted around the network. Instead of sending the whole tx blob to every connection (except the one that sent the tx to us first) send the tx hash and the peer can send a request for that tx if it doesn't already have it.

This would require a couple new p2p messages, adding support for these can be done before the changes to how txs are broadcasted, which is how I would recommend doing it.

I am planning to make a proposal for this in a separate issue as I am pretty sure this would significantly reduce the amount of bandwidth used for nodes but I want to test the amount of bandwidth wasted first to see just how much we can save before I do.

For people interested, the way I am planning to test to see how much bandwidth is wasted is by recording data sent and received by monerod in a fully synced state. Then for each link I will monitor the txs sent along it, if the same tx is sent twice (in either direction), both the tx messages will count as wasted as it means the first message of the 2 was not the first time the node received that tx.

Would it be possible to add some check to the fluff queue to disallow duplicate transactions?

Probably but the queue is kept as raw bytes so we would need to keep a separate list of tx ids in the queue or compare the raw bytes of every tx in the queue to the one we are about to broadcast.

If we adopted the new way to broadcast then this would be a lot easier as the queue would be tx ids not raw bytes.

@selsta
Copy link
Collaborator Author

selsta commented May 12, 2024

sorry, I don't know what you mean.

Sorry I mistyped and didn't proofread. I wanted to ask if transactions are typically re-broadcasted within the first 12h of being in the txpool, because that's usually the timeframe of the txpool clearing up again.

I am planning to make a proposal for this in a separate issue as I am pretty sure this would significantly reduce the amount of bandwidth used for nodes but I want to test the amount of bandwidth wasted first to see just how much we can save before I do.

That sounds like a good idea!


Also regarding the drop peer issue with --max-txpool-weight, is #8916 related? Because the change was only done recently, maybe it makes sense to revert it?

@Boog900
Copy link
Contributor

Boog900 commented May 12, 2024

I wanted to ask if transactions are typically re-broadcasted within the first 12h of being in the txpool, because that's usually the timeframe of the txpool clearing up again.

Yes, txs are rebroadcasted after 5 mins, then 10, then 15, so after every broadcast the wait between them increases 5 mins, up to 4 hours where it is capped.

Also regarding the drop peer issue with --max-txpool-weight, is #8916 related? Because the change was only done recently, maybe it makes sense to revert it?

If people have a need to have an extremely low txpool we could temporarily, but we would need people to update to fix this issue, not just the people actually having the issue. I think it makes more sense to just recommend people not set the txpool weight too low, until we fix monerod sending duplicate txs.

@chaserene
Copy link

@chaserene What about the amount of incoming and outgoing connections before your node "crashed" / got killed for the first time, not afterwards?

@selsta I don't remember how many connections my monerod had before I first restarted it, but I checked my logs and I see quite a few of the following around that time:

  • monerod is now disconnected from the network
  • There were 0 blocks in the last 90 minutes, there might be large hash rate changes, or we might be partitioned, cut off from the Monero network or under attack, or your computer's time is off. Or it could be just sheer bad luck.
  • Host _._._._ blocked.

@selsta
Copy link
Collaborator Author

selsta commented May 13, 2024

@chaserene I'm not sure if your issue is the same as the issue I opened here. I did not see any of these messages on my nodes. I only had issues with high RAM usage.

What kind of hardware do you have? What daemon config do you use?

@vtnerd
Copy link
Contributor

vtnerd commented May 17, 2024

If you receive a tx, add it to your fluff queue, drop the tx from the txpool, then receive it again, it is added to the fluff queue a second time. When the fluff timer fires we would then broadcast a message with the same tx twice causing the peer to disconnect.

Is it possible to trigger this (the bolded part) remotely, or do you have to manually drop the tx from the txpool?

@selsta
Copy link
Collaborator Author

selsta commented May 17, 2024

@vtnerd for this to happen there has to be a large txpool backlog (for example 1MB) and the node has to be set to a smaller txpool limit with --max-txpool-weight. Then the node drops / prunes transactions once the limit is reached.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants