SyncSubmitBlock #4245

ruseinov · 2024-04-19T13:13:47Z

The block submission to the swarm is done.

The sanity checks are not, as this requires introducing some extra abstractions and infra.

alt-ivan · 2024-05-08T10:17:56Z

Hi!

Unfortunately, seems like the chain head still doesn't get updated. I'm not sure why. I've set tracing to debug (info for the most noise-producing stuff) and I'm attaching a log. Hopefully it's at least a little bit helpful.

forestlog.txt

The commit hash from which Forest was built for testing is 3a1291d4655fb641a85d25dd288cd3f19d5c2c2d .
The testing was performed on a local 2K devnet, with Forest bootstrapped and synced via a Lotus node.

Tested using custom miner that is tested against Lotus and works on the same devnet on the Lotus node.

ruseinov · 2024-05-08T11:39:56Z

@alt-ivan how do I reproduce this?
I’d love to see the steps both for Lotus and Forest to find the culprit. Thank you in advance!

alt-ivan · 2024-05-08T13:39:42Z

@ruseinov

Hi Roman,

I don't currently have a great method of reproducing to share. I tried quickly jury-rigging lotus-miner to work with Forest as means of being able to create a repro but I haven't had much success yet.

Here's the RPC sequence used by the custom miner:

ChainHead
MinerGetBaseInfo -> Beacon entries & is eligible for mining
WalletSign -> VRFProof for computing election proof
MinerGetBaseInfo -> Sector info for winning PoST
WalletSign -> Signing computed block ticket hash
MpoolSelect -> Messages to include in block
MinerCreateBlock
SyncSubmitBlock -> With result from MinerCreateBlock
ChainHead

The only change I made is to hard-code block_delay_secs from StateGetNetworkParams as that RPC isn't implemented in Forest yet. I used the same value returned by the Lotus node.

I have also manually run each of these against both Lotus/Forest synced to the same tipset/head and all RPCs listed here return identical responses on both Forest and Lotus nodes for the same inputs.

If you would find it helpful, I can share the request/response for each of these as well.

Edit:

MinerCreateBlock

SyncSubmitBlock -> With result from MinerCreateBlock

These should be enough for manual testing if you have Forest/Lotus synced to the same head. I suppose you could at that point take MinerCreateBlock result from lotus-miner and manually feed it into both nodes' SyncSubmitBlock RPCs

ruseinov · 2024-05-10T10:19:47Z

@alt-ivan
Thanks for lining that up for me, it's helpful. The requests and responses would be nice to see as well if possible.

With the sequence above - how can we be sure that the head is never set?

Bear in mind that the way this works in forest is that a block is sent to the message queue for processing, so there are no guarantees as to exactly when it actually gets processed, though it should be quite fast. Have you tried verifying that the given block never makes it into the chain?

ruseinov · 2024-05-10T10:27:23Z

3a1291d

This does not seem to exist anymore, could you try the latest main? Though as long as it had my latest SyncSubmitBlock change - it should not matter.

Edit: I have double checked the files there and it seems correct.

alt-ivan · 2024-05-10T16:07:56Z

@ruseinov

The requests and responses would be nice to see as well if possible.

Seems like I didn't save them after all 🤦 ... I'll redo the sequence and send the requests/responses. Sorry for the delay.

With the sequence above - how can we be sure that the head is never set?

Calling ChainHead returns the same tipset as before the SyncSubmitBlock call. Also, SyncSubmitBlock can be called over and over again with the same block and it will return a successful result, but the blocks seems to get "eaten" somewhere.

Bear in mind that the way this works in forest is that a block is sent to the message queue for processing, so there are no guarantees as to exactly when it actually gets processed, though it should be quite fast. Have you tried verifying that the given block never makes it into the chain?

I figured as much when looking at the codebase. I only stepped through the SyncSubmitBlock impl and there I saw that the created tipset doesn't get a CID assigned to it at that point in time. After that I didn't know where to follow the propagation of the block to see if it gets recorded anywhere using GetChainBlock, because I couldn't find its CID if it did.

ruseinov · 2024-05-14T11:01:21Z

Seems like I didn't save them after all 🤦 ... I'll redo the sequence and send the requests/responses. Sorry for the delay.

Please do!

ruseinov · 2024-05-14T11:03:20Z

I figured as much when looking at the codebase. I only stepped through the SyncSubmitBlock impl and there I saw that the created tipset doesn't get a CID assigned to it at that point in time. After that I didn't know where to follow the propagation of the block to see if it gets recorded anywhere using GetChainBlock, because I couldn't find its CID if it did.

That is helpful, let me check..

ruseinov · 2024-05-14T12:16:46Z

I figured as much when looking at the codebase. I only stepped through the SyncSubmitBlock impl and there I saw that the created tipset doesn't get a CID assigned to it at that point in time. After that I didn't know where to follow the propagation of the block to see if it gets recorded anywhere using GetChainBlock, because I couldn't find its CID if it did.

That is helpful, let me check..

@alt-ivan in fact key is lazily initialized on fn key() call.

forest/src/blocks/tipset.rs

Lines 301 to 304 in 4a6ad5c

    
           pub fn key(&self) -> &TipsetKey { 
        
               self.key 
        
                   .get_or_init(|| TipsetKey::from(self.headers.iter_ne().map(|h| *h.cid()).collect_vec())) 
        
           }

So you can basically check the CIDs of headers to see which ones to monitor.

For further debugging:

Here's where tipset channel is being consumed:

forest/src/chain_sync/tipset_syncer.rs

Line 348 in 32ae416

match self.tipsets.as_mut().poll_next(cx) {

Here's where the rest of the logic happens:

forest/src/chain_sync/tipset_syncer.rs

Lines 375 to 573 in 32ae416

    
               match self.state { 
        
                   TipsetProcessorState::Idle => { 
        
                       // Set the state to FindRange if we have a tipset to sync towards 
        
                       // Consume the tipsets received, start syncing the heaviest tipset group, and 
        
                       // discard the rest 
        
                       if let Some(((epoch, parents), heaviest_tipset_group)) = grouped_tipsets 
        
                           .into_iter() 
        
                           .max_by(|(_, a), (_, b)| a.weight_cmp(b)) 
        
                       { 
        
                           trace!("Finding range for tipset epoch = {}", epoch); 
        
                           self.state = TipsetProcessorState::FindRange { 
        
                               epoch, 
        
                               parents, 
        
                               range_finder: self.find_range(heaviest_tipset_group), 
        
                               current_sync: None, 
        
                               next_sync: None, 
        
                           }; 
        
                       } 
        
                   } 
        
                   TipsetProcessorState::FindRange { 
        
                       ref mut epoch, 
        
                       ref mut parents, 
        
                       ref mut current_sync, 
        
                       ref mut next_sync, 
        
                       .. 
        
                   } => { 
        
                       // Add tipsets to the current sync cache 
        
                       if let Some(tipset_group) = grouped_tipsets.remove(&(*epoch, parents.clone())) { 
        
                           match current_sync { 
        
                               Some(cs) => { 
        
                                   // This check is redundant 
        
                                   if cs.is_mergeable(&tipset_group) { 
        
                                       cs.merge(tipset_group); 
        
                                   } 
        
                               } 
        
                               None => *current_sync = Some(tipset_group), 
        
                           } 
        
                       } 
        
                       // Update or replace the next sync 
        
                       if let Some(heaviest_tipset_group) = grouped_tipsets 
        
                           .into_iter() 
        
                           .max_by(|(_, a), (_, b)| a.weight_cmp(b)) 
        
                           .map(|(_, group)| group) 
        
                       { 
        
                           // Find the heaviest tipset group and either merge it with the 
        
                           // tipset group in the next_sync or replace it. 
        
                           match next_sync { 
        
                               None => *next_sync = Some(heaviest_tipset_group), 
        
                               Some(ns) => { 
        
                                   if ns.is_mergeable(&heaviest_tipset_group) { 
        
                                       // Both tipsets groups have the same epoch & parents, so merge them 
        
                                       ns.merge(heaviest_tipset_group); 
        
                                   } else if heaviest_tipset_group.is_heavier_than(ns) { 
        
                                       // The tipset group received is heavier than the one saved, replace 
        
                                       // it. 
        
                                       *next_sync = Some(heaviest_tipset_group); 
        
                                   } 
        
                                   // Otherwise, drop the heaviest tipset group 
        
                               } 
        
                           } 
        
                       } 
        
                   } 
        
                   TipsetProcessorState::SyncRange { 
        
                       ref mut range_syncer, 
        
                       ref mut next_sync, 
        
                   } => { 
        
                       // Add tipsets to the current tipset range syncer 
        
                       if let Some(tipset_group) = grouped_tipsets.remove(&( 
        
                           range_syncer.proposed_head_epoch(), 
        
                           range_syncer.proposed_head_parents(), 
        
                       )) { 
        
                           tipset_group.tipsets().into_iter().for_each(|ts| { 
        
                               let tipset_key = ts.key().clone(); 
        
                               match range_syncer.add_tipset(ts) { 
        
                                   Ok(added) => { 
        
                                       if added { 
        
                                           trace!("Successfully added tipset [key = {:?}] to running range syncer", tipset_key); 
        
                                       } 
        
                                   } 
        
                                   Err(why) => { 
        
                                       error!("Adding tipset to range syncer failed: {}", why); 
        
                                   } 
        
                               } 
        
                           }); 
        
                       } 
        
                       // Update or replace the next sync 
        
                       if let Some(heaviest_tipset_group) = grouped_tipsets 
        
                           .into_iter() 
        
                           .max_by(|(_, a), (_, b)| a.weight_cmp(b)) 
        
                           .map(|(_, group)| group) 
        
                       { 
        
                           // Find the heaviest tipset group and either merge it with the 
        
                           // tipset group in the next_sync or replace it. 
        
                           match next_sync { 
        
                               None => *next_sync = Some(heaviest_tipset_group), 
        
                               Some(ns) => { 
        
                                   if ns.is_mergeable(&heaviest_tipset_group) { 
        
                                       // Both tipsets groups have the same epoch & parents, so merge them 
        
                                       ns.merge(heaviest_tipset_group); 
        
                                   } else if heaviest_tipset_group.is_heavier_than(ns) { 
        
                                       // The tipset group received is heavier than the one saved, replace 
        
                                       // it. 
        
                                       *next_sync = Some(heaviest_tipset_group); 
        
                                   } else { 
        
                                       // Otherwise, drop the heaviest tipset group 
        
                                       trace!("Dropping collected tipset groups"); 
        
                                   } 
        
                               } 
        
                           } 
        
                       } 
        
                   } 
        
               } 
        
               // Drive underlying futures to completion 
        
               loop { 
        
                   match self.state { 
        
                       TipsetProcessorState::Idle => { 
        
                           // Tipsets stream must have already registered with the task driver 
        
                           // to wake up this task when a new tipset is available. 
        
                           // Otherwise, the future can stall. 
        
                           return Poll::Pending; 
        
                       } 
        
                       TipsetProcessorState::FindRange { 
        
                           ref mut range_finder, 
        
                           ref mut current_sync, 
        
                           ref mut next_sync, 
        
                           .. 
        
                       } => match range_finder.take() { 
        
                           Some(mut range_syncer) => { 
        
                               debug!( 
        
                                   "Determined epoch range for next sync: [{}, {}]", 
        
                                   range_syncer.current_head.epoch(), 
        
                                   range_syncer.proposed_head.epoch(), 
        
                               ); 
        
                               // Add current_sync to the yielded range syncer. 
        
                               // These tipsets match the range's [epoch, parents] 
        
                               if let Some(tipset_group) = current_sync.take() { 
        
                                   tipset_group.tipsets().into_iter().for_each(|ts| { 
        
                                       if let Err(why) = range_syncer.add_tipset(ts) { 
        
                                           error!("Adding tipset to range syncer failed: {}", why); 
        
                                       } 
        
                                   }); 
        
                               } 
        
                               self.state = TipsetProcessorState::SyncRange { 
        
                                   range_syncer: Box::pin(range_syncer), 
        
                                   next_sync: next_sync.take(), 
        
                               }; 
        
                           } 
        
                           None => { 
        
                               self.state = TipsetProcessorState::Idle; 
        
                           } 
        
                       }, 
        
                       TipsetProcessorState::SyncRange { 
        
                           ref mut range_syncer, 
        
                           ref mut next_sync, 
        
                       } => { 
        
                           let proposed_head_epoch = range_syncer.proposed_head.epoch(); 
        
                           let current_head_epoch = range_syncer.current_head.epoch(); 
        
                           // Drive the range_syncer to completion 
        
                           match range_syncer.as_mut().poll(cx) { 
        
                               Poll::Ready(Ok(_)) => { 
        
                                   metrics::HEAD_EPOCH.set(proposed_head_epoch); 
        
                                   info!( 
        
                                       "Successfully synced tipset range: [{}, {}]", 
        
                                       current_head_epoch, proposed_head_epoch, 
        
                                   ); 
        
                               } 
        
                               Poll::Ready(Err(why)) => { 
        
                                   metrics::TIPSET_RANGE_SYNC_FAILURE_TOTAL.inc(); 
        
                                   error!( 
        
                                       "Syncing tipset range [{}, {}] failed: {}", 
        
                                       current_head_epoch, proposed_head_epoch, why, 
        
                                   ); 
        
                               } 
        
                               Poll::Pending => return Poll::Pending, 
        
                           } 
        
                           // Move to the next state 
        
                           match next_sync.take() { 
        
                               // This tipset group is the heaviest that has been received while 
        
                               // rnning this tipset range syncer, so start syncing it 
        
                               Some(tipset_group) => { 
        
                                   self.state = TipsetProcessorState::FindRange { 
        
                                       epoch: tipset_group.epoch(), 
        
                                       parents: tipset_group.parents(), 
        
                                       range_finder: self.find_range(tipset_group), 
        
                                       current_sync: None, 
        
                                       next_sync: None, 
        
                                   }; 
        
                               } 
        
                               None => { 
        
                                   self.state = TipsetProcessorState::Idle; 
        
                               } 
        
                           } 
        
                       } 
        
                   } 
        
               } 
        
           }

alt-ivan · 2024-05-14T16:34:44Z

@ruseinov

I tried to adapt lotus-miner so I could give you easy steps to reproduce (to stop at MinerCreateBlock so you can feed the output of that into SyncSubmitBlock) but I ran into so many speedbumps that I gave up for now :/

Some new interesting info:

So you can basically check the CIDs of headers to see which ones to monitor.

When I connect Forest/Lotus and execute SyncSubmitBlock on Forest, it propagates that block to Lotus and Lotus now has the correct chain head with the latest tipset. Forest also receives this block, seemingly back from Lotus via gossip, because I can query it via GetChainBlock and see it. However, Forest's ChainHead still points to the previous epoch tipset.

When I don't connect Forest/Lotus and execute SyncSubmitBlock on Forest, then that block doesn't get stored and I cannot query it via GetChainBlock

Here's where tipset channel is being consumed:

I added tracing into these locations and nothing happens. Doing some more eye-level examination I saw this:

    fn follow(&self, tipset_opt: Option<FullTipset>) -> ChainMuxerFuture<(), ChainMuxerError> {
        // Instantiate a TipsetProcessor

But in devnet it seems that Forest never enters follow mode #3089
From what I saw, tipset_receiver only gets consumed here via TipsetProcessor? I might be wrong though.

This may also explain other sync issues with Lotus in devnet i.e. Forest often doesn't sync fully with Lotus with similar "symptoms" (ChainHead returns earlier tipset but later one exists in store)

ruseinov · 2024-05-14T17:25:53Z

@alt-ivan Thanks a lot for looking into this, no it starts to make a lot of sense!

When I don't connect Forest/Lotus and execute SyncSubmitBlock on Forest, then that block doesn't get stored and I cannot query it via GetChainBlock

That's probably due to the fact that the actual network send works, but the TipsetProcessor part does not work.

But in devnet it seems that Forest never enters follow mode #3089

You are correct, it seems it never does. Good spot! And logically you can submit a block if the node is never "in sync". I guess we'll have to deal with it to fix this.

ruseinov · 2024-05-15T13:49:10Z

@alt-ivan Can you please try going through the process using this commit: #4328 ?
It's going to be merged to master automatically in a little bit. This should fix the follow issue.

alt-ivan · 2024-05-15T14:03:47Z

@ruseinov
I will be testing shortly and reporting back here. Thanks!

alt-ivan · 2024-05-15T15:25:15Z

@ruseinov
Good news! I tried it on the latest commit and mining works!

Couple of notes:

It seems that Lotus processes the block received via SyncSubmitBlock synchronously while Forest does it asynchronously, leading to a small delay between SyncSubmitBlock and the tipset being processed/chain head getting updated. ~~This shouldn't be a big deal in practice because I've just added a small delay to the miner to tackle this issue, but I assume it could potentially be a problem since this delay isn't deterministic.~~ In fact I can just tweak the miner to wait for tipset at expected height so you can ignore this altogether :D
The Forest node now enters follow mode and properly syncs up with Lotus even without active miners on Lotus, hooray!
Now we have a situation where Lotus fails some validation when syncing to Forest in devnet :) I'll paste Lotus logs below for posterity, but I won't expand on this here since it's outside the scope of this issue

lotus  | 2024-05-15T15:22:37.565Z       ERROR   chain   chain/sync_manager.go:255       error during sync in [bafy2bzaceajmcnxkqbwwe7fzw5ildicadw57kw2eyu2sprwynhqeklxlbaazs]: collectChain failed: chain linked to block marked previously as bad ([bafy2bzaceajmcnxkqbwwe7fzw5ildicadw57kw2eyu2sprwynhqeklxlbaazs], bafy2bzacedt364lbbw4xavfnmbka2k4r27245g3zc3kepzy2ejeyafyx7rio4) (reason: linked to bafy2bzaceaewkhk2ikz7d46ilbc7j2tbcfydcrl4u3qykgkank4th4x2xwu3q caused by: [bafy2bzacecf2hjz7sarl3qfphjn5u5mnbfmfbmeefepqp7atnzbc2gnrqkcz4] 1 error occurred:
lotus  |        * failed to validate blocks random beacon values:
lotus  |     github.com/filecoin-project/lotus/chain/consensus/filcns.(*FilecoinEC).ValidateBlock.func4
lotus  |         /lotus/chain/consensus/filcns/filecoin.go:248
lotus  |   - unexpected beacon round 7660175, expected 7660174 for epoch 180:
lotus  |     github.com/filecoin-project/lotus/chain/beacon.ValidateBlockValues
lotus  |         /lotus/chain/beacon/beacon.go:99

ruseinov · 2024-05-15T15:33:39Z

Good news! I tried it on the latest commit and mining works!

Nice!

It seems that Lotus processes the block received via SyncSubmitBlock synchronously while Forest does it asynchronously, leading to a small delay between SyncSubmitBlock and the tipset being processed/chain head getting updated. This shouldn't be a big deal in practice because I've just added a small delay to the miner to tackle this issue, but I assume it could potentially be a problem since this delay isn't deterministic

Perhaps the best option is to have some sort of retry mechanism there. It should not be long, but we currently have no means of making sure this is synchronous. However we're working on a new state machine that might allow us to solve this issue cc @lemmih . I'm happy to contribute that part of the codebase if need be, but for now we have to make do with the current setup.
Ah yeah, if you can wait for a tipset - all good then. I think limiting this to a sync operation is not necessary.

3. Now we have a situation where Lotus fails some validation when syncing to Forest in devnet :) I'll paste Lotus logs below for posterity, but I won't expand on this here since it's outside the scope of this issue

Gotta investigate. Would be great to have a semi-automated reproducible case.

ruseinov mentioned this issue Apr 19, 2024

[Feature] Filecoin.SyncSubmitBlock RPC. #4246

Merged

4 tasks

ruseinov mentioned this issue Apr 30, 2024

[Fix] SyncSubmitBlock - queue block for syncing. #4278

Merged

4 tasks

ruseinov mentioned this issue May 14, 2024

[Fix] Assign a key to FullTipset on from(block). #4323

Closed

4 tasks

This was referenced May 15, 2024

Lotus fails some checks syncing to Forest in devnet. #4329

Open

[Fix] Additional checks for Filecoin.SyncSubmitBlock. #4332

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SyncSubmitBlock #4245

SyncSubmitBlock #4245

ruseinov commented Apr 19, 2024 •

edited

alt-ivan commented May 8, 2024

ruseinov commented May 8, 2024

alt-ivan commented May 8, 2024 •

edited

ruseinov commented May 10, 2024

ruseinov commented May 10, 2024 •

edited

alt-ivan commented May 10, 2024

ruseinov commented May 14, 2024

ruseinov commented May 14, 2024

ruseinov commented May 14, 2024

alt-ivan commented May 14, 2024 •

edited

ruseinov commented May 14, 2024

ruseinov commented May 15, 2024

alt-ivan commented May 15, 2024

alt-ivan commented May 15, 2024 •

edited

ruseinov commented May 15, 2024 •

edited

SyncSubmitBlock #4245

SyncSubmitBlock #4245

Comments

ruseinov commented Apr 19, 2024 • edited

alt-ivan commented May 8, 2024

ruseinov commented May 8, 2024

alt-ivan commented May 8, 2024 • edited

ruseinov commented May 10, 2024

ruseinov commented May 10, 2024 • edited

alt-ivan commented May 10, 2024

ruseinov commented May 14, 2024

ruseinov commented May 14, 2024

ruseinov commented May 14, 2024

alt-ivan commented May 14, 2024 • edited

ruseinov commented May 14, 2024

ruseinov commented May 15, 2024

alt-ivan commented May 15, 2024

alt-ivan commented May 15, 2024 • edited

ruseinov commented May 15, 2024 • edited

ruseinov commented Apr 19, 2024 •

edited

alt-ivan commented May 8, 2024 •

edited

ruseinov commented May 10, 2024 •

edited

alt-ivan commented May 14, 2024 •

edited

alt-ivan commented May 15, 2024 •

edited

ruseinov commented May 15, 2024 •

edited