Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: better error handling #1751

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

fix: better error handling #1751

wants to merge 1 commit into from

Conversation

vmx
Copy link
Contributor

@vmx vmx commented Apr 11, 2024

#1748 fixed a problem
where errors (asserts) where not properly propagated upwards. There were
threads just hanging without making any progress.

I usually follow the development paradigm of "Make it work, make it right,
make it fast" 1, where I interpret the "make it right" as "make it simple".
With "simple" I mean things like clear code, easy to follow, no new paradigms,
following the style of the existing code base.

I'd like to be more concrete about that for this change, on why I find this
version "simpler".

The core issue was an assert within a spawned thread in a yastl pool. Someone
new to the code base coould re-introduce such a problem easily, hence it would
be best if we can actually prevent that. With removing the yastl thread pool,
where it's not really needed for better performance, we can easily prevent
that.

We see three difference uses of the yastl thread pool within the proof.rs.
One is using it to pipeline operations, another one is for a highly parallizing
an operation. Usually for data parallism we use Rayon in this code base. In
the pipelining use case, there is Rayon used within pipeline. The yastl thread
pool will spawn only a limited set of threads, so this looks alright. For the
highly parallelised case, we only use yastl and not Rayon (although we should
look into just Rayon instead).

The third use, where we mix yastl and Rayon for highly parallel operations is
removed with this PR. This is intentional. Having two thread pools, which by
default use as many threads as cores could easily overprovision a system. This
could potentially lead to unintended slow downs. Another benefit of just using
Rayon is, that the number of threads can be controlled with an environment
variable. This gives more control when several instances of this code is run
in parallel, which is the case for some storage providers.

Switching back from errors to asserts. I don't know the reason, why the
asserts were changed to errors. Hence I'm switching it back to asserts, as
now it's easily possible. If one looks at the diff between the version prior
to #1728 and this, then the diff is pretty minimal an straight forward. Also
one verification again runs only in debug mode and not also in release mode.
Though if errors are desired, they can be easily be introduce by switching the
asserts to anyhow's ensure!() macro.

Following the error handling needs less context. With this change, the asserts
are happening in the code right away, it works the way people coding Rust are
used to. Prior to this change, more context is needed. Taking the "invalid
comm_d"-error as an example. It happens on line 456 2. Now reading through
the code what it means: first look for the invalid_comm_d variable. It
defines an instance of the InvalidChallengeCoordinate struct, which we take
a quick look at and see that's a local one, for specifcally error handling.
That instance is wrapped in an Arc which is interesting at a first glance,
as in Rust shared state is usually tried to be avoided where possible. When
looking into the usage of invalid_comm_d it becomes clear that we need the
Arc, as the might assign a different value to it in the error case. We need
a mutable reference here, but again usually in Rust immutable types are
preferred. So if we can avoid the Arc as well some mutability, it's a win
for being more Rust idiomatic, hence easier for people familiar with Rust.

For me that falls under the "code is written once, but read many times"
category. So making it easy to read is a win.

In the lower part of the change, the yastl usage for high data parallelism is
removed in favour of Rayon, see above for some of the reasons. Also using
Rayon here seems to use the thread pool more efficiently, at least on the
machine I've tested it on (with 64 threads).

When looking at the diff between this change and the commit prior to #1748

git diff https://github.com/filecoin-project/rust-fil-proofs/commit/8f5bd86be36a55e33b9b293ba22ea13ca1f28163.. -w -- storage-proofs-porep/src/stacked/vanilla/proof.rs

or via GitHub.

Then the changes are very minimal, which I also count as a sign for being
"simpler".

@vmx
Copy link
Contributor Author

vmx commented Apr 11, 2024

This PR is easiest review without whitespace: https://github.com/filecoin-project/rust-fil-proofs/pull/1751/files?diff=split&w=1

#1748 fixed a problem
where errors (asserts) where not properly propagated upwards. There were
threads just hanging without making any progress.

I usually follow the development paradigm of "Make it work, make it right,
make it fast" [1], where I interpret the "make it right" as "make it simple".
With "simple" I mean things like clear code, easy to follow, no new paradigms,
following the style of the existing code base.

I'd like to be more concrete about that for this change, on why I find this
version "simpler".

The core issue was an assert within a spawned thread in a yastl pool. Someone
new to the code base coould re-introduce such a problem easily, hence it would
be best if we can actually prevent that. With removing the yastl thread pool,
where it's not really needed for better performance, we can easily prevent
that.

We see three difference uses of the yastl thread pool within the `proof.rs`.
One is using it to pipeline operations, another one is for a highly parallizing
an operation. Usually for data parallism we use Rayon in this code base. In
the pipelining use case, there is Rayon used within pipeline. The yastl thread
pool will spawn only a limited set of threads, so this looks alright. For the
highly parallelised case, we only use yastl and not Rayon (although we should
look into just Rayon instead).

The third use, where we mix yastl and Rayon for highly parallel operations is
removed with this PR. This is intentional. Having two thread pools, which by
default use as many threads as cores could easily overprovision a system. This
could potentially lead to unintended slow downs. Another benefit of just using
Rayon is, that the number of threads can be controlled with an environment
variable. This gives more control when several instances of this code is run
in parallel, which is the case for some storage providers.

Switching back from errors to asserts. I don't know the reason, why the
asserts were changed to errors. Hence I'm switching it back to asserts, as
now it's easily possible. If one looks at the diff between the version prior
to #1728 and this, then the diff is pretty minimal an straight forward. Also
one verification again runs only in debug mode and not also in release mode.
Though if errors are desired, they can be easily be introduce by switching the
asserts to anyhow's `ensure!()` macro.

Following the error handling needs less context. With this change, the asserts
are happening in the code right away, it works the way people coding Rust are
used to. Prior to this change, more context is needed. Taking the "invalid
comm_d"-error as an example. It happens on line 456 [2]. Now reading through
the code what it means: first look for the `invalid_comm_d` variable. It
defines an instance of the `InvalidChallengeCoordinate` struct, which we take
a quick look at and see that's a local one, for specifcally error handling.
That instance is wrapped in an Arc which is interesting at a first glance,
as in Rust shared state is usually tried to be avoided where possible. When
looking into the usage of `invalid_comm_d` it becomes clear that we need the
`Arc`, as the might assign a different value to it in the error case. We need
a mutable reference here, but again usually in Rust immutable types are
preferred. So if we can avoid the `Arc` as well some mutability, it's a win
for being more Rust idiomatic, hence easier for people familiar with Rust.

For me that falls under the "code is written once, but read many times"
category. So making it easy to read is a win.

In the lower part of the change, the yastl usage for high data parallelism is
removed in favour of Rayon, see above for some of the reasons. Also using
Rayon here seems to use the thread pool more efficiently, at least on the
machine I've tested it on (with 64 threads).

When looking at the diff between this change and the commit prior to #1748

    git diff 8f5bd86.. -w -- storage-proofs-porep/src/stacked/vanilla/proof.rs

Then the changes are very minimal, which I also count as a sign for being
"simpler".

[1]: https://wiki.c2.com/?MakeItWorkMakeItRightMakeItFast
[2]: https://github.com/filecoin-project/rust-fil-proofs/blob/3f018b51b6327b135830899d237a7ba181942d7e/storage-proofs-porep/src/stacked/vanilla/proof.rs#L456-L457
@vmx vmx changed the base branch from remove-wrong-assert to master April 15, 2024 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant