-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accelerate R1CSShape::multiply_vec_uniform #297
Comments
Current thinking is that we can use some preprocessing on the uniform A, B, C matrices to determine what they'll end up reading and writing. A given sparse entry's column in {A, B, C} determines which Currently we parallelize across rows because it is easy:
My thinking is that we can instead group by column, which would allow preprocessing the witness segment Here's the column accesses by the different sparse small M matrices {A, B, C}:
If these do not intersect with the rows too much we could get good speedups switching the direction of parallelism. Further, 50-90% of the values in the result.iter_mut().enumerate().for_each(|(i, out)| {
*out += mul_0_1_optimized(&val, &full_witness_vector[witness_offset + i]);
}); |
fyi: |
R1CSShape::multiply_vec_uniform
(here) generates theAz
/Bz
/Cz
vectors for Spartan.Z
is the full witness. A, B, C are the R1CS matrices.Performance seems to vary greatly across machines / architectures but this function cost scales slightly super-linearly with cycle count.
In the case of our instance of Spartan A, B, C are highly uniform: they have a "sparse diagonal" construction. Because the same R1CS circuit is run per step of the CPU, we have a concept of a "single step shape" (represented by a,b,c,d below).
The text was updated successfully, but these errors were encountered: