Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor errors in v3 FullMath.mulDiv #122

Open
barakman opened this issue May 30, 2021 · 10 comments
Open

Minor errors in v3 FullMath.mulDiv #122

barakman opened this issue May 30, 2021 · 10 comments

Comments

@barakman
Copy link

barakman commented May 30, 2021

I'm not permitted to open issues on the v3 repo for some reason, so posting it here instead.

In FullMath.mulDiv, there are several minor errors, mostly in the documentation.

I understand that this code was taken from https://xn--2-umb.com/21/muldiv (as is, I suppose), but you might still want to attend these errors.

There is a comment stating that variable twos is "Always >= 1".

Further below it, there is another comment stating that "If twos is zero, then it becomes one".

One of these two comments is obviously wrong - and that would be the second one.

But let's start with the first one - "Always >= 1":

At the point where it is mentioned, we know that twos >= 1 because we know that denominator > 0.

The expression denominator & -denominator is evaluated to 0 if and only if denominator == 0.

Hence twos, the highest power of 2 that denominator is divisible by, is at least 1.

It is worth mentioning that briefly, for example, "Always >= 1, because denominator > 0".

As for "If twos is zero, then it becomes one", it should actually be the opposite: "If twos is one, then it becomes zero".

More precisely (if you will), "If twos is 1, then its inverse (2**256 / twos) is equal to 2 ** 256, hence 0".

I would go further and use a new variable for this - something like twosInv, because there doesn't seem to be any good reason to "reuse" it (which just obfuscates the code IMO).

Finally, a minor coding error is that the function is declared as returns (uint256 result), and then ends with return result.

I understand the "incentive" for this, as the function consists of several return statements.

But specifically for that reason, I would actually remove the variable name from the function declaration, and just declare it inside the function.

I believe that the general idea for declaring the return-variable name, is when the function consists of a single return point (at the end of the function, obviously).

@barakman
Copy link
Author

barakman commented Jun 2, 2021

Another point to improve in this method - this time an actual performance improvement (and not just coding or documentation) - is that following this:

        assembly {
            prod1 := sub(prod1, gt(remainder, prod0))
            prod0 := sub(prod0, remainder)
        }

You can (once again) do this:

        // Handle non-overflow cases, 256 by 256 division
        if (prod1 == 0) {
            assembly {
                require(denominator > 0);
                result := div(prod0, denominator)
            }
            return result;
        }

This can save the gas used in the remaining part of the function, for cases where a * b is larger than 256 bits but a * b - a * b % denominator is not.

@recmo
Copy link

recmo commented Nov 28, 2021

@barakman Another point to improve in this method - this time an actual performance improvement (and not just coding or documentation) - is that following this:

I am aware and considered this when I wrote the function.

This can save the gas used in the remaining part of the function, for cases where a * b is larger than 256 bits but a * b - a * b % denominator is not.

And it would add more gas in all other cases, which are (under reasonable assumptions) overwhelmingly more likely. So it would be a net loss.

Optimization is about the expected gas cost, which depends on the probability distribution of your input values. This is context specific, and that's why it's important to benchmark with real data (or have a good understanding of the context) before deciding which early-exits are worth it, vs which ones are a net loss.

@recmo
Copy link

recmo commented Nov 28, 2021

It is worth mentioning that briefly, for example, "Always >= 1, because denominator > 0".

It is there in my blog post, which is linked from the code. Not sure why the comment is not in the code.

@recmo
Copy link

recmo commented Nov 28, 2021

As for "If twos is zero, then it becomes one", it should actually be the opposite: "If twos is one, then it becomes zero".

It should not, the comment is correct as is and documents how the code handles a division-by-zero (which is an important exceptional case to document for a division routine, although here it won't happen because twos != 0).

        // Shift in bits from prod1 into prod0. For this we need
        // to flip `twos` such that it is 2**256 / twos.
        // If twos is zero, then it becomes one
        assembly {
            twos := add(div(sub(0, twos), twos), 1)
        }

The evaluation for twos = 0 are

  • twos := add(div(sub(0, 0), 0), 1)
  • twos := add(div(0, 0), 1)
  • twos := add(0, 1)
  • twos := 1

So if twos = 0 then the result, the new value of twos will be 1, as documented.

@recmo
Copy link

recmo commented Nov 28, 2021

I would go further and use a new variable for this - something like twosInv, because there doesn't seem to be any good reason to "reuse" it (which just obfuscates the code IMO).

New variables allocate stack space and cost more gas.

@recmo
Copy link

recmo commented Nov 28, 2021

Finally, a minor coding error is that the function is declared as returns (uint256 result), and then ends with return result.

Matter of style; I strongly dislike implicit returns because they obfuscate what is happening.

But specifically for that reason, I would actually remove the variable name from the function declaration, and just declare it inside the function.

It changes the order in which Solidity allocates variables on the stack which (in this case) leads to gas savings.

So not a coding error but another stack-allocation-order optimization.

@barakman
Copy link
Author

barakman commented Nov 28, 2021

If twos is zero

Well, first off, I don't see how div(0, 0) would not revert on division by 0, although perhaps assembly div works differently than how I'd expect it, so I'm not going to argue on that.

However (and as I wrote in my initial post), there is a previous comment stating that variable twos is "Always >= 1".

And that comment is accurate (as I've also explained in my initial post, though I'm pretty sure that you know it already).

So even if div(sub(0, twos), twos) somehow doesn't revert for twos == 0, I don't quite understand the point in explaining how the code works for twos == 0, when we already know that twos >= 1 (hence never == 0).

Finally, a quick examination of the code behavior for twos == 1:

add(div(sub(0, twos), twos), 1) ==

add(div(sub(0, 1), 1), 1) ==

add(div(max_uint, 1), 1) ==

add(max_uint, 1) ==

0

Therefore, the comment "If twos is one, then it becomes zero" is the correct comment here IMO.

BTW, at the beginning of the code where you recalculate twos (and where the comment "If twos is zero, then it becomes one" appears), twos represents the highest power of 2 which divides denominator. So the value of twos at that point cannot be zero even on the mathematical aspect, as every integer is divisible by at least 1, which is a power of 2. In fact, 0 is not even a power of 2 to begin with, unless you consider minus infinity as a legitimate value in this context (more precisely, 2^x tends to 0 as x tends to minus infinity).

@barakman
Copy link
Author

barakman commented Nov 28, 2021

Finally, a minor coding error is that the function is declared as returns (uint256 result), and then ends with return result.

Matter of style; I strongly dislike implicit returns because they obfuscate what is happening.

But specifically for that reason, I would actually remove the variable name from the function declaration, and just declare it inside the function.

It changes the order in which Solidity allocates variables on the stack which (in this case) leads to gas savings.

So not a coding error but another stack-allocation-order optimization.

I am well aware of the stack reordering issue, and of the fact that a "pre-declared return variable" may help (though AFAIK, this is mostly used for working around the "stack too deep" issue rather than reducing gas cost).

However, I'm pretty sure that you want to use either one of the two options, and not a "mixture" of them.

As I wrote, I believe that the general idea for declaring the return-variable name, is when the function consists of a single return point (at the end of the function, obviously).

But this is a mere coding suggestion, so feel free to ignore it ;)

@barakman
Copy link
Author

barakman commented Nov 28, 2021

P.S.: This function is useful to an unimaginable extent, as it allows for a significant increase in the supported input range for a vast number of applications.

The alternative long-division version of it (which I once wrote), would allow for the output itself to be larger than 256 bits, but at the expense of a much higher gas consumption (while such output is not really needed in most cases).

P.S.2: With regards to the comment about "twos" being "always >= 1", you might want to add an explanation of why that is the case.

As I wrote in my initial post, at the point where this comment is mentioned, we know that twos >= 1 because we know that denominator > 0, and we set twos := denominator & -denominator.

The expression denominator & -denominator is evaluated to 0 if and only if denominator == 0.

Hence twos != 0, or in other words, twos >= 1.

So I believe that it is worth mentioning it briefly, for example, "twos is always >= 1, because denominator > 0".

@barakman
Copy link
Author

barakman commented Nov 28, 2021

I would go further and use a new variable for this - something like twosInv, because there doesn't seem to be any good reason to "reuse" it (which just obfuscates the code IMO).

New variables allocate stack space and cost more gas.

My 2 cents:

Gas cost of an additional variable is minor, while using the same variable in this specific case has a significant negative impact on readability.

Keep in mind that twos == 0 is an "invalid state" even on the pure mathematical aspect.

Every integer is divisible by at least 1, which is a power of 2, so twos can be 1, 2, 4, 8... but it can never be 0.

That's why I think that you should use a new variable at the point where you recalculate twos to be the inverse of itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants