-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
yul literal value as struct of u256 + optional formatting hint #15112
base: develop
Are you sure you want to change the base?
Conversation
e1c8cd3
to
3ca7a0a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a complete review yet. Halfway through reading it I started realizing that I thought I knew how the hint and the unlimited flag worked but I actually didn't. I'm more and more convinced that this flag is not exactly the right solution here and we need to change it a bit. It's not really a property of the literal itself. But see my comments for details.
And some style nitpicking :)
libyul/AST.h
Outdated
LiteralValue(Data const& _data, std::optional<std::string> const& _hint = std::nullopt, bool _unlimited = false): | ||
m_data(_data), m_hint(_hint ? std::make_shared<std::string>(*_hint) : nullptr), m_unlimited(_unlimited) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assert here that the hint matches the data if present?
formatLiteral()
ignores m_data
when m_hint
is present so we must ensure that these two are always consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I'd recommend this style:
LiteralValue(Data const& _data, std::optional<std::string> const& _hint = std::nullopt, bool _unlimited = false): | |
m_data(_data), m_hint(_hint ? std::make_shared<std::string>(*_hint) : nullptr), m_unlimited(_unlimited) {} | |
LiteralValue(Data const& _data, std::optional<std::string> const& _hint = std::nullopt, bool _unlimited = false): | |
m_data(_data), | |
m_hint(_hint ? std::make_shared<std::string>(*_hint) : nullptr), | |
m_unlimited(_unlimited) | |
{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm asserting it in this place is going to be difficult without knowing the LiteralKind. That is what determines the validity in the end. I have added a check to formatLiteral
in the utilities, though, so that the hint isn't blindly taken but it is compared to the stored u256 value.
Alternatively, we could think about storing the LiteralKind
in the LiteralValue
as well. Then an assert in the constructor is possible and failure would occur earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it really feels like LiteralKind
should be inside LiteralValue
, because the value is ambiguous without it. If would let us get rid of a lot of redundant asserts that we now have to do at every step to ensure the values are valid.
But I think it would require changing a lot more code so perhaps better to leave at is for now and just finish the PR in the current state.
libyul/Utilities.cpp
Outdated
if (_literal.kind == LiteralKind::Boolean) | ||
{ | ||
yulAssert(_literal.value() == 0 || _literal.value() == 1, "Could not format boolean literal"); | ||
result = _literal.value() == 1 ? "true" : "false"; | ||
} | ||
else | ||
result = _literal.value().str(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like you're formatting strings and integers the same way. How can that work? I.e. how does it preserve the distinction between 1234
and "1234"
?
Actually, does the hint include the quotes? Because if it does not, then I can't see how we can distinguish values like '1234'
and "1234"
or '1234'
and hex'1234'
.
Also, if there's no distinction, the value cannot be unambiguously recovered from the hint (which matters if we want to enforce that they match).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, it would be good to have those examples as test cases if we don't have them somewhere already.
I'd also add something like this:
object "A" {
code {
pop(datasize(hex'616263'))
}
data 'abc' "1234"
}
I actually only now realized that we're allowing anything other than plain strings for literal arguments, not sure we even have it covered (only for arguments though, not in data
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The quoting and escaping of string literals is done in AsmPrinter
, so the LiteralKind
is also always required to unambiguously print and/or recover a literal. In your example you'd get
object "A" {
code {
pop(datasize(hex'616263'))
}
data 'abc' "1234"
}
// ----
// step: disambiguator
//
// object "A" {
// code { pop(datasize("abc")) }
// data "abc" hex"31323334"
// }
as for almost ambiguous literals
{
let a := 1234
let a_hex := 0x4d2
let b := "1234"
let b_hex := "0x4d2"
let c := '1234'
let c_hex := '0x4d2'
let d := hex"1234"
let d_hex := hex"04d2"
}
// ----
// step: disambiguator
//
// {
// let a := 1234
// let a_hex := 0x4d2
// let b := "1234"
// let b_hex := "0x4d2"
// let c := "1234"
// let c_hex := "0x4d2"
// let d := "\x124"
// let d_hex := "\x04\xd2"
// }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, so we're actually not preserving the representation perfectly? That's surprising. I had the impression that the AST allowed us to mostly reproduce the original source aside from comments and whitespace and that the representation hint was meant to help with that, but apparently not. I see from your output that it doesn't even keep hex numbers as hex so it seems kinda useless.
I wonder what we'd really lose by removing it. I wouldn't do it in this PR, but maybe we should consider that later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agree, this can probably be removed and hopefully make for a much cleaner data structure without losing too much
b9373bf
to
f712d53
Compare
libyul/Utilities.cpp
Outdated
bool solidity::yul::validBoolLiteral(solidity::yul::Literal const& _literal) | ||
{ | ||
switch (_literal.kind) | ||
if (_literal.kind == LiteralKind::Boolean && !_literal.value.unlimited()) | ||
{ | ||
case LiteralKind::Number: | ||
return valueOfNumberLiteral(_literal); | ||
case LiteralKind::Boolean: | ||
return valueOfBoolLiteral(_literal); | ||
case LiteralKind::String: | ||
return valueOfStringLiteral(_literal); | ||
default: | ||
yulAssert(false, "Unexpected literal kind!"); | ||
return _literal.value.value() == true || _literal.value.value() == false; | ||
} | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're not validating the hint for boolean literals.
I see lots of checks for equality with "true"
or "false"
in the old code so I'm not sure why a lack of this check is not breaking anything. Are we missing test coverage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LiteralValue instances are only created via valueOfLiteral
calls - that would take care of it already. But I agree, this should be checked!
libyul/Utilities.cpp
Outdated
else | ||
if (_literal.kind == LiteralKind::Boolean) | ||
{ | ||
yulAssert(_literal.value.value() == 0 || _literal.value.value() == 1, "Could not format boolean literal"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
formatLiteral()
only has this single assert but I think there are more things you should have sanity checks against. For example that an unlimited literal is of the string kind or that the hint for boolean is actually true
or false
.
Or, if you define a validLiteral()
function that executes the right validation based on the kind
you could simply assert it at the beginning of the function. I've seen quite a few places where such a combined check would simplify assertions. AsmPrinter::operator()(Literal)
is one example.
You might also use it at the end of valueOfLiteral()
(or maybe better in each specific function, since they can be called individually). I see that you removed the original assertions from them but they seem still relevant to me. Or are the functions now used on invalid literals as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was indeed using them on invalid literals so that the AsmAnalysis
would produce proper errors for them instead of failing asserts: Having an invalid literal and then trying to format it into a string which then asserts validity is bound to go wrong.
But yeah, I have added a validLiteral
method as you suggested and it does make the code cleaner :) for the AsmAnalysis
reason I made it an optional (but default: on) assert in formatLiteral
.
if (_lhs.value.unlimited() || _rhs.value.unlimited()) | ||
{ | ||
if (_lhs.value.unlimited() && _rhs.value.unlimited()) | ||
return _lhs.value.builtinStringLiteralValue() == _rhs.value.builtinStringLiteralValue(); | ||
else | ||
{ | ||
bool const valid = validStringLiteral(_lhs) && validStringLiteral(_rhs); | ||
// the string literals are both valid, ie <32 chars | ||
return valid && _lhs.value.value() == _rhs.value.value(); | ||
} | ||
} | ||
|
||
return _lhs.value == _rhs.value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This compares value()
between normal and unlimited strings. For syntactic equality, shouldn't we be comparing the string representations instead? I think that the previous version would have failed for different string representations of the same value and yours won't.
And value()
comparison will never be true between a normal and unlimited string (assuming you the value of unlimited is nullopt
). In fact an assert will fail if you run it on an unlimited string.
I wonder why none of this resulted in test failures. We're probably missing coverage here and if so, we should add a test that will trigger this code path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, shouldn't invalid strings longer than 32 chars be syntactically equal to unlimited strings with the same representation? They will look identical in the source and have identical AST representation. And you already do consider two identical invalid strings to be equal so it would make sense to do the same thing for invalid compared with valid unlimited.
Can you even get an invalid string here? Maybe this should be an assert instead? Such a situation may only happen if you take an AST where a function call argument is a normal string on a position where an unlimited string is expected or the other way around. The parser after your changes won't ever produce anything like this so this could only happen if someone somehow imported doctored Yul AST with this exact node configuration - and we don't support Yul AST import right now. And if we did, I'd expect such AST to fail validation during import. We should probably just have a validLiteral()
assert at the beginning of this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused about this one. There's a test like
{
f()
g()
function f() { mstore(0x01, mload(0x00)) }
function g() { mstore(1, mload(0)) }
}
// ----
// step: equivalentFunctionCombiner
//
// {
// f()
// f()
// function f()
// { mstore(0x01, mload(0x00)) }
// function g()
// { mstore(1, mload(0)) }
// }
which uses the SyntacticallyEqual
under the hood. Now clearly f
and g
are not syntactically equal. If I change the implementation to compare the syntax (somewhat at least) by return formatLiteral(_lhs) == formatLiteral(_rhs);
, this test will fail as the expectation is no longer met.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This component is not checking for strict syntactical equality but rather checks whether two things are semantically equal while evaluating syntactical equality - but doesn't really check full semantic equality either. :D I have added a statement in the header that literals are compared based on their value and not based on their string representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a really a syntax-based comparison that may only compare equal for things that are semantically equivalent :-). (While comparing unequal for semantically equivalent cases is not invalid, although the more semantically equivalent cases compare equal the better.)
Or more precisely it's syntactic comparison modulo renamings and literal formatting.
libyul/AsmAnalysis.cpp
Outdated
std::vector<YulString> AsmAnalyzer::operator()(Literal const& _literal) | ||
{ | ||
expectValidType(_literal.type, nativeLocationOf(_literal)); | ||
if (_literal.kind == LiteralKind::String && _literal.value.str().size() > 32) | ||
if (_literal.kind == LiteralKind::String && !validStringLiteral(_literal)) | ||
m_errorReporter.typeError( | ||
3069_error, | ||
nativeLocationOf(_literal), | ||
"String literal too long (" + std::to_string(_literal.value.str().size()) + " > 32)" | ||
"String literal too long (" + std::to_string(formatLiteral(_literal).size()) + " > 32)" | ||
); | ||
else if (_literal.kind == LiteralKind::Number && bigint(_literal.value.str()) > u256(-1)) | ||
else if (_literal.kind == LiteralKind::Number && !validNumberLiteral(_literal)) | ||
m_errorReporter.typeError(6708_error, nativeLocationOf(_literal), "Number literal too large (> 256 bits)"); | ||
else if (_literal.kind == LiteralKind::Boolean) | ||
yulAssert(_literal.value == "true"_yulstring || _literal.value == "false"_yulstring, ""); | ||
yulAssert(validBoolLiteral(_literal)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small problem here is that if we ever add any extra checks to those validation functions, the messages will no longer match the errors.
We could make the messages more generic but maybe a better way to do this would be to keep the original checks and just assert validLiteral(_literal)
after them to make sure that nothing slipped through?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that makes sense. I pulled them out into the utilities functions at some iteration of the PR because they became a bit lengthy but now it doesn't seem too bad.
libyul/AsmAnalysis.cpp
Outdated
if (std::get<Literal>(arg).value.empty()) | ||
static u256 const empty {valueOfStringLiteral("").value()}; | ||
auto const& literalValue = std::get<Literal>(arg).value; | ||
if ((literalValue.unlimited() && literalValue.builtinStringLiteralValue().empty()) || (!literalValue.unlimited() && literalValue.value() == empty)) | ||
m_errorReporter.typeError( | ||
1844_error, | ||
nativeLocationOf(arg), | ||
"The \"verbatim_*\" builtins cannot be used with empty bytecode." | ||
); | ||
} | ||
|
||
argTypes.emplace_back(expectUnlimitedStringLiteral(std::get<Literal>(arg))); | ||
bool const isUnlimitedLiteralArgument = [&]() { | ||
if (BuiltinFunction const* f = m_dialect.builtin(_funCall.functionName.name)) | ||
return f->literalArguments.size() >= i && f->literalArguments.at(i-1).has_value(); | ||
return false; | ||
}(); | ||
argTypes.emplace_back(expectStringLiteral(std::get<Literal>(arg), isUnlimitedLiteralArgument)); | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that you're handling both normal and unlimited strings here even though here we know we're analyzing a call with unlimited literal arguments. Given that in the parser you made sure to use unlimited LiteralValue
for such calls, it would be perfectly fine to just assert here that the argument is such a value and make the code simpler with that assumption.
In fact, if the parsing is wrong in some way and misses some cases, being too liberal here will let them slip through and we won't discover them easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, isUnlimitedLiteralArgument
is redundant anyway. It will always be true here because we're inside this overcomplicated condition that already checked that this is the case:
if (
auto literalArgumentKind = (literalArguments && i <= literalArguments->size()) ?
literalArguments->at(i - 1) :
std::nullopt
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, done that:)
…limited' - unlimited meaning a specific string literal argument of a builtin function
8658ad2
to
617efd2
Compare
Changed the definition of the
solidity::yul::Literal
to carry its value not asYulString
but asLiteralValue
struct, consisting of au256
and an optional string representation hint. Upon converting from its data back to a string representation, it is first checked if the hint is not empty and in that case, whethervalue == parseValue(hint)
.