Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tree-sitter-python is overly permissive with newlines #178

Open
agirardeau opened this issue Oct 28, 2022 · 3 comments
Open

tree-sitter-python is overly permissive with newlines #178

agirardeau opened this issue Oct 28, 2022 · 3 comments

Comments

@agirardeau
Copy link

The following code produces a syntax error in python due to the line break before the colon, but tree-sitter-python parses it as valid code:

def foo(x)
:
    return x + 2

This happens because \s is included in the extras parameter[1], telling tree-sitter to ignore whitespace (and therefore newlines) between any two characters.

Replacing \s by \t in extras causes tree-sitter-python to correctly reject newlines such as the above[2]. However, after doing so it longer escape newlines correctly inside brackets. Consider the following valid python:

a = (
  1 +
  2
)

This fails to parse because tree-sitter does not expect newlines at the end of lines 1 and 2. The scanner.cc logic to ignore line breaks inside bracket expressions depends on close bracket being a valid token[3], which it is not following an open paren or the plus operator.

Is disallowing arbitrary newlines in general while permitting them inside brackets something that is possible to accomplish with tree-sitter?

[1]

/[\s\f\uFEFF\u2060\u200B]|\\\r?\n/

[2] To avoid rejecting all empty lines we'd also have to replace module: $ => repeat($._statement) with something like module: $ => repeat(choice($._statement, /\r?\n/))

[3]

bool within_brackets = valid_symbols[CLOSE_BRACE] || valid_symbols[CLOSE_PAREN] || valid_symbols[CLOSE_BRACKET];

@agirardeau
Copy link
Author

Seems like the correct approach would be to track open paren/brace/brackets in the delimiter stack

@amaanq
Copy link
Member

amaanq commented Aug 16, 2023

Tracking brackets/parenthesis in the delimiter stack sounds a bit complex, but it might also solve this case

def foo(a):
    return (a.
bar)

PR welcome :)

@theHamsta
Copy link
Contributor

I thought the policy of tree-sitter was too parse all valid code, but also make the most sense out of invalid code. Though your examples show that also valid code gets rejected. If there's a solution for this issue that would fix the failure cases, but also parse some invalid cases, maybe that could get favored over something that would require more scanner logic with state.

module: $ => repeat(choice($._statement, /\r?\n/))

At the moment, ; is ignored. ; can be an alternative to a newline for a next statement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants