Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query doesn't work correctly when using read_callable #169

Open
MrPrezident opened this issue Oct 19, 2023 · 11 comments
Open

Query doesn't work correctly when using read_callable #169

MrPrezident opened this issue Oct 19, 2023 · 11 comments
Labels
documentation Missing or incorrect documentation
Milestone

Comments

@MrPrezident
Copy link

The read_callable feature doesn't seem to be working correctly. When I try to use it, the queries get messed up. Here is an example.

import tree_sitter

tree_sitter.Language.build_library(
  # Store the library in the `build` directory
  'build/my-languages.so',

  # Include one or more languages
  [
    'vendor/tree-sitter-python'
  ]
)

# Initialize parser and load Python grammar
parser = tree_sitter.Parser()
python_language = tree_sitter.Language('build/my-languages.so', "python")
parser.set_language(python_language)

# Initialize test query (match identifier with all caps)
query_text = '((identifier) @test (#match? @test "^[A-Z]$"))'
python_query = python_language.query(query_text)

# Original Source Code
orig_src  = b'def myfunc():\n    pass\n'

# Parse original tree and query
orig_tree = parser.parse(orig_src)
orig_captures = python_query.captures(orig_tree.root_node)

# Print results
print(f"orig_sexp:{orig_tree.root_node.sexp()}")
print(f"orig_text:{orig_tree.root_node.text}")
print(f"orig_captures:{orig_captures}")

# Edit Source Code (insert newline)
# new_src = b'def myfunc():\n\n    pass\n'
new_src = orig_src[:14] + b'\n' + orig_src[14:]

# Edit Tree
orig_tree.edit(start_byte=14, old_end_byte=14, new_end_byte=15,
               start_point=(1,0), old_end_point=(1,0), new_end_point=(1,1))

# Define parse callback to read source code
def read_callable_bytes(byte_offset, point):
    byte = new_src[byte_offset:byte_offset + 1]
    return byte

# Parse changes with full source code
new1_tree = parser.parse(new_src, orig_tree)
# Parse changes using callback
new2_tree = parser.parse(read_callable_bytes, orig_tree)

# Query
new1_captures = python_query.captures(new1_tree.root_node)
new2_captures = python_query.captures(new2_tree.root_node)

# Print results
print(f"new1_sexp:{new1_tree.root_node.sexp()}")
print(f"new1_text:{new1_tree.root_node.text}")
print(f"new1_captures:{new1_captures}")
print(f"new2_sexp:{new2_tree.root_node.sexp()}")
print(f"new2_text:{new2_tree.root_node.text}")
print(f"new2_captures:{new2_captures}")

Here is the output. The expectation is that there will be no matches on any of the queries. Notice that the new2_captures matched "myfunc" even though it is not capitalized. This only happens when using the read_callable callback.

orig_sexp:(module (function_definition name: (identifier) parameters: (parameters) body: (block (pass_statement))))
orig_text:b'def myfunc():\n    pass\n'
orig_captures:[]
new1_sexp:(module (function_definition name: (identifier) parameters: (parameters) body: (block (pass_statement))))
new1_text:b'def myfunc():\n\n    pass\n'
new1_captures:[]
new2_sexp:(module (function_definition name: (identifier) parameters: (parameters) body: (block (pass_statement))))
new2_text:None
new2_captures:[(<Node type=identifier, start_point=(0, 4), end_point=(0, 10)>, 'test')]
@MrPrezident
Copy link
Author

Actually no need to do the tree edit. I don't know why I included that in the example. Here is a simpler example showing an incorrect match:

# Initialize test query (match identifier with all caps)
query_text = '((identifier) @test (#match? @test "^[A-Z]$"))'
python_query = python_language.query(query_text)

# Original Source Code
orig_src  = b'def myfunc():\n    pass\n'

def read_callable_bytes(byte_offset, point):
    byte = orig_src[byte_offset:byte_offset + 1]
    return byte

# Parse original tree and query
orig_tree = parser.parse(read_callable_bytes)
orig_captures = python_query.captures(orig_tree.root_node)

# Print results
print(f"orig_sexp:{orig_tree.root_node.sexp()}")
print(f"orig_text:{orig_tree.root_node.text}")
print(f"orig_captures:{orig_captures}")

Output:

orig_sexp:(module (function_definition name: (identifier) parameters: (parameters) body: (block (pass_statement))))
orig_text:None
orig_captures:[(<Node type=identifier, start_point=(0, 4), end_point=(0, 10)>, 'test')]

@amaanq
Copy link
Member

amaanq commented Oct 21, 2023

In this case - a read callable does not have the ability to index/remember the source input, so a query match predicate can't execute on that

@MrPrezident
Copy link
Author

Is there a fundamental reason why read callable is not used for query? It seems inefficient to have to pass in the entire source code on every keystroke.

@MrPrezident
Copy link
Author

Also, can you point me to where the code is for handling the match predicate?

@amaanq
Copy link
Member

amaanq commented Oct 22, 2023

Is there a fundamental reason why read callable is not used for query? It seems inefficient to have to pass in the entire source code on every keystroke.

You cannot get the text at a specific point for the match to execute unlike using a buffer, sure, some read callbacks might be able to, but it's not a guarantee hence why it's omitted

Also, can you point me to where the code is for handling the match predicate?

// if there is no source, ignore the text predicates

MrPrezident added a commit to MrPrezident/py-tree-sitter that referenced this issue Oct 29, 2023
@MrPrezident
Copy link
Author

@amaanq let me know what you think of this PR. I tested this out and it works great for me. I don't see why this wouldn't work for any type of read callback.

MrPrezident added a commit to MrPrezident/py-tree-sitter that referenced this issue Oct 29, 2023
@MrPrezident
Copy link
Author

@amaanq any love or feedback on this PR?

@amaanq
Copy link
Member

amaanq commented Nov 13, 2023

Hey @MrPrezident I believe upstream (Rust/Wasm) doesn't keep text from sources that use a "read_callable" as well, so I'd like to implement it there first before here - I'm also a bit concerned about the performance cost about keeping the text in certain cases when conditionally switching the input based on external programmatic sources (e.g. a test in Rust will run forever until a timeout is reached - wouldn't that be expensive to keep? https://github.com/tree-sitter/tree-sitter/blob/6019b7c84cd5a7f580d57bb981315afd9afec826/cli/src/tests/parser_test.rs#L676

@MrPrezident
Copy link
Author

I believe upstream (Rust/Wasm) doesn't keep text from sources that use a "read_callable" as well, so I'd like to implement it there first before here

Hmm.. ok. I can try to look into that.

I'm also a bit concerned about the performance cost about keeping the text in certain cases when conditionally switching the input based on external programmatic sources

but it's not keeping the text, it's just keeping the callback, or am I missing something?

@amaanq
Copy link
Member

amaanq commented Nov 13, 2023

Well it collects the bytes from the callable right, that's what I meant sorry for not clarifying, like in that timeout test it's possible that it might accumulate a lot of bytes before the timeout is hit.

Maybe a flag in core to enable keeping the source text when using a callable would be nice

@MrPrezident
Copy link
Author

Yeah but that test you are talking about is doing parsing, not querying. I didn't make any changes that would affect what happens to the callback during parsing.

@ObserverOfTime ObserverOfTime added the bug Something isn't working label Feb 26, 2024
@ObserverOfTime ObserverOfTime added this to the 0.23 milestone May 10, 2024
@ObserverOfTime ObserverOfTime added documentation Missing or incorrect documentation and removed bug Something isn't working labels May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Missing or incorrect documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants