Query doesn't work correctly when using read_callable #169

MrPrezident · 2023-10-19T03:38:32Z

The read_callable feature doesn't seem to be working correctly. When I try to use it, the queries get messed up. Here is an example.

import tree_sitter

tree_sitter.Language.build_library(
  # Store the library in the `build` directory
  'build/my-languages.so',

  # Include one or more languages
  [
    'vendor/tree-sitter-python'
  ]
)

# Initialize parser and load Python grammar
parser = tree_sitter.Parser()
python_language = tree_sitter.Language('build/my-languages.so', "python")
parser.set_language(python_language)

# Initialize test query (match identifier with all caps)
query_text = '((identifier) @test (#match? @test "^[A-Z]$"))'
python_query = python_language.query(query_text)

# Original Source Code
orig_src  = b'def myfunc():\n    pass\n'

# Parse original tree and query
orig_tree = parser.parse(orig_src)
orig_captures = python_query.captures(orig_tree.root_node)

# Print results
print(f"orig_sexp:{orig_tree.root_node.sexp()}")
print(f"orig_text:{orig_tree.root_node.text}")
print(f"orig_captures:{orig_captures}")

# Edit Source Code (insert newline)
# new_src = b'def myfunc():\n\n    pass\n'
new_src = orig_src[:14] + b'\n' + orig_src[14:]

# Edit Tree
orig_tree.edit(start_byte=14, old_end_byte=14, new_end_byte=15,
               start_point=(1,0), old_end_point=(1,0), new_end_point=(1,1))

# Define parse callback to read source code
def read_callable_bytes(byte_offset, point):
    byte = new_src[byte_offset:byte_offset + 1]
    return byte

# Parse changes with full source code
new1_tree = parser.parse(new_src, orig_tree)
# Parse changes using callback
new2_tree = parser.parse(read_callable_bytes, orig_tree)

# Query
new1_captures = python_query.captures(new1_tree.root_node)
new2_captures = python_query.captures(new2_tree.root_node)

# Print results
print(f"new1_sexp:{new1_tree.root_node.sexp()}")
print(f"new1_text:{new1_tree.root_node.text}")
print(f"new1_captures:{new1_captures}")
print(f"new2_sexp:{new2_tree.root_node.sexp()}")
print(f"new2_text:{new2_tree.root_node.text}")
print(f"new2_captures:{new2_captures}")

Here is the output. The expectation is that there will be no matches on any of the queries. Notice that the new2_captures matched "myfunc" even though it is not capitalized. This only happens when using the read_callable callback.

orig_sexp:(module (function_definition name: (identifier) parameters: (parameters) body: (block (pass_statement))))
orig_text:b'def myfunc():\n    pass\n'
orig_captures:[]
new1_sexp:(module (function_definition name: (identifier) parameters: (parameters) body: (block (pass_statement))))
new1_text:b'def myfunc():\n\n    pass\n'
new1_captures:[]
new2_sexp:(module (function_definition name: (identifier) parameters: (parameters) body: (block (pass_statement))))
new2_text:None
new2_captures:[(<Node type=identifier, start_point=(0, 4), end_point=(0, 10)>, 'test')]

The text was updated successfully, but these errors were encountered:

MrPrezident · 2023-10-19T03:46:26Z

Actually no need to do the tree edit. I don't know why I included that in the example. Here is a simpler example showing an incorrect match:

# Initialize test query (match identifier with all caps)
query_text = '((identifier) @test (#match? @test "^[A-Z]$"))'
python_query = python_language.query(query_text)

# Original Source Code
orig_src  = b'def myfunc():\n    pass\n'

def read_callable_bytes(byte_offset, point):
    byte = orig_src[byte_offset:byte_offset + 1]
    return byte

# Parse original tree and query
orig_tree = parser.parse(read_callable_bytes)
orig_captures = python_query.captures(orig_tree.root_node)

# Print results
print(f"orig_sexp:{orig_tree.root_node.sexp()}")
print(f"orig_text:{orig_tree.root_node.text}")
print(f"orig_captures:{orig_captures}")

Output:

orig_sexp:(module (function_definition name: (identifier) parameters: (parameters) body: (block (pass_statement))))
orig_text:None
orig_captures:[(<Node type=identifier, start_point=(0, 4), end_point=(0, 10)>, 'test')]

amaanq · 2023-10-21T19:11:56Z

In this case - a read callable does not have the ability to index/remember the source input, so a query match predicate can't execute on that

MrPrezident · 2023-10-21T19:57:31Z

Is there a fundamental reason why read callable is not used for query? It seems inefficient to have to pass in the entire source code on every keystroke.

MrPrezident · 2023-10-21T21:18:20Z

Also, can you point me to where the code is for handling the match predicate?

amaanq · 2023-10-22T05:16:48Z

Is there a fundamental reason why read callable is not used for query? It seems inefficient to have to pass in the entire source code on every keystroke.

You cannot get the text at a specific point for the match to execute unlike using a buffer, sure, some read callbacks might be able to, but it's not a guarantee hence why it's omitted

Also, can you point me to where the code is for handling the match predicate?

py-tree-sitter/tree_sitter/binding.c

Line 1883 in da6716b

// if there is no source, ignore the text predicates

fixes tree-sitter#169

MrPrezident · 2023-10-29T16:40:28Z

@amaanq let me know what you think of this PR. I tested this out and it works great for me. I don't see why this wouldn't work for any type of read callback.

fixes tree-sitter#169

MrPrezident · 2023-11-09T15:32:35Z

@amaanq any love or feedback on this PR?

amaanq · 2023-11-13T04:46:11Z

Hey @MrPrezident I believe upstream (Rust/Wasm) doesn't keep text from sources that use a "read_callable" as well, so I'd like to implement it there first before here - I'm also a bit concerned about the performance cost about keeping the text in certain cases when conditionally switching the input based on external programmatic sources (e.g. a test in Rust will run forever until a timeout is reached - wouldn't that be expensive to keep? https://github.com/tree-sitter/tree-sitter/blob/6019b7c84cd5a7f580d57bb981315afd9afec826/cli/src/tests/parser_test.rs#L676

MrPrezident · 2023-11-13T05:24:22Z

I believe upstream (Rust/Wasm) doesn't keep text from sources that use a "read_callable" as well, so I'd like to implement it there first before here

Hmm.. ok. I can try to look into that.

I'm also a bit concerned about the performance cost about keeping the text in certain cases when conditionally switching the input based on external programmatic sources

but it's not keeping the text, it's just keeping the callback, or am I missing something?

amaanq · 2023-11-13T05:25:55Z

Well it collects the bytes from the callable right, that's what I meant sorry for not clarifying, like in that timeout test it's possible that it might accumulate a lot of bytes before the timeout is hit.

Maybe a flag in core to enable keeping the source text when using a callable would be nice

MrPrezident · 2023-11-13T15:44:58Z

Yeah but that test you are talking about is doing parsing, not querying. I didn't make any changes that would affect what happens to the callback during parsing.

MrPrezident mentioned this issue Oct 21, 2023

Can't get #match? to work tree-sitter/tree-sitter#2718

Closed

MrPrezident added a commit to MrPrezident/py-tree-sitter that referenced this issue Oct 29, 2023

keep read_callable callback for queries

87e0add

fixes tree-sitter#169

MrPrezident mentioned this issue Oct 29, 2023

keep read_callable callback for queries #172

Closed

MrPrezident added a commit to MrPrezident/py-tree-sitter that referenced this issue Oct 29, 2023

keep read_callable callback for queries

8d1b7ba

fixes tree-sitter#169

ObserverOfTime assigned amaanq Feb 26, 2024

ObserverOfTime added the bug Something isn't working label Feb 26, 2024

ObserverOfTime added this to the 0.23 milestone May 10, 2024

ObserverOfTime added documentation Missing or incorrect documentation and removed bug Something isn't working labels May 28, 2024

ObserverOfTime unassigned amaanq May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query doesn't work correctly when using read_callable #169

Query doesn't work correctly when using read_callable #169

MrPrezident commented Oct 19, 2023

MrPrezident commented Oct 19, 2023

amaanq commented Oct 21, 2023 •

edited

MrPrezident commented Oct 21, 2023

MrPrezident commented Oct 21, 2023

amaanq commented Oct 22, 2023

MrPrezident commented Oct 29, 2023

MrPrezident commented Nov 9, 2023

amaanq commented Nov 13, 2023

MrPrezident commented Nov 13, 2023

amaanq commented Nov 13, 2023 •

edited

MrPrezident commented Nov 13, 2023

Query doesn't work correctly when using read_callable #169

Query doesn't work correctly when using read_callable #169

Comments

MrPrezident commented Oct 19, 2023

MrPrezident commented Oct 19, 2023

amaanq commented Oct 21, 2023 • edited

MrPrezident commented Oct 21, 2023

MrPrezident commented Oct 21, 2023

amaanq commented Oct 22, 2023

MrPrezident commented Oct 29, 2023

MrPrezident commented Nov 9, 2023

amaanq commented Nov 13, 2023

MrPrezident commented Nov 13, 2023

amaanq commented Nov 13, 2023 • edited

MrPrezident commented Nov 13, 2023

amaanq commented Oct 21, 2023 •

edited

amaanq commented Nov 13, 2023 •

edited