Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cyclic path in the parse tree #119

Open
casouri opened this issue Dec 13, 2022 · 6 comments
Open

Cyclic path in the parse tree #119

casouri opened this issue Dec 13, 2022 · 6 comments

Comments

@casouri
Copy link

casouri commented Dec 13, 2022

I encountered a case where a node's sibling's parent is the node itself. I don't know if this is a tree-sitter bug or a tree-sitter-c bug, but here it is:

For a program like this:

#ifdef A
struct a{
};

the parse tree is

(preproc_ifdef #ifdef name: (identifier)
 (struct_specifier struct name: (type_identifier)
  body: (field_declaration_list { }))
 ; #endif)

And if we go from the semicolon, first find its next sibling, which is the #endif, then the parent of #endif, and we are back at the semicolon.

To reproduce, try the following python script. (libtree-sitter-c.dylib needs to be in the same directory as the script.) I'm using tree-sitter-c built from latest master, and python's tree-sitter binding on PyPI, so 0.20.1. I first observed this in Emacs, and it uses tree-sitter v0.20.7.

from tree_sitter import Language, Parser
import tree_sitter

LANG = Language('libtree-sitter-c.dylib', 'c')
parser = Parser()
parser.set_language(LANG)

text = '''#ifdef A
struct a{
};
'''

tree = parser.parse(bytes(text, "utf8"))

root_node = tree.root_node

query = LANG.query('";" @cap')

captures = query.captures(tree.root_node)

semicolon = captures[0][0]

semicolon.next_sibling.parent == semicolon

which evaluates to True.

@maxbrunsfeld
Copy link
Contributor

maxbrunsfeld commented Dec 14, 2022

I think this is a bug in the ts_node_parent method, that occurs when a node has zero extent (such as the missing #endif token in this tree). It's pretty bad; it'll require making ts_node_parent a bit more complex to fix.

For now, my suggested workaround is to not use the ts_node APIs that walk up the syntax tree (the sibling and parent APIs). They have never been very efficient anyway, and for code that needs to be performant, you should walk the tree using a tree cursor.

I've also considered simply removing the parent and sibling APIs, so that the tree cursor would be the only way to walk the tree in an upward fashion.

@casouri
Copy link
Author

casouri commented Dec 14, 2022

Thanks. Pardon my ignorance, but I wonder how could you traverse the parse tree backwards with a cursor? That would be useful for, eg, searching for a function_definition node before the cursor.

BTW, just to make sure I'm not missing anything, if I want to traverse the parse tree and find a function_definition node, I would write something like this, right? Ie, you need to create a node with ts_tree_cursor_current_node in every step.

TSNode node;
...traverse in a loop {
  node = ts_tree_cursor_current_node(cursor)
  (check for node's type)
}

@maxbrunsfeld
Copy link
Contributor

Yes, exactly. The tree cursor lets you retrieve nodes as you walk the tree.

@casouri
Copy link
Author

casouri commented Dec 14, 2022

Thanks, but how about traversing backwards in the tree with a cursor? That would be useful for, eg, searching for a function_definition node before the cursor.

@casouri
Copy link
Author

casouri commented Dec 14, 2022

Ok, I looked at the source of ts_node_prev_sibling, it seems the only way to traverse backwards is to traverse forwards and stop at the node just before the starting node. I don't have any questions now. Thanks :-)

@jdtsmith
Copy link

In at least one context — emacs with python-ts-mode, navigating from a node at point up to root — ts_node_parent is quite a bit faster and scales much better (logarithmic with line number) than ts_tree_cursor_goto_parent, the latter growing in time used by over 100x from beginning to end of a large (>8k line) file.

Test results here. Perhaps the current emacs 29 implementation of node-parent using cursors is sub-optimal in some way?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants