Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(language-support): add more tree-sitter grammars #1589

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

CrossR
Copy link
Member

@CrossR CrossR commented Apr 10, 2020

This brings in the in-progress tree-sitter updates, as well as some new parsers.

I'm still getting a build error for reason-tree-sitter on Linux (not Mac) and I'm not so sure why, since it compiles and runs fine on my Manjaro box. I should give my Ubuntu VM a try I guess...

Like my Fzy one, this is more a test of the pipeline than a complete PR right now. Once the upstream stuff is more concrete, I can start working on this more.

This does technically work right now, but the syntax highlights look a little bare in Python and C files (I should compare against Atom thinking about it...)

EDIT: Similar to the C bits, need to check/update licenses for the tree-sitter CSON files.

Comment on lines 36 to 47
let _hasTreeSitterScope = (configuration, scope: string) => {
let treeSitterEnabled =
Core.Configuration.getValue(c => c.experimentalTreeSitter, configuration);

if (!treeSitterEnabled) {
false;
} else {
switch (scope) {
| "source.json" => true
/* | "source.c" => true
| "source.cpp" => true */
| _ => false
};
};
};

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped since I don't think we need this and a getParser (or something similar), so we may as well just do the config look up inline. The config lookup in the current version is actually duplicated for some reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ya, I just saw this too (working on #1626). Seems reasonable to me, and should be easy to merge with #1626 👍

Comment on lines +36 to +48
let getParserFromScope = language =>
switch (language) {
| "source.json" => Some(Parser.json())
| "source.c" => Some(Parser.c())
| "source.cpp" => Some(Parser.cpp())
| "source.python" => Some(Parser.python())
| "source.js" => Some(Parser.javascript())
| "source.ts" => Some(Parser.typescript())
| "source.tsx" => Some(Parser.tsx())
| _ => None
};

let create = (~theme, ~scopeConverter, ~parser, lines: array(string)) => {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inline right now more just for ease of debugging, since I can't link against reason-tree-sitter (it complains about inconsistent usage of editor-core-types).

This doesn't feel like the place for this to live...but also since the scopes aren't tree-sitter scopes (that I can see? They seem to come from the VScode extensions), I don't especially know a nice place for them to live.

It could go into reason-tree-sitter and be renamed to something more specific i.e. getParserFromXScope where X is whatever this scope is, unless someone has a better idea than me for this.

I originally was going to have it in reason-tree-sitter just so the updates for parser could be self-contained, and adding a parser was just a case of adding it to the lib and bumping the version in Oni2....that got ruined when I realised regardless you need to add the corresponding scope converter in this repo anyways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could go into reason-tree-sitter and be renamed to something more specific i.e. getParserFromXScope where X is whatever this scope is, unless someone has a better idea than me for this.

I agree - I think, in retrospect, since the only place we can link this is in esy-tree-sitter... it makes sense to have them in reason-tree-sitter. The nice thing is we could add tests for the particular syntax parsers and scopes.

that got ruined when I realised regardless you need to add the corresponding scope converter in this repo anyways.

Right! I was thinking... for now... we could also embed the scope converters as just JSON in reason-tree-sitter, like:

let jsonScopeConverter = {|
{ .. }
|};

And then your getParserFromXScope function could know about both the actual parser, and the scope converter. Plus we could add tests for individual languages right in reason-tree-sitter.

I also don't think this would preclude having dynamically-loaded parsers down the road... we could have a global method in reason-tree-sitter that could be like:

registerTreeSitter(~scopeName, ~scopeConverterJSON, ~parserDynamicLibraryPath)

and just add to our built-in set of tree-sitters...

@CrossR
Copy link
Member Author

CrossR commented Apr 10, 2020

Yeah, it looks like some part of the grammar or theme is missing (tested across gruvbox and the default theme):

image

The colour for actual data types in the C grammar seems to be wrong, if compared to the atom one. (I'm using code from one of the tree-sitter talks, and atom matches the talk to my knowledge).

I guess the thing to do here is go back to reason-tree-sitter and check the tests are complete enough to pick this sort of stuff up, to see if its grammar or theme issue.

The changes I've made to the C grammar are just updating to the latest tree-sitter (at the time, I'm probably behind a bit now), and pulling in the latest grammar at the time.

EDIT: Actually, I think the fact that the same bits are missing from all (i.e. every type), the parser must be working. I should check that the scope matching is happy (both that its working and what its linking from/to), and check the themes we have do have the thing its linking to.

@CrossR
Copy link
Member Author

CrossR commented Apr 11, 2020

When I look at the matching code (which we take the CSON from Atom and convert to JSON for here) I see the following in the C one:

    "type_identifier": "support.storage.type",

...which doesn't appear in any themes we use. storage.type does appear, but I can't see if/where the conversion between the two would occur.

@CrossR
Copy link
Member Author

CrossR commented Apr 15, 2020

Looking around, I can't tell why there are certain parts missing from some themes, especially when its something so basic as types. The textmate grammars must be giving more specific scopes?

@bryphe, do you have any idea what would be causing the issue I'm seeing (missing syntax hls for lots of groups, in the case of the C grammar function defs and types)? I can only assume that the scopes are different for the textmate bits, and/or we need to do something more specific than just convert the atom CSON files blindly.

Or perhaps they are fine, and we need to update the scope matching to be more forgiving? I'm a bit confused, so wasn't sure if someone who has poked around the themes more would have a better idea.

Barring that (and the linux build issue I need to debug), hooking up alternative tree-sitter parsers is pretty easy.

@bryphe
Copy link
Member

bryphe commented Apr 17, 2020

@bryphe, do you have any idea what would be causing the issue I'm seeing (missing syntax hls for lots of groups, in the case of the C grammar function defs and types)? I can only assume that the scopes are different for the textmate bits, and/or we need to do something more specific than just convert the atom CSON files blindly.

There might be some cases where we're not properly converting scopes correctly, ie, from the C file, I'm not sure we handle this right:

  '''
  call_expression > identifier,
  call_expression > field_expression > field_identifier,
  function_declarator > identifier
  ''': 'entity.name.function'

I think what might make this easier to troubleshoot is if we had a nice way of testing the flow of block of source code -> tree sitter parsing -> syntax highlights.

I think we could get there by:

  • Moving the tree-sitter scopes file https://github.com/onivim/oni2/blob/master/src/Syntax/TreeSitterScopes.re to reason-tree-sitter (this might mean reason-tree-sitter needs to be depend on reason-textmate, but that should be OK)
  • Move the scope mapping JSON files in reason-tree-sitter along with the parser
  • Add a test case that parses the source code into syntax tokens, and then maps into textmate, and then validate the textmate tokens.

Those sort of test cases might help us isolate and figure out exactly what rule is going wrong.

One other tool that would be helpful is a way to visualize the tokens at a cursor position...

@bryphe
Copy link
Member

bryphe commented Apr 17, 2020

Oh, another thing we could look at is the new syntax highlighting support directly in tree-sitter: https://tree-sitter.github.io/tree-sitter/syntax-highlighting

I haven't had a chance to dive into that much, though, but looks promising

@CrossR
Copy link
Member Author

CrossR commented Apr 19, 2020

I've started to move over the bits you've said over now in the matching PR over in reason-tree-sitter: onivim/reason-tree-sitter#16

I'm planning on adding some basic tests, one to target the code snippet in my screenshots/a subset of it, and then some to target the weirder logic we may not be parsing correctly.

@glennsl glennsl changed the title [WIP] Tree-sitter Updates [WIP] feat(language-support): add more tree-sitter grammars May 5, 2020
@glennsl
Copy link
Member

glennsl commented May 5, 2020

I've updated the title to be more descriptive and end-user friendly, to prepare it for being included in the change log. Not entirely sure how accurate it is though.

@glennsl glennsl added the WIP label May 13, 2020
@glennsl glennsl changed the title [WIP] feat(language-support): add more tree-sitter grammars feat(language-support): add more tree-sitter grammars May 13, 2020
bryphe added a commit that referenced this pull request Aug 17, 2020
This moves in `reason-tree-sitter` to this Onivim 'monorepo' - the goal being to bring in `EditorCoreTypes` so that we can better represent byte/character indices in the type system.

This will also let us test tree-sitter changes more easily against Onivim - sorry this collides with #1589 @CrossR - hopefully it won't be too bad to merge (and make it easier for us to iterate / test tree-sitter changes in Onivim)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants