Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with handling preprocessor attributes #13

Open
genotrance opened this issue Nov 20, 2018 · 9 comments
Open

Issue with handling preprocessor attributes #13

genotrance opened this issue Nov 20, 2018 · 9 comments

Comments

@genotrance
Copy link

For example from libclipboard.h:

LCB_API clipboard_c *LCB_CC clipboard_new(clipboard_opts *cb_opts);

Here, LCB_API is a preprocessor attribute:

#ifdef LIBCLIPBOARD_BUILD_SHARED
#  ifdef clipboard_EXPORTS /* Defined by CMake when we're compiling the shared library */
#    define LCB_API LIBCLIPBOARD_DLL_EXPORT
#  else
#    define LCB_API LIBCLIPBOARD_DLL_IMPORT
#  endif
#  define LCB_LOCAL LIBCLIPBOARD_DLL_LOCAL
#else
#  define LCB_API
#  define LCB_LOCAL
#endif

This results in an ERROR:

(translation_unit 0 71
 (declaration 0 19
  (type_identifier 0 7)
  (identifier 8 19)
 )
 (expression_statement 20 67
  (pointer_expression 20 66
   (ERROR 21 27                             <=====
    (identifier 21 27)
   )
   (call_expression 28 66
    (identifier 28 41)
    (argument_list 41 66
     (math_expression 42 65
      (identifier 42 56)
      (identifier 58 65)
     )
    )
   )
  )
 )
)

LCB_API gets treated as a type rather than a marker.

@maxbrunsfeld
Copy link
Contributor

maxbrunsfeld commented Mar 5, 2019

Yeah, this is a tough one. We don't have a very good way to handle this kind of preprocessor usage without both

  1. maintaining a symbol table, and
  2. evaluating #include directives and doing file IO during parsing

I don't think either of these are compatible with incremental parsing. Maybe we could add some heuristic, where all-caps identifiers are treated specially in cases like these?

@maxbrunsfeld
Copy link
Contributor

Oops, I closed the wrong with that commit.

@calixteman
Copy link
Collaborator

Macros are really a big big pain... and can give some very messy asts.
I was wondering if it would be possible to combine extras and external:

  • a macro can be almost everywhere (so a good place is extras)
  • each time a valid macro name is encountered just query a registered callback to know if it's a macro
  • if it's a macro then if followed by a pair of parenthesis then try to parse the content and if it fails just put a macro_content. If no parenthesis just put a token with a type macro.
  • if not a macro then just consider it as a "normal" token

And as you proposed it could be interesting at least to have the possibility to set some callbacks on #include to parse the file and with another parser get all the #define....etc and then feed a data structure which could be used with the callbacks for macros.

@calixteman
Copy link
Collaborator

calixteman commented Jun 7, 2019

An other idea I had:

  • preprocess the file to replace macros name by "$ * len(name)":
    • the preprocessing could be made with a parser which gets valid identifiers from the code (out of comments & strings), get the ast, decide if identifiers are macros or not and if they're then replace them by the correct number of "$" (and track the spans for such ids)
  • run tree-sitter-c/cpp parser where /$+/ has been added in extras (could be "dollars" token)
  • when the ast is postprocessed the dollars tokens can be ignored or if the spans have been tracked then the dollars tokens can be identified by the correct macro identifiers.

Anyway it could be interesting in general (not necessarily limited to c/c++) to have this kind of magic token.
@maxbrunsfeld wdyt ?

@genotrance
Copy link
Author

I do something similar in nimterop - I run the file thru gcc -E and get the defines using -dD.

https://github.com/nimterop/nimterop/blob/9387de8638d7805130c2c9dc53e83541eb399f6b/nimterop/getters.nim#L201

@vdcow
Copy link

vdcow commented Nov 3, 2020

Is it possible to change the current implementation to at least cover cases like this

PROJECT_API some_type_t functionname(int param);

assuming that token closest to function name is a type. That's how emacs e.g. is dealing with this situation. When it doesn't fix the problem from the description it will at least fix the most common use case.

@maxbrunsfeld
Copy link
Contributor

Yeah, good point @vladimir-didenko; I agree that this is the most common case where the current grammar breaks. If I get a chance, I may try to figure out a way to handle this specific case better.

@ofrank123
Copy link

Hi, has any progress been made on this special case? If not, I'd be happy to look into it.

@apodolsk
Copy link

As an emacs user trying out treesitter support:

Unfortunately, I think that the crufty codebases that would most benefit from tree-sitter-derived performance improvements are also those that are most likely to have grown some non-ident-compatible macros. One common use case in our codebase is a macro for __attribute__((__warn_unused__)) on functions that return an error code.

A heuristic assumption that upper-case names are macros would work well for this code base, as bad as it feels. Thinking about alternatives: I would probably be happier teaching my editor about problem macro names as I encounter them than I would be teaching it how to correctly preprocess sources and handle #includes for the usual non-heuristic solution (and then thinking about the ccache-esque caching system that it'd probably need to have to avoid tanking edit perf).

For example, to get rtags set up, we maintain rtags binaries and .el files capable of running in our intentionally-ancient docker build env, postprocess our compile_commands.json to avoid things that its command parsing doesn't understand, and write glue code to manage actually running and talking to the rtags server from editors running outside our build container. Only a subset of people use rtags, so small problems with the setup wind up going unfixed for a long time, encouraging even fewer people to use it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants