Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

** behaves like * inside or expression #104

Open
conartist6 opened this issue Mar 10, 2022 · 3 comments
Open

** behaves like * inside or expression #104

conartist6 opened this issue Mar 10, 2022 · 3 comments

Comments

@conartist6
Copy link

Expected:
pm('(**|x)')('a') is true
pm('(**|x)')('a/b') is true

Actual:
pm('(**|x)')('a') is true
pm('(**|x)')('a/b') is false

Most of the discussion of the technical aspects of this issue is in the now-closed #88.

@conartist6
Copy link
Author

In my particular case I was merging multiple glob expressions with `(${globs.join('|')})`. I was able to work around the issue by simply passing the array of patterns to picomatch, but ultimately I think this still a bug that needs to be fixed. I may eventually try to fix it by rewriting the picomatch parser, which seems to have a variety of internal inconsistencies at present.

@jonschlinkert
Copy link
Member

I may eventually try to fix it by rewriting the picomatch parser, which seems to have a variety of internal inconsistencies at present.

PR would be welcome, as long as it's passing all unit tests. I think there are several thousand from bash, minimatch, etc.

but ultimately I think this still a bug

There are opportunities to improve some of the matching with extglobs, since negative and positive lookbehinds were not available when I wrote this parser. My recommendation is that you take the patterns from your examples and show what they should look like if they were pure regular expressions. There are many limitations in ES regular expressions. We can't do atomic groups, we can't do proper conditionals, etc. Which makes it more challenging, but also the longer and more complicated the regex, the more branches we have and the more susceptible to catastrophic backtracking.

@conartist6
Copy link
Author

conartist6 commented Mar 10, 2022

Yep, it's great that there are so many tests, it will really make my life easier if I get into making the changes.

As to regex itself, I don't really see how ** is much different than * from that perspective. I mean, I do, it can potentially consume a lot more stuff before having to backtrack, but I think that's still more or less on the writer of the pattern.

Now I don't know if it is of any interest to you, but I actually wrote the only non-native (i.e. scripted) non-backtracking regex engine currently in the ecosystem: @iter-tools/regex. It's a bit sluggish though as it is scripted and doesn't implement the DFA optimization, which is to say that it may be in more than one state at a time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants