Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certain globs ending with "non-word" characters fail to match #18

Open
pe8ter opened this issue Nov 18, 2018 · 7 comments
Open

Certain globs ending with "non-word" characters fail to match #18

pe8ter opened this issue Nov 18, 2018 · 7 comments
Labels

Comments

@pe8ter
Copy link

pe8ter commented Nov 18, 2018

Please describe the minimum necessary steps to reproduce this issue:

Run this Node.js script:

const nanomatch = require('nanomatch');
const reg = nanomatch.makeRe('é/**/*');
console.log(reg.test('é/foo.txt'));

What is happening (but shouldn't):

Output is false because the RegExp test fails.

What should be happening instead?

Output is true because the RegExp test succeeds.

What's happening

Here is the RegExp produced by nanomatch:

/^(?:(?:\.[\\\/](?=.))?é[\\\/]?\b(?!\.)(?:(?!(?:[\\\/]|^)\.).)*?[\\\/](?!\.)(?=.)[^\\\/]*?(?:[\\\/]|$))$/
                               **

The word boundary matcher (starred) is the culprit. This matcher requires that the end of the first part of the glob é is a word boundary. There are two problems with the matcher:

  1. According to ECMA-262, the set of characters that constitutes a word boundary is quite small, which is why é gets rejected as a word boundary. One solution is to add the Unicode flag u to the end of the RegExp. This is only a partial solution because...
  2. Directory names can end in odd characters like # for example. If you replace the é in this example with #, the test fails even with the Unicode flag.

Another odd behavior with this RegExp is that the first test here fails but the second test passes:

reg.test('é/foo.txt'); // false
reg.test('é/a/foo.txt'); // true

The Unicode flag would be a good addition to un-break certain consumers of this library (see gulpjs/gulp#2153), but given the above odd behavior and above problem (2), it seems there might be some other consideration necessary.

@jonschlinkert
Copy link
Member

thank you for looking into this and figuring out the issue! I've been working on getting these matching libs updated for the past few days, I'll get this fixed.

thanks!

@pe8ter
Copy link
Author

pe8ter commented Nov 19, 2018

Are you thinking about adding a "Unicode" option to makeRe, or will all of the generated RegExps have the Unicode u flag by default? Or are you doing something entirely different?

I wanted to get a sense of your thoughts because, depending on the solution, all the packages in the dependency chain from Gulp down to nanomatch may need to be updated.

@jonschlinkert
Copy link
Member

jonschlinkert commented Nov 19, 2018 via email

@pe8ter
Copy link
Author

pe8ter commented Nov 20, 2018

Could you give me high-level explanation about why there are so many globbing packages: anymatch, micromatch, nanomatch, picomatch, etc? What's different about them?

@jonschlinkert
Copy link
Member

No, I dont have time to do that. But the projects each have descriptions that answer your question, and there are readme documents that took a long time to write and were created for that purpose.

@pe8ter
Copy link
Author

pe8ter commented Nov 20, 2018

Fair enough.

@Silic0nS0ldier
Copy link

Using a unicode aware regex polyfill modified to include the exceptions noted seems like a feasible solution for this.

Given that the fix will greatly expand the allowed characters, locking this behind a configuration switch or semver major release would be a good idea (particularly given the widespread usage of nanomatch). Semver major would prevent accumulation of technical debt, however dependent libraries might be better off with the switch option (lots of semver major version bumps otherwise).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants