Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import hook incompatible with tsx #12011

Open
3 tasks done
nwalters512 opened this issue May 13, 2024 · 24 comments
Open
3 tasks done

import hook incompatible with tsx #12011

nwalters512 opened this issue May 13, 2024 · 24 comments

Comments

@nwalters512
Copy link

Is there an existing issue for this?

How do you use Sentry?

Sentry Saas (sentry.io)

Which SDK are you using?

@sentry/node

SDK Version

8.0.0

Framework Version

No response

Link to Sentry event

No response

SDK Setup

Sentry.init({
  dsn: 'https://examplePublicKey@o0.ingest.sentry.io/0',
});

Steps to Reproduce

I've created a minimal reproduction here: https://github.com/nwalters512/sentry-v8-tsx-error-repro

  1. Clone the repository
  2. Install dependencies with yarn
  3. Run yarn tsx src/index.ts
  4. Observe that the process fails with the error TypeError [ERR_INVALID_URL_SCHEME]: The URL must be of scheme file

Several observations that are hopefully helpful to y'all:

  • This only occurs when "allowJs": true is set in tsconfig.json. As Sentry isn't concerned with this, that made me think this problem might actually be in tsx. However...

  • When adding instrumentation directly with the @opentelemetry/* packages, everything works fine. This is reproducible by making the following change to src/index.ts:

    -import './instrument-sentry.js';
    +import './instrument-opentelemetry.js';

    Given this, I'm strongly inclined to believe that this is an issue with the way in which Sentry is using OpenTelemetry.

  • This only breaks for core Node modules like util, fs, etc. Importing other modules works fine. For instance, the following change to src/index.ts makes it work without erroring:

    -await import('util');
    +await import('zod');
  • This only breaks for dynamic imports. For instance, the following change to src/index.ts makes it work without erroring:

    -await import('util');
    +import 'util';

Expected Result

I would expect the process to complete successfully and log util imported!.

Actual Result

The process errors out while importing util and does not print util imported!.

@timfish
Copy link
Collaborator

timfish commented May 14, 2024

The full stack trace is:

TypeError [ERR_INVALID_URL_SCHEME]: The URL must be of scheme file
    at new NodeError (node:internal/errors:406:5)
    at fileURLToPath (node:internal/url:1393:11)
    at finalizeResolution (node:internal/modules/esm/resolve:234:42)
    at moduleResolve (node:internal/modules/esm/resolve:845:10)
    at defaultResolve (node:internal/modules/esm/resolve:1043:11)
    at nextResolve (node:internal/modules/esm/hooks:833:28)
    at y (file:///Users/tim/Documents/Repositories/sentry-v8-tsx-error-repro/node_modules/tsx/dist/esm/index.mjs?1715684968701:2:2079)
    at j (file:///Users/tim/Documents/Repositories/sentry-v8-tsx-error-repro/node_modules/tsx/dist/esm/index.mjs?1715684968701:2:3198)
    at nextResolve (node:internal/modules/esm/hooks:833:28)
    at resolve (/Users/tim/Documents/Repositories/sentry-v8-tsx-error-repro/node_modules/@opentelemetry/instrumentation/node_modules/import-in-the-middle/hook.js:238:23) {
  code: 'ERR_INVALID_URL_SCHEME'
}

And here is the bottom of the call stack.

Maybe the loader hook added by import-in-the-middle is somehow clashing with the loader hooks added by tsx?

@timfish
Copy link
Collaborator

timfish commented May 14, 2024

Looking at the stack trace, although import-in-the-middle is at the bottom, further up we see frames from tsx:

    at y (file:///Users/tim/Documents/Repositories/sentry-v8-tsx-error-repro/node_modules/tsx/dist/esm/index.mjs?1715684968701:2:2079)
    at j (file:///Users/tim/Documents/Repositories/sentry-v8-tsx-error-repro/node_modules/tsx/dist/esm/index.mjs?1715684968701:2:3198)

I guess it's not handling the params the iitm adds to the url?

So I've opened an issue there:
privatenumber/tsx#554

@timfish timfish changed the title Sentry breaks dynamic import of core Node modules when running in tsx import hook incompatible with tsx May 14, 2024
@nwalters512
Copy link
Author

@timfish thanks for investigating this; your small reproduction that used just import-in-the-middle was very helpful. I'm relatively confident that the issue here is in fact with import-in-the-middle and the fact that it appends query parameters to the URL it returns from its resolvers. That is, consider this line:

https://github.com/DataDog/import-in-the-middle/blob/00b01fff1f5b69dd25e307593ec54d1d8abb4844/hook.js#L259

Changing that to just url: url.url makes your reproduction complete without error (though I have no idea what effect that has on import-in-the-middle actually working).

Specifically, it looks like the issue is that this URL with ?iitm=true ends up as the parentURL in a subsequent resolution, which you can see by adding console.log(specifier, context) immediately after https://github.com/privatenumber/tsx/blob/e2afc60bbcc299e09a5bf0e0c8909b6766b633a2/src/esm/hook/resolve.ts#L39:

file:///Users/nathan/git/tsx/node_modules/.pnpm/import-in-the-middle@1.7.4/node_modules/import-in-the-middle/lib/register.ts {
  conditions: [ 'node', 'import', 'node-addons' ],
  importAttributes: {},
  parentURL: 'node:util?iitm=true'
}

@timfish
Copy link
Collaborator

timfish commented May 14, 2024

Tracing the resolving of the otel hooking, it looks like ?iitm=true is used so that iitm knows that the resolving is actually coming from its hook rather than the original location.

Adding the query to the parent URL should be fine as this is still a valid URL.

For example, the following code runs fine under both node and tsx:

import { register } from "node:module";

register(
  new URL(
    `data:application/javascript;base64,${Buffer.from(
      `
    export async function resolve(specifier, context, nextResolve) {
      if (context.parentURL) {
        context.parentURL += "?some=query";
      }
      console.log("resolve", specifier, context);
      return nextResolve(specifier, context);
    }
`
    ).toString("base64")}`
  ),
  import.meta.url
);

await import("node:util");

and outputs:

resolve node:util {
  conditions: [ 'node', 'import', 'node-addons' ],
  importAssertions: {},
  parentURL: 'file:///Users/tim/Documents/repro/test.ts?some=query'
}

node:util?iitm=true is also a valid URL so I'm not fully convinced that's causing the issue yet!

@timfish
Copy link
Collaborator

timfish commented May 14, 2024

Changing that to just url: url.url makes your reproduction complete without error (though I have no idea what effect that has on import-in-the-middle actually working).

Ah, I just re-read this and it makes sense that this is the cause.

@nwalters512
Copy link
Author

I think you make have just realized this, but I'll finish this comment just so we're both on the same page. Your example is subtly different from what's happening in the failure case. In your example, parentURL is indeed still a valid file:// URL. However, in the failure case, it is not. This was easy to prove to myself:

const { fileURLToPath } = require('url');
fileURLToPath('node:util?foo=bar');

The above errors out exactly the same as the failure case with import-in-the-middle:

Uncaught TypeError [ERR_INVALID_URL_SCHEME]: The URL must be of scheme file
    at fileURLToPath (node:internal/url:1463:11) {
  code: 'ERR_INVALID_URL_SCHEME'
}

@timfish
Copy link
Collaborator

timfish commented May 14, 2024

You're right, but fileURLToPath('node:util') also throws with the same error!

I think the issue isn't the query string, it's more that import-in-the-middle results in the parentURL being the node built-in which I guess this should never normally be possible because it's not a valid URL.

@timfish
Copy link
Collaborator

timfish commented May 14, 2024

For example, when the minimal reproduction is run through plain old Node:

import { register } from "node:module";

register("import-in-the-middle/hook.mjs", import.meta.url);
await import("node:util");

The iitm resolve function with this added:

    console.log('resolve', [specifier, context.parentURL])

Results in:

resolve [
  'node:util',
  'file:///Users/tim/Documents/Repositories/sentry-v8-tsx-error-repro/test.js'
]
resolve [
  'file:///Users/tim/Documents/Repositories/sentry-v8-tsx-error-repro/node_modules/import-in-the-middle/lib/register.js',
  'node:util?iitm=true'
]
resolve [ 
  'node:util?iitm=true', 
  'node:util?iitm=true' 
]

@nwalters512
Copy link
Author

All great observations! I managed to make a very isolated reproduction: https://github.com/nwalters512/register-hook-playground

It uses neither tsx nor import-in-the-middle; it copies the bare minimum amount of behavior from import-in-the-middle to be able to reproduce:

node:internal/modules/run_main:129
    triggerUncaughtException(
    ^
TypeError [ERR_INVALID_URL_SCHEME]: The URL must be of scheme file
    at fileURLToPath (node:internal/url:1463:11)
    at finalizeResolution (node:internal/modules/esm/resolve:266:42)
    at moduleResolve (node:internal/modules/esm/resolve:933:10)
    at defaultResolve (node:internal/modules/esm/resolve:1157:11)
    at nextResolve (node:internal/modules/esm/hooks:866:28)
    at resolve (file:///Users/nathan/git/register-hook-playground/hook2.mjs:4:21)
    at nextResolve (node:internal/modules/esm/hooks:866:28)
    at Hooks.resolve (node:internal/modules/esm/hooks:304:30)
    at MessagePort.handleMessage (node:internal/modules/esm/worker:196:24)
    at [nodejs.internal.kHybridDispatch] (node:internal/event_target:825:20) {
  code: 'ERR_INVALID_URL_SCHEME'
}

It turns on that the query params were a red herring of sorts. The presence of it actually triggers another behavior in import-in-the-middle that produces synthetic source code for the module being loaded (in this case, node:util?iitm=true):

https://github.com/DataDog/import-in-the-middle/blob/00b01fff1f5b69dd25e307593ec54d1d8abb4844/hook.js#L266

This puts us in a situation that Node would normally never expect to encounter: parentURL refers to a builtin (node:import). You can reproduce this even more simply with the following hook code (in my repro as hook-simpler.mjs):

export async function resolve(specifier, context, parentResolve) {
  specifier = 'file:///dost-not-exist.mjs';
  context.parentURL = 'node:util';
  const url = await parentResolve(specifier, context, parentResolve);
}

This is obviously very contrived, but it reproduces the problem very simply: Node's parentResolve chokes when specifier is a a file:// URL and context.parentURL is not a file:// URL. This wasn't reproducible in your above example because Node seems to short-circuit if specifier is a core module, it never even tries to use context.parentURL in that case.

I'm feeling pretty confident that this is a bug in import-in-the-middle, as that's the one that ends up generating synthetic code for node:util that ends up trying to import some file:// thing. Do you want me to move this issue there?

@timfish
Copy link
Collaborator

timfish commented May 14, 2024

What I don't yet fully understand is why iitm works with Node as it's using the same parentURLs.

For example these are the logs from resolve in import-in-the-middle calling down to the default resolver:

defaultResolve [
  'node:util',
  {
    conditions: [ 'node', 'import', 'node-addons' ],
    importAttributes: {},
    parentURL: 'file:///Users/tim/Documents/Repositories/sentry-v8-tsx-error-repro/test.js'
  }
]
result [Object: null prototype] { url: 'node:util' }
defaultResolve [
  'file:///Users/tim/Documents/Repositories/sentry-v8-tsx-error-repro/node_modules/import-in-the-middle/lib/register.js',
  {
    conditions: [ 'node', 'import', 'node-addons' ],
    importAttributes: {},
    parentURL: 'node:util?iitm=true'
  }
]
result [Object: null prototype] {
  url: 'file:///Users/tim/Documents/Repositories/sentry-v8-tsx-error-repro/node_modules/import-in-the-middle/lib/register.js',
  format: 'commonjs'
}
defaultResolve [
  'node:util?iitm=true',
  {
    conditions: [ 'node', 'import', 'node-addons' ],
    importAttributes: {},
    parentURL: 'node:util?iitm=true'
  }
]
result [Object: null prototype] { url: 'node:util' }

It appears to handle the node: URLs without error!

From the stack trace, we can see that when tsx is involved, it's called after iitm, ie. iitm > tsx > default.

However, my console logs from tsx resolve aren't being outputted before the exception so it's really hard to see if/how the parameters are being modified before they hit the default resolver.

My best guess is that they're not getting flushed from the loader hooks thread and there's no obvious way to debug it otherwise.

@nwalters512
Copy link
Author

nwalters512 commented May 14, 2024

I think I have an answer: this only happens when the file that we're trying to resolve with node:util as its parent does not exist! You can make the following change to hook.mjs in https://github.com/nwalters512/register-hook-playground to demonstrate this:

-  const iitmURL = new URL('lib/register.mjs', import.meta.url).toString();
+  const iitmURL = new URL('register.mjs', import.meta.url).toString();

Note that the updated version points to a file that actually exists; when that happens, there's no error!

So... maybe this is a bug in Node? If nothing else, it's unexpected behavior. I think he expected behavior in the does-not-exist case is that we'd get a ERR_MODULE_NOT_FOUND, not ERR_INVALID_URL_SCHEME. This can be demonstrated with the following hook:

export async function resolve(specifier, context, parentResolve) {
  await parentResolve('file:///does-not-exist.mjs', context, parentResolve);
}

Which errors with the following:

Error [ERR_MODULE_NOT_FOUND]: Cannot find module '/does-not-exist.mjs' imported from /Users/nathan/git/register-hook-playground/index.mjs
    at finalizeResolution (node:internal/modules/esm/resolve:265:11)
    at moduleResolve (node:internal/modules/esm/resolve:933:10)
    at defaultResolve (node:internal/modules/esm/resolve:1157:11)
    at nextResolve (node:internal/modules/esm/hooks:866:28)
    at resolve (file:///Users/nathan/git/register-hook-playground/hook.mjs:2:9)
    at nextResolve (node:internal/modules/esm/hooks:866:28)
    at Hooks.resolve (node:internal/modules/esm/hooks:304:30)
    at MessagePort.handleMessage (node:internal/modules/esm/worker:196:24)
    at [nodejs.internal.kHybridDispatch] (node:internal/event_target:825:20)
    at MessagePort.<anonymous> (node:internal/per_context/messageport:23:28) {
  code: 'ERR_MODULE_NOT_FOUND',
  url: 'file:///does-not-exist.mjs'
}

Aside: tsx actually relies on ERR_MODULE_NOT_FOUND being thrown so that it can tell when files don't exist.

@nwalters512
Copy link
Author

nwalters512 commented May 14, 2024

So to summarize, the combination of tsx and import-in-the-middle means that we end in a situation with the following properties:

  • We're trying to resolve a file:///... URL
  • The file referenced by the file:/// URL does not exist
  • The context.parentURL is a node: (core) module

This is a perfect storm that produces unexpected behavior in Node's own resolution process.

@timfish
Copy link
Collaborator

timfish commented May 14, 2024

Ah got it, the error is being thrown from here. It's throwing while it's trying to create the error message:
https://github.com/nodejs/node/blob/87b87a8f6017545e2c9d6048d7b081f73c8a1072/lib/internal/modules/esm/resolve.js#L265

@timfish
Copy link
Collaborator

timfish commented May 14, 2024

Node tries to check if the parentURL is valid here, but it doesn't guard against node:* URLs and later just throws it into fileURLToPath to create the error message.

If we report this to Node they might want to throw on all node: parentURLs which would break import-in-the-middle 🤔

I'll open an issue there first and explain the detail...

@nwalters512
Copy link
Author

Excellent find! Sounds good; I'll check out the issue once you make it.

@timfish
Copy link
Collaborator

timfish commented May 14, 2024

Link to the node issues:
nodejs/node#52987

@nwalters512
Copy link
Author

I opened a PR to fix this in import-in-the-middle: DataDog/import-in-the-middle#76

@lilouartz
Copy link

Maybe it would be good idea for Sentry to fork iitm and launch it as their own package. DDs shipping cadedance has not been particularly frequent, and this is going to be a major bottleneck for ESM adoption.

@timfish
Copy link
Collaborator

timfish commented May 21, 2024

Maybe it would be good idea for Sentry to fork iitm and launch it as their own package

The plan is for it to move to the Node org:
nodejs/admin#858

@lilouartz
Copy link

Maybe it would be good idea for Sentry to fork iitm and launch it as their own package

The plan is for it to move to the Node org: nodejs/admin#858

Node org is equally slow to make releases.

If this is going to be a bottleneck to Sentry v8 adoption, you want to have more control over it.

It is small enough package to make little difference even if there are multiple versions of it maintained, i.e. a copy can be incubated under Sentry and if Node.js wants to merge upstream, let them do it.

@mydea
Copy link
Member

mydea commented May 21, 2024

Maybe it would be good idea for Sentry to fork iitm and launch it as their own package

The plan is for it to move to the Node org: nodejs/admin#858

Node org is equally slow to make releases.

If this is going to be a bottleneck to Sentry v8 adoption, you want to have more control over it.

It is small enough package to make little difference even if there are multiple versions of it maintained, i.e. a copy can be incubated under Sentry and if Node.js wants to merge upstream, let them do it.

Normally, I'd agree, it would be nice if we could fork it. But sadly, in this case we can't do this because import-in-the-middle is a dependency of all the other opentelemetry instrumentation, which would not use our fork :( So we need to make fixes upstream there and live with the slower cadence 😬

@timfish

This comment was marked as outdated.

@nwalters512
Copy link
Author

@timfish what ultimately fixed things for me was actually #12043; we don't use performance instrumentation, so the error went away when Sentry stopped trying to unnecessarily instrument modules. This is of course not a general solution, and yours might very well be a reasonable bandaid for anyone who is using both tsx and Sentry tracing while we wait for import-in-the-middle maintainers to do their thing!

@timfish
Copy link
Collaborator

timfish commented May 24, 2024

There are multiple outstanding PRs for import-in-the-middle (including the one from @nwalters512 🙏) that combined hopefully fix a wide range of issues.

Until they make it to a release, I've combined these into a patch that can be used with patch-package:
#12242 (comment)

If anyone can confirm that this fixes their issues that would be great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

6 participants