Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

global module code that is required is executed twice #430

Open
luxzeitlos opened this issue Feb 18, 2018 · 4 comments
Open

global module code that is required is executed twice #430

luxzeitlos opened this issue Feb 18, 2018 · 4 comments

Comments

@luxzeitlos
Copy link
Contributor

This is a rather complex problem, and I've spend hours debugging it. But basically the problem is, that if a dependency is requireing a file in the app folder, this file will get executed twice, with a different context, and so this will also break equality.

I've encountered the problem while trying to integrate TypeORM. Basically you specify a glob pattern for TypeORM to find the entities, and then it will call require on them. This will now execute a decorator on each entity (because the decorator is executed when the class is declared) which writes metadata to TypeORM and saves it on the global object.
Now when importing (not requireing) this entity from within a denali action, this will execute the same file again. This means that a === check done by TypeORM on the class saved to the global object and the class now avaliable in my denali action will fail. This is actually logically because we now have two identical classes, but they are not the same!

I'm also not sure if one of the classes is from the splash-server.bundle.js file and one not.

I'm also not 100% sure if this can be fixed by denali. It would mean that mean that the loader may never call the native require, but instead shim it over all dependencies, and if it ever encounters a path inside the app module inject its non-standard behaviour. The only alternative I see is not to use a custom loader at all.

I think for my TypeORM integration I can tweak around this, but if this behaviour remains, it should be a big red warning somewhere in the docs. Also it could be possible, for debug purpose only, to inject some kind of check into every file. Something like this allows to detect a double-load:

{
  const fileSymbol = Symbol.for('/app/models/bar');
  if(global[fileSymbol]) {
    throw 'double loading file';
  }
  global[fileSymbol] = true;
}
@davewasmer
Copy link
Collaborator

Interesting. Yeah, I hadn't anticipated the use case of have a third party dependency try to directly require stuff from the app. The issue you're hitting, as it seems you've realized, is that when you execute your app code, you are actually executing a compiled bundle file, not the individual files in your dist folder. So it looks like the same file is executing twice, but actually, one is the real, on-disk file, and the other is the copy that has been inlined into your bundle.

The main reason for the bundle is to allow for parallelized tests. Parallelizing tests is notoriously difficult because any shared state leads to confusing errors as one test ends up asserting against another test's state.

The bundle lets us skirt the problem because it turns your entire app source code into a factory function. Each time we spin up a new test, we spin up a new copy of your source code by executing the factory function again. That ensures that it's borderline impossible to share state, making parallel tests automatic and out-of-the-box.

But it also means that you end up with these kinds of problems. And while your Symbol-based double load detection would be awesome, unfortunately, it would likely break tests (since we actually want to allow the same "file" to be executed multiple times per process).

I see a few options going forward:

  1. We leave things as-is, and simply say things like TypeORM's requireing aren't supported. We'd basically be saying that Denali completely "owns" the file system structure / build process, and if you want to use something with Denali, it must be able to operate without making assumptions about the on-disk locations / representations of things. In TypeORM's case, that would mean that TypeORM must be able to accept a Map of { filepaths -> modules } rather than a glob pattern.

  2. We ditch the bundler. We go back to traditional separate files, and we lose automatic test parallelization. It would suck to lose that benefit, but it would remove a lot of complexity from Denali's build process, and it would remove this kind of surprising gotcha (I'm sure there are more edge cases to using bundled files than just this one we've encountered here). For this particular problem - TypeORM would be able to require as normal.

  3. We hook into Node's require.extensions. This is officially "deprecated" and not recommended, but also is not going anywhere, and is how lots of language preprocessors work (i.e. babel-register, @std/esm, etc). We'd patch the regular loader to try to load from the bundle if it looks like it's a bundle file. I'm not 100% this is possible (can we reliably detect that a requested file is part of a bundle?), and in some ways introduces even more trickery (it's much harder to figure out what's going on this way, vs. our current approach which at least makes it obvious when you step through the debugger). In this case, TypeORM would do it's normal require call, but we'd essentially intercept and return the bundled copy of the module.

  4. Something else? Suggestions welcome!

@luxzeitlos
Copy link
Contributor Author

luxzeitlos commented Feb 19, 2018

Interesting. I understood mostly what happened but not why.
First, for this exact edge case I dont think this is really a Problem. I was able to hook the etities loaded into TypeORM to basically use the denali container instead of glob patterns which works. Except for the migrations, where TypeORM fully operates on the file system, but this is no problem.

Ans so I dont really have a problem with options 1. Giving denali the full control of the build and loading process of the app gives it great power. However then I really think we should inject some kind of check. We could still disable this for tests. Also we could detect if we are inside the bundle or not, and maybe crash if we are not in the bundle but denali started the process? I'm sure we will figure something out.

However I'm more concerned about other global state leaking during tests. Even if the entire app is a function, global and symbols are still global (as I've shown), and so state will leak. For Unit tests TypeORM this is probably not a big problem, since I assume that you want to mock away your database. However other things that use global may not be a database driver.

Why do you even parallelize tests at all? I dont understand why there is a performance benefit. In the end you only have one thread, and every async operation that actually needs time (mostly network, databases and hardware) should be mocked anyway.

However could it be a solution to use real processes to parallelize? Like start one node process per thread as executor, use a coordinator, dispatch tests to the processes and unify the result? How do other tools (ember?) handle this?

Your point 3. looks like an elegant way to fix this for things like TypeORM, and may be a cool thing to have when sticking with bundles, but I dont think its something we need just now. Just something to keep in mind.

@davewasmer
Copy link
Collaborator

Also we could detect if we are inside the bundle or not, and maybe crash if we are not in the bundle but denali started the process? I'm sure we will figure something out.

Yea, it's probably doable to detect if Denali is trying to load from outside a bundle, and warn/throw on that.

However I'm more concerned about other global state leaking during tests. Even if the entire app is a function, global and symbols are still global (as I've shown), and so state will leak.

Very true - however, Denali doesn't use anything like this itself, and I've rarely seen global used by library code anywhere in the Node ecosystem. This is ultimately something Denali can't 100% (hence the ever-present "almost impossible" qualifier). If you really want to, you can leak state, but your going to have to go outside Denali to do it.

If you must share global state, you can always configure the tests to run serially rather than in parallel.

Why do you even parallelize tests at all?
However could it be a solution to use real processes to parallelize?

Heh - we use Ava as our test runner, which does exactly this. Each test file runs in a separate worker process, and each test in a test file is run asynchronously.

How do other tools (ember?) handle this?

Their tests aren't parallelized. They run serially, so concurrent shared state is not a concern. They have code that "resets the world" after each test.

every async operation that actually needs time (mostly network, databases and hardware) should be mocked anyway.

This point is worth addressing separately. It goes to the heart of Denali's philosophy:

We need to meet developers where they are.

Yes, in an ideal world, everything would be mocked and stubbed and isolated. But real world teams have constraints: limited time, limited experience, etc. Denali isn't a framework for ideal applications. It's for real world use cases.

For lots of teams, those constraints mean that things like fully mocked test dependencies are a luxury they can't afford. So Denali needs to meet them where they are, and offer the right set of tools to deal with the problems they face, while trying to offer them easy ways to level up and do it "the right way".

@luxzeitlos
Copy link
Contributor Author

luxzeitlos commented Feb 21, 2018

Yea, it's probably doable to detect if Denali is trying to load from outside a bundle, and warn/throw on that.

then we probably should do this. At least we should track this.

Each test file runs in a separate worker process, and each test in a test file is run asynchronously.

Just to clarify: So we have one process per test file, and asynchronicity inside this one file. This would mean that even when stopping with bundles we still could run every test inside its own process, and just loose the asynchronicity inside this file. Right?

For lots of teams, those constraints mean that things like fully mocked test dependencies are a luxury they can't afford.

I agree. I just really have many examples when its easier to not mock heavy IO in tests without leaking state. And thats the only thing where asynchronicity is possible gives a speed improvement.


In general my opinion is that we should add this check and ensure that its documented well, but stick with the current behaviour. Its just important that developers realize it soon when they go on undefined land.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants