Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate possible bundle size limit #23

Open
ovflowd opened this issue Aug 19, 2022 · 9 comments
Open

Investigate possible bundle size limit #23

ovflowd opened this issue Aug 19, 2022 · 9 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed research

Comments

@ovflowd
Copy link
Member

ovflowd commented Aug 19, 2022

Within the realm of reasonable, there are always limitations. Reducing bundle sizes is one of the oldest yet, active challenges in the realm of signal processing.

One thing that I'd love about bundlers like WebPack is that they have "conscious" performance checks, and they automatically resolve to default optimizations that the current loaders support.

Having practices set on optimising the resulting bundle, checking if it would somehow reach current hardware constraints (leaks?), et cetera, would be interesting!

This is just a wild-shot, but I believe we should keep performance and optimization in our mind 🤔

@jviotti
Copy link
Member

jviotti commented Aug 22, 2022

@dsanders11 @robertgzr @RaisinTen Can you lead this investigation for Mach-O, PE, ELF and XCOFF and document this as a Markdown file on the repo itself?

@jviotti jviotti added help wanted Extra attention is needed research good first issue Good for newcomers labels Aug 22, 2022
@jviotti
Copy link
Member

jviotti commented Aug 22, 2022

Anybody else welcomed to help :)

@RaisinTen
Copy link
Contributor

I think here we are limited by the maximum size of a segment that we are allowed to embed in a particular file format. I was working on a test in the postject repo - nodejs/postject#7 that tries to embed a segment with random data into a binary which prints the embedded data. After comparing the embedded data to the printed data, it seems that on Mach-O it is limited at around 1kb. Seems to be kinda low. 🤔

@jviotti
Copy link
Member

jviotti commented Sep 19, 2022

@RaisinTen I don't think 1kb is the limit. Compilers definitely allocate much larger data. @dsanders11 Do you have more insights here? I quickly skimmed through the Mach-O reference and sections size are described by uint32_t, which would give you ~4 GB?

@RaisinTen
Copy link
Contributor

I actually made a boo-boo. I used printf() to print the embedded data and that was causing problems because the data gets terminated at null bytes. Seems to be producing sensible results after I started using fwrite() instead.

The limitation is actually not the maximum size of a segment that's allowed by the file format. It's probably the maximum memory a process is allowed to use.

For Mach-O on macOS, the limit seems to be at around 3GB. Sometimes the process was getting terminated because it was running out of memory.

For ELF on Linux, it seems to be around 1GB on my Intel NUC but it seems to be lesser than 1GB on CircleCI (haven't found the limit yet).

Postject doesn't do any special handling of these limits. It tries to fit in the data as much as possible and when it comes across something that's too big, python throws a memory error exception and terminates the process.

@jviotti
Copy link
Member

jviotti commented Sep 20, 2022

python throws a memory error exception and terminates the process.

Ah, interesting. Maybe the limitation comes from LIEF then? I wonder if its trying to load the entire blob into memory (potentially multiple times) and then injecting?

@RaisinTen
Copy link
Contributor

@jviotti I won't consider that part to be much of a problem. The limits imposed by python is more than what the executable can handle, which is why it gets terminated even when lief succeeds in embedding a really large resource.

@ovflowd
Copy link
Member Author

ovflowd commented Sep 20, 2022

Reasonably, it feels like it's better to generate an exit code (end execution) rather than gracefully handle enormous resources... It might be worth documenting for people using in the future node-sea about being reasonable with bundling assets.

@RaisinTen
Copy link
Contributor

it's better to generate an exit code (end execution) rather than gracefully handle enormous resources

Do you mean setting an artificial limit? What limit do we set and how would that help?

It might be worth documenting for people using in the future node-sea about being reasonable with bundling assets.

Yea, I'll try to send a PR to document my findings after I'm done experimenting on Windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed research
Projects
None yet
Development

No branches or pull requests

3 participants