Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Early stopping #17

Open
Pr0methean opened this issue May 27, 2023 · 2 comments
Open

Early stopping #17

Pr0methean opened this issue May 27, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@Pr0methean
Copy link
Contributor

Pr0methean commented May 27, 2023

I suspect that without too much difficulty, theorems along the lines of the following could be proven mathematically for the algorithm and debug_asserted for the implementation:

  • When an iteration of Zopfli hasn't reduced the file size, subsequent iterations won't do so either.
  • If a file's uncompressed size is N bytes, the minimum compressed size will be found within cN + d iterations for some small constants c and d (probably c < 10 and d < 10).

It would be helpful to have these applied to limit the number of iterations for small blocks, which would help with fuzz testing (where a very large iteration count and a very small file can be properties of a corner case that needs to be tested, even if having them happen in production would indicate a wrong assumption), especially given cargo fuzz's bias toward very small Vec<u8>s.

@AlexTMjugador
Copy link
Member

Your hypotheses sound sensible and useful to me, although I'm not sure right now on how to go about formally proving them. I'd need to understand the applied mathematics behind the algorithm more than I currently do to make a definitive statement.

Out of curiosity, do you happen to know of any good resources to learn how exactly Zopfli works? The Zopfli whitepaper can be summarized as "we made this compressor, tested it, and it turned out to work well in practice", which is not very helpful. On the other hand, the books and papers I've found on LZ77 and compression algorithms in general tend to be somewhat old and disconnected from the considerations and refinements made by state-of-the-art implementations like Zopfli.

@andrews05
Copy link

While I don't know anything myself, I have noticed libdeflate's code is super well documented - could be helpful to read over it. I think it borrows a lot of concepts from zstd but the near-optimal algorithm likely has some similarities with zopfli.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants