Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build an automated test matrix across John/Hashcat/name-that-hash #62

Open
bburky opened this issue Mar 16, 2021 · 3 comments
Open

Build an automated test matrix across John/Hashcat/name-that-hash #62

bburky opened this issue Mar 16, 2021 · 3 comments

Comments

@bburky
Copy link
Contributor

bburky commented Mar 16, 2021

It would be nice to automatically test sample hashes against name-that-hash, and then verify that the same hashes work against both John and Hashcat using the modes we have in our DB.

Similarly, in #59 we discovered that John and Hashcat don't always accept the same formats for what seems to be the same hash type. If any of our regexes in name-that-hash are permissive enough (e.g. optional values) some hashes will not work against both John and Hashcat even though we claim it should. We should break these out to separate hash types in name-that-hash if they do exist.

It would be really nice if there's an existing DB anywhere mapping John modes to Hashcat, but to my knowledge name-that-hash is the only example of it.

We have a few hashes in test_main.py.. but it would be awesome if we could test every hash. Actual code coverage measurements won't work, our DB is a python object, not code. But we should use something similar to ensure every value is tested.

Data

We could add sample hashes to each entry in our hashes DB perhaps? And use them for automated testing?

We should pull in sample hashes from both John and Hashcat and use them for our tests

  • Hashcat example hashes, this may actually be comprehensive, I'm not sure. But this only includes one hash per mode, if any hashcat modes have any parts of the hash as optional, more than one test is needed per hash
  • John sample hashes Unlike hashcat, this list is far from comprehensive
  • John unit tests. I think --test on john might use these, but I can't see any way to print them all automatically. There are individual tests inside each John mode's source file though.
    • TODO: see if there's a nice way to automatically extract all these test hashes
    • John is GPL2+, we are GPL3. Which means we can pull things from their source.

Automated testing

John's --show=formats allows easily testing what modes it matched against the hashes. We could probably test every mode at the same time?

./john   --show=formats /tmp/hashes.txt
[{"lineNo":1,"ciphertext":"$krb5pa$17$hashcat$HASHCATDOMAIN.COM$a17776abe5383236c58582f515843e029ecbff43706d177651b7b6cdb2713b17597ddb35b1c9c470c281589fd1d51cca125414d19e40e333","rowFormats":[]},
{"lineNo":2,"ciphertext":"$krb5pa$17$user1$EXAMPLE.COM$$c5461873dc13665771b98ba80be53939e906d90ae1ba79cf2e21f0395e50ee56379fbef4d0298cfccfd6cf8f907329120048fd05e8ae5df4","rowFormats":[{"label":"krb5pa-sha1","prepareEqCiphertext":true,"canonHash":["$krb5pa$17$user1$EXAMPLE.COM$EXAMPLE.COMuser1$c5461873dc13665771b98ba80be53939e906d90ae1ba79cf2e21f0395e50ee56379fbef4d0298cfccfd6cf8f907329120048fd05e8ae5df4"]}]}]

Hashcat appears to have nothing similar. Hashcat has no "dry run, parse hashes only" mode that I know of, but it will log something like this if it can't parse a hash. Also, this would likely require running hashcat once per each mode we would want to test, unlike john's --show=formats that would allow testing everything at once.

Hashfile '/tmp/hashes.txt' on line 2 ($krb5p...cfd6cf8f907329120048fd05e8ae5df4): Token length exception

Then of course, test all of them against name-that-hash and see if they are correctly detected. We should also have some kind of coverage to see if we have any hash regexes that never matched any of the test hashes.

@bee-san
Copy link
Member

bee-san commented Mar 16, 2021

We could add sample hashes to each entry in our hashes DB perhaps? And use them for automated testing?

We can do this for sure. Using a DataClass means we can have items that have a default value, such as None like here:
https://github.com/HashPals/Name-That-Hash/blob/main/name_that_hash/hashes.py#L17

We could similarly do it for all the hashes while we build up the DB of example hashes?

I'm thinking we can start off with:

  1. Scrape Hashcat example hashes
  2. Find matching hashcat mode in our DB
  3. Insert the example hash into the code

That way we'll fill up most of our DB with examples.

Thoughts? 😄

@bburky
Copy link
Contributor Author

bburky commented Mar 16, 2021

Yes, that can get us pretty far. I may try out implementing that.

I'm trying to decide if we should actually put it in the DB, or make the test code join the example hashes with our patterns at runtime.

I think we should commit the results we scrape into git. Like maybe parse the example hashes into JSON and commit that. (Mostly so we can detect when it changes, not require internet access to do tests, etc.) But should we merge this JSON with our patterns at runtime or just put it directly in the DB. I may play with both options and see how it looks.

@Knowledge-Wisdom-Understanding

My unrequested 2 cents. 😅
I think you should re-write this project in go.
That way it works on all platforms, plus it will be faster. Not noticeably faster to the naked eye, but technically faster and probably better. Seems like you're mostly just using regular expressions anyway for hash detection, not sure if there are any real advantages of keeping the code base in python? Seems like lots of duplicate code that could be simplified. Already notice stucts and context usage in the code base, just seems like go would be the natural next step / choice.
Also ❤️ python3 🐍 by the way . Awesome project! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants