Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"encountered an invalid instruction" when operating on PE32/PE32+ #9

Open
autumnontape opened this issue Aug 18, 2020 · 10 comments
Open
Labels
help wanted Extra attention is needed

Comments

@autumnontape
Copy link
Contributor

autumnontape commented Aug 18, 2020

I've tried running steg86 profile against several EXEs and DLLs, both PE32 and PE32+, and every time, it has produced an error like this:

Fatal: encountered an invalid instruction at text offset 3678 (file offset 4702)

It seems like this should be easy to reproduce, but I can upload an example file if not. I've had no such problems with ELF files.

@autumnontape autumnontape changed the title encountered an invalid instruction "encountered an invalid instruction" when operating on PE32/PE32+ Aug 18, 2020
@woodruffw
Copy link
Owner

Interesting. I only tested against some small PEs, but I haven't seen that. Would you mind uploading an example?

PEs in the wild are interesting things, so it wouldn't surprise me if many include data in their text sections; that would trip steg86 up. The general fix here is an unsolved one (CFG recovery/code-data disambiguation for arbitrary binaries), but steg86 could do a few things to make the happy path simpler:

  • Allow a --text-map or similar option, which describes "holes" in the text section where data occurs. This kind of input could be generated by hand, or by a reverse engineering tool.

  • Allow an --unsafe-skip-invalid-instructions or similar flag, which simply skips bytes when they don't decode properly. This would be incredibly dangerous, since an initial decode error is a pretty good indicator that any subsequent decodes might be us inadvertently modifying data instead of code + silently breaking the program. But that's sometimes okay, as a nuclear option.

@autumnontape
Copy link
Contributor Author

I can't really upload files right now but will later -- in the meantime, if you have any Unity-based games for Windows downloaded, the game executables should trigger this error.

Another possible option would be to act as if the text section ended at the first illegal instruction, which would still be dangerous because data may accidentally look like valid instructions, but less so than powering through.

I think this program is cute and might use it for some little easter eggs in the future, but not for any binaries that I'm not compiling and linking myself, so if the problem is data in the text section, it shouldn't affect me! But I tried like five files, and they all had this problem, so I guess MSVC must like doing this or something.

@woodruffw
Copy link
Owner

I don't have any Unity games, but I do have a Windows VM -- I'll see if I can find a testcase 🙂

Another possible option would be to act as if the text section ended at the first illegal instruction, which would still be dangerous because data may accidentally look like valid instructions, but less so than powering through.

Yeah, this would be a good third option to have!

I guess MSVC must like doing this or something.

Yeah, quite possibly. I would have expected it to be a little more discerning since mixing code and data makes the CPU's L1I/L1D and ITLB/DTLB work harder, but it's always a mystery with MSVC.

@autumnontape
Copy link
Contributor Author

putty.zip

For an example of an erroring input, there's putty.exe, which is under the MIT license, and which triggers this error message when I run steg86 profile on it:

Fatal: encountered an invalid instruction at text offset 460395 (file offset 461419)

@autumnontape
Copy link
Contributor Author

psftp.zip

This is the PuTTY SFTP client, which also triggers the error and is smaller:

Fatal: encountered an invalid instruction at text offset 319873 (file offset 320897)

@autumnontape
Copy link
Contributor Author

I'm not great with reverse engineering tools, but I opened up psftp.exe in Cutter, and the address reported in the error is near the start of a jump table (at 0x0044f181):

disassembly from psftp.exe; there's a jump table in the text section at address 0x0044f17e

@woodruffw
Copy link
Owner

Yep, that looks right to me. Most compilers that I'm aware of would use a pseudo-instruction to place the jump table in .data or .rodata (or whatever), but maybe MSVC isn't bright enough or something else interfered.

@woodruffw
Copy link
Owner

In the mean time, I'd be happy to accept a PR that adds support for punching "holes" in the Text structure. It's something that I can do on my own, but if you'd like to get a head start on it, feel free 🙂

@woodruffw woodruffw added the help wanted Extra attention is needed label Aug 19, 2020
@autumnontape
Copy link
Contributor Author

Sure, it seems interesting to work on. Here are my thoughts on how to implement it, let me know what you think.

At least to begin with, the input format can be plain CSV, which won't require any dependencies. The two input columns are an offset into the text section and a length, both in bytes, and each row describes a span of instructions that may be used for steganography.

A map of this same information can then be optionally embedded inband with the message to make it possible to extract the message without having to pass the CSV file around like a decoder ring. There could be a dedicated bit to distinguish between mapped and mapless modes, or they could be distinguished by different magic numbers. The inband map uses varints and counts the lengths of usable spans in terms of semantic pairs and unusable spans in terms of bytes.

@woodruffw
Copy link
Owner

woodruffw commented Aug 21, 2020

At least to begin with, the input format can be plain CSV, which won't require any dependencies. The two input columns are an offset into the text section and a length, both in bytes, and each row describes a span of instructions that may be used for steganography.

👍, that sounds very reasonable to me. Having it be an explicit allowlist rather than "holes" also makes more sense, now that I think about it.

There could be a dedicated bit to distinguish between mapped and mapless modes, or they could be distinguished by different magic numbers. The inband map uses varints and counts the lengths of usable spans in terms of semantic pairs and unusable spans in terms of bytes.

Different magic numbers sounds good to me: we could do the current magic incremented by 1 (b'x') to indicate special treatment. The encoding you propose also makes sense to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants