1883 lines
70 KiB
Plaintext
1883 lines
70 KiB
Plaintext
Volume 0x0b, Issue 0x3f, Phile #0x0d of 0x14
|
|
|
|
|=------=[ cryptexec: Next-generation runtime binary encryption ]=-------=|
|
|
|=------=[ using on-demand function extraction ]=-------=|
|
|
|=-----------------------------------------------------------------------=|
|
|
|=----------------=[ Zeljko Vrba <zvrba@globalnet.hr> ]=-----------------=|
|
|
|=-----------------------------------------------------------------------=|
|
|
|
|
ABSTRACT
|
|
|
|
Please excuse my awkward English, it is not my native language.
|
|
|
|
What is binary encryption and why encrypt at all? For the answer to
|
|
this question the reader is referred to the Phrack#58 [1] and article
|
|
therein titled "Runtime binary encryption". This article describes a
|
|
method to control the target program that doesn't does not rely on
|
|
any assistance from the OS kernel or processor hardware. The method
|
|
is implemented in x86-32 GNU AS (AT&T syntax). Once the controlling
|
|
method is devised, it is relatively trivial to include on-the-fly
|
|
code decryption.
|
|
|
|
1 Introduction
|
|
2 OS- and hardware-assisted tracing
|
|
3 Userland tracing
|
|
3.1 Provided API
|
|
3.2 High-level description
|
|
3.3 Actual usage example
|
|
3.4 XDE bug
|
|
3.5 Limitations
|
|
3.6 Porting considerations
|
|
4 Further ideas
|
|
5 Related work
|
|
5.1 ELFsh
|
|
5.2 Shiva
|
|
5.3 Burneye
|
|
5.4 Conclusion
|
|
6 References
|
|
7 Credits
|
|
A Appendix: source code
|
|
A.1 crypt_exec.S
|
|
A.2 cryptfile.c
|
|
A.3 test2.c
|
|
|
|
Note: Footnotes are marked by # and followed by the number. They are
|
|
listed at the end of each section.
|
|
|
|
--[ 1.0 - Introduction
|
|
|
|
First let me introduce some terminology used in this article so that
|
|
the reader is not confused.
|
|
|
|
o The attributes "target", "child" and "traced" are used interchangeably
|
|
(depending on the context) to refer to the program being under the
|
|
control of another program.
|
|
|
|
o The attributes "controlling" and "tracing" are used interchangeably to
|
|
refer to the program that controls the target (debugger, strace, etc.)
|
|
|
|
|
|
--[ 2.0 - OS- and hardware-assisted tracing
|
|
|
|
Current debuggers (both under Windows and UNIX) use x86 hardware
|
|
features for debugging. The two most commonly used features are the trace
|
|
flag (TF) and INT3 instruction, which has a convenient 1-byte encoding of
|
|
0xCC.
|
|
|
|
TF resides in bit 8 of the EFLAGS register and when set to 1 the pro-
|
|
cessor generates exception 1 (debug exception) after each instruction
|
|
is executed. When INT3 is executed, the processor generates exception 3
|
|
(breakpoint).
|
|
|
|
The traditional way to trace a program under UNIX is the ptrace(2)
|
|
syscall. The program doing the trace usually does the following
|
|
(shown in pseudocode):
|
|
|
|
fork()
|
|
child: ptrace(PT_TRACE_ME)
|
|
execve("the program to trace")
|
|
parent: controls the traced program with other ptrace() calls
|
|
|
|
Another way is to do ptrace(PT_ATTACH) on an already existing process.
|
|
Other operations that ptrace() interface offers are reading/writing target
|
|
instruction/data memory, reading/writing registers or continuing the
|
|
execution (continually or up to the next system call - this capability is
|
|
used by the well-known strace(1) program).
|
|
|
|
Each time the traced program receives a signal, the controlling program's
|
|
ptrace() function returns. When the TF is turned on, the traced program
|
|
receives a SIGTRAP after each instruction. The TF is usually not turned
|
|
on by the traced program#1, but from the ptrace(PT_STEP).
|
|
|
|
Unlike TF, the controlling program places 0xCC opcode at strategic#2
|
|
places in the code. The first byte of the instruction is replaced with
|
|
0xCC and the controlling program stores both the address and the original
|
|
opcode. When execution comes to that address, SIGTRAP is delivered and
|
|
the controlling program regains control. Then it replaces (again using
|
|
ptrace()) 0xCC with original opcode and single-steps the original
|
|
instruction. After that the original opcode is usually again replaced
|
|
with 0xCC.
|
|
|
|
Although powerful, ptrace() has several disadvantages:
|
|
|
|
1. The traced program can be ptrace()d only by one controlling program.
|
|
|
|
2. The controlling and traced program live in separate address spaces,
|
|
which makes changing traced memory awkward.
|
|
|
|
3. ptrace() is a system call: it is slow if used for full-blown tracing
|
|
of larger chunks of code.
|
|
|
|
I won't go deeper in the mechanics of ptrace(), there are available
|
|
tutorials [2] and the man page is pretty self-explanatory.
|
|
|
|
__
|
|
#1 Although nothing prevents it to do so - it is in the user-modifiable
|
|
portion of EFLAGS.
|
|
#2 Usually the person doing the debugging decides what is strategic.
|
|
|
|
|
|
--[ 3.0 - Userland tracing
|
|
|
|
The tracing can be done solely from the user-mode: the instructions
|
|
are executed natively, except control-transfer instructions (CALL, JMP,
|
|
Jcc, RET, LOOP, JCXZ). The background of this idea is explained
|
|
nicely in [3] on the primitive 1960's MIX computer designed by Knuth.
|
|
|
|
Features of the method I'm about to describe:
|
|
|
|
o It allows that only portions of the executable file are encrypted.
|
|
|
|
o Different portions of the executable can be encrypted with different
|
|
keys provided there is no cross-calling between them.
|
|
|
|
o It allows encrypted code to freely call non-encrypted code. In this
|
|
case the non-encrypted code is also executed instruction by instruction.
|
|
When called outside of encrypted code, it still executes without
|
|
tracing.
|
|
|
|
o There is never more than 24 bytes of encrypted code held in memory in
|
|
plaintext.
|
|
|
|
o OS- and language-independent.
|
|
|
|
The rest of this section explains the provided API, gives a high-level
|
|
description of the implementation, shows a usage example and discusses
|
|
Here are the details of my own implementation.
|
|
|
|
|
|
----[ 3.1 - Provided API
|
|
|
|
|
|
No "official" header file is provided. Because of the sloppy and
|
|
convenient C parameter passing and implicit function declarations, you
|
|
can get away with no declarations whatsoever.
|
|
|
|
The decryption API consists of one typedef and one function.
|
|
|
|
typedef (*decrypt_fn_ptr)(void *key, unsigned char *dst, const unsigned
|
|
char *src);
|
|
|
|
This is the generic prototype that your decryption routine must fit. It
|
|
is called from the main decryption routine with the following arguments:
|
|
|
|
o key: pointer to decryption key data. Note that in most cases this is
|
|
NOT the raw key but pointer to some kind of "decryption context".
|
|
|
|
o dst: pointer to destination buffer
|
|
|
|
o src: pointer to source buffer
|
|
|
|
Note that there is no size argument: the block size is fixed to 8
|
|
bytes. The routine should not read more than 8 bytes from the src and NEVER
|
|
output more than 8 bytes to dst.
|
|
|
|
Another unusual constraint is that the decryption function MUST NOT
|
|
modify its arguments on the stack. If you need to do this, copy the stack
|
|
arguments into local variables. This is a consequence of how the routine
|
|
is called from within the decryption engine - see the code for details.
|
|
|
|
There are no constraints whatsoever on the kind of encryption which can
|
|
be used. ANY bijective function which maps 8 bytes to 8 bytes is suitable.
|
|
Encrypt the code with the function, and use its inverse for the
|
|
decryption. If you use the identity function, then decryption becomes
|
|
simple single-stepping with no hardware support -- see section 4 for
|
|
related work.
|
|
|
|
The entry point to the decryption engine is the following function:
|
|
|
|
int crypt_exec(decrypt_fn_ptr dfn, const void *key, const void *lo_addr,
|
|
const void *hi_addr, const void *F, ...);
|
|
|
|
The decryption function has the capability to switch between executing
|
|
both encrypted and plain-text code. The encrypted code can call the
|
|
plain-text code and plain-text code can return into the encrypted code.
|
|
But for that to be possible, it needs to know the address bounds of the
|
|
encrypted code.
|
|
|
|
Note that this function is not reentrant! It is not allowed for ANY
|
|
kind of code (either plain-text or encrypted) running under the crypt_exec
|
|
routine to call crypt_exec again. Things will break BADLY because the
|
|
internal state of previous invocation is statically allocated and will
|
|
get overwritten.
|
|
|
|
The arguments are as follows:
|
|
|
|
o dfn: Pointer to decryption function. The function is called with the
|
|
key argument provided to crypt_exec and the addresses of destination
|
|
and source buffers.
|
|
|
|
o key: This are usually NOT the raw key bytes, but the initialized
|
|
decryption context. See the example code for the test2 program: first
|
|
the user-provided raw key is loaded into the decryption context and the
|
|
address of the _context_ is given to the crypt_exec function.
|
|
|
|
o lo_addr, hi_addr: These are low and high addresses that are encrypted
|
|
under the same key. This is to facilitate calling non-encrypted code
|
|
from within encrypted code.
|
|
|
|
o F: pointer to the code which should be executed under the decryption
|
|
engine. It can be an ordinary C function pointer. Since the tracing
|
|
routine was written with 8-byte block ciphers in mind, the F function
|
|
must be at least 8-byte aligned and its length must be a multiple of 8.
|
|
This is easier to achieve (even with standard C) than it sounds. See the
|
|
example below.
|
|
|
|
o ... become arguments to the called function.
|
|
|
|
crypt_exec arranges to function F to be called with the arguments
|
|
provided in the varargs list. When crypt_exec returns, its return value is
|
|
what the F returned. In short, the call
|
|
|
|
x = crypt_exec(dfn, key, lo_addr, hi_addr, F, ...);
|
|
|
|
has exactly the same semantics as
|
|
|
|
x = F(...);
|
|
|
|
would have, were F plain-text.
|
|
|
|
Currently, the code is tailored to use the XDE disassembler. Other
|
|
disassemblers can be used, but the code which accesses results must be
|
|
changed in few places (all references to the disbuf variable).
|
|
|
|
The crypt_exec routine provides a private stack of 4kB. If you use your
|
|
own decryption routine and/or disassembler, take care not to consume too
|
|
much stack space. If you want to enlarge the local stack, look for the
|
|
local_stk label in the code.
|
|
|
|
__
|
|
#3 In the rest of this article I will call this interchangeably tracing
|
|
or decryption routine. In fact, this is a tracing routine with added
|
|
decryption.
|
|
|
|
|
|
----[ 3.2 - High-level description
|
|
|
|
|
|
The tracing routine maintains two contexts: the traced context and
|
|
its own context. The context consists of 8 32-bit general-purpose
|
|
registers and flags. Other registers are not modified by the routine.
|
|
Both contexts are held on the private stack (that is also used for
|
|
calling C).
|
|
|
|
The idea is to fetch, one at a time, instructions from the traced
|
|
program and execute them natively. Intel instruction set has rather
|
|
irregular encoding, so the XDE [5] disassembler engine is used to find
|
|
both the real opcode and total instruction length. During experiments on
|
|
FreeBSD (which uses LOCK- prefixed MOV instruction in its dynamic loader)
|
|
I discovered a bug in XDE which is described and fixed below.
|
|
|
|
We maintain our own EIP in traced_eip, round it down to the next lower
|
|
8-byte boundary and then decrypt#4 24 bytes#5 into our own buffer. Then
|
|
the disassembly takes place and the control is transferred to emulation
|
|
routines via the opcode control table. All instructions, except control
|
|
transfer, are executed natively (in traced context which is restored at
|
|
appropriate time). After single instruction execution, the control is
|
|
returned to our tracing routine.
|
|
|
|
In order to prevent losing control, the control transfer instructions#6
|
|
are emulated. The big problem was (until I solved it) emulating indirect
|
|
JMP and CALL instructions (which can appear with any kind of complex EA
|
|
that i386 supports). The problem is solved by replacing the CALL/JMP
|
|
instruction with MOV to register opcode, and modifying bits 3-5 (reg
|
|
field) of modR/M byte to set the target register (this field holds the
|
|
part of opcode in the CALL/JMP case). Then we let the processor to
|
|
calculate the EA for us.
|
|
|
|
Of course, a means are needed to stop the encrypted execution and to
|
|
enable encrypted code to call plaintext code:
|
|
|
|
1. On entering, the tracing engine pops the return address and its
|
|
private arguments and then pushes the return address back to the
|
|
traced stack. At that moment:
|
|
o The stack frame is good for executing a regular C function (F).
|
|
o The top of stack pointer (esp) is stored into end_esp.
|
|
|
|
2. When the tracing routine encounters a RET instruction it first checks
|
|
the traced_esp. If it equals end_esp, it is a point where the F
|
|
function would have ended. Therefore, we restore the traced context
|
|
and do not emulate RET, but let it execute natively. This way the
|
|
tracing routine loses control and normal instruction execution
|
|
continues.
|
|
|
|
In order to allow encrypted code to call plaintext code, there are
|
|
lo_addr and hi_addr parameters. These parameters determine the low and high
|
|
boundary of encrypted code in memory. If the traced_eip falls out of
|
|
[lo_addr, hi_addr) range, the decryption routine pointer is swapped with
|
|
the pointer to a no-op "decryption" that just copies 8 bytes from source
|
|
to destination. When the traced_eip again falls into that interval, the
|
|
pointers are again swapped.
|
|
|
|
__
|
|
#4 The decryption routine is called indirectly for reasons described
|
|
later.
|
|
#5 The number comes from worst-case considerations: if an instruction
|
|
begins at a boundary that is 7 (mod 8), given maximum instruction
|
|
length of 15 bytes, yields a total of 22 bytes = 3 blocks. The buffer
|
|
has 32 bytes in order to accommodate an additional JMP indirect
|
|
instruction after the traced instruction. The JMP jumps indirectly to
|
|
place in the tracing routine where execution should continue.
|
|
#6 INT instructions are not considered as control transfer. After (if)
|
|
the OS returns from the invoked trap, the program execution continues
|
|
sequentially, the instruction right after INT. So there are no special
|
|
measures that should be taken.
|
|
|
|
|
|
----[ 3.3 - Actual usage example
|
|
|
|
|
|
Given encrypted execution engine, how do we test it? For this purpose I
|
|
have written a small utility named cryptfile that encrypts a portion of
|
|
the executable file ($ is UNIX prompt):
|
|
|
|
$ gcc -c cast5.c
|
|
$ gcc cryptfile.c cast5.o -o cryptfile
|
|
$ ./cryptfile
|
|
USAGE: ./cryptfile <-e_-d> FILE KEY STARTOFF ENDOFF
|
|
KEY MUST be 32 hex digits (128 bits).
|
|
|
|
The parameters are as follows:
|
|
|
|
o -e,-d: one of these is MANDATORY and stands for encryption
|
|
or decryption.
|
|
|
|
o FILE: the executable file to be encrypted.
|
|
|
|
o KEY: the encryption key. It must be given as 32 hex digits.
|
|
|
|
o STARTOFF, ENDOFF: the starting and ending offset in the file that should
|
|
be encrypted. They must be a multiple of block size (8 bytes). If not,
|
|
the file will be correctly encrypted, but the encrypted execution will
|
|
not work correctly.
|
|
|
|
The whole package is tested on a simple program, test2.c. This program
|
|
demonstrates that encrypted functions can call both encrypted and plaintext
|
|
functions as well as return results. It also demonstrates that the engine
|
|
works even when calling functions in shared libraries.
|
|
|
|
Now we build the encrypted execution engine:
|
|
|
|
$ gcc -c crypt_exec.S
|
|
$ cd xde101
|
|
$ gcc -c xde.c
|
|
$ cd ..
|
|
$ ld -r cast5.o crypt_exec.o xde101/xde.o -o crypt_monitor.o
|
|
|
|
I'm using patched XDE. The last step is to combine several relocatable
|
|
object files in a single relocatable file for easier linking with other
|
|
programs.
|
|
|
|
Then we proceed to build the test program. We must ensure that
|
|
functions that we want to encrypt are aligned to 8 bytes. I'm specifying 16
|
|
, just in case. Therefore:
|
|
|
|
$ gcc -falign-functions=16 -g test2.c crypt_monitor.o -o test2
|
|
|
|
We want to encrypt functions f1 and f2. How do wemap from function
|
|
names to offsets in the executable file? Fortunately, this can be simply
|
|
done for ELF with the readelf utility (that's why I chose such an awkward
|
|
way - I didn't want to bother with yet another ELF 'parser').
|
|
|
|
$ readelf -s test2
|
|
|
|
Symbol table '.dynsym' contains 23 entries:
|
|
Num: Value Size Type Bind Vis Ndx Name
|
|
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
|
|
1: 08048484 57 FUNC GLOBAL DEFAULT UND printf
|
|
2: 08050aa4 0 OBJECT GLOBAL DEFAULT ABS _DYNAMIC
|
|
3: 08048494 0 FUNC GLOBAL DEFAULT UND memcpy
|
|
4: 08050b98 4 OBJECT GLOBAL DEFAULT 20 __stderrp
|
|
5: 08048468 0 FUNC GLOBAL DEFAULT 8 _init
|
|
6: 08051c74 4 OBJECT GLOBAL DEFAULT 20 environ
|
|
7: 080484a4 52 FUNC GLOBAL DEFAULT UND fprintf
|
|
8: 00000000 0 NOTYPE WEAK DEFAULT UND __deregister_frame..
|
|
9: 0804fc00 4 OBJECT GLOBAL DEFAULT 13 __progname
|
|
10: 080484b4 172 FUNC GLOBAL DEFAULT UND sscanf
|
|
11: 08050b98 0 NOTYPE GLOBAL DEFAULT ABS __bss_start
|
|
12: 080484c4 0 FUNC GLOBAL DEFAULT UND memset
|
|
13: 0804ca64 0 FUNC GLOBAL DEFAULT 11 _fini
|
|
14: 080484d4 337 FUNC GLOBAL DEFAULT UND atexit
|
|
15: 080484e4 121 FUNC GLOBAL DEFAULT UND scanf
|
|
16: 08050b98 0 NOTYPE GLOBAL DEFAULT ABS _edata
|
|
17: 08050b68 0 OBJECT GLOBAL DEFAULT ABS _GLOBAL_OFFSET_TABLE_
|
|
18: 08051c78 0 NOTYPE GLOBAL DEFAULT ABS _end
|
|
19: 080484f4 101 FUNC GLOBAL DEFAULT UND exit
|
|
20: 08048504 0 FUNC GLOBAL DEFAULT UND strlen
|
|
21: 00000000 0 NOTYPE WEAK DEFAULT UND _Jv_RegisterClasses
|
|
22: 00000000 0 NOTYPE WEAK DEFAULT UND __register_frame_info
|
|
|
|
Symbol table '.symtab' contains 145 entries:
|
|
Num: Value Size Type Bind Vis Ndx Name
|
|
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
|
|
1: 080480f4 0 SECTION LOCAL DEFAULT 1
|
|
2: 08048110 0 SECTION LOCAL DEFAULT 2
|
|
3: 08048128 0 SECTION LOCAL DEFAULT 3
|
|
4: 080481d0 0 SECTION LOCAL DEFAULT 4
|
|
5: 08048340 0 SECTION LOCAL DEFAULT 5
|
|
6: 08048418 0 SECTION LOCAL DEFAULT 6
|
|
7: 08048420 0 SECTION LOCAL DEFAULT 7
|
|
8: 08048468 0 SECTION LOCAL DEFAULT 8
|
|
9: 08048474 0 SECTION LOCAL DEFAULT 9
|
|
10: 08048520 0 SECTION LOCAL DEFAULT 10
|
|
11: 0804ca64 0 SECTION LOCAL DEFAULT 11
|
|
12: 0804ca80 0 SECTION LOCAL DEFAULT 12
|
|
13: 0804fc00 0 SECTION LOCAL DEFAULT 13
|
|
14: 08050aa0 0 SECTION LOCAL DEFAULT 14
|
|
15: 08050aa4 0 SECTION LOCAL DEFAULT 15
|
|
16: 08050b54 0 SECTION LOCAL DEFAULT 16
|
|
17: 08050b5c 0 SECTION LOCAL DEFAULT 17
|
|
18: 08050b64 0 SECTION LOCAL DEFAULT 18
|
|
19: 08050b68 0 SECTION LOCAL DEFAULT 19
|
|
20: 08050b98 0 SECTION LOCAL DEFAULT 20
|
|
21: 00000000 0 SECTION LOCAL DEFAULT 21
|
|
22: 00000000 0 SECTION LOCAL DEFAULT 22
|
|
23: 00000000 0 SECTION LOCAL DEFAULT 23
|
|
24: 00000000 0 SECTION LOCAL DEFAULT 24
|
|
25: 00000000 0 SECTION LOCAL DEFAULT 25
|
|
26: 00000000 0 SECTION LOCAL DEFAULT 26
|
|
27: 00000000 0 SECTION LOCAL DEFAULT 27
|
|
28: 00000000 0 SECTION LOCAL DEFAULT 28
|
|
29: 00000000 0 SECTION LOCAL DEFAULT 29
|
|
30: 00000000 0 SECTION LOCAL DEFAULT 30
|
|
31: 00000000 0 SECTION LOCAL DEFAULT 31
|
|
32: 00000000 0 FILE LOCAL DEFAULT ABS crtstuff.c
|
|
33: 08050b54 0 OBJECT LOCAL DEFAULT 16 __CTOR_LIST__
|
|
34: 08050b5c 0 OBJECT LOCAL DEFAULT 17 __DTOR_LIST__
|
|
35: 08050aa0 0 OBJECT LOCAL DEFAULT 14 __EH_FRAME_BEGIN__
|
|
36: 08050b64 0 OBJECT LOCAL DEFAULT 18 __JCR_LIST__
|
|
37: 0804fc08 0 OBJECT LOCAL DEFAULT 13 p.0
|
|
38: 08050b9c 1 OBJECT LOCAL DEFAULT 20 completed.1
|
|
39: 080485b0 0 FUNC LOCAL DEFAULT 10 __do_global_dtors_aux
|
|
40: 08050ba0 24 OBJECT LOCAL DEFAULT 20 object.2
|
|
41: 08048610 0 FUNC LOCAL DEFAULT 10 frame_dummy
|
|
42: 00000000 0 FILE LOCAL DEFAULT ABS crtstuff.c
|
|
43: 08050b58 0 OBJECT LOCAL DEFAULT 16 __CTOR_END__
|
|
44: 08050b60 0 OBJECT LOCAL DEFAULT 17 __DTOR_END__
|
|
45: 08050aa0 0 OBJECT LOCAL DEFAULT 14 __FRAME_END__
|
|
46: 08050b64 0 OBJECT LOCAL DEFAULT 18 __JCR_END__
|
|
47: 0804ca30 0 FUNC LOCAL DEFAULT 10 __do_global_ctors_aux
|
|
48: 00000000 0 FILE LOCAL DEFAULT ABS test2.c
|
|
49: 08048660 75 FUNC LOCAL DEFAULT 10 f1
|
|
50: 080486b0 58 FUNC LOCAL DEFAULT 10 f2
|
|
51: 08050bb8 16 OBJECT LOCAL DEFAULT 20 key.0
|
|
52: 080486f0 197 FUNC LOCAL DEFAULT 10 decode_hex_key
|
|
53: 00000000 0 FILE LOCAL DEFAULT ABS cast5.c
|
|
54: 0804cba0 1024 OBJECT LOCAL DEFAULT 12 s1
|
|
55: 0804cfa0 1024 OBJECT LOCAL DEFAULT 12 s2
|
|
56: 0804d3a0 1024 OBJECT LOCAL DEFAULT 12 s3
|
|
57: 0804d7a0 1024 OBJECT LOCAL DEFAULT 12 s4
|
|
58: 0804dba0 1024 OBJECT LOCAL DEFAULT 12 s5
|
|
59: 0804dfa0 1024 OBJECT LOCALDEFAULT 12 s6
|
|
60: 0804e3a0 1024 OBJECT LOCAL DEFAULT 12 s7
|
|
61: 0804e7a0 1024 OBJECT LOCAL DEFAULT 12 sb8
|
|
62: 0804a3c0 3734 FUNC LOCAL DEFAULT 10 key_schedule
|
|
63: 0804b408 0 NOTYPE LOCAL DEFAULT 10 identity_decrypt
|
|
64: 08051bf0 0 NOTYPE LOCAL DEFAULT 20 r_decrypt
|
|
65: 08051be8 0 NOTYPE LOCAL DEFAULT 20 key
|
|
66: 08050bd4 0 NOTYPE LOCAL DEFAULT 20 lo_addr
|
|
67: 08050bd8 0 NOTYPE LOCAL DEFAULT 20 hi_addr
|
|
68: 08050bcc 0 NOTYPE LOCAL DEFAULT 20 traced_eip
|
|
69: 08050be0 0 NOTYPE LOCAL DEFAULT 20 end_esp
|
|
70: 08050bd0 0 NOTYPE LOCAL DEFAULT 20 traced_ctr
|
|
71: 0804b449 0 NOTYPE LOCAL DEFAULT 10 decryptloop
|
|
72: 08050bc8 0 NOTYPE LOCAL DEFAULT 20 traced_esp
|
|
73: 08051be4 0 NOTYPE LOCAL DEFAULT 20 stk_end
|
|
74: 0804b456 0 NOTYPE LOCAL DEFAULT 10 decryptloop_nocontext
|
|
75: 0804b476 0 NOTYPE LOCAL DEFAULT 10 .store_decrypt_ptr
|
|
76: 08051bec 0 NOTYPE LOCAL DEFAULT 20 decrypt
|
|
77: 0804fc35 0 NOTYPE LOCAL DEFAULT 13 insn
|
|
78: 08051bf4 0 NOTYPE LOCAL DEFAULT 20 disbuf
|
|
79: 08051be4 0 NOTYPE LOCAL DEFAULT 20 ilen
|
|
80: 080501f0 0 NOTYPE LOCAL DEFAULT 13 continue
|
|
81: 0804fdf0 0 NOTYPE LOCAL DEFAULT 13 control_table
|
|
82: 0804fc20 0 NOTYPE LOCAL DEFAULT 13 _unhandled
|
|
83: 0804fc21 0 NOTYPE LOCAL DEFAULT 13 _nonjump
|
|
84: 0804fc33 0 NOTYPE LOCAL DEFAULT 13 .execute
|
|
85: 0804fc55 0 NOTYPE LOCAL DEFAULT 13 _jcc_rel8
|
|
86: 0804fc5e 0 NOTYPE LOCAL DEFAULT 13 _jcc_rel32
|
|
87: 0804fc65 0 NOTYPE LOCAL DEFAULT 13 ._jcc_rel32_insn
|
|
88: 0804fc71 0 NOTYPE LOCAL DEFAULT 13 ._jcc_rel32_true
|
|
89: 0804fc6b 0 NOTYPE LOCAL DEFAULT 13 ._jcc_rel32_false
|
|
90: 0804fc72 0 NOTYPE LOCAL DEFAULT 13 rel_offset_fixup
|
|
91: 0804fc7d 0 NOTYPE LOCAL DEFAULT 13 _retn
|
|
92: 0804fca6 0 NOTYPE LOCAL DEFAULT 13 ._endtrace
|
|
93: 0804fcbe 0 NOTYPE LOCAL DEFAULT 13 _loopne
|
|
94: 0804fce0 0 NOTYPE LOCAL DEFAULT 13 ._loop_insn
|
|
95: 0804fcd7 0 NOTYPE LOCAL DEFAULT 13 ._doloop
|
|
96: 0804fcc7 0 NOTYPE LOCAL DEFAULT 13 _loope
|
|
97: 0804fcd0 0 NOTYPE LOCAL DEFAULT 13 _loop
|
|
98: 0804fcec 0 NOTYPE LOCAL DEFAULT 13 ._loop_insn_true
|
|
99: 0804fce2 0 NOTYPE LOCAL DEFAULT 13 ._loop_insn_false
|
|
100: 0804fcf6 0 NOTYPE LOCAL DEFAULT 13 _jcxz
|
|
101: 0804fd0a 0 NOTYPE LOCAL DEFAULT 13 _callrel
|
|
102: 0804fd0f 0 NOTYPE LOCAL DEFAULT 13 _call
|
|
103: 0804fd38 0 NOTYPE LOCAL DEFAULT 13 _jmp_rel8
|
|
104: 0804fd41 0 NOTYPE LOCAL DEFAULT 13 _jmp_rel32
|
|
105: 0804fd49 0 NOTYPE LOCAL DEFAULT 13 _grp5
|
|
106: 0804fda4 0 NOTYPE LOCAL DEFAULT 13 ._grp5_continue
|
|
107: 08050bdc 0 NOTYPE LOCAL DEFAULT 20 our_esp
|
|
108: 0804fdc9 0 NOTYPE LOCAL DEFAULT 13 ._grp5_call
|
|
109: 0804fdd0 0 NOTYPE LOCAL DEFAULT 13 _0xf
|
|
110: 08050be4 0 NOTYPE LOCAL DEFAULT 20 local_stk
|
|
111: 00000000 0 FILE LOCAL DEFAULT ABS xde.c
|
|
112: 0804b419 0 NOTYPE GLOBAL DEFAULT 10 crypt_exec
|
|
113: 08048484 57 FUNC GLOBAL DEFAULT UND printf
|
|
114: 08050aa4 0 OBJECT GLOBAL DEFAULT ABS _DYNAMIC
|
|
115: 08048494 0 FUNC GLOBAL DEFAULT UND memcpy
|
|
116: 0804b684 4662 FUNC GLOBAL DEFAULT 10 xde_disasm
|
|
117: 08050b98 4 OBJECT GLOBAL DEFAULT 20 __stderrp
|
|
118: 0804fc04 0 OBJECT GLOBAL HIDDEN 13 __dso_handle
|
|
119: 0804b504 384 FUNC GLOBAL DEFAULT 10 reg2xset
|
|
120: 08048468 0 FUNC GLOBAL DEFAULT 8 _init
|
|
121: 0804c8bc 364 FUNC GLOBAL DEFAULT 10 xde_asm
|
|
122: 08051c74 4 OBJECT GLOBAL DEFAULT 20 environ
|
|
123: 080484a4 52 FUNC GLOBAL DEFAULT UND fprintf
|
|
124: 00000000 0 NOTYPE WEAK DEFAULT UND __deregister_frame..
|
|
125: 0804fc00 4 OBJECT GLOBAL DEFAULT 13 __progname
|
|
126: 08048520 141 FUNC GLOBAL DEFAULT 10 _start
|
|
127: 0804b258 431 FUNC GLOBAL DEFAULT 10 cast5_setkey
|
|
128: 080484b4 172 FUNC GLOBAL DEFAULT UND sscanf
|
|
129: 08050b98 0 NOTYPE GLOBAL DEFAULT ABS __bss_start
|
|
130: 080484c4 0 FUNC GLOBAL DEFAULT UND memset
|
|
131: 080487c0 318 FUNC GLOBAL DEFAULT 10 main
|
|
132: 0804ca64 0 FUNC GLOBAL DEFAULT 11 _fini
|
|
133: 080484d4 337 FUNC GLOBAL DEFAULT UND atexit
|
|
134: 080484e4 121 FUNC GLOBAL DEFAULT UND scanf
|
|
135: 08050200 2208 OBJECT GLOBAL DEFAULT 13 xde_table
|
|
136: 08050b98 0 NOTYPE GLOBAL DEFAULT ABS _edata
|
|
137: 08050b68 0 OBJECT GLOBAL DEFAULT ABS _GLOBAL_OFFSET_TABLE_
|
|
138: 08051c78 0 NOTYPE GLOBAL DEFAULT ABS _end
|
|
139: 08049660 3421 FUNC GLOBAL DEFAULT 10 cast5_decrypt
|
|
140: 080484f4 101 FUNC GLOBAL DEFAULT UND exit
|
|
141: 08048900 3421 FUNC GLOBAL DEFAULT 10 cast5_encrypt
|
|
142: 08048504 0 FUNC GLOBAL DEFAULT UND strlen
|
|
143: 00000000 0 NOTYPE WEAK DEFAULT UND _Jv_RegisterClasses
|
|
144: 00000000 0 NOTYPE WEAK DEFAULT UND __register_frame_info
|
|
|
|
We see that function f1 has address 0x8048660 and size 75 = 0x4B.
|
|
Function f2 has address 0x80486B0 and size 58 = 3A. Simple calculation
|
|
shows that they are in fact consecutive in memory so we don't have to
|
|
encrypt them separately but in a single block ranging from 0x8048660 to
|
|
0x80486F0.
|
|
|
|
$ readelf -l test2
|
|
|
|
Elf file type is EXEC (Executable file)
|
|
Entry point 0x8048520
|
|
There are 6 program headers, starting at offset 52
|
|
|
|
Program Headers:
|
|
Type Offset VirtAddr PhysAddr FileSiz MemSiz
|
|
Flg Align
|
|
PHDR 0x000034 0x08048034 0x08048034 0x000c0 0x000c0 R E 0x4
|
|
INTERP 0x0000f4 0x080480f4 0x080480f4 0x00019 0x00019 R 0x1
|
|
[Requesting program interpreter: /usr/libexec/ld-elf.so.1]
|
|
LOAD 0x000000 0x08048000 0x08048000 0x06bed 0x06bed R E 0x1000
|
|
LOAD 0x006c00 0x0804fc00 0x0804fc00 0x00f98 0x02078 RW 0x1000
|
|
DYNAMIC 0x007aa4 0x08050aa4 0x08050aa4 0x000b0 0x000b0 RW 0x4
|
|
NOTE 0x000110 0x08048110 0x08048110 0x00018 0x00018 R 0x4
|
|
|
|
Section to Segment mapping:
|
|
Segment Sections...
|
|
00
|
|
01 .interp
|
|
02 .interp .note.ABI-tag .hash .dynsym .dynstr .rel.dyn .rel.plt
|
|
.init .plt .text .fini .rodata
|
|
03 .data .eh_frame .dynamic .ctors .dtors .jcr .got .bss
|
|
04 .dynamic
|
|
05 .note.ABI-tag
|
|
|
|
>From this we see that both addresses (0x8048660 and 0x80486F0) fall into
|
|
the first LOAD segment which is loaded at VirtAddr 0x804800 and is placed
|
|
at offset 0 in the file. Therefore, to map virtual address to file offset
|
|
we simply subtract 0x8048000 from each address giving 0x660 = 1632 and
|
|
0x6F0 = 1776.
|
|
|
|
If you obtain ELFsh [7] then you can make your life much easier. The
|
|
following transcript shows how ELFsh can be used to obtain the same
|
|
information:
|
|
|
|
$ elfsh
|
|
|
|
Welcome to The ELF shell 0.51b3 .::.
|
|
|
|
.::. This software is under the General Public License
|
|
.::. Please visit http://www.gnu.org to know about Free Software
|
|
|
|
[ELFsh-0.51b3]$ load test2
|
|
|
|
[*] New object test2 loaded on Mon Jun 13 20:45:33 2005
|
|
|
|
[ELFsh-0.51b3]$ sym f1
|
|
|
|
[SYMBOL TABLE]
|
|
[Object test2]
|
|
|
|
[059] 0x8048680 FUNCTION f1
|
|
size:0000000075 foffset:001632 scope:Local sctndx:10 => .text + 304
|
|
|
|
[ELFsh-0.51b3]$ sym f2
|
|
|
|
[SYMBOL TABLE]
|
|
[Object test2]
|
|
|
|
[060] 0x80486d0 FUNCTION f2
|
|
size:0000000058 foffset:001776 scope:Local sctndx:10 => .text + 384
|
|
|
|
[ELFsh-0.51b3]$ exit
|
|
|
|
[*] Unloading object 1 (test2) *
|
|
|
|
Good bye ! .::. The ELF shell 0.51b3
|
|
|
|
The field foffset gives the symbol offset within the executable, while
|
|
size is its size. Here all the numbers are decimal.
|
|
|
|
Now we are ready to encrypt a part of the executable with a very
|
|
'imaginative' password and then test the program:
|
|
|
|
$ echo -n "password" | openssl md5
|
|
5f4dcc3b5aa765d61d8327deb882cf99
|
|
$ ./cryptfile -e test2 5f4dcc3b5aa765d61d8327deb882cf99 1632 1776
|
|
$ chmod +x test2.crypt
|
|
$ ./test2.crypt
|
|
|
|
At the prompt enter the same hex string and then enter numbers 12 and
|
|
34 for a and b. The result must be 1662, and esp before and after must be
|
|
the same.
|
|
|
|
Once you are sure that the program works correctly, you can strip(1)
|
|
symbols from it.
|
|
|
|
|
|
----[ 3.4 - XDE bug
|
|
|
|
|
|
During the development, a I have found a bug in the XDE disassembler
|
|
engine: it didn't correctly handle the LOCK (0xF0) prefix. Because of the
|
|
bug XDE claimed that 0xF0 is a single-byte instruction. This is the
|
|
needed patch to correct the disassembler:
|
|
|
|
--- xde.c Sun Apr 11 02:52:30 2004
|
|
+++ xde_new.c Mon Aug 23 08:49:00 2004
|
|
@@ -101,6 +101,8 @@
|
|
if (c == 0xF0)
|
|
{
|
|
if (diza->p_lock != 0) flag |= C_BAD; /* twice */
|
|
+ diza->p_lock = c;
|
|
+ continue;
|
|
}
|
|
|
|
break;
|
|
|
|
I also needed to remove __cdecl on functions, a 'feature' of Win32 C
|
|
compilers not needed on UNIX platforms.
|
|
|
|
|
|
----[ 3.5 - Limitations
|
|
|
|
|
|
o XDE engine (probably) can't handle new instructions (SSE, MMX, etc.).
|
|
For certain it can't handle 3dNow! because they begin with 0x0F 0x0F,
|
|
a byte sequence for which the XDE claims is an invalid instruction
|
|
encoding.
|
|
|
|
o The tracer shares the same memory with the traced program. If the traced
|
|
program is so badly broken that it writes to (random) memory it doesn't
|
|
own, it can stumble upon and overwrite portions of the tracing routine.
|
|
|
|
o Each form of tracing has its own speed impacts. I didn't measure how
|
|
much this method slows down program execution (especially compared to
|
|
ptrace()).
|
|
|
|
o Doesn't handle even all 386 instructions (most notably far calls/jumps
|
|
and RET imm16). In this case the tracer stops with HLT which should
|
|
cause GPF under any OS that runs user processes in rings other than 0.
|
|
|
|
o The block size of 8 bytes is hardcoded in many places in the program.
|
|
The source (both C and ASM) should be parametrized by some kind of
|
|
BLOCKSIZE #define.
|
|
|
|
o The tracing routine is not reentrant! Meaning, any code being executed
|
|
by crypt_exec can't call again crypt_exec because it will overwrite its
|
|
own context!
|
|
|
|
o The code itself isn't optimal:
|
|
- identity_decrypt could use 4-byte moves.
|
|
- More registers could be used to minimize memory references.
|
|
|
|
|
|
----[ 3.6 - Porting considerations
|
|
|
|
|
|
This is as heavy as it gets - there isn't a single piece of machine-
|
|
independent code in the main routine that could be used on an another
|
|
processor architecture. I believe that porting shouldn't be too difficult,
|
|
mostly rewriting the mechanics of the current program. Some points to
|
|
watch out for include:
|
|
|
|
o Be sure to handle all control flow instructions.
|
|
|
|
o Move instructions could affect processor flags.
|
|
|
|
o Write a disassembly routine. Most RISC architectures have regular
|
|
instruction set and should be far easier to disassemble than x86 code.
|
|
|
|
o This is self-modifying code: flushing the instruction prefetch queue
|
|
might be needed.
|
|
|
|
o Handle delayed jumps and loads if the architecture provides them. This
|
|
could be tricky.
|
|
|
|
o You might need to get around page protections before calling the
|
|
decryptor (non-executable data segments).
|
|
|
|
Due to unavailability of non-x86 hardware I wasn't able to implement
|
|
the decryptor on another processor.
|
|
|
|
|
|
--[ 4 - Further ideas
|
|
|
|
|
|
o Better encryption scheme. ECB mode is bad, especially with
|
|
small block size of 8 bytes. Possible alternative is the following:
|
|
|
|
1. Round the traced_eip down to a multiple of 8 bytes.
|
|
2. Encrypt the result with the key.
|
|
3. Xor the result with the instruction bytes.
|
|
|
|
That way the encryption depends on the location in memory. Decryption
|
|
works the same way. However, it would complicate cryptfile.c program.
|
|
|
|
o Encrypted data. Devise a transparent (for the C programmer) way to
|
|
access the encrypted data. At least two approaches come to mind:
|
|
1) playing with page mappings and handling read/write faults,
|
|
or 2) use XDE to decode all accesses to memory and perform encryption
|
|
or decryption, depending on the type of access (read or write). The
|
|
first approach seems too slow (many context switches per data read)
|
|
to be practical.
|
|
|
|
o New instruction sets and architectures. Expand XDE to handle new x86
|
|
instructions. Port the routine to architectures other than i386 (first
|
|
comes to mind AMD64, then ARM, SPARC...).
|
|
|
|
o Perform decryption on the smart card. This is slow, but there is no
|
|
danger of key compromise.
|
|
|
|
o Polymorphic decryption engine.
|
|
|
|
|
|
----[ 5 - Related Work
|
|
|
|
|
|
This section gives a brief overview of existing work, either because of
|
|
similarity in coding techniques (ELFsh and tracing without ptrace) or
|
|
because of the code protection aspect.
|
|
|
|
|
|
5.1 ELFsh
|
|
---------
|
|
|
|
The ELFsh crew's article on elfsh and e2dbg [7], also in this Phrack
|
|
issue. A common point in our work is the approach to program tracing
|
|
without using ptrace(2). Their latest work is a scriptable embedded ELF
|
|
debugger, e2dbg. They are also getting around PaX protections, an issue I
|
|
didn't even take into account.
|
|
|
|
|
|
5.2 Shiva
|
|
---------
|
|
|
|
The Shiva binary encryptor [8], released in binary-only form. It tries
|
|
really hard to prevent reverse engineering by including features such as
|
|
trap flag detection, ptrace() defense, demand-mapped blocks (so that
|
|
fully decrpyted image can't be dumped via /proc), using int3 to emulate
|
|
some instructions, and by encryption in layers. The 2nd, password
|
|
protected layer, is optional and encrypted using 128-bit AES. Layer 3
|
|
encryption uses TEA, the tiny encryption algorithm.
|
|
|
|
According to the analysis in [9], "for sufficiently large programs, no
|
|
more than 1/3 of the program will be decrypted at any given time". This
|
|
is MUCH larger amount of decrypted program text than in my case: 24
|
|
bytes, independent of any external factors. Also, Shiva is heavily
|
|
tied to the ELF format, while my method is not tied to any operating
|
|
system or executable format (although the current code IS limited to
|
|
the 32-bit x86 architecture).
|
|
|
|
|
|
5.3 Burneye
|
|
-----------
|
|
|
|
There are actually two tools released by team-teso: burneye and burneye2
|
|
(objobf) [10].
|
|
|
|
Burneye is a powerful binary encryption tool. Similarly to Shiva, it has
|
|
three layers: 1) obfuscation, 2) password-based encryption using RC4 and
|
|
SHA1 (for generating the key from passphrase), and 3) the fingerprinting
|
|
layer.
|
|
|
|
The fingerprinting layer is the most interesting one: the data about the
|
|
target system is collected (e.g. amount of memory, etc..) and made into
|
|
a 'fingeprint'. The executable is encrypted taking the fingerprint into
|
|
account so that the resulting binary can be run only on the host with the
|
|
given fingerprint. There are two fingerprinting options:
|
|
|
|
o Fingeprint tolerance can be specified so that Small deviations are
|
|
allowed. That way, for example, the memory can be upgraded on the
|
|
target system and the executable will still work. If the number of
|
|
differences in the fingeprint is too large, the program won't work.
|
|
|
|
o Seal: the program produced with this option will run on any system.
|
|
However, the first time it is run, it creats a fingerprint of the
|
|
host and 'seals' itself to that host. The original seal binary is
|
|
securely deleted afterwards.
|
|
|
|
The encrypted binary can also be made to delete itself when a certain
|
|
environment variable is set during the program execution.
|
|
|
|
objobf is just relocatable object obfuscator. There is no encryption
|
|
layer. The input is an ordinary relocatable object and the output is
|
|
transformed, obfuscated, and functionally equivalent code. Code
|
|
transformations include: inserting junk instructions, randomizing the
|
|
order of basic blocks, and splitting basic blocks at random points.
|
|
|
|
|
|
5.4 Conclusion
|
|
--------------
|
|
|
|
Highlights of the distinguishing features of the code encryption
|
|
technique presented here:
|
|
|
|
o Very small amount of plaintext code in memory at any time - only 24
|
|
bytes. Other tools leave much more plain-text code in memory.
|
|
|
|
o No special loaders or executable format manipulations are needed. There
|
|
is one simple utility that encrypts the existing code in-place. It is
|
|
executable format-independent since its arguments are function offsets
|
|
within the executable (which map to function addresses in runtime).
|
|
|
|
o The code is tied to the 32-bit x86 architecture, however it should be
|
|
portable without changes to any operating system running on x86-32.
|
|
Special arrangements for setting up page protections may be necessary
|
|
if PaX or NX is in effect.
|
|
|
|
On the downside, the current version of the engine is very vulnerable
|
|
with respect to reverse-engineering. It can be easily recognized by
|
|
scanning for fixed sequences of instructions (the decryption routine).
|
|
Once the decryptor is located, it is easy to monitor a few fixed memory
|
|
addresses to obtain both the EIP and the original instruction residing at
|
|
that EIP. The key material data is easy to obtain, but this is the case
|
|
in ANY approach using in-memory keys.
|
|
|
|
However, the decryptor in its current form has one advantage: since it is
|
|
ordinary code that does no special tricks, it should be easy to combine
|
|
it with a tool that is more resilient to reverse-engineering, like Shiva
|
|
or Burneye.
|
|
|
|
|
|
----[ 6 - References
|
|
|
|
|
|
1. Phrack magazine.
|
|
http://www.phrack.org
|
|
|
|
2. ptrace tutorials:
|
|
http://linuxgazette.net/issue81/sandeep.html
|
|
http://linuxgazette.net/issue83/sandeep.html
|
|
http://linuxgazette.net/issue85/sandeep.html
|
|
|
|
3. D. E. Knuth: The Art of Computer Programming, vol.1: Fundamental
|
|
Algorithms.
|
|
|
|
4. Fenris.
|
|
http://lcamtuf.coredump.cx/fenris/whatis.shtml
|
|
|
|
5. XDE.
|
|
http://z0mbie.host.sk
|
|
|
|
6. Source code for described programs. The source I have written is
|
|
released under MIT license. Other files have different licenses. The
|
|
archive also contains a patched version of XDE.
|
|
http://www.core-dump.com.hr/software/cryptexec.tar.gz
|
|
|
|
7. ELFsh, the ELF shell. A powerful program for manipulating ELF files.
|
|
http://elfsh.devhell.org
|
|
|
|
8. Shiva binary encryptor.
|
|
http://www.securereality.com.au
|
|
|
|
9. Reverse Engineering Shiva.
|
|
http://blackhat.com/presentations/bh-federal-03/bh-federal-03-eagle/
|
|
bh-fed-03-eagle.pdf
|
|
|
|
10. Burneye and Burneye2 (objobf).
|
|
http://packetstormsecurity.org/groups/teso/indexsize.html
|
|
|
|
|
|
----[ 7 - Credits
|
|
|
|
Thanks go to mayhem who has reviewed this article. His suggestions were
|
|
very helpful, making the text much more mature than the original.
|
|
|
|
|
|
--[ A - Appendix: Source code
|
|
Here I'm providing only my own source code. The complete source package
|
|
can be obtained from [6]. It includes:
|
|
|
|
o All source listed here,
|
|
o the patched XDE disassembler, and
|
|
o the source of the CAST5 cryptographic algorithm.
|
|
|
|
|
|
----[ A.1 - The tracer source: crypt_exec.S
|
|
|
|
|
|
/*
|
|
Copyright (c) 2004 Zeljko Vrba
|
|
|
|
Permission is hereby granted, free of charge, to any person obtaining
|
|
a copy of this software and associated documentation files (the
|
|
"Software"), to deal in the Software without restriction, including
|
|
without limitation the rights to use, copy, modify, merge, publish,
|
|
distribute, sublicense, and/or sell copies of the Software, and to permit
|
|
persons to whom the Software is furnished to do so, subject to the
|
|
following conditions:
|
|
|
|
The above copyright notice and this permission notice shall be included
|
|
in all copies or substantial portions of the Software.
|
|
|
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
|
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
|
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
|
|
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
|
|
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT
|
|
OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
|
|
THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
|
*/
|
|
|
|
.text
|
|
|
|
/************************************************************************
|
|
* void *crypt_exec(
|
|
* decrypt_fn_ptr dfn, const void *key,
|
|
* const void *lo_addr, const void *hi_addr,
|
|
* const void *addr, ...)
|
|
* typedef (*decrypt_fn_ptr)(
|
|
* void *key, unsigned char *dst, const unsigned char *src);
|
|
*
|
|
* - dfn is pointer to deccryption function
|
|
* - key is pointer to crypt routine key data
|
|
* - addr is the addres where execution should begin. due to the way the
|
|
* code is decrypted and executed, it MUST be aligned to 8 (BLOCKSIZE)
|
|
* bytes!!
|
|
* - the rest are arguments to called function
|
|
*
|
|
* The crypt_exec stops when the stack pointer becomes equal to what it
|
|
* was on entry, and executing 'ret' would cause the called function to
|
|
* exit. This works assuming normal C compiled code.
|
|
*
|
|
* Returns the value the function would normally return.
|
|
*
|
|
* This code calls:
|
|
* int xde_disasm(unsigned char *ip, struct xde_instr *outbuf);
|
|
* XDE disassembler engine is compiled and used with PACKED structure!
|
|
*
|
|
* It is assumed that the encryption algorithm uses 64-bit block size.
|
|
* Very good protection could be done if decryption is executed on the
|
|
* SMART CARD.
|
|
*
|
|
* Some terminology:
|
|
* 'Traced' refers to the original program being executed instruction by
|
|
* instruction. The technique used resembles Knuth's tracing routine (and
|
|
* indeed, we get true tracing when decryption is dropped).
|
|
*
|
|
* 'Our' refers to our data stack, etc.
|
|
*
|
|
* TODOs and limitations:
|
|
* - some instructions are not emulated (FAR CALL/JMP/RET, RET NEAR imm16)
|
|
* - LOOP* and JCXZ opcodes haven't been tested
|
|
* - _jcc_rel32 has been tested only indirectly by _jcc_rel8
|
|
***********************************************************************/
|
|
|
|
/*
|
|
Offsets into xde_instr struct.
|
|
*/
|
|
#define OPCODE 23
|
|
#define OPCODE2 24
|
|
#define MODRM 25
|
|
|
|
/*
|
|
Set up our stack and save traced context. The context is saved at the end
|
|
of our stack.
|
|
*/
|
|
#define SAVE_TRACED_CONTEXT \
|
|
movl %esp, traced_esp ;\
|
|
movl $stk_end, %esp ;\
|
|
pusha ;\
|
|
pushf
|
|
|
|
/*
|
|
Restore traced context from the current top of stack. After that restores
|
|
traced stack pointer.
|
|
*/
|
|
#define RESTORE_TRACED_CONTEXT \
|
|
popf ;\
|
|
popa ;\
|
|
movl traced_esp, %esp
|
|
|
|
/*
|
|
Identity decryption routine. This just copies 8 bytes (BLOCKSIZE) from
|
|
source to destination. Has normal C calling convention. Is not global.
|
|
*/
|
|
identity_decrypt:
|
|
movl 8(%esp), %edi /* destination address */
|
|
movl 12(%esp), %esi /* source address */
|
|
movl $8, %ecx /* 8 bytes */
|
|
cld
|
|
rep movsb
|
|
ret
|
|
|
|
crypt_exec:
|
|
.globl crypt_exec
|
|
.extern disasm
|
|
|
|
/*
|
|
Fetch all arguments. We are called from C and not expected to save
|
|
registers. This is the stack on entry:
|
|
[ ret_addr dfn key lo_addr hi_addr addr ...args ]
|
|
*/
|
|
popl %eax /* return address */
|
|
popl r_decrypt /* real decryption function pointer */
|
|
popl key /* encryption key */
|
|
popl lo_addr /* low traced eip */
|
|
popl hi_addr /* high traced eip */
|
|
popl traced_eip /* eip to start tracing */
|
|
pushl %eax /* put return addr to stack again */
|
|
|
|
/*
|
|
now the stack frame resembles as if inner function (starting at
|
|
traced_eip) were called by normal C calling convention (after return
|
|
address, the vararg arguments folow)
|
|
*/
|
|
movl %esp, end_esp /* this is used to stop tracing. */
|
|
movl $0, traced_ctr /* reset counter of insns to 0 */
|
|
|
|
decryptloop:
|
|
/*
|
|
This loop traces a single instruction.
|
|
|
|
The CONTEXT at the start of each iteration:
|
|
traced_eip: points to the next instruction in traced program
|
|
|
|
First what we ever do is switch to our own stack and store the traced
|
|
program's registers including eflags.
|
|
|
|
Instructions are encrypted in ECB mode in blocks of 8 bytes.
|
|
Therefore, we always must start decryption at the lower 8-byte
|
|
boundary. The total of three blocks (24) bytes are decrypted for one
|
|
instruction. This is due to alignment and maximum instruction length
|
|
constraints: if the instruction begins at addres that is congruent
|
|
to 7 mod 8 + 16 bytes maximum length (given some slack) gives
|
|
instruction span of three blocks.
|
|
|
|
Yeah, I know ECB sucks, but this is currently just a proof-of
|
|
concept. Design something better for yourself if you need it.
|
|
*/
|
|
SAVE_TRACED_CONTEXT
|
|
|
|
decryptloop_nocontext:
|
|
/*
|
|
This loop entry point does not save traced context. It is used from
|
|
control transfer instruction emulation where we doall work ourselves
|
|
and don't use traced context.
|
|
|
|
The CONTEXT upon entry is the same as for decryptloop.
|
|
|
|
First decide whether to decrypt or just trace the plaintext code.
|
|
*/
|
|
movl traced_eip, %eax
|
|
movl $identity_decrypt, %ebx /* assume no decryption */
|
|
cmpl lo_addr, %eax
|
|
jb .store_decrypt_ptr /* traced_eip < lo_addr */
|
|
cmpl hi_addr, %eax
|
|
ja .store_decrypt_ptr /* traced_eip > hi_addr */
|
|
movl r_decrypt, %ebx /* in bounds, do decryption */
|
|
.store_decrypt_ptr:
|
|
movl %ebx, decrypt
|
|
|
|
/*
|
|
Decrypt three blocks starting at eax, reusing arguments on the stack
|
|
for the total of 3 calls. WARNING! For this to work properly, the
|
|
decryption function MUST NOT modify its arguments!
|
|
*/
|
|
andl $-8, %eax /* round down traced_eip to 8 bytes */
|
|
pushl %eax /* src buffer */
|
|
pushl $insn /* dst buffer */
|
|
pushl key /* key data pointer */
|
|
call *decrypt /* 1st block */
|
|
addl $8, 4(%esp) /* advance dst */
|
|
addl $8, 8(%esp) /* advance src */
|
|
call *decrypt /* 2nd block */
|
|
addl $8, 4(%esp) /* advance dst */
|
|
addl $8, 8(%esp) /* advance src */
|
|
call *decrypt /* 3rd block */
|
|
addl $12, %esp /* clear args from stack */
|
|
|
|
/*
|
|
Obtain the real start of instruction in the decrypted buffer. The
|
|
traced eip is taken modulo blocksize (8) and added to the start
|
|
address of decrypted buffer. Then XDE is called (standard C calling
|
|
convention) to get necessary information about the instruction.
|
|
*/
|
|
movl traced_eip, %eax
|
|
andl $7, %eax /* traced_eip mod 8 */
|
|
addl $insn, %eax /* offset within decrypted buffer */
|
|
pushl $disbuf /* address to disassemble into */
|
|
pushl %eax /* insn offset to disassemble */
|
|
call xde_disasm /* disassemble and return len */
|
|
movl %eax, ilen /* store instruction length */
|
|
popl %eax /* decrypted insn start */
|
|
popl %ebx /* clear remaining arg from stack */
|
|
|
|
/*
|
|
Calculate the offset in control table of the instruction handling
|
|
routine. Non-control transfer instructions are just executed in
|
|
traced context, other instructions are emulated.
|
|
|
|
Before executing the instruction, the traced eip is advanced by
|
|
instruction length, and the number of executed instructions is
|
|
incremented. We also append indirect 'jmp *continue' after the
|
|
instruction, to continue execution at appropriate place in our
|
|
tracing. The JMP indirect opcodes are 0xFF 0x25.
|
|
*/
|
|
movl ilen, %ebx
|
|
addl %ebx, traced_eip /* advance traced eip */
|
|
incl traced_ctr /* increment counter */
|
|
movw $0x25FF, (%eax, %ebx) /* JMP indirect; little-endian! */
|
|
movl $continue, 2(%eax, %ebx) /* store address */
|
|
movzbl OPCODE+disbuf, %esi /* load instruction byte */
|
|
jmp *control_table(,%esi,4) /* execute by appropirate handler */
|
|
|
|
.data
|
|
/*
|
|
Emulation routines start here. They are in data segment because code
|
|
segment isn't writable and we are modifying our own code. We don't
|
|
want yet to mess around with mprotect(). One day (non-exec page table
|
|
support on x86-64) it will have to be done anyway..
|
|
|
|
The CONTEXT upon entry on each emulation routine:
|
|
eax : start of decrypted (CURRENT) insn addr to execute
|
|
ilen : instruction length in bytes
|
|
stack top -> [traced: eflags edi esi ebp esp ebx edx ecx eax]
|
|
traced_esp : original program's esp
|
|
traced_eip : eip of next insn to execute (NOT of CURRENT insn!)
|
|
*/
|
|
|
|
_unhandled:
|
|
/*
|
|
Unhandled opcodes not normally generated by compiler. Once proper
|
|
emulation routine is written, they become handled :)
|
|
|
|
Executing privileged instruction, such as HLT, is the easiest way to
|
|
terminate the program. %eax holds the address of the instruction we
|
|
were trying to trace so it can be observed from debugger.
|
|
*/
|
|
hlt
|
|
|
|
_nonjump:
|
|
/*
|
|
Common emulation for all non-control transfer instructions.
|
|
Instruction buffer (insn) is already filled with decrypted blocks.
|
|
|
|
Decrypted instruction can begin in the middle of insn buffer, so the
|
|
relative jmp instruction is adjusted to jump to the traced insn,
|
|
skipping 'junk' at the beginning of insn.
|
|
|
|
When the instruction is executed, our execution continues at location
|
|
where 'continue' points to. Normally, this is decryptloop, but
|
|
occasionaly it is temporarily changed (e.g. in _grp5).
|
|
*/
|
|
subl $insn, %eax /* insn begin within insn buffer */
|
|
movb %al, .execute+1 /* update jmp instruction */
|
|
RESTORE_TRACED_CONTEXT
|
|
.execute:
|
|
jmp insn /* relative, only offset adjusted */
|
|
insn:
|
|
.fill 32, 1, 0x90
|
|
|
|
_jcc_rel8:
|
|
/*
|
|
Relative 8-bit displacement conditional jump. It is handled by
|
|
relative 32-bit displacement jump, once offset is adjusted. Opcode
|
|
must also be adjusted: short jumps are 0x70-0x7F, long jumps are 0x0F
|
|
0x80-0x8F. (conditions correspond directly). Converting short to long
|
|
jump needs adding 0x10 to 2nd opcode.
|
|
*/
|
|
movsbl 1(%eax), %ebx /* load sign-extended offset */
|
|
movb (%eax), %cl /* load instruction */
|
|
addb $0x10, %cl /* adjust opcode to long form */
|
|
/* drop processing to _jcc_rel32 as 32-bit displacement */
|
|
|
|
_jcc_rel32:
|
|
/*
|
|
Emulate 32-bit conditional relative jump. We pop the traced flags,
|
|
let the Jcc instruction execute natively, and then adjust traced eip
|
|
ourselves, depending whether Jcc was taken or not.
|
|
|
|
CONTEXT:
|
|
ebx: jump offset, sign-extended to 32 bits
|
|
cl : real 2nd opcode of the instruction (1st is 0x0F escape)
|
|
*/
|
|
movb %cl, ._jcc_rel32_insn+1 /* store opcode to instruction */
|
|
popf /* restore traced flags */
|
|
|
|
._jcc_rel32_insn:
|
|
/*
|
|
Explicit coding of 32-bit relative conditional jump. It is executed
|
|
with the traced flags. Also the jump offset (32 bit) is supplied.
|
|
*/
|
|
.byte 0x0F, 0x80
|
|
.long ._jcc_rel32_true - ._jcc_rel32_false
|
|
|
|
._jcc_rel32_false:
|
|
/*
|
|
The Jcc condition was false. Just save traced flags and continue to
|
|
next instruction.
|
|
*/
|
|
pushf
|
|
jmp decryptloop_nocontext
|
|
|
|
._jcc_rel32_true:
|
|
/*
|
|
The Jcc condition was true. Traced flags are saved, and then the
|
|
execution falls through to the common eip offset-adjusting routine.
|
|
*/
|
|
pushf
|
|
|
|
rel_offset_fixup:
|
|
/*
|
|
Common entry point to fix up traced eip for relative control-flow
|
|
instructions.
|
|
|
|
CONTEXT:
|
|
traced_eip: already advanced to the would-be next instruction. this
|
|
is done in decrypt_loop before transferring control to
|
|
any insn-handler.
|
|
ebx : sign-extended 32-bit offset to add to eip
|
|
*/
|
|
addl %ebx, traced_eip
|
|
jmp decryptloop_nocontext
|
|
|
|
_retn:
|
|
/*
|
|
Near return (without imm16). This is the place where the end-of
|
|
trace condition is checked. If, at this point, esp equals end_esp,
|
|
this means that the crypt_exec would return to its caller.
|
|
*/
|
|
movl traced_esp, %ebp /* compare curr traced esp to esp */
|
|
cmpl %ebp, end_esp /* when crypt_exec caller's return */
|
|
je ._endtrace /* address was on top of the stack */
|
|
|
|
/*
|
|
Not equal, emulate ret.
|
|
*/
|
|
movl %esp, %ebp /* save our current stack */
|
|
movl traced_esp, %esp /* get traced stack */
|
|
popl traced_eip /* pop return address */
|
|
movl %esp, traced_esp /* write back traced stack */
|
|
movl %ebp, %esp /* restore our current stack */
|
|
jmp decryptloop_nocontext
|
|
|
|
._endtrace:
|
|
/*
|
|
Here the traced context is completely restored and RET is executed
|
|
natively. Our tracing routine is no longer in control after RET.
|
|
Regarding C calling convention, the caller of crypt_exec will get
|
|
the return value of traced function.
|
|
|
|
One detail we must watch for: the stack now looks like this:
|
|
|
|
stack top -> [ ret_addr ...args ]
|
|
|
|
but we have been called like this:
|
|
|
|
stack top -> [ ret_addr dfn key lo_addr hi_addr addr ...args ]
|
|
|
|
and this is what compiler expects when popping arg list. So we must
|
|
fix the stack. The stack pointer can be just adjusted by -20 instead
|
|
of reconstructing the previous state because C functions are free to
|
|
modify their arguments.
|
|
|
|
CONTEXT:
|
|
ebp: current traced esp
|
|
*/
|
|
movl (%ebp), %ebx /* return address */
|
|
subl $20, %ebp /* fake 5 extra args */
|
|
movl %ebx, (%ebp) /* put ret addr on top of stack */
|
|
movl %ebp, traced_esp /* store adjusted stack */
|
|
RESTORE_TRACED_CONTEXT
|
|
ret /* return without regaining control */
|
|
|
|
/*
|
|
LOOPNE, LOOPE and LOOP instructions are executed from the common
|
|
handler (_doloop). Only the instruction opcode is written from
|
|
separate handlers.
|
|
|
|
28 is the offset of traced ecx register that is saved on our stack.
|
|
*/
|
|
_loopne:
|
|
movb $0xE0, ._loop_insn /* loopne opcode */
|
|
jmp ._doloop
|
|
_loope:
|
|
movb $0xE1, ._loop_insn /* loope opcode */
|
|
jmp ._doloop
|
|
_loop:
|
|
movb $0xE2, ._loop_insn /* loop opcode */
|
|
._doloop:
|
|
/*
|
|
* Get traced context that is relevant for LOOP* execution: signed
|
|
* offset, traced ecx and traced flags.
|
|
*/
|
|
movsbl 1(%eax), %ebx
|
|
movl 28(%esp), %ecx
|
|
popf
|
|
|
|
._loop_insn:
|
|
/*
|
|
Explicit coding of loop instruction and offset.
|
|
*/
|
|
.byte 0xE0 /* LOOP* opcodes: E0, E1, E2 */
|
|
.byte ._loop_insn_true - ._loop_insn_false
|
|
|
|
._loop_insn_false:
|
|
/*
|
|
LOOP* condition false. Save only modified context (flags and ecx)
|
|
and continue tracing.
|
|
*/
|
|
pushf
|
|
movl %ecx, 28(%esp)
|
|
jmp decryptloop_nocontext
|
|
|
|
._loop_insn_true:
|
|
/*
|
|
LOOP* condition true. Save only modified context, and jump to the
|
|
rel_offset_fixup to fix up traced eip.
|
|
*/
|
|
pushf
|
|
movl %ecx, 28(%esp)
|
|
jmp rel_offset_fixup
|
|
|
|
_jcxz:
|
|
/*
|
|
JCXZ. This is easier to simulate than to natively execute.
|
|
*/
|
|
movsbl 1(%eax), %ebx /* get signed offset */
|
|
cmpl $0, 28(%esp) /* test traced ecx for 0 */
|
|
jz rel_offset_fixup /* if so, fix up traced EIP */
|
|
jmp decryptloop_nocontext
|
|
|
|
_callrel:
|
|
/*
|
|
Relative CALL.
|
|
*/
|
|
movb $1, %cl /* 1 to indicates relative call */
|
|
movl 1(%eax), %ebx /* get offset */
|
|
|
|
_call:
|
|
/*
|
|
CALL emulation.
|
|
|
|
CONTEXT:
|
|
cl : relative/absolute indicator.
|
|
ebx: absolute address (cl==0) or relative offset (cl!=0).
|
|
*/
|
|
movl %esp, %ebp /* save our stack */
|
|
movl traced_esp, %esp /* push traced eip onto */
|
|
pushl traced_eip /* traced stack */
|
|
movl %esp, traced_esp /* write back traced stack */
|
|
movl %ebp, %esp /* restore our stack */
|
|
testb %cl, %cl /* if not zero, then it is a */
|
|
jnz rel_offset_fixup /* relative call */
|
|
movl %ebx, traced_eip /* store dst eip */
|
|
jmp decryptloop_nocontext /* continue execution */
|
|
|
|
_jmp_rel8:
|
|
/*
|
|
Relative 8-bit displacement JMP.
|
|
*/
|
|
movsbl 1(%eax), %ebx /* get signed offset */
|
|
jmp rel_offset_fixup
|
|
|
|
_jmp_rel32:
|
|
/*
|
|
Relative 32-bit displacement JMP.
|
|
*/
|
|
movl 1(%eax), %ebx /* get offset */
|
|
jmp rel_offset_fixup
|
|
|
|
_grp5:
|
|
/*
|
|
This is the case for 0xFF opcode which escapes to GRP5: the real
|
|
instruction opcode is hidden in bits 5, 4, and 3 of the modR/M byte.
|
|
*/
|
|
movb MODRM+disbuf, %bl /* get modRM byte */
|
|
shr $3, %bl /* shift bits 3-5 to 0-2 */
|
|
andb $7, %bl /* and test only bits 0-2 */
|
|
cmpb $2, %bl /* < 2, not control transfer */
|
|
jb _nonjump
|
|
cmpb $5, %bl /* > 5, not control transfer */
|
|
ja _nonjump
|
|
cmpb $3, %bl /* CALL FAR */
|
|
je _unhandled
|
|
cmpb $5, %bl /* JMP FAR */
|
|
je _unhandled
|
|
movb %bl, %dl /* for future reference */
|
|
|
|
/*
|
|
modR/M equals 2 or 4 (near CALL or JMP).
|
|
In this case the reg field of modR/M (bits 3-5) is the part of
|
|
instruction opcode.
|
|
|
|
Replace instruction byte 0xFF with 0x8B (MOV r/m32 to reg32 opcode).
|
|
Replace reg field with 3 (ebx register index).
|
|
*/
|
|
movb $0x8B, (%eax) /* replace with MOV_to_reg32 opcode */
|
|
movb 1(%eax), %bl /* get modR/M byte */
|
|
andb $0xC7, %bl /* mask bits 3-5 */
|
|
orb $0x18, %bl /* set them to 011=3: ebx reg index */
|
|
movb %bl, 1(%eax) /* set MOV target to ebx */
|
|
|
|
/*
|
|
We temporarily update continue location to continue execution in
|
|
this code instead of jumping to decryptloop. We execute MOV in TRACED
|
|
context because it must use traced registers for address calculation.
|
|
Before that we save OUR esp so that original TRACED context isn't
|
|
lost (MOV updates ebx, traced CALL wouldn't mess with any registers).
|
|
|
|
First we save OUR context, but after that we must restore TRACED ctx.
|
|
In order to do that, we must adjust esp to point to traced context
|
|
before restoration.
|
|
*/
|
|
movl $._grp5_continue, continue
|
|
movl %esp, %ebp /* save traced context pointer into ebp */
|
|
pusha /* store our context; eflags irrelevant */
|
|
movl %esp, our_esp /* our context pointer */
|
|
movl %ebp, %esp /* adjust traced context pointer */
|
|
jmp _nonjump
|
|
|
|
._grp5_continue:
|
|
/*
|
|
This is where execution continues after MOV calculates effective
|
|
address for us.
|
|
|
|
CONTEXT upon entry:
|
|
ebx: target address where traced execution should continue
|
|
dl : opcode part (bits 3-5) of modR/M, shifted to bits 0-2
|
|
*/
|
|
movl $decryptloop, continue /* restore continue location */
|
|
movl our_esp, %esp /* restore our esp */
|
|
movl %ebx, 16(%esp) /* so that ebx is restored anew */
|
|
popa /* our context along with new ebx */
|
|
cmpb $2, %dl /* CALL near indirect */
|
|
je ._grp5_call
|
|
movl %ebx, traced_eip /* JMP near indirect */
|
|
jmp decryptloop_nocontext
|
|
._grp5_call:
|
|
xorb %cl, %cl /* mark: addr in ebx is absolute */
|
|
jmp _call
|
|
|
|
_0xf:
|
|
/*
|
|
0x0F opcode esacpe for two-byte opcodes. Only 0F 0x80-0x8F range are
|
|
Jcc rel32 instructions. Others are normal instructions.
|
|
*/
|
|
movb OPCODE2+disbuf, %cl /* extended opcode */
|
|
cmpb $0x80, %cl
|
|
jb _nonjump /* < 0x80, not Jcc */
|
|
cmpb $0x8F, %cl
|
|
ja _nonjump /* > 0x8F, not Jcc */
|
|
movl 2(%eax), %ebx /* load 32-bit offset */
|
|
jmp _jcc_rel32
|
|
|
|
control_table:
|
|
/*
|
|
This is the jump table for instruction execution dispatch. When the
|
|
real opcode of the instruction is found, the tracer jumps indirectly
|
|
to execution routine based on this table.
|
|
*/
|
|
.rept 0x0F /* 0x00 - 0x0E */
|
|
.long _nonjump /* normal opcodes */
|
|
.endr
|
|
.long _0xf /* 0x0F two-byte escape */
|
|
|
|
.rept 0x60 /* 0x10 - 0x6F */
|
|
.long _nonjump /* normal opcodes */
|
|
.endr
|
|
|
|
.rept 0x10 /* 0x70 - 0x7F */
|
|
.long _jcc_rel8 /* relative 8-bit displacement */
|
|
.endr
|
|
|
|
.rept 0x10 /* 0x80 - 0x8F */
|
|
.long _nonjump /* long displ jump handled from */
|
|
.endr /* _0xf opcode escape */
|
|
|
|
.rept 0x0A /* 0x90 - 0x99 */
|
|
.long _nonjump
|
|
.endr
|
|
.long _unhandled /* 0x9A: far call to full pointer */
|
|
.rept 0x05 /* 0x9B - 0x9F */
|
|
.long _nonjump
|
|
.endr
|
|
|
|
.rept 0x20 /* 0xA0 - 0xBF */
|
|
.long _nonjump
|
|
.endr
|
|
|
|
.long _nonjump, _nonjump /* 0xC0, 0xC1 */
|
|
.long _unhandled /* 0xC2: retn imm16 */
|
|
.long _retn /* 0xC3: retn */
|
|
.rept 0x06 /* 0xC4 - 0xC9 */
|
|
.long _nonjump
|
|
.endr
|
|
.long _unhandled, _unhandled /* 0xCA, 0xCB : far ret */
|
|
.rept 0x04
|
|
.long _nonjump
|
|
.endr
|
|
|
|
.rept 0x10 /* 0xD0 - 0xDF */
|
|
.long _nonjump
|
|
.endr
|
|
|
|
.long _loopne, _loope /* 0xE0, 0xE1 */
|
|
.long _loop, _jcxz /* 0xE2, 0xE3 */
|
|
.rept 0x04 /* 0xE4 - 0xE7 */
|
|
.long _nonjump
|
|
.endr
|
|
.long _callrel /* 0xE8 */
|
|
.long _jmp_rel32 /* 0xE9 */
|
|
.long _unhandled /* far jump to full pointer */
|
|
.long _jmp_rel8 /* 0xEB */
|
|
.rept 0x04 /* 0xEC - 0xEF */
|
|
.long _nonjump
|
|
.endr
|
|
|
|
.rept 0x0F /* 0xF0 - 0xFE */
|
|
.long _nonjump
|
|
.endr
|
|
.long _grp5 /* 0xFF: group 5 instructions */
|
|
|
|
.data
|
|
continue: .long decryptloop /* where to continue after 1 insn */
|
|
|
|
.bss
|
|
.align 4
|
|
traced_esp: .long 0 /* traced esp */
|
|
traced_eip: .long 0 /* traced eip */
|
|
traced_ctr: .long 0 /* incremented by 1 for each insn */
|
|
lo_addr: .long 0 /* low encrypted eip */
|
|
hi_addr: .long 0 /* high encrypted eip */
|
|
our_esp: .long 0 /* our esp... */
|
|
end_esp: .long 0 /* esp when we should stop tracing */
|
|
local_stk: .fill 1024, 4, 0 /* local stack space (to call C) */
|
|
stk_end = . /* we need this.. */
|
|
ilen: .long 0 /* instruction length */
|
|
key: .long 0 /* pointer to key data */
|
|
decrypt: .long 0 /* USED decryption function */
|
|
r_decrypt: .long 0 /* REAL decryption function */
|
|
disbuf: .fill 128, 1, 0 /* xde disassembly buffer */
|
|
|
|
|
|
|
|
----[ A.2 - The file encryption utility source: cryptfile.c
|
|
|
|
|
|
|
|
/*
|
|
Copyright (c) 2004 Zeljko Vrba
|
|
|
|
Permission is hereby granted, free of charge, to any person obtaining
|
|
a copy of this software and associated documentation files (the
|
|
"Software"), to deal in the Software without restriction, including
|
|
without limitation the rights to use, copy, modify, merge, publish,
|
|
distribute, sublicense, and/or sell copies of the Software, and to permit
|
|
persons to whom the Software is furnished to do so, subject to the
|
|
following conditions:
|
|
|
|
The above copyright notice and this permission notice shall be included
|
|
in all copies or substantial portions of the Software.
|
|
|
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
|
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
|
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
|
|
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
|
|
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT
|
|
OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
|
|
THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
|
*/
|
|
|
|
/*
|
|
* This program encrypts a portion of the file, writing new file with
|
|
* .crypt appended. The permissions (execute, et al) are NOT preserved!
|
|
* The blocksize of 8 bytes is hardcoded.
|
|
*/
|
|
#include <stdio.h>
|
|
#include <stdlib.h>
|
|
#include <string.h>
|
|
#include <errno.h>
|
|
#include "cast5.h"
|
|
|
|
#define BLOCKSIZE 8
|
|
#define KEYSIZE 16
|
|
|
|
typedef void (*cryptblock_f)(void*, u8*, const u8*);
|
|
|
|
static unsigned char *decode_hex_key(char *hex)
|
|
{
|
|
static unsigned char key[KEYSIZE];
|
|
int i;
|
|
|
|
if(strlen(hex) != KEYSIZE << 1) {
|
|
fprintf(stderr, "KEY must have EXACTLY %d hex digits.\n",
|
|
KEYSIZE << 1);
|
|
exit(1);
|
|
}
|
|
|
|
for(i = 0; i < KEYSIZE; i++, hex += 2) {
|
|
unsigned int x;
|
|
char old = hex[2];
|
|
|
|
hex[2] = 0;
|
|
if(sscanf(hex, "%02x", &x) != 1) {
|
|
fprintf(stderr, "non-hex digit in KEY.\n");
|
|
exit(1);
|
|
}
|
|
hex[2] = old;
|
|
key[i] = x;
|
|
}
|
|
|
|
return key;
|
|
}
|
|
|
|
static void *docrypt(
|
|
FILE *in, FILE *out,
|
|
long startoff, long endoff,
|
|
cryptblock_f crypt, void *ctx)
|
|
{
|
|
char buf[BLOCKSIZE], enc[BLOCKSIZE];
|
|
long curroff = 0;
|
|
size_t nread = 0;
|
|
|
|
while((nread = fread(buf, 1, BLOCKSIZE, in)) > 0) {
|
|
long diff = startoff - curroff;
|
|
|
|
if((diff < BLOCKSIZE) && (diff > 0)) {
|
|
/*
|
|
this handles the following mis-alignment (each . is 1 byte)
|
|
...[..|......]....
|
|
^ ^ ^ curoff+BLOCKSIZE
|
|
| startoff
|
|
curroff
|
|
*/
|
|
if(fwrite(buf, 1, diff, out) < diff) {
|
|
perror("fwrite");
|
|
exit(1);
|
|
}
|
|
memmove(buf, buf + diff, BLOCKSIZE - diff);
|
|
fread(buf + BLOCKSIZE - diff, 1, diff, in);
|
|
curroff = startoff;
|
|
}
|
|
|
|
if((curroff >= startoff) && (curroff < endoff)) {
|
|
crypt(ctx, enc, buf);
|
|
} else {
|
|
memcpy(enc, buf, BLOCKSIZE);
|
|
}
|
|
if(fwrite(enc, 1, nread, out) < nread) {
|
|
perror("fwrite");
|
|
exit(1);
|
|
}
|
|
curroff += nread;
|
|
}
|
|
}
|
|
|
|
int main(int argc, char **argv)
|
|
{
|
|
FILE *in, *out;
|
|
long startoff, endoff;
|
|
char outfname[256];
|
|
unsigned char *key;
|
|
struct cast5_ctx ctx;
|
|
cryptblock_f mode;
|
|
|
|
if(argc != 6) {
|
|
fprintf(stderr, "USAGE: %s <-e|-d> FILE KEY STARTOFF ENDOFF\n",
|
|
argv[0]);
|
|
fprintf(stderr, "KEY MUST be 32 hex digits (128 bits).\n");
|
|
return 1;
|
|
}
|
|
|
|
if(!strcmp(argv[1], "-e")) {
|
|
mode = cast5_encrypt;
|
|
} else if(!strcmp(argv[1], "-d")) {
|
|
mode = cast5_decrypt;
|
|
} else {
|
|
fprintf(stderr, "invalid mode (must be either -e od -d)\n");
|
|
return 1;
|
|
}
|
|
|
|
startoff = atol(argv[4]);
|
|
endoff = atol(argv[5]);
|
|
key = decode_hex_key(argv[3]);
|
|
|
|
if(cast5_setkey(&ctx, key, KEYSIZE) < 0) {
|
|
fprintf(stderr, "error setting key (maybe invalid length)\n");
|
|
return 1;
|
|
}
|
|
|
|
if((endoff - startoff) & (BLOCKSIZE-1)) {
|
|
fprintf(stderr, "STARTOFF and ENDOFF must span an exact multiple"
|
|
" of %d bytes\n", BLOCKSIZE);
|
|
return 1;
|
|
}
|
|
if((endoff - startoff) < BLOCKSIZE) {
|
|
fprintf(stderr, "STARTOFF and ENDOFF must span at least"
|
|
" %d bytes\n", BLOCKSIZE);
|
|
return 1;
|
|
}
|
|
|
|
sprintf(outfname, "%s.crypt", argv[2]);
|
|
if(!(in = fopen(argv[2], "r"))) {
|
|
fprintf(stderr, "fopen(%s): %s\n", argv[2], strerror(errno));
|
|
return 1;
|
|
}
|
|
if(!(out = fopen(outfname, "w"))) {
|
|
fprintf(stderr, "fopen(%s): %s\n", outfname, strerror(errno));
|
|
return 1;
|
|
}
|
|
|
|
docrypt(in, out, startoff, endoff, mode, &ctx);
|
|
|
|
fclose(in);
|
|
fclose(out);
|
|
return 0;
|
|
}
|
|
|
|
|
|
----[ A.3 - The test program: test2.c
|
|
|
|
|
|
/*
|
|
Copyright (c) 2004 Zeljko Vrba
|
|
|
|
Permission is hereby granted, free of charge, to any person obtaining
|
|
a copy of this software and associated documentation files (the
|
|
"Software"), to deal in the Software without restriction, including
|
|
without limitation the rights to use, copy, modify, merge, publish,
|
|
distribute, sublicense, and/or sell copies of the Software, and to permit
|
|
persons to whom the Software is furnished to do so, subject to the
|
|
following conditions:
|
|
|
|
The above copyright notice and this permission notice shall be included
|
|
in all copies or substantial portions of the Software.
|
|
|
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
|
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
|
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
|
|
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
|
|
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT
|
|
OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
|
|
THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
|
*/
|
|
|
|
#include <stdio.h>
|
|
#include <stdlib.h>
|
|
#include <string.h>
|
|
#include <unistd.h>
|
|
#include "cast5.h"
|
|
|
|
#define BLOCKSIZE 8
|
|
#define KEYSIZE 16
|
|
|
|
/*
|
|
* f1 and f2 are encrypted with the following 128-bit key:
|
|
* 5f4dcc3b5aa765d61d8327deb882cf99 (MD5 of the string 'password')
|
|
*/
|
|
|
|
static int f1(int a)
|
|
{
|
|
int i, s = 0;
|
|
|
|
for(i = 0; i < a; i++) {
|
|
s += i*i;
|
|
}
|
|
printf("called plaintext code: f1 = %d\n", a);
|
|
return s;
|
|
}
|
|
|
|
static int f2(int a, int b)
|
|
{
|
|
int i;
|
|
|
|
a = f1(a);
|
|
for(i = 0; i < b; i++) {
|
|
a += b;
|
|
}
|
|
return a;
|
|
}
|
|
|
|
static unsigned char *decode_hex_key(char *hex)
|
|
{
|
|
static unsigned char key[KEYSIZE];
|
|
int i;
|
|
|
|
if(strlen(hex) != KEYSIZE << 1) {
|
|
fprintf(stderr, "KEY must have EXACTLY %d hex digits.\n",
|
|
KEYSIZE << 1);
|
|
exit(1);
|
|
}
|
|
|
|
for(i = 0; i < KEYSIZE; i++, hex += 2) {
|
|
unsigned int x;
|
|
char old = hex[2];
|
|
|
|
hex[2] = 0;
|
|
if(sscanf(hex, "%02x", &x) != 1) {
|
|
fprintf(stderr, "non-hex digit in KEY.\n");
|
|
exit(1);
|
|
}
|
|
hex[2] = old;
|
|
key[i] = x;
|
|
}
|
|
|
|
return key;
|
|
}
|
|
|
|
int main(int argc, char **argv)
|
|
{
|
|
int a, b, result;
|
|
char op[16], hex[256];
|
|
void *esp;
|
|
struct cast5_ctx ctx;
|
|
|
|
printf("enter decryption key: ");
|
|
scanf("%255s", hex);
|
|
if(cast5_setkey(&ctx, decode_hex_key(hex), KEYSIZE) < 0) {
|
|
fprintf(stderr, "error setting key.\n");
|
|
return 1;
|
|
}
|
|
|
|
printf("a b = "); scanf("%d %d", &a, &b);
|
|
|
|
asm("movl %%esp, %0" : "=m" (esp));
|
|
printf("esp=%p\n", esp);
|
|
result = crypt_exec(cast5_decrypt, &ctx, f1, decode_hex_key,
|
|
f2, a, b);
|
|
asm("movl %%esp, %0" : "=m" (esp));
|
|
printf("esp=%p\n", esp);
|
|
printf("result = %d\n", result);
|
|
|
|
return 0;
|
|
}
|