In reverse engineering, there are two approaches for examining the target: "live" or active examination, in which a program is run under the careful scrutiny of a) a debugger, b) a system monitor, or c) a capture utility that filters disk/file/memory access, API calls, messages, etc; and "dead" or passive examination, in which a program is opened in a hex editor or disassembler, with the end result being a .lst or .asm file containing a close approximation of the original aource code of the program. Most passive utilities will produce an "assembly-language rendering" of the target, which then must be reviewed and corrected by the engineer and finally--if at all--translated into C/C++, Visual Basic, Pascal, Fortran, Java, or whichever language the target is presumed to have been written in. What follows is an introduction to a number of "dead-listing" tools which, once learned by the engineer, will prove invaluable in retrieving an accurate assembly language rendering from the target binary file.
Synpopsis:HED v.1.78 is a 370K (installed) "hacking tool disquised as a hex editor", running in a non-resizeable DOS box. Its features include tradtional hex/ASCII or disassembly mode, multiple-file editting, excellent search and replace capabilities, macro recording, win32 API function name resolution,integral expresion calculator and ASCII table, branch following, 10 bookmarks, and imports/exports/internal references tables. All in all, a pretty sophisticated hex editor at 193K with a 176K imports.dat file.
Usage: General hex/text editor with Win32 import/export information.
Esc | Activate Menu | Alt-O | Open File | |
Alt-P | Previous File | Alt-N | Next File | |
Alt-Q | Close File | Alt-X | Exit Program | |
F2 | Save File | Alt-F2 | Save As | |
F9 | DOS Shell | Alt-F9 | Execute a program | |
Ctrl-F4 | Calculator | F11 | ASCII Table | |
Ctrl-M | Record Macro | Alt-F | Text Filter | |
Ctrl-Ins | Copy to HED clipboard | Shift-Ins | Paste from HED clipboard | |
Ctrl-G | Goto Offset | Ctrl-B | Goto Previous Position | |
Alt-Shift(0...9) | Save Position (Bookmark) 0-9 | Alt-[0...9) | Goto Position (Bookmark) 0-9 | |
F5 | Find Number | F6 | Find Text String | |
F7 | Find Hex Data | Ctrl-F6 | Find in ASM text | |
Ctrl-F8 | Find reference | Alt-F8 | SuperFind & Replace | |
Shift-F7 | Find Again | Alt-V | Toggle Hex/ASM view |
Notes: HED is a freeware (or, rather, "emailware") hex editor written for OS/2 by Dimitris Kotsonis, and ported to Win32 by Malakoudis S. Panagiotis. It is written in Visual C++ (GNU C for OS/2 version) and takes advantage of the Win32 DOS console interface...the result being that the code of the .exe is interesting to scroll through, but the program itself can slow down in certain functions. The "file open" dialog box, as it does not alphabetize the file names, is tedious to use in large directories such as C:\WINDOWS...creating a PIF file that takes a filename parameter or simply drag'n'dropping the target on the HED icon is recommended. HED will also open multiple files, so that typing C:\WINDOWS\*.* in the "file open" box will open every file in the Windows directory...
Synpopsis: HIEW v.5.66 is a 177K (installed) hex editor that runs in a DOS box and takes multiple files names as its startup parameters. Features include MZ/PE file header parsing, multiple file editting, hex/ASCII/disassembly views, file search and replace, saved jump table, reference calling, bookmarks, win32 API function name resolution, built-in 80386 assembler, and cyrptographic/XOR functions. HIEW has three different viewing modes (Asm, Hex, and Text, or A, H, and T), from A and H modes the user can enter the "Edit" mode (E). The PE file header summary is particularly effective, allowing the user to jump to locations in the file (such as the .text or .rsrc directories) reference by the PE Header, its Directory Table, or its Object Table.
Usage:General hex/text editor with PE file header and assembler capabilities.
Crypting Operations: The HIEW manual gives the followign explanation for its cryptographic funtions
Crypt operations are using for crypting/decrypting the code/data. Crypt algorithm is very simple. Code/data will be crypted by the bytes/words (to change the size ot the unit, press F2). Crypting routine must be terminated with "LOOP numberLine" operator. Available commands: Reg mode : neg,mul,div Reg-Reg mode: mov,xor,add,sub,rol,ror,xchg Reg-Imm mode: mov,xor,add,sub,rol,ror Imm mode : loop All 8/16 bit registers are available, except AL/AX that will be filled with (de)crypted byte/word. The differences from standard assembler: there are no jumps; 'loop' means 'jmp/stop' the operands of 'rol/ror' commands must have the same size, i.e. ROL AX,CL not allowed. Example: a. XOR byte with 0AAh: 1. XOR al,0aah 2. LOOP 1 b. XOR word with mask increment 1. MOV dx,0 2. XOR ax,dx 3. ADD dx,1 4. LOOP 2
Enter | Toggle View Mode | Alt-H | Help | |
F1 | File Info | F2 | Wrap/Unwrap (T) Assemble (E) | |
F3 | Edit (A,H) Undo (E) | F4 | Mode | |
F5 | Goto (A,H) | F6 | Linefeed (T) Find reference on current position (A) | |
F7 | Search(A,H,T) Crypt (E) | F8 | Header (A,H) XLAT (T) XOR (E) | |
F9 | Open Files (A,H,T) Update File (E) | F10 | Exit (A,H,T) Truncate File (E) | |
Alt-P | Save screen to file | Alt-R | Reload file | |
Ctrl-F3 | Search and Replace | Ctrl-F7 | Ctrl-Enter | Search Next | |
Ctrl-F8 | Previous File | Ctrl-F9 | Next File | |
+ | Bookmark | Alt-(1...8) | Goto Bookmark | |
Alt- - | Clear current bookmark | Alt-0 | Clear all bookmarks | |
1...9 | A...Y | Jump to target/save jump | 0 | Z | Return from jump |
Synpopsis: IDA v. 3.7 is a 15.9 MB interactive disassembler (hence its name), similar in a way to the old Bubble Chamber disassembler: a file is loaded, disassembled by the program, then the user is given a chance to modify code and data interpretations before saving the final output file. IDA takes this method to the extreme, modifying the code after the user makes changes to re-interpret the program...basically saving the user a lot of work. Features include multiple file editting, integral byte patcher (creates and .exe file for DOS files, or a .dif difference file for other formats), integral calculator, extensive macro language, integral text editor/viewer, full navigational and code interpretation facilities.
Usage: IDA is different from other disassemblers in that the user is intended to modify the disassembled file "interactively" with the program until an adequate approximation of the original source code is produced. Obtaining a full disassembled listing therefore requires that the user take part in three distinct processes:
The first and the third processes are pretty simple: the "Load File Of New Format" window provides plenty of options for the user to configure (be sure to set the DLL directory to c:\windows\system and not c:\windows; also uncheck "Rename DLLs" and check "Load Resources" and "Make Imports Section"), and typing ":" allows the user to enter comments that stand out in bright white (and therefore easily distinguishable from the brown IDA-generated comments).
The second process is the hardest, the most time consuming, and the one that requires the most technical knowledge. The user can use the C command to change data into code, and the D to do the opposite--note that each of these commands will cause changes throughout the file, for all relevant bytes beneath the changed line will be coverted to data or code as well. This means basically that the user must have very intimate knowledge of the program itself and the structure of the file format they are working on in order to get full use out of IDA.
Not all files require this much work to disassemble, however; with Windows files in particular, IDA does a good job on its own and usually provides the user with a more than adequate disassembly that only needs a little commenting and data modification. For cases like this, IDA provides excellent navigational commands (summarized in the Shortcuts section below) as well as the ability to change the data representation on the current line to hexidecimal (Q), ASCII (R), octal, binary (B), or decimal (H). The user can also rename (N) functions or variables defined by IDA, and can even patch the file from within the IDA environment.
A more thorough examination of IDA Pro's functions, including FLIRT and IDC, can be found on this summation of the IDA Pro web site.
Load as... * Portable executable _ MSDOS .exe _Binary file Loading segmemnt: 0x1000 (Exe & Bin) (paragraph where file will be loaded...only for exe/bin) Loading Offset: 0x0 (bin) (binary only...offset of first byte from start of first segment) * Create Segments (bin) * Load Resources _Rename DLL entries unchecked, makled repeatedable comments for entries imported by ordinal...else renames 2nd occurence _Manual Load (NE, LE, LX ...IDA will ask for loading addrersses/selectors for each object in file) _Fill Segment Gaps (NE) * Make Imports Section (PE) (convert .idata section to extra directives) _Don't align segments (OMF) _IBM Object Table (OMF) DLL directory: c:\windows\systemFirst thing : save the database by going to File->SaveDatabase (or pressing Ctrl-W); this will allow you to come back to your work later simply by loading the .IDB file instead of an executable when IDA starts up.
Next, scroll through the code to get the lay of the land...this is a relatively small file. Note that at offset 0041416 there starts
a continuous sequence of add [eax], al
repeating over and over. Toggling to hex mode via F4 or just examining
the bytes after the offset will show that this is just a continous block of 00's, terminating at 4015FE with the end of the .text
segment--meaning that these 00's are padding to fit the File Alignment "magic number"; the code segment therefore really ends
at offset 0041416.
IDA has produced one anamoly in this block of padding: at offset 00401464 it has generated the comment
CODE XREF: .text:004013F5^j
, meaning that this address is referenced by a jump at 4013F5. Press ENTER while the
cursor is over the cross-reference "jump-to" address and IDA will switch to this line of code: jnz short near ptr loc_401464+1
.
The location at 00401464 is always going to be zero, so the value 401464+1 would be simply 1, or the first line of code..which happens to be a subroutine.
Okay, on to work. Just what does this program do? Go to the View menu and choose names; this will show the imports used by the program and give you a brief overview: 28 names, all standard functions such as lstrcpyA, wsprintf, MessageBoxA, and LoadIconA, plus library functions like LoadLibraryA, FreeLibrary, and GetProcAddress that one would expect due to the nature of this program.
The .text section in this small program is only 416 lines...easy enough to track through manually using IDA:
Go to the program entry point by pressing Ctrl-E; you will start off at address 401028 which, as is standard for the start of a program or function, will prepare a stack frame. From here you can create a "skeleton" outline of the code by noting the "flow of execution", taking down relevant jumps and calls and any imports from the Windows API: Start: 401028 Start of Program 40102F API: GetCommandLine, store pointer in esi 401075 API: GetStartupInfo, store STARTUPINFO structure in ebp+var_44 401090 API: GetModule Handle..either 0Ah or ebp+var_14 (address of mudule to return handle for) 401097 Call 401322 40109F API: Exit process Type G 401322 or double-click/press enter on the address 401322 in line 401097: Main: 401334 API: SetErrorMode mask:8001h 401344 Call 4010AC 401352 Call 40124F (RegisterClassA_CreateWindowExA function) 401373 Call J_SHELL32_122 40137A Call 402010 (Bad Call: .data segment) 401380 Call 4012F8 (DestroyWindow_FreeLibrary) 40138S RET (end subroutine) Using the same method, investigate each of the called subroutines: Call from Main #1: ...to 4010AC... *****Function 4010AC***** 4010DD Call 401000 (CharNextA function, parameters 20h, esi) 4010EF Call 401000 (CharNextA function, parameters 2Fh, esi) 4010F8 Jcc 401101 4010FC JMP 40120B (RET) ...to 401000... *****Function 401000***** 40101A API: CharNextA 401025 RET ...to 401101... 401106 API: LoadLibrary 401113 Jcc 4011C7 40111A Call Kernel32_35 401128 Jcc 401182 40112C Call Kernel32_37 401139 Jcc 401161 401149 Call 40138D (LoadString_wsprintfA_MessageBox function) 401154 Call Kernel32_36 40115C JMP 40120B (RET) ...to 4011C7... 4011CE API: GetProcAddress 4011DB Jcc 40116B 4011F6 API: FreeLibrary 4011FE JMP 40120B (RET) ...to 401182... 401192 API: GetLastError 4011A0 API: FormatMessageA 4011BE Call 40138D (LoadString_wsprintfA_MessageBox function) 4011C5 JMP 40120B (RET) ...to 401161... 40116D Jcc 401200 (RET) 401177 API: lstrcpy 40117D JMP 401206 (RET) ...to 401208... 401211 RET ...to 40138D... *****Function 40138D***** 4013AB API: LoadStringA 4013C9 API: wsprintfA 4013E1 API: MessageBox 4013EA RET Call From Main #2: ...to 401024F... *****Function 401024F***** 401277 API: Call LoadIconA 401286 API: Call LoadCursorA 401290 API: Call GetStockObject 4012A7 API: Call RegisterClassA 4012DF API: Call CreateWindowExA 4012F5 RET Call From Main #3: ...to 4013EE... *****Function J_SHELL32_122***** 4013EE API: Shell32.122 (Unknown, poss ExtractAssociatedIconExW) Call From Main #4: ...to 402010... .data segment 402010 db 00 00 00 00 Call From Main #5: ...to 4012F8... *****Function 4012F8***** 4012FE API: DestroyWindow 401313 API: Kernel32.36 (unknown) 40131B API: FreeLibrary 401321 RETComparing the above abstract with the list of internal routines in View-> Functions shows that all 8 of Rundll32.exe's routines have been accounted for. While this source code still has a few mysteries that could be cleaned up, its functionality is relatively clear: this is simply a "loader" function that takes the name of a .DLL file as its startup parameter, then loads that .DLL using the GetProcAddress/LoadLibrary combo that is used in many applications for loading their own .DLLs. Not very mysterious at all...more like a patch than a utility.
//------------------------------------------------------------------------- // // Analysis parameters // //------------------------------------------------------------------------- ENABLE_ANALYSIS = YES // Background analysis is enabled SHOW_INDICATOR = YES // Show background analysis indicator #define AF_FIXUP 0x0001 // Create offsets and segments using fixup info #define AF_MARKCODE 0x0002 // Mark typical code sequences as code #define AF_UNK 0x0004 // Delete instructions with no xrefs #define AF_CODE 0x0008 // Trace execution flow #define AF_PROC 0x0010 // Create functions if call is present #define AF_USED 0x0020 // Analyse and create all xrefs #define AF_FLIRT 0x0040 // Use flirt signatures #define AF_PROCPTR 0x0080 // Create function if data xref data->code32 exists #define AF_JFUNC 0x0100 // Rename jump functions as j_... #define AF_NULLSUB 0x0200 // Rename empty functions as nullsub_... #define AF_LVAR 0x0400 // Create stack variables #define AF_TRACE 0x0800 // Trace stack pointer #define AF_ASCII 0x1000 // Create ascii string if data xref exists #define AF_IMMOFF 0x2000 // Convert 32bit instruction operand to offset #define AF_DREFOFF 0x4000 // Create offset if data xref to seg32 exists #define AF_FINAL 0x8000 // Final pass of analysis // See also ANALYSIS2, bit AF2_DODATA ANALYSIS = 0xFFFF // This value is combination of the defined // above bits. #define AF2_JUMPTBL 0x0001 // Locate and create jump tables #define AF2_DODATA 0x0002 // Coagulate data segs in the final pass ANALYSIS2 = 0x0001 //------------------------------------------------------------------------- // // Text representation // //------------------------------------------------------------------------- OPCODE_BYTES = 6 // don't display bytes of instruction/data INDENTION =0 // Indention of instructions COMMENTS_INDENTION = 30 // Indention for on-line comments MAX_TAIL = 16 // Tail depth MAX_XREF_LENGTH = 80 // Maximal length of line with cross-references MAX_DATALINE_LENGTH = 70 // Data directives (db,dw, etc): // max length of argument string SHOW_AUTOCOMMENTS = YES // Don't show silly comments SHOW_BAD_INSTRUCTIONS = NO // Don't bother about instruction lengthes SHOW_BORDERS = YES // Borders between data/code SHOW_EMPTYLINES = NO // Generate empty line to make // text more readable SHOW_LINEPREFIXES = YES // Show line prefixes (1000:0000) SHOW_SEGMENTS = YES // Show segments in addresses USE_SEGMENT_NAMES = YES // Show segment names instead of numbers SHOW_REPEATABLE_COMMENTS = YES // Of course, use repeatable comments // Disabling this increases IDA speed. SHOW_VOIDS = NO // Don't displaymarks SHOW_XREFS = 100 // Show 2 cross-references SHOW_XREF_VALUES = YES // If not, xrefs are displayed // as "..." SHOW_SEGXREFS = YES // Show segment part of addresses // in cross-references SHOW_SOURCE_LINNUM = YES // Show source line numbers // (used in .obj files and java) SHOW_ASSUMES = YES // Generate 'assume' directives SHOW_ORIGINS = YES // Generate 'org' directives USE_TABULATION = YES // Use '\t' in output file //------------------------------------------------------------------------- // Proccesor specific parameters //------------------------------------------------------------------------- #ifdef __PC__ // INTEL 80x86 PROCESSORS USE_FPP = YES // Floating Point Processor // instructions are enabled WINDIR = "c:\\windows\\system" // Default directory to look up for // DLL files
Alt-Z | DOS Shell | Alt-X | Exit | |
Ctrl-W | Save Databse | Ctrl-F10 | Produce .exe file | |
Alt-F10 | Produce .asm file | Shift- F10 | Produce .map file | |
F1 | Help | F2 | IDC File | |
F3 | Open Window | F4 | Toggle Hex/Asm view | |
Shfit-F6 | Previous Window | F6 | Next Window | |
F7 | Tile Windows | F8 | Cascade Windows | |
F5 | Zoom | F10 | Activate Menu | |
C | Current line=Code | D | Current line=Data | |
A | Display current line in ASCII | N | Name current line | |
: | Add comment | Alt-M | Mark Position | |
Q | Operand=Hex | H | Operand=Decimal | |
B | Operand=Binary | R | Operand=Character | |
Enter | Jump to location under cursor | Esc | Return from jump | |
G | Goto Address | Ctrl-L | Goto Name | |
Ctrl-P | Goto Function | Ctrl-S | Goto Segment | |
Ctrl-M | Goto Marked Position | Ctrl-X | Goto Cross Reference | |
Ctrl-E | Goto Entry Point | Alt-T | Search for text | |
Ctrl-C | Search for next code | Ctrl-D | Search for next data | |
? | Calculate expression | Shift-F2 | Run IDC command |
Synpopsis: Sourcer v.7.0 is a DOS mode disassembler that uses a Windows pre-processor (essentially a script that calls resdump, dumppe, impdump, dumplx, and dumpne, then formats their output for use by Sr.exe); together the whole package is 1.79 MB. Output is a .lst file containing the asm source code for the original file; the goal of Sourcer is to provide source code that is re-compilable for the target assembler.
Usage: Sourcer is non-interactive; the user sets options for disassembly, then runs Sourcer--when it has finished, they can peruse the .lst or .asm file at their leisure in a standard text editor. Windows programs are first run through the winp.exe preprocessor, which produces a .r and .wdf file as input for sr.exe (the main Sourcer executable).
Windows Preprocessor:
Sourcer:
Synpopsis: W32DASM v.8.9 is a combined disassembler/debugger that totals up to 2.13MB. The disassembler allows viewing of one file at a time; starting a debug process allows the disassembled file to be run and patched in memory (debug-mode commands are marked with D, below). Features include import and export function tables, reference tables for strings, menus, and dialog boxes, hex dumps of data and code segments, and jump/call branching. The debugger is standard fare with the added features of in-memory code patching and Windows API call "detailing"--a valuable feature that gives the parameters and returns of any API call made by the program.
Usage:
Debugger:
Ctrl-L | Load Process | Ctrl-T | Terminate Process (D) | |
F5 | Auto Step Into(D) | F6 | Auto Step Over(D) | |
F7 | Step Into(D) | F8 | Step Over(D) | |
F9 | Run Process(D) | Space | Pause Process(D) | |
F2 | Breakpoint Toggle (D) | Ctrl-C | Copy Selection | |
Ctrl-S,F | Find Text | F3 | Find Next | |
Ctrl-S | Goto Code Start | F10 | Goto Entry Point | |
F11 | Goto Page | F12 | Goto Code Location | |
Lft Arrow | Execute Jump | Ctrl Rt Arrow | Return From Jump | |
Lft Arrow | Execute Call | Rt Arrow | Return From Call |