Reverse engineering the Linux OS, a first approach
(disassembling Linux)
by SiuL+Hacky
(15 October 1997)
Courtesy of fravia's page
of reverse engineering
Well, another VERY remarkable essay, that I am proud to present.
SiuL+Hacky tackles here NEW UNCOVERED ground, and teaches all of you the
first elements of Linux reverse engineering... you would have tought, as I did, that
such reversing would have been useless, since the main characteristic of Linux (and
of the whole GNU initiative) was to give freely the source code of any program. Yet
the deficiencies of Windoze are to-day so evident that more and more "commercial"
programmers are turning to Linux despite all efforts by Gate's lackeys. And if you
say "commercial" you say of course limited egotistical pusillanimous minds, that
introduce their banal protection schemes even into the Linux world,
until yesterday incontaminated.
Enjoy this GREAT essay/tutorial by SiuL+Hacky,
let's hope that he will send us more essays on this subject!
BTW, you'll find inside here dasm: a disassembler for
Linux *WRITTEN* by SiuL+Hacky himself!
I. Linux Introduction.
-------------------------
Probably all of you know about linux, but I don't know how many people
has linux installed in their computers. I have (as many people do)
both o.s. in different partitions of my hard-disk. Sometimes people
thinks of Operative Systems as religions (it use to happen also with
editors), so I'm not gonna tell you: INSTALL IT if you want your soul
to be saved ! If you are not sure, after reading this document, I
think you should know for sure what to do.
A friend of mine told some time ago a joke about Operative Systems
compared with Airlines. When you travel with Microsoft Airlines, you
may find beautiful women at the checking desk, you may enjoy amazing
entertaining shows before departure, when you climb in the aeroplane
it is really comfort and full of charming stewardesses. Ok, after
taking off the aeroplane explodes and nobody knows why. When
travelling with unix airlines you may travel safely, but passengers
must carry themselves the pieces of the aeroplane.
Unix is for you if you if you feel right working with DOS-boxes under
Windows, if you use to work with network environments, if you want
speed and safety back (your brand-new Pentium acts like a Pentium, not
like 386) and if you find lack of excitement configuring W95
programs. You may recover this bittersweet feeling of being in the
middle of a deserted island when things go wrong. But if you hate
command line programs with thousands of switches, unix is not for you.
One of the main characteristics of linux, is that it's a "free
environment". The applications (and kernel itself) are developed by
people and are offered to "the world" completely free. Most
applications are developed (more or less) under GNU License. Moreover,
a lot of the programs are provided with the source code (and you
compile it). Though it has been ported to several platforms, is
especially popular in x86 computers, and many users come from DOS.
II. A Cracker inside Linux world.
---------------------------------
Linux is cool for hacking, but I had never heard anything about
cracking in linux. As I told you, software is free and there's no
"bunch of shareware programmers". Imagine ... protecting a program and
giving you the source code, really nonsense.
But wait, Linux is not perfect, programs are not beautiful and
user-friendly. One of the problems I found from start with linux, is
multimedia. Multimedia is new in Dos/Windows world, so the old unix
dinosaur, that hasn't changed in the last twenty years (though if you
look inside "new" operative systems they are not that different) was
not supposed to have lot of multimedia support. I have a cheap
Soundblaster clone, and I cannot make it "cry" through my speakers. I
am not waiting for Dennis Ritchie saying "bye bye" when logging out,
but I like to "play" with sound algorithms and other stuff.
Surprisingly in just one day I downloaded two sound-programs with the
same nasty protections of their DOS brothers. It is really strange,
and I don't know if it is going to be usual in the future; probably it
will depend on Microsoft (once more), and if it finally gets into
Linux world (now it is just a rumour). Anyway, I decided to crack
them.
In Linux, people use to program in C (the Linux kernel is made in C) and
I found practically no assembler references. I had no idea if cracking
linux was gonna be easy or not, but the fact was that I had to start
from scratch practically. Most of the utilities I found are binary
utilities that come with GCC (GNU C compiler), and that every linux
user may find in the different distributions or elsewhere in the Web.
I didn't know of their existence, but I had them in my computer. Well,
this is for you.
III. Tools of the trade.
-------------------------
Here you'll find some tools that I have found or make myself, and will
make cracking easier. Mostly are "Windoze" brothers. First of all,
slight differences, mnemonics are named in a different way. I would
say it's even better (Sacrilegious !), but anyway you'll have no
problem getting these changes. You just have to be careful with
operands, especially in mov instructions, because they are reversed, I
mean:
mov source, destiny
instead of usual DOS:
mov destiny, source
1) GDB. GNU Debugger.
GNU Compiler has its own debugger, it's called gdb and it has even a
front-end for X Windows. It is neither Softice nor DOS Debug, but it
is thought to work with the source code and executables with debug
information. You can debug a program with assembler instructions, but
is not comfortable. For example, you are not seeing the current
assembler instruction, nor registers. This do not pretend to be a
replace for the man page of gdb. There are lots of useful information
in books or INFO documents, but here you'll get some useful clues for
starting.
It has some features that you cannot find in Softice, for instance,
you can debug a program that is already running ! You may use the
"attach" command for it. Gdb runs in a virtual console, so may run
your favorite programs while debugging.
Assembler instructions are executed with the "stepi" and "nexti"
commands, but you cannot fire the program with these instructions. The
programs are broken with Control-C, but you will not "surf" inside
every instruction of kernel code. Usually you'll stop the program (for
instance while waiting for a key) in a system call. Programs do not
use to call directly to system calls, because a kernel update could
make them crash. They call C functions, and C libraries (more or less
like DLLs) will make the system calls. If you want to see a
disassembled listing, use the "disassemble" command ("disas" will do
also) + an address (0xaddress), though that address is just used to
get a function (the function owner of the inst. with the address
given) and gdb shows you the whole listing of the function from start.
That's not cool, you know, life is tough. At least you can see current
instruction with "display/i $eip". After breaking the program use
"Continue" to resume execution.
The "display" command is also good for showing the value of a
particular register (don't forget $ sign), but if you want to show all
registers use "info registers". Finally if you want to change their
value use "set $eax=3" for instance.
There's a wide range of breakpoints. You can set usual breakpoints "br
*address", clear them, disable them, use conditional breakpoints
(YES!), hardware breakpoints ...
And finally the "backtrace" command is more or less like Softice
"stack", and "finish" should make 'p ret', but do not trust it very
much. Well there are lots of commands, study them, but after realizing
the power of the dead approach, I'm sure you will not want gdb
anymore.
2) STRACE
This is really a nice tool, especially for spying the program and its
behaviour. It logs every System Call made by a program, WITH
PARAMETERS and in a way you'll love it, as I'll show you afterwards. I
like to use it this way:
strace -oOUTPUT_FILE -i TARGET_FILE
where OUTPUT_FILE is the file where you want the log to be dumped.
-i: appends the value of eip when the call was made. It seems like a
bliss, but be careful: LIBRARIES USE TO MAKE SYSTEM CALLS, not
programs.
3) STRINGS
It should be a great tool, because show you strings inside a binary
file, and then you can identify the evil program that is punishing
you, but there's a simple and easier way to do it using the amazing
"grep" command. For example if you are looking for strings as
"Register", run this:
grep Register *
and it'll show you all the files in the current directory containing
the string "Register". But the first field of this command is a
general PATTERN, so it may be an exact match or a match as complicated
as you want (learn REGULAR EXPRESSIONS for it).
4) HEX EDITORS
What is a crack, without an Hex-Editor ? ("mental" cracking is hard,
by now). There are very few of them in Unix (that I know of). Get
one of them at:
ftp://vieta.math.uni-sb.de/pub/misc/hexer-0.1.4c.tar.gz
It uses "VI"-style. You know, vi is the "official" editor in Unix. It
seems that every "cool-unix-guy" must love it, or he'll be an
"aficionado". I do prefer JOE, which "looks-like" old WordStar and old
WordPerfect and you'll know how to quit the first time you run it :-).
Anyway, you may use, as I do, good Dos HEXEDITORS like Norton Diskedit
(version 4 or 5). I'm not kidding, a DOS emulator (DOSEMU) is
available in Linux, and works fine with real mode and DOS4GW programs.
There's a Windows emulator, but it is long ago in " an early alpha
stage ". Don't try it.
5) OBJDUMP
Well, at last a candle in the middle of the darkness. If is difficult
to find assembler references, to find disassembling references is like
looking for Money 3.0 (perhaps FidoNet has again the answer :-). I
found only a switch in this program that gives a "dump disassembly".
This program gives you the information and data of the different
sections (more about sections later) of a linux object (executable)
file. It is possible to get the assembler listing of a program you
have made (there's a switch in the compiler), but objdump is the only
program I found that disassemble an arbitrary executable. It also
gather information of the different "Sections" of the file. But the
problem, is that there's no analysis information in the disassembled
file. Some switches of objdump:
-d: Displays the assembler mnemonics contained in the code Sections.
Note that mnemonics are displayed in the "linux-way". Something like
this:
0804a37a repnz scasb %es:(%edi),%al
0804a37c notl %ecx
0804a37e movl %ecx,0xfffffc0c(%ebp)
0804a384 movb $0x0,0xfffffc16(%ebp,%ecx,1)
Download dasm.txt here!
(If you want to save a web file and you don't know how, and all
it does is display on the screen, try to hold down the shift key when you
click on it: it might solve your problem :-)
I programmed it in PERL. Why ? Well since my very first steps in perl
I realize it was perfect for text-processing files (I knew nothing
about sed, awk ...). The syntax is not very beautiful or
high-level-looking; it's an interpreted language, so it is not the
fastest. Anyway it always has the tools you are looking for (or you
always dreamt of) and enables you to do a lot of things at the same
time. It's very popular in CGI scripts. I learnt perl and CGI with a
very good book by Eric Herrmann. Sorry, I tried not to make it very
cryptic, but PERL is PERL, and if you don't know perl you'll probably
don't understand it. For this reason I'll explain how it works.
BTW a perl interpreter (perl 5.0) may be found in any LINUX
distribution, though interpreters for DOS are available too. Well
let's start with jmp/call processing:
- The (DYNAMIC) SYMBOL TABLE is read and the elements are put into an
associative array indexed by the addresses. For instance:
$st_element{"0xprint_address"}="print";
- Then all call / jmp instructions are processed into another
associative array, in this way:
$jumping{"jump_to_address"}="jump_from_address";
- After this, the addresses of assembled instructions (from .text
section) are checked against $jumping elements, and if it do exists,
the reference is written.
- In the same process, call instruction are processed and if they call
a function from the symbol table, it is also written.
For string processing, we must get further knowledge of how
executables are build in linux. The most common format is ELF-32bits (
Executable and Linkable Format). The structure of the object is :
* ELF HEADER
* PROGRAM TABLE HEADER
* SECTION 1
* ...
* SECTION N
* SECTION HEADER TABLE
These sections will be "segments" when the program is executed. Some
important sections are .init (initialization code), .fini (
termination code), .data (pretty obvious), .text (code), .rodata
(Read-only data), and so on. Do you remember lesson 8.1 and Win32
exe files ? Don't you think it's pretty much the same ?
These are ELF-TYPES:
Elf32_Addr 4 bytes unsigned
Elf32_Half 2 bytes unsigned
Elf32_Off 4 bytes unsigned
Elf32_Sword 4 bytes signed
Elf32_Word 4 bytes unsigned
And ELF Header is something like this:
typedef struct {
unsigned char e_ident[16];
Elf32_Half e_type;
Elf32_Half e_machine;
Elf32_Word e_version;
Elf32_Addr e_entry;
Elf32_Off e_phoff;
Elf32_Off e_shoff;
Elf32_Word e_flags;
Elf32_Half e_ehsize;
Elf32_Half e_phentsize;
Elf32_Half e_phnum;
Elf32_Half e_shentsize;
Elf32_Half e_shnum;
Elf32_Half e_shstrndx;
} Elf32_Ehdr;
For us, is important the member e_shoff, that keeps information about
the file offset of the Section Header Table. The SHT is an array of
Elf32_Shdr structures. The element e_shnum tells the number of entries
in the SHT, and e_shentsize gives the size in bytes of each entry.
This is the Elf32_Shdr:
typedef struct {
Elf32_Word sh_name;
Elf32_Word sh_type;
Elf32_Word sh_flags;
Elf32_Addr sh_addr;
Elf32_Off sh_offset;
Elf32_Word sh_size;
Elf32_Word sh_link;
Elf32_Word sh_info;
Elf32_Word sh_addralign;
Elf32_Word sh_entsize;
} Elf32_Shdr ;
The offset of each section is taken from each sh_offset member. The
name of each section is a little bit more complicated, because sh_name
is an index into the section header String Table Section. Well, stop,
I don't want you to get confused. Fortunately, objdump give us that
information. Strings are located in the .rodata Section (for obvious
reasons), and objdump gives the file offset of the section. If you
want complete information on ELF format, there's a PostScript document
for you:
ftp://tsx-11.mit.edu/pub/linux/packages/GCC/ELF.doc.tar.gz
There (or in any other mirror), you'll find a lot of interesting things.
Ok, then for string processing, dasm reads Section .rodata offset, and
get its content from the binary file. We get starting address and
size of .rodata section, so to make string processing:
- The whole .rodata section is read in a variable.
- Dasm looks for inmediate operands (with $ prefix) and checks if
they own to .rodata section.
- If true, the string (null terminated) is extracted from .rodata
section, and the reference is written.
The rest, is dirty details about format processing. The program calls
objdump, and you just have to use it this way:
dasm exec_file processed_output_file
I've tested it with several programs, but if you find any bug, problem
or you have any question, suggestion or whatever, report them to me
at:
lluisote@hotmail.com
NOTE: In dasm, I don't use the hex values of the instructions (switch
--show-raw-insn), because the output is not tabbed and it wastes disk
space. When we'll need this data, I'll show you how to get it easily.
IV. THE CRACKS
---------------
For applying all this theory, we're gonna crack the couple of programs
I told you. I chose them because they are very different and
appropriate for beginning, you'll see. The first one is a disabled
program with password registration, the second one is a trial with 2
level of time protection and the same nasty behaviour of its windows
brothers.
1) ftp://ftp.fhg.de/pub/layer3/l3v270.linux.tar.gz
What the hell is this ? Well, it's an encoder/decoder of MPEG layer
III. If you don't know about it, it's a standard for audio compression
(a really exciting subject). Every time you run the decoder you're
asked about entering a registration code, because sample rates and
other features are restricted to "registered users".
Let's have some fun with the new tools: "strace -oSalida l3dec" will
dump system calls in a file called Salida. Do it, answer that you
don't want to enter Reg.Cod., and get something like this (filtered by
me):
write(2, "\n*** l3dec V2.70 ISO/MPEG Au"..., 71) = 71
write(2, "| "..., 71) = 71
write(2, "| copyright Fraunhofer"..., 71) = 71
write(2, "| "..., 71) = 71
<<<< Look! It is writing the file header
open("./l3dec", O_RDONLY) = 4 <<<< get current directory
close(4) = 0
open("./register.inf", O_RDONLY)=-1 ENOENT (No such file or directory)
<<< FILE sndconf
seconds of evaluation time left -> FILE modules/soundbase
The second file is not executable, is a "relocatable Elf file" (a
module). No problem. It is logical, for a countdown the protection
must dwell in a resident program. This protection is a little bit more
complicated than the first one, but is not a tough protection at all.
Dasm sndconf, and look for "License expired" (Be indulgent with this
long listing, trust me, it's easy):
08052101 cmpl %esi,0x10(%eax); <<<< some comparing
08052104 jl 08052110; <<<< if not less flag=0
08052106 movl $0x0,0xfffffd84(%ebp)
Referenced from jump/call at 080520f3 ; 08052104 ;
08052110 cmpl $0x0,0xfffffd84(%ebp); <<< flag=1 seems to be good
08052117 jne 08052150; <<< jump somewhere
08052119 pushl %ebx; <<< the game is over outlaw!
0805211a pushl %edi
Possible reference to string:
"License expired: %02d/%04d"
0805211b pushl $0x806fc08
Reference to function : printf
08052120 call 08049138
Possible reference to string:
"Please download a fresh version from http://www.4front-tech.com"
08052125 pushl $0x806fb97
Reference to function : printf
0805212a call 08049138
0805212f pushl %ebx
08052130 pushl %edi
Possible reference to string:
"License expired: %02d/%04d"
08052131 pushl $0x806fc08; <<<< I love this formatted strings
08052136 pushl $0x807e6d0
Reference to function : fprintf
0805213b call 08049368
08052140 addl $0x20,%esp
08052143 pushl $0xffffffff
Reference to function : exit
08052145 call 08049598; <<<< beggar off
0805214a leal 0x0(%esi),%esi
Referenced from jump/call at 08052117 ;
<<< Do you remember the flag ?
08052150 movl $0x1,0xfffffd84(%ebp); <<< jump here if above flag=1
0805215a movl 0xfffffd94(%ebp),%eax
08052160 movl %eax,0xfffffd80(%ebp)
08052166 decl %eax
08052167 movl %eax,0xfffffd94(%ebp)
0805216d movl 0xfffffd80(%ebp),%esi
08052173 decl %esi
08052174 jns 08052186
08052176 decl 0xfffffd90(%ebp)
0805217c movl $0xb,0xfffffd94(%ebp)
Referenced from jump/call at 08052174 ;
08052186 movl 0xfffffd7c(%ebp),%eax
0805218c movl 0x14(%eax),%edx
0805218f movl 0xfffffd90(%ebp),%ecx
08052195 cmpl %ecx,%edx
08052197 jle 080521a3; <<< jumping flag=0
08052199 movl $0x0,0xfffffd84(%ebp);<<< flag=0 BAD GUY !
Referenced from jump/call at 08052197 ;
080521a3 cmpl %edx,%ecx
080521a5 jne 080521c2
080521a7 movl 0xfffffd94(%ebp),%eax
080521ad movl 0xfffffd7c(%ebp),%esi
08052160 movl %eax,0xfffffd80(%ebp)
08052166 decl %eax
08052167 movl %eax,0xfffffd94(%ebp)
0805216d movl 0xfffffd80(%ebp),%esi
08052173 decl %esi
08052174 jns 08052186
08052176 decl 0xfffffd90(%ebp)
0805217c movl $0xb,0xfffffd94(%ebp):
Referenced from jump/call at 08052174 ;
08052186 movl 0xfffffd7c(%ebp),%eax
0805218c movl 0x14(%eax),%edx
0805218f movl 0xfffffd90(%ebp),%ecx
08052195 cmpl %ecx,%edx
08052197 jle 080521a3; <<< jumping again badflag
08052199 movl $0x0,0xfffffd84(%ebp); <<< flag =0
Referenced from jump/call at 08052197 ;
080521a3 cmpl %edx,%ecx
080521a5 jne 080521c2
080521a7 movl 0xfffffd94(%ebp),%eax
080521ad movl 0xfffffd7c(%ebp),%esi
080521b3 cmpl %eax,0x10(%esi)
080521b6 jl 080521c2; <<< again
080521b8 movl $0x0,0xfffffd84(%ebp)
Referenced from jump/call at 080521a5 ; 080521b6 ;
080521c2 pushl %ebx
080521c3 pushl %edi
Possible reference to string:
"License will expire after: %02d/%04d"
080521c4 pushl $0x806fc24
Ejem, if flag=1 your license don't expire, and then lot of
possibilities of flag=0. Pretty obvious. Use your favorite dos/unix
hexeditor (or copy the file to your dos partition, reboot and run the
damned Windoze hexeditor) and do a general Search/Replace:
(... objdump -d --show-raw-insn sndconf | grep 080521b)
Every
c7 85 84 fd ff ff 00 00 00 00 movl $0x0,0xfffffd84(%ebp)
changes to:
c7 85 84 fd ff ff 01 00 00 00 movl $0x1,0xfffffd84(%ebp);ALWAYS GOOD!
You'll notice that the message even disappear. But we must get rid of
the countdown too. Dasm soundbase and look for "seconds" (you may see
that this file has line information):
Possible reference to string:
"OSS: The evaluation time has elapsed. Please reload the driver."
<<<< if you're executing this part
<<<< you are a really bad guy
00005901 <sound_open_sw+71> pushl $0x944
RELOC: 00005902 R_386_32 .rodata; << look! objdump smts helps
00005906 <sound_open_sw+76> call 00005907 <sound_open_sw+77>
<<< movl $0xffffffed,%eax
Possible reference to string:
"d: Driver partially removed. Can't open device" <<<< String references sometimes fail
00005910 <sound_open_sw+80> addl $0x4,%esp
00005913 <sound_open_sw+83> popl %ebx
00005914 <sound_open_sw+84> popl %esi
00005915 <sound_open_sw+85> ret
00005916 <sound_open_sw+86> leal 0x0(%esi),%esi
00005919 <sound_open_sw+89> leal 0x0(%esi,1),%esi
Referenced from jump/call at 000058ff ;
00005920 <sound_open_sw+90> movl 0x0,%eax
RELOC: 00005921 R_386_32 jiffies_R2f7c7437
00005925 <sound_open_sw+95> subl %eax,%edx
00005927 <sound_open_sw+97> movl %edx,%eax
Possible reference to string:
"en configured"
00005929 <sound_open_sw+99> movl $0x64,%ecx
0000592e <sound_open_sw+9e> xorl %edx,%edx
00005930 <sound_open_sw+a0> divl %ecx,%eax
00005932 <sound_open_sw+a2> pushl %eax
Possible reference to string:
"OSS: %d seconds of evaluation time left" <<< Here you are a not so good guy
00005933 <sound_open_sw+a3> pushl $0x99e
RELOC: 00005934 R_386_32 .rodata
00005938 <sound_open_sw+a8> call 00005939 <sound_open_sw+a9>
RELOC: 00005939 R_386_PC32 printk_Rad1148ba; << printing what?
Possible reference to string:
"river partially removed. Can't open device"
0000593d <sound_open_sw+ad> addl $0x8,%esp
Referenced from jump/call at 000058e8 ; 000058ec ; 000058f6 ;
00005940 <sound_open_sw+b0> movl %ebx,%eax; <<<I want to jump here !
Look at this, before seeing the rest of the code:
- If you are a not so good guy you come from 58ff
- You bypass the countdown message if you come from 58e8;58ec and 58f6
- If you don't get these jumping you are a really bad guy.
It seems to be a REAL HOT AREA. Ok, you cannot wait anymore, I'll show you:
000058e0 <sound_open_sw+50> movl 0x1148,%edx
RELOC: 000058e2 R_386_32 .data
000058e6 <sound_open_sw+56> testl %edx,%edx
000058e8 <sound_open_sw+58> je 00005940; <<< FIRST OPPORTUNITY
000058ea <sound_open_sw+5a> testl %ebx,%ebx
000058ec <sound_open_sw+5c> je 00005940; <<< movl %ebx,%eax
Possible reference to string:
"artially removed. Can't open device"
000058f0 <sound_open_sw+60> andl $0xf,%eax
Possible reference to string:
" Driver partially removed. Can't open device"
000058f3 <sound_open_sw+63> cmpl $0x6,%eax
000058f6 <sound_open_sw+66> je 00005940; <<< THIRD ONE
000058f8 <sound_open_sw+68> movl 0x0,%eax
RELOC: 000058f9 R_386_32 jiffies_R2f7c7437
000058fd <sound_open_sw+6d> cmpl %edx,%eax
000058ff <sound_open_sw+6f> jbe 00005920; <<< LAST ONE EVEN BEING
<<< A NOT S.G. GUY
If i'm honest i don't like this variety. If you look for hits for the
FIRST key variable 0x1148 (apparently 0x1148=0 is a good thing), it
is never (directly) assigned to 0. I don't like, perhaps it works,
but I do prefer the other two options (that deal with the same thing).
Change:
000058f0 <sound_open_sw+60> 83 e0 0f andl $0xf,%eax
000058f3 <sound_open_sw+63> 83 f8 06 cmpl $0x6,%eax
000058f6 <sound_open_sw+66> 74 48 je 00005940
to:
000058f0 <sound_open_sw+60> 83 e0 0f andl $0xf,%eax
000058f3 <sound_open_sw+63> 83 f8 06 cmpl $0x6,%eax
000058f6 <sound_open_sw+66> eb 48 jmp 00005940
It apparently works, and I say apparently 'cause I told before that
this buggy module doesn't work anyhow :-)
Well, easy cracks for a new area. Good linuxing !
SiuL+Hacky
(c) SiuL+Hacky 1997. All rights reversed
You are deep inside fravia's page of reverse engineering,
choose your way out:
homepage
links
anonymity
+ORC
students' essays
academy database
tools
cocktails
antismut CGI-scripts
search_forms
mail_fravia
Is reverse engineering legal?