+HCU papers
courtesy of fravia's page of reverse engineering
SLH
FUTURE VISION
The supression and resurrection of assembler programming.
Historical perspective
~~~~~~~~~~~~~~~~~~~~~~
A long time ago in world far away, the designers of the millenium bug
scribble up flow charts in ancient ciphers like Cobol and Fortran, send it
to the girls in the punch card room who dutifully typed out the punch cards
and send the result to an air conditioned boiler room downstairs where they
were loaded into a card reading machine and run overnight.
This was a cosy corporate world of high profits, high salaries and endless
accolades for being on the leading edge of the computer age. The writing on
the wall started to be read when the oil price rises in the seventies took
its toll on the profitability of the corporate sector and the cost of running
and upgrading mainframes generated the need for a cheaper alternative.
The birth of the PC concept was an act of economic necessity and it heralded
a major turn in the computer world where ordinary people were within
reasonable cost range of owning a functional computer. When Big Blue
introduced the original IBM PC, it put a blazingly fast 8088 processor
screaming along at 2 megahertz into the hands of ordinary people.
The great success of the early PCs onwards was related to the empowerment
on a technical level of ordinary people being able to do complex things in
a way that they could understand.
Early Programming
~~~~~~~~~~~~~~~~~
The early tools available for development on PCs were very primitive by
modern standards yet they were a great improvement over what was available
earlier. If anyone has ever seen the electronic hardware guys writing
instructions for eproms in hex, the introduction of a development tool like
debug was a high tech innovation.
PCs came with an ancient dialect of ROM basic where if you switched the
computer on without a floppy disk in the boot drive, it would start up in
basic. This allowed ordinary users to dabble with simple programs that
would do useful things without the need of a room full of punch card
typists and an air conditioned boiler room downstairs with an array of
operators feeding in the necessary bits to keep a mainframe going.
The early forms of assembler were reasonably hard going and software output
tended to take months of hard work using rather primitive tools which gave
birth to the need for a powerful low level language that could be used on a
PC that would improve the output.
C filled this gap as it had the power to write at operating system level
and as the language improved, it had the capacity to write assembler
directly inline with the C code.
If the runtime library functions could not do what you wanted, you simply
added an asm block,
ASM
{
instruction ...
instruction ...
instruction ...
}
and compiled it directly into you program.
As the tools improved from being driven by market demand, the idea of a
common object file format emerged which dramatically increased the power
that programmers had available.
Different languages had different strengths which could be exploited to
deliver ever more powerful and useful software.
C had the architecture to write anything up to an operating system.
Pascal had developed into a language with a very rich function set that
was used by many games developers.
Basic had emerged from its spagetti code origins into a compiler that
had advanced capacity in the area of dynamic memory allocation and string
handling.
The great unifying factor to mixed language programming was the capacity to
fix or extend the capacity of each language by writing modules in assembler.
Modern Programming
~~~~~~~~~~~~~~~~~~
By the early nineties, modern assemblers came with integrated development
environments, multi language support in calling conventions and powerful
and extendable macro capacities which allowed high level simulations of
functions without the overhead associated with high level languages.
To put some grunt into a deliberately crippled language like Quick Basic,
you wrote a simple assembler module like the following,
;--------------------------------------------------------------
.Model Medium, Basic
.Code
fWrite Proc handle:WORD, Address:WORD, Length:WORD
mov ah, 40h
mov bx, handle
mov cx, Length
mov dx, Address
int 21h
ret ; Return to Basic
fWrite Endp
End
;--------------------------------------------------------------
Change the memory model to [ .Model Small, C ] and you had a printf
replacement with one tenth the overhead.
Code as simple as this allowed you to write to files, the screen or a
printer, just by passing the correct handle to the function.
Simply by specifying the calling convention, the programmer could extend
C, Pascal, Basic, Fortran or any other language they wrote in so that it
would deliver the capacity that was wanted.
This capacity was nearly the heyday of flexible and powerful software
development in the hands of non-corporate developers. The monster looming
on the horizon came to fruition as a consequence of corporate greed on one
hand and expediency on the other.
The Decline
~~~~~~~~~~~
Legal wrangling about the ownership of UNIX in the early nineties crippled
its development for long enough to leave the door open for the early version
of Windows to gain enough popularity to be further developed. With the
advent of the upgraded version 3.1, DOS users had a protected mode, graphics
mode add on that offered extended functionality over the old DOS 640k limit.
The great divide started by stealth, development tools for version 3.1 were
thin on the ground for a long time, the technical data necessary to write
protected mode software was proprietary and very expensive.
Even after parting with a reasonably large amount of hard currency, the
version of C and the SDK that was supposed to be the be all and end all
came with a development environment that crashed and crashed and crashed.
The documentation could only be classed as poor and it dawned on most who
bothered that the proprietor couldn't care less.
The sales were their and they no longer needed the little guys who supported
them on the way up.
The Fall
~~~~~~~~
Over the duration of 16 bit windows, the little guys made a reasonable
comeback and produced some often very good and reliable software but the
die had been cast. The reigns of proprietry control drew tighter and
tighter while the support for the expensive software became poorer and
poorer.
The problem for the corporate giants was that the world was finite and
market saturation was looming over their head in the very near future.
Their solution was to gobble up the smaller operators to increase their
market share and block out the little guys by controlling the access to
the development tools.
The Great Divide
~~~~~~~~~~~~~~~~
Many would say, why would anyone bother to write in assembler when we have
Objects, OLE, DDE, Wizards and Graphics User Interfaces ? The answer is
simple, EVERYTHING is written in assembler, the things that pretend to be
development software are only manipulating someone elses assembler.
Market control of the present computer industry is based on the division of
who produces useful and powerful software and who is left to play with the
junk that is passed off on the market as development tools.
Most programmers these days are just software consumers to the Corporate
sector and are treated as such. As the development tools get buggier and
their vendors spend more time writing their Licence Agreements than they
appear to spend debugging their products, the output gets slower and more
bloated and the throughput of finished software is repeatedly crippled by
the defects in these "objects".
A simple example of market control in development software occurs in the
Visual Basic environment.
Visual Basic has always had the capacity to pass pointers to its variables.
This is done by passing the value by REFERENCE rather than by VALUE. The
problem is that the VB developer does not have access at the pointer and
has to depend on expensive aftermarket add ons to do simple things.
Visual Basic has been deliberately crippled for commercial reasons.
This is something like downloading and running a function crippled piece
of shareware except that you have already paid for it. There are times
when listening to the hype about enterprise solutions is no more than a
formula for an ear ache.
Why would a language as powerful as C and its successor C++ ever need to
use a runtime DLL. The answer again is simple, programs that have a startup
size of over 200k are not a threat to corporate software vendors who are
in a position to produce C and assembler based software internally.
The great divide is a THEM and US distinction between who has the power to
produce useful things and who is left to play with the "cipher" that passes
as programming languages.
In an ideal world, a computer would be a device that knew what you thought
and prepared information on the basis of what you needed. The problem is
that the hardware is just not up to the task. It will be a long time into
the future before processors do anything more than linear number crunching.
The truth function calculus that processors use through the AND, OR, NOT
instructions is a useful but limited logic. A young German Mathematician
by the name of Kurt Godel produced a proof in 1931 that axiomatic systems
developed from the symbolic logic of Russell and Whitehead had boundaries
in their capacity to deliver true statements.
This became known as "The indeterminacy of Mathematics" and it helps to put
much of the hype about computers into perspective. The MAC user who asks
the question "Why won't this computer do what I think" reveals a disposition
related to believing the hype rather than anything intrinsic about the
68000 series Motorola processors.
Stripped of the hype surrounding processors and operating systems leaves the
unsuspecting programmer barefoot, naked and at the mercy of large greedy
corporations using their market muscle to extract more and more money by
ruthlessly exploiting the need to produce software that is useful.
Computer processors into the foreseable future will continue to be no more
than electric idiots that switch patterns of zeros and ones fast enough
to be useful. The computer programmer who will survive into the future is
the one who grasps this limitation and exploits it by learning the most
powerful of all computer languages, the processors NATIVE language.
The Empire Fights Back
~~~~~~~~~~~~~~~~~~~~~~
The Internet is the last great bastion of freedom of thought and this is
where the first great battle has been won.
The attempt to make the Internet into a corporate controlled desktop has
been defeated for the moment. Choose your browser carefully or you may
help shoot yourself in the foot by killing off the alternative.
Control of knowledge is the last defence of the major software vendors and
it is here where they are loosing the battle. The Internet is so large and
uncontrollable that the dispossessed who frequent its corridors have started
to publish a vast array of information.
Assembler is the "spanner in the works" of the greedy corporate sector.
There are some excellent technical works that have been contributed by
many different authors in the area of assembler. The very best in this
field are those who have honed their skills by cracking games and other
commercial software.
It should be noted that the hacking and cracking activities of the fringe
of computing is a different phenomenon to cracking games and commercial
software protection schemes. The fringe play with fire when they attack
security information and the like and complain when they get their fingers
burnt. The attempt by the major software vendors to label the reverse
engineering activities in the same class as the fringe is deliberate
disinformation.
These authors are at the leading edge of software research and like most
highly skilled people, their knowledge is given out freely and is not
tainted by the pursuit of money. It comes as no surprise that the corporate
sector is more interested in supressing this knowledge than they are in
supressing the WAREZ sites that give away their software for free.
The Comeback Trail
~~~~~~~~~~~~~~~~~~
Start with the collection of essays by the +ORC. You will find an incisive
mind that gives away this knowledge without cost. Start looking for some
of the excellent tools that can be found on the Internet ranging from
dis-assemblers to software in circuit emulators (SoftIce).
There are some brilliant essays written by _mammon on how to use SoftIce
which are freely available.
Dis-assemblers are a supply of enormous quantities of code to start
learning how to read and write assembler. The best starting point is the
nearly unlimited supply of DOS com files. This is for good reason in that
they are simple in structure being memory images and are usually very
small in size.
The other factor is an eye to the future. COM files are an escapee from
early eighties DOS programming where most PCs only had 64k of memory. This
meant that they are free of the later and far more complex segment
arithmetic that DOS and 16 bit Windows EXE files are cursed with.
The emerging generation of 32 bit files are called Portable Executables and
they are written in what is called FLAT memory model where there is no 64k
limit. COM files were restricted to 64k absolute but could directly read
and write anything in their address space.
A portable executable file has a very similar capacity except that in 32 bit
it can theoretically read and write anything within a 4 gigabyte address
space. In a very crude sense, PE files are 32 bit COM files but without
some of the other limitations.
A very good dis-assembler for COM files is SOURCER 7. Particularly in the
early stages of exploring the structure of COM files, its capacity to
add comments to the reconstructed source code make the code much easier to
read.
To start making progress, you will need an assembler. Although they are
getting a bit harder to get, you can still source either MASM or TASM and
start writing your own COM files. The generic "Hello World" example comes
with a lot less code than many would think.
;----------------------- Hello.ASM ----------------------------
com_seg segment byte public ; define the ONLY segment
assume cs:com_seg, ds:com_seg ; both code & data in same segment.
org 100h ; go to start adress in memory.
start:
mov ah, 40h ; the DOS function number.
mov bx, 1 ; the screen handle.
mov cx, 11 ; the length of the text to display.
mov dx, offset Greeting ; the address of the text.
int 21h ; get DOS to execute the function.
mov ax, 4Ch ; the TERMINATE process function.
int 21h ; call DOS again to EXIT.
Greeting db "Hello World",13,10 ; specify the text as byte data.
com_seg ends ; define the end of the segment.
end start
;----------------------------------------------------------------
This tiny program assembles at 31 bytes long and it makes the point that
when you write something in assembler you only get what you write without
a mountain of junk attached to it. Even in C, putting printf in a bare
Main function with the same text will compile at over 2k. The humourous
part is if you dump the executable, printf uses DOS function 40h to output
to the screen.
Once you assemble a simple program of this type, immediately dis-assemble
it and have a look at your program as it has been converted from binary back
to code again. This will train your eye into the relationship between your
written code and the results of dis-assembly.
This will help to develop the skill to dis-assemble programs and read them
when you don't have the source code. Once you start on the mountain of DOS
com files available, you will find that much of the code is very similar to
what you have written yourself and you get to see an enormous quantity of
well written code that you can learn from without having to pay one brass
razoo for the privilege.
Some people are slightly bemused by the +ORC's reference to Zen yet if it is
understood in the sense that the human brain processes data at a rate that
makes fast computer processors look like snails racing down a garden bed,
you will start to get the idea of "feeling" the code rather than just
munching through it like a computer does.
As you read and write more code your brain will start "pattern matching"
other bits of code that you have already digested and larger blocks of
code will start to become very clear.
Once you go past a particular threshold, the process of "data mapping" and
"model fitting" starts to occur. This is where you know enough to project
a model of what is happening and then test it to see if it work the way
you have modelled it. The rest is just practice and a willingness to keep
learning.
Once you get the swing of manipulating data in assembler, you will start to
comprehend the power and precision that it puts in your hands. Contrary to
the "buzz word" area of software where logic is couched in "Boolean" terms,
the foundation of logic is called "The law of excluded middle". In layman's
terms, something either is or it ain't but it can't be both.
George Boole and others like Augustus De Morgan developed parts of logic
during the nineteenth century but it was not until Russell and Whitehead
published "Principia Mathematica" shortly before the first world war that
logic became a complete and proven system. Russell based much of this
milestone in reasoning on a very important distinction, the difference
between EXTENSIONAL and INTENSIONAL truth.
Something that is spacio temporally "extended" in the world is subject to
the normal method of inductive proof where things that are "intension"
cannot be either proven or disproven.
Logic had been held back for centuries by the assumption that it was a
branch of metaphysics where Russell and Whitehead delivered the proof
that logic is "hard wired" into the world.
This is important to computing in very fundamental ways. The devices that
programming is about controlling are very "hard wired" in the way that they
work. Recognising the distinction between what the devices ARE as against
what some would like them to BE, or worse, the bullshit that is peddled
to an unsuspecting public about the "wonders" of computers, and you have
made one of the great leaps forward.
The devices are in fact very powerful at manipulating data at very high
speed and can be made very useful to the most poweful of all processors,
the conceptual apparatus of the brain using it.
The only reason why this distinction has ever been inverted is through the
greed and arrogance of corporate software vendors and their desire to
extract yet another quick and dirty buck.
In this sense, the large commercial vendors are in the same class as the
proliferation of low class smut vendors clogging up the Internet, they lure
with the promise of fulfilling the "lusts of the flesh" yet when they
extract the money that they are after, they leave their victims both poorer
and unfulfilled.
Most ordinary computer users part with reasonably large sums of money when
they buy a computer and load it with software yet the promise of fun,
convenience and usefulness is more often than not followed by the bugs,
crashes and defects in the software. A Faustian bargain where the hidden
cost is not seen until the money is handed over.
The EXIT clause for the programmers who are wise enough to see that their
skills are being deliberately eroded for marketing reasons is the most
powerful tools of all, the direct processor instructions that assembler
puts in their hands.
The time to make the move to learning assembler is not open ended. DOS is
a passing world and without the background of starting in a simpler world
that DOS has given us for so many years, the move to assembler will be
much harder and less accessible. There is probably only a couple of years
left.
If you are not robust enough to use the +ORC's formula for martinis, pure
malt has a glow to it that surpasses all understanding.
FUTURE VISION
+HCU papers
homepage
links
anonymity
+ORC
students' essays
academy database
antismut
tools
cocktails
javascript wars
search_forms
mail_fravia
Is reverse engineering illegal?