crunchi7.htm: Little essay about the various methods and viewpoints of crunching

+HCU papers

Little essay about the various methods and viewpoints of crunching
~ version December 1998 ~
by Joa ----- Part. VII

Courtesy of fravia's page of reverse engineering

Well, Joa continues his fundamental paper on crunching, this is part VII
enjoy!

12 December 98 Joa ~ crunchi1.htm Little essay about the various methods and viewpoints of crunching papers ~ fra_0126

10 June 98 Joa ~ crunchi2.htm Little essay about the various methods and viewpoints of crunching II papers ~ fra_0129

17 June 98 Joa ~ crunchi3.htm Little essay about the various methods and viewpoints of crunching III papers ~ fra_012E

17 June 98 Joa ~ crunchi4.htm Little essay about the various methods and viewpoints of crunching IV papers ~ fra_012F

17 June 98 Joa ~ crunchi5.htm Little essay about the various methods and viewpoints of crunching V papers ~ fra_012F

17 June 98 Joa ~ crunchi6.htm Little essay about the various methods and viewpoints of crunching VI papers ~ fra_012F


Little essay about the various methods and viewpoints of crunching.

Part VII: Arithmetic Crunching (crunching bits apart...)



By Joa Koester (JoKo2000@HotMail.Com)

This essay is free to be copied and read and spread as long as you are decent enough to leave my name in it. If you have corrections, comments, better examples than the ones used in this essay etc. - drop me a line. hello, hello, well, shave my legs and call me Grandpa if that wasn't a long delay, but i had some serious difficulties accessing the Internet recently. I hope that i can access you more frequently in the near future. And to sweet up the time till then a little bit here my next essay about crunching methods: Arithmetic Crunching! What is Arithmetic Crunching? Arithmetic Crunching is based on the observation that a certain floating point number decrements to a much smaller degree the higher the factor is with which you multiply it. For us that means, that that arithmetic crunching a based on probabilities like Huffman. But as you will see, much better. Wanna example? Well, imagine a floating point number. Say, we imagine 1,00. Then imagine we have a multiplicating factor like 0,9. And now watch what happens when we multiply the 1,00 with 0,9 and then multiply the result with 0,9 again and again and ... and that some times: 1,00 * 0,9 = 0,9 0,9 * 0,9 = 0,81 0,81 * 0,9 = 0,729 ... as we continue we see that the value of the number decrements. But it decrements SLOWLY. And it decrements with the factor 0,9. This is the first key element for arithmetic crunching: you multiply a number with a factor. The higher the factor the slower the number decrements, the better the crunching. You want a practical example? Coming up... Have a look at the following byte sequence: aaaaaaaaaabbbbcccddd which consists of four chars. Counting the appearences of the char reveals the following: 10 x a 4 x b 3 x c 3 x d ------- 20 bytes As we know that Arithmetic Crunching has something to do with probabilities, let's build up a table with the probabilites of the chars: a = 50% b = 20% c = 15% d = 15% And to explain the mechanics, let's start at the most basic implementation. The basic idea is to transform the probabilities into slots, starting - for now - in floating point mode in the range from 0,0 to 1,0. So let's scale this table into a range from 0.0 to 1.0, giving the lowest value the starting range 0.0: Byte | % | Low | High a | 50% | 0.50 | 0.999999(999...) b | 20% | 0.30 | 0.499999(999...) c | 15% | 0.15 | 0.299999(999...) d | 15% | 0.00 | 0.149999(999...) So we have the char 'a' with the range from 0.5 till 0.99999. But what does that bring us? Well, we have now TWO values and we automatically have a RANGE, namely the range of the two actual calculated values. That means we have three values that are calculated: the actual range, the actual Low and the actual High. The actual range is the range that is later saved bytewise to disk, the actual High and Low are the parts that change according to the crunched byte. Just watch as we step thru the byte-sequence: (We start with Low = 0.0 and High with 1.0) Low: 0.0 High: 1.0 Range: (High - Low) = 1.0 char: a LowOfChar: 0.5 HighOfChar: 0.99999(999...) High= Low + (Range * HighOfChar)= 0.0 + (1.0 * 0.99999) = 0.99999 Low = Low + (Range * LowOfChar) = 0.0 + (1.0 * 0.5) = 0.5 What happened here? Well at first we calculated the actual RANGE. This range is our arithmetic number, yielding the values of the chars we are crunching. Then we calculate the High and then the Low. These are build upon the slots (the ranges) of the actual char. Because we are having TWO values we have to calculate also two new values. The High is to be calculated with the high position of the actual slot (0.99999 for the char a, 0.49999 for the char b, etc.). We multiply the high and low position with the actual range, because this range is recreated dynamically with every byte we crunch. At first the range is 1,0. But with every byte the range becomes smaller and smaller until a byte or so has to emitted to give space again and the range is shifted upwards again. If for example your range is 0.00025 that would mean that your High and Low are only 0.00025 away. At this point we have to emit the first number(s) of Low to disk and shift the rest up until we are behind the decimal point again. So, if Low was 0.12325 and High was 0.12350, our RANGE would be 0.00025. Too small to continue. First we have to emit something and then shift the rest upwards again. We emit for example 0.12 to disk and subtract this from Low and High. Low and High would then be 0.00325 / 0.00350. Then we have to shift the values up again until they are behind the decimal point: 0.325 / 0.350. The next bytes you crunch will more or less slowly bring the values nearer again and then we emit again some values, probably 0.3 and shift the rest up again, and so on. How many decimal points you emit to disk depends on your personal ideas. You could send as much decimal numbers to disk until there is a difference in High and Low (so in the example 0.12350 and 0.12325 we could have sent 0.123 to disk because these are the values that aren't different) or you could send a certain number of numbers to disk and shift upwards until a certain difference is ensured. Or something completely different. It's up to you. We calculate the High first, because the High-Value is calculated on the actual Low-Value. If we would calculate the Low-Value first, we couldn't calculate our High-Value anymore. When we calculate these values we take the actual range, multiply it with the factor(s) of the actual char and give the result back into High or Low. What that mechanism will do will become clearer when we watch the crunching of the second byte: Low: 0.5 High: 0.99999 Range: (High - Low) = 0.49999 char: a LowOfChar: 0.5 HighOfChar: 0.999999 High= Low + (Range * HighOfChar)= 0.5 + (0.49999 * 0.99999) = 0.999985 Low = Low + (Range * LowOfChar) = 0.5 + (0.49999 * 0.5) = 0.749995 What you see is that the number is still in the range of the char 'a'. What will happen when we crunch the next 8 a's ? I reckon that the number will get nearer to each other. But still it will be in the range of 0.5 - 0.99999. What we could use now is an example consisting of just 4 chars: aabc Ok. Now build the probability table: a = 50% b = 25% c = 25% And next the transforming-table from the percentages into the range of 0.0 - 1.0. Byte | % | Low | High a | 50% | 0.50 | 0.99999 b | 25% | 0.25 | 0.49999 c | 25% | 0.00 | 0.24999 Now again, watch the ranges as we skip the first two crunching steps as the result is identical with the first example. Let's watch what happens when we now reach the 'b': Low: 0.749995 High: 0.999985 Range: (High - Low) = 0.24999 char: b LowOfChar: 0.25 HighOfChar: 0.49999 High= Low + (Range * HighOfChar)= 0.749995 + (0.24999 * 0.49999) = 0.8749875 Low = Low + (Range * LowOfChar) = 0.749995 + (0.24999 * 0.25) = 0.8124925 Phew. As you can see the values made some heavy jumps. The High sank from 0.999985 down to 0.8749875, while the Low jumped from 0,749995 up to 0,8124925. I think i will not surprise you when i say the the next step will go in the same direction: Low: 0.8124925 High: 0.8749875 Range: (High - Low) = 0.062495 char: b LowOfChar: 0.0 HighOfChar: 0.24999 High= Low + (Range * HighOfChar)= 0.8124925 + (0.062495 * 0.24999) = 0.828115625 Low = Low + (Range * LowOfChar) = 0.8124925 + (0.062495 * 0.0) = 0.8124925 Oh boy, oh boy, oh boy. Look, what we have here. The Low and the High are damn near to each other. But the bytes are thru. The crunching is over. And what now? Well, all we have to do now is to emit the Low and that's all. We are done. The sequence aabc is coded into 0.8124925! It is most important to understand that we are not emitting a number, but a collection of RANGES!!! When we decrunch the number it will become even clearer: (For now the decoder is supposed to know the model) Byte | % | Low | High a | 50% | 0.50 | 0.99999 b | 25% | 0.25 | 0.49999 c | 25% | 0.00 | 0.24999 0.8124925 is in the range of the char a. So we output the char 'a'. And then? It's simple. We subtract the Low of the char and shift the rest upwards by multiplying it with the RANGE: .99999 - .5 = .49999 = Range .8124925 - .5 (=LowOfChar) = .3124925 .3124925 / .49999 (=Range) = .6249975 (Remember: dividing by a fraction is the same as multiplying with it's reciprocal. Dividing by .49999 is nearly the same as multiplying with 2.0, meaning for us the shifting we searched for.) .6249975 is in the range of the char a. So we output the char 'a'. And proceed as before: .99999 - .5 = .49999 = Range .6249975 - .5 (=LowOfChar) = .1249975 .1249975 / .49999 (=Range) = .25 Well, 0.25 is clearly in the range of b and so we emit a 'b'. Then we proceed: .49999 - .25 = .24999 = Range .25 - .25 (=LowOfChar) = .0 .0 / .24999 (=Range) = .0 0.0 is in the range of c and we emit a 'c'. After proceeding we recognize that we are finished and stop working. Again: It is most important to notice that in the crunching process we emit RANGES, or maybe better formulated, interpretations of ranges. Therefore we emit only the Low of the range in question. As a consequence when we decrunch the 'number' we subtract the LowOfRange of the char in question. After that we can shift the 'number' up by the RANGE of the char. Now two problems are emerging: Is it necessary to emit the model to the decruncher and second: How do we implement a floating point number that can be as long as some hundreds of kilobyte when we only have registers of maybe 80 or 128 bits? To answer the first question: Well, what about an adaptive model? It will not crunch efficiently in the first few hundreds bytes but then we won't crunch just texts of 200 bytes, right? The answer to the second question is a little bit more complex. Of course we can't emit a floating point number consisting of thousands of decimal numbers. But with a mathematical trick we can fake this process. I'm sure that nearly all of you once tried to programm some 3D stuff. And after some time you came up with the idea to shift up the floating point numbers by, say, 16384, and calculate then with these now integer values. After the calculation you'd then shift the result down again by 16384 and would have speeded up your floating point calculations by the factor 1000 or so. Now, based on this we could also say that the number 1.0 could be same as 0x010000 and 0.0 could be 0x0000. We would then transform the whole calculations into the realm of integer values. We can only hope that 16 bits are enough to simulate the floating point calculations, but let's think our example thru again: Byte | % | Low | High a | 50% | 0x008000 | 0x00ffff b | 25% | 0x004000 | 0x007fff c | 25% | 0x000000 | 0x003fff First byte: Low: 0x000000 High: 0x010000 Range: (High - Low) = 0x010000 char: a LowOfChar: 0x008000 HighOfChar: 0x00ffff High= Low + (Range * HighOfChar)= 0 + (0x010000 * 0x00ffff) = BANG Ooops. It seems that we have a little overload here. Hm, but there must be a way to calculate those values in integer format. It it most necessary for to output single bytes to disk. What do we do? Well, there are certain ways out of this. I will tell you one i find extremely elegant and also has the advantage of being extensible and able to crunch SIMILAR values, too. This solution comes from Gordon V. Cormack University of Waterloo cormack@uwaterloo.ca and is build upon the idea that you can see the bytes you crunch also as a sequence of BITS you crunch. Now when you observe the 8 bits of a byte and would log the countings of the 1-bit and 0-bit states of the byte you would have a High and a Low. They would just be transformed into 256 entries of two arrays. In C/C++ these array would be declared: int one[256], int zero[256]. When we then would examine one single bit of a byte we would then have to fetch the correct entry of the array and calculate the actual fraction with the values of the entries in the arrays one and zero. What we would get out of this calculation would be the factor to multiply with the actual range. Well, i include the sources for both the cruncher and the decruncher at the end of this essay (they were not formatted by me). Let's examine the source together: int max = 0x1000000, min = 0, mid, index, c, i, bytes = 0, obytes = 3; int bit; int one[256], zero[256]; for (i=0;i<256;i++) { one[i] = 1; zero[i] = 1; } Ok, some ints are created and the arrays are build and initialized. Why they are initialized with 1 is due to the fact that they are used to be divided with. Were they not initialized it would create an error "Division by Zero". for(;;){ c = getchar(); if (c == EOF) { min = max-1; fprintf(stderr,"compress done bytes in %d bytes out %d ratio %f\n", bytes,obytes, (float)obytes/bytes); break; } ... So, forever (or until End Of File) we read a char and if it's EOF we break the loop and land here: putchar(min>>16); putchar((min>>8) & 0xff); putchar(min & 0x00ff); } So we at least output three bytes. Again you can see that we output from the min-value (which is of course our Low). But that is the starting and the end of the loop. What happens in the normal course: bytes++; for (i=0;i<8;i++){ bit = (c << i) & 0x80; index = (1<> (8-i)); mid = min + ((max-min-1)*((float)zero[index]) / (one[index]+zero[index])); if (mid == min) mid++; if (mid == (max-1)){ /* should never happen, but with floating pt? */ mid--; } if (bit) { min = mid; one[index]++; } else { max = mid; zero[index]++; } while ((max-min) <256) { max--; putchar(min >> 16); obytes++; min = (min <8) & 0xffff00; max = ((max << 8) & 0xffff00) ; if (min >= max) max = 0x1000000; } } Uh, looks complicated. Let's examine it step by step and first up reformat the source: /*-----------------------------------------------------------------------------*/ bytes++; for (i=0;i<8;i++) { bit = (c << i) & 0x80; index = (1<> (8-i)); /*--------------------------------------------------------------------------*/ mid = min + ((max-min-1)*((float)zero[index]) / (one[index]+zero[index])); if (mid == min) mid++; if (mid == (max-1)) { /* should never happen, but with floating pt? */ mid--; } /*--------------------------------------------------------------------------*/ if (bit) { min = mid; one[index]++; } else { max = mid; zero[index]++; } /*--------------------------------------------------------------------------*/ while ((max-min) <256) { max--; putchar(min >> 16); obytes++; min = (min <8) & 0xffff00; max = ((max << 8) & 0xffff00) ; if (min >= max) { max = 0x1000000; } } /*--------------------------------------------------------------------------*/ } Ok, let's examine the source from easy to heavy: while ((max-min) <256) { max--; putchar(min >> 16); obytes++; min = (min <8) & 0xffff00; max = ((max << 8) & 0xffff00) ; if (min >= max) { max = 0x1000000; } } should mean that, if the difference between High and Low is less than 256 we emit the highest byte and shift up the rest. For the case that this difference is still present after one shifting the code is packaged into a loop. max is ensured to be greater than min. Next one: if (bit) { min = mid; one[index]++; } else { max = mid; zero[index]++; } means that if the actual bit of the actual byte is set, the Low is set to the variable mid and the specified entry of the array one is incremented by one. If however the bit is not set we set the High down to mid and increment the specified entry in the array zero. The value for mid is calculated here: for (i=0;i<8;i++) { bit = (c << i) & 0x80; index = (1<> (8-i)); /*--------------------------------------------------------------------------*/ mid = min + ((max-min-1)*((float)zero[index]) / (one[index]+zero[index])); if (mid == min) mid++; if (mid == (max-1)) { /* should never happen, but with floating pt? */ mid--; } /* ... */ } Again, let's examine this code backwards: if (mid == min) mid++; if (mid == (max-1)) { /* should never happen, but with floating pt? */ mid--; } are just insurances that further calculations won't go beyond the boundaries of the cruncher. mid = min + ((max-min-1)*((float)zero[index]) / (one[index]+zero[index])); based on min (Low) we calculate a new value. The formula becomes clearer when we do it in two steps: float factor = ((float)zero[index]) / (one[index] + zero[index]); mid = min + (max-min-1) * factor; So the factor is the actual count of the zero-bits of this actual bit of the actual char and some similar chars divided by the sum of the actual count of the zero-bits plus the one-bits of the actual bit of the actual char and similar chars. Sounds heavy, eh? This factor is one key-element of the actual implementation. If you would create another formula for a better factor, you could reach a better compression, as the prof already said. But back to the code. This factor is multiplied with the actual range. Then this result is added to the actual min and is put into the variable called mid which is then used to alter the value of min or max as shown above. bit = (c > (8-i)); are examples why C/C++ is loved and hated by it's programmers / opponents. Well, the first line is pretty easy. It's a bit-test to check, whether the i. bit of the char is set or not. In Assembler you could do a simple "bt reg, i" - however after that instruction the variable bit contains a 1 or a 0. The second line is a little bit more complicated. It can only really be understood if we simulate a char in the loop. Let's say we crunch the char 'a' and we would only have this line in our loop. The value for index would change as follows: char | i | index | | 01000001 = 65 | 0 | (1<0) - 1 + (65 >> (8-0)) = 0 + (65 >> (8)) = 0 01000001 = 65 | 1 | (1<1) - 1 + (65 >> (8-1)) = 1 + (65 >> (7)) = 1 01000001 = 65 | 2 | (1<2) - 1 + (65 >> (8-2)) = 3 + (65 >> (6)) = 4 01000001 = 65 | 3 | (1<3) - 1 + (65 >> (8-3)) = 7 + (65 >> (5)) = 9 01000001 = 65 | 4 | (1<4) - 1 + (65 >> (8-4)) = 15 + (65 >> (4)) = 19 01000001 = 65 | 5 | (1<5) - 1 + (65 >> (8-5)) = 31 + (65 >> (3)) = 39 01000001 = 65 | 6 | (1<6) - 1 + (65 >> (8-6)) = 63 + (65 >> (2)) = 79 01000001 = 65 | 7 | (1<7) - 1 + (65 >> (8-7)) = 127 + (65 >> (1)) = 159 a 'b' would deliver the following results: char | i | index | | 01000010 = 65 | 0 | (1<0) - 1 + (66 >> (8-0)) = 0 + (66 >> (8)) = 0 01000010 = 66 | 1 | (1<1) - 1 + (66 >> (8-1)) = 1 + (66 >> (7)) = 1 01000010 = 66 | 2 | (1<2) - 1 + (66 >> (8-2)) = 3 + (66 >> (6)) = 4 01000010 = 66 | 3 | (1<3) - 1 + (66 >> (8-3)) = 7 + (66 >> (5)) = 9 01000010 = 66 | 4 | (1<4) - 1 + (66 >> (8-4)) = 15 + (66 >> (4)) = 19 01000010 = 66 | 5 | (1<5) - 1 + (66 >> (8-5)) = 31 + (66 >> (3)) = 39 01000010 = 66 | 6 | (1<6) - 1 + (66 >> (8-6)) = 63 + (66 >> (2)) = 79 01000010 = 66 | 7 | (1<7) - 1 + (66 >> (8-7)) = 127 + (66 >> (1)) = 160 a 'c' would deliver the following results: char | i | index | | 01000011 = 67 | 0 | (1<0) - 1 + (67 >> (8-0)) = 0 + (67 >> (8)) = 0 01000011 = 67 | 1 | (1<1) - 1 + (67 >> (8-1)) = 1 + (67 >> (7)) = 1 01000011 = 67 | 2 | (1<2) - 1 + (67 >> (8-2)) = 3 + (67 >> (6)) = 4 01000011 = 67 | 3 | (1<3) - 1 + (67 >> (8-3)) = 7 + (67 >> (5)) = 9 01000011 = 67 | 4 | (1<4) - 1 + (67 >> (8-4)) = 15 + (67 >> (4)) = 19 01000011 = 67 | 5 | (1<5) - 1 + (67 >> (8-5)) = 31 + (67 >> (3)) = 39 01000011 = 67 | 6 | (1<6) - 1 + (67 >> (8-6)) = 63 + (67 >> (2)) = 79 01000011 = 67 | 7 | (1<7) - 1 + (67 >> (8-7)) = 127 + (67 >> (1)) = 160 and a 'd' would give us this: char | i | index | | 01000100 = 68 | 0 | (1<0) - 1 + (68 >> (8-0)) = 0 + (68 >> (8)) = 0 01000100 = 68 | 1 | (1<1) - 1 + (68 >> (8-1)) = 1 + (68 >> (7)) = 1 01000100 = 68 | 2 | (1<2) - 1 + (68 >> (8-2)) = 3 + (68 >> (6)) = 4 01000100 = 68 | 3 | (1<3) - 1 + (68 >> (8-3)) = 7 + (68 >> (5)) = 9 01000100 = 68 | 4 | (1<4) - 1 + (68 >> (8-4)) = 15 + (68 >> (4)) = 19 01000100 = 68 | 5 | (1<5) - 1 + (68 >> (8-5)) = 31 + (68 >> (3)) = 39 01000100 = 68 | 6 | (1<6) - 1 + (68 >> (8-6)) = 63 + (68 >> (2)) = 80 01000100 = 68 | 7 | (1<7) - 1 + (68 >> (8-7)) = 127 + (68 >> (1)) = 161 Now you see why i emphasized so much on the similar chars. The index is in a lot of cases identical and gives us a great range of possibilities. In combination with the (non-)setting of the corresponding bit the zero or the one-array is incremented exactly at this index. So, the more letters we have the more the entries 0, 1, 4, 9 and 19 will increment. The rest will be incremented according to the actual char value. But the values in these entries will grow and grow and lead to a big fraction - and so to a slower nearing of the Low and High values. If you for example crunch a normal text then the bit 7 will never be set because of the ASCII-Value of the normal letter values. Personally this way of scanning thru bytes inspired me a lot. I hope it does the same for you. This implementation is pretty neat and you should give it a try in a debugger. It's very interesting watching the value of mid, min and max grow and shrink until the first byte of min is emitted and the whole stuff shifted upwards. The decruncher is also extremely interesting as it recreates the first byte from the settings of the first three bits emitted. It's a little bit like magic, but purely mathematics. It can't be stressed enough that the Artithmetic Crunching just emits the lower interpretation of a range and that this range is then shifted upwards to give enough 'space' again for the next bytes to be crunched. So, we are not dealing with a number but with a folded and shifted heap of RANGES of certain chars. Well, uhm, i have a question... Yes, Watson? If Arithmetic Crunching is superior to Huffman, why isn't it more often used? I only see Huffman but never saw AC somewhere in file formats or so. Good question, Watson. The answer is: the basic algorithm is patented. If you ever implement a file format using AC and this file format is used in a commercial way you have to pay license fees to certain patent holders. If this was not the case our JPGs would be around 10% to 20% smaller. But, what the heck, this here is just for research purposes, isn't it? :) OK, enough talking, next time we talk about the most recent developement in crunching business: the Burrows-Wheeler-Transformation (tataaa) Till then hasta la pasta, Joa /*----------------------------------------------------------------------------*/ /* Arithmetic Coding Implementation - Compression version 0.0.0 Copyright Feb. 1993 Gordon V. Cormack Feb. 1993 University of Waterloo cormack@uwaterloo.ca All rights reserved. This code and the algorithms herein are the property of Gordon V. Cormack. Neither the code nor any algorithm herein may be included in any software, device, or process which is sold, exchanged for profit, or for which a licence or royalty fee is charged. Permission is granted to use this code for educational, research, or commercial purposes, provided this notice is included, and provided this code is not used as described in the above paragraph. */ /* This code uses a one-byte finite state predictor to drive an arithmetic coder for data compression. It should give compression nearly identical to one-byte huffman coding. Find a better predictor, and you'll have a better compressor! It handles end-of-file properly, which requires more than 5 minutes thought. */ #include main(){ int max = 0x1000000, min = 0, mid, index, c, i, bytes = 0, obytes = 3; int bit; int one[256], zero[256]; for (i=0;i<256;i++) { one[i] = 1; zero[i] = 1; } for(;;){ c = getchar(); if (c == EOF) { min = max-1; fprintf(stderr,"compress done bytes in %d bytes out %d ratio %f\n", bytes,obytes, (float)obytes/bytes); break; } bytes++; for (i=0;i<8;i++){ bit = (c << i) & 0x80; index = (1<> (8-i)); mid = min + ((max-min-1)*((float)zero[index]) / (one[index]+zero[index])); if (mid == min) mid++; if (mid == (max-1)){ /* should never happen, but with floating pt? */ mid--; } if (bit) { min = mid; one[index]++; } else { max = mid; zero[index]++; } while ((max-min) <256) { max--; putchar(min >> 16); obytes++; min = (min <8) & 0xffff00; max = ((max << 8) & 0xffff00) ; if (min >= max) max = 0x1000000; } } } putchar(min>>16); putchar((min>>8) & 0xff); putchar(min & 0x00ff); } /* Arithmetic Coding Implementation - Expansion version 0.0.0 Copyright Feb. 1993 Gordon V. Cormack Feb. 1993 University of Waterloo cormack@uwaterloo.ca All rights reserved. This code and the algorithms herein are the property of Gordon V. Cormack. Neither the code nor any algorithm herein may be included in any software, device, or process which is sold, exchanged for profit, or for which a licence or royalty fee is charged. Permission is granted to use this code for educational, research, or commercial purposes, provided this notice is included, and provided this code is not used as described in the above paragraph. */ #include main(){ int max = 0x1000000, min = 0, mid, val, index, i; int bit; char c; int one[256], zero[256]; for (i=0;i<256;i++){ one[i] = 1; zero[i] = 1; } val = getchar()<<16; val += getchar()<<8; val += getchar(); while(1) { c = 0; if (val == (max-1)) { fprintf(stderr,"expand done\n"); break; } for (i=0;i<8;i++){ index = (1<= mid) { bit = 1; min = mid; one[index]++; } else { bit = 0; max = mid; zero[index]++; } c = c + c + bit; while ((max-min) <256) { max--; val = (val << 8) & 0xffff00 | (getchar()& 0xff); min = (min << 8) & 0xffff00; max = ((max << 8) & 0xffff00) ; if (min >= max) max = 0x1000000; } } putchar(c); } }

homepage

links

anonymity

+ORC

tools

bots wars

mail_fravia

Is reverse engineering legal?

12 December 98	Joa	~	crunchi1.htm	Little essay about the various methods and viewpoints of crunching	papers	~	fra_0126
10 June 98	Joa	~	crunchi2.htm	Little essay about the various methods and viewpoints of crunching II	papers	~	fra_0129
17 June 98	Joa	~	crunchi3.htm	Little essay about the various methods and viewpoints of crunching III	papers	~	fra_012E
17 June 98	Joa	~	crunchi4.htm	Little essay about the various methods and viewpoints of crunching IV	papers	~	fra_012F
17 June 98	Joa	~	crunchi5.htm	Little essay about the various methods and viewpoints of crunching V	papers	~	fra_012F
17 June 98	Joa	~	crunchi6.htm	Little essay about the various methods and viewpoints of crunching VI	papers	~	fra_012F