# 03 - And a whole byte is not enough ... | 2B or not 2B

So, friends, on April 1st, it's time to reveal the cards, what exactly is “ 2B or not 2B ”. This is a joint text from the author of the work.jin_x and already familiar to you grandfather unbeliever

image

Be sure to download the archive with the work on Pouet and read the introductory (April Fools') article, as well as comments on it. Watch the first video with a practical demonstration of how two-byte code works on x86. And only then try to master the entire text below.


Yes, 2B or not 2B is really an environment for launching various sizecoding works, very simple and, perhaps, the smallest of those existing. However, it has its own requirements and limitations.

If someone else has not caught it, tool 2b.com is launched from under DOS (DOSBox, FreeDOS, MS-DOS) and jumps to the command line area (at offset $ 82 * of the PSP segment), launching the code that was transferred to execution command line in binary. Actually, this code may well have the form of a string that can be typed on the keyboard (i.e., consist of ASCII characters with codes from 33 to 126), but more on that later.

* Hexadecimal numbers we will write in Pascal notation $ XX, this is convenient, and fasm allows you to do this.

What is important to know?


Null, we recommend using fasm as a compiler , all of our tools are written specifically for it.

Firstly, the main code can have a maximum size of 125 bytes (these are the restrictions on the length of the command line) and will start as a regular COM program, only with an offset of $ 82, not $ 100, as usual. Immediately after the main code, a carriage return (CR) symbol with code 13 ($ 0D) will be automatically added, and the command jmp short $82($ EB, $ 80) will be located at $ 100 .

Secondly, since the launch is supposed to be from a BAT file (well, or from the command line interpreter), the code should not contain some characters. First of all, these are I / O redirection characters ("<", ">" and "|"), as well as the character of parameter substitution and environment variables ("%"). In some systems (including Windows, supporting the launch of DOS-programs from under V86), the characters "&", "^" also have special meaning. Special characters with codes up to 32 are not supported by all DOS, and some are not supported by any or almost none (DOSBox has a particularly meager set), so we exclude all these characters too.

Thirdly, the starting values ​​of all the registers and flags are the same as when starting the COM program. In the vast majority of DOS, at the start it will be: ax = bx = 0 (almost always it is), cx = $ FF, dx = cs = ds = es = ss, si = $ 100, di = sp = $ FFFE (with enough operational memory), bp = $ 9XX (the low byte is different everywhere, but its high tetrad, that is, a nibble, usually = 1), flags cf = df = 0. To use it or not is up to you.

Most of all, the “second” point is confusing here, isn't it?
Suppose we need to write:

   mov ah,0
   int $16
   cmp al,27
   je x

And then right away there are 5 forbidden characters at once: 0 in mov ah,0, $ 16 in int $16, $ 3C (character "<") and 27 ($ 1B) in cmp al,27and some number with code <32 in je xif it xis located somewhere nearby in the code.

What to do? What can be replaced with other commands is replaced by:

  • instead mov ah,0we write xor ah,ahor even cbw(when possible);
  • instead cmp al,27we write not al+ sub al,not 27or xor al,not 27+ inc al, and even better (because here we need to wait for the key to be pressed and compare the received code with the ESC key code) - dec ah.

C is int $16more complicated, but if you think about it, then the xor ah,ah+ construction int $16can be replaced, for example, with mov ah,$83+ ror ah,1+ int $21.

It remains je $+10. There are at least 2 ways: either make a jump backward (to a sufficient distance), and from there go forward. Or replace the byte in the code. For example, you can write z: je ($*2+3)-x, but somewhere above: not byte [si-($100-(z+1))].

As a result, we get:

   not byte [si-($100-(z+1))]  ;  2-  ( ) je  si=$100
   mov ah,$83
   rol ah,1  ; ah=7
   int $21  ;   ,    al
   not al
   sub al,not 27  ; cmp al,27
z: je ($*2+3)-x  ;    x (   )

Alternative solutions


Of course, in the final intro for 100+ bytes of forbidden characters there can be quite a lot (for example, 15-20 and even more), and each time to make such manipulations is a rather dreary task, moreover, they often lead to an increase in the code length.

Therefore, you can resort to encryption. Either the whole code, or individual places. In the 2b_life.asm example , we encrypt all the code by adding $ AC to each byte. After the first encryption, we have about 4 forbidden characters left, which we could work out by replacing with other commands. Of course, the choice of encryption method (add, sub, xor, not, etc.), as well as the key, also takes time, but this is the lesser of all evils. The code of the decoder is only 8 bytes - this is quite acceptable in this situation. Encryption happens automatically using directives.repeat, loadand store(i.e. we get the code already encrypted).


Individual locations are encrypted in 2b_note.asm example . Here, again, with the help of repeat, loadand storethe value $ 3D is added to some bytes, and the list of addresses of these bytes is stored separately (1 byte of the address for each such byte). In total, we encrypt 20 bytes + 13 bytes are occupied by the decryptor. Yes, the first method was more economical :)

At the beginning of the article, we promised to talk about the code, which can take the form of a string consisting of ASCII characters with codes from 33 to 126 (so that it can be typed, for example, without special difficulties on the keyboard). This is possible, for example, if the code is encrypted using hexadecimal characters or the like. Yes, this is wasteful, but if encrypted using the BASE64 method, the expense may turn out to be even greater, because the decoder should only consist of such characters.


Tools


For the convenience of writing code under " 2B or not 2B " 4 files were created:

  • 2b.draft.asm – 128- BAT-, 2b ( , - , ). , .
  • 2b.draft44.asm – 44- , ASCII- 33 126. - : + «A» ( «A»...«P»), + «K» ( «K» «Z»). – 37 (+ 2 pusha + popa, ). : (125 — 37) / 2 = 44 ( 43, pusha + popa). :). , – 2b_snow.asm 2b_hello.asm
  • 2b.check.inc – include- . , , ( BAT-).
  • 2b.debug.inc – include- ( COM- BIN- ).

?


The set of existing platforms for which intras are written has remained virtually unchanged for many years, if you do not take into account the Wild category (AON, soldering irons, cotton buds). We offer you ... not just a new platform, but at least some variety, with its limitations. It is limitations and their overcoming that are the essence of the demoscene as a process. It would be nice to see the whole contest in the framework of this concept at the nearest demopati, where different authors will be able to try their hand at the “2B or not 2B compo” :)

--- EOF ---

#FF - And a whole byte is not enough ... | Pilot)
# 00 - ICBM ... | Invitation to Revision Online 2020
# 01 - IBMP ... | What are intro?
# 02 - The MBM ... | The Cross of Changes
# 03 - IBMP ... | 2B or not 2B
# 04 - The MBM ... | We take BC by the horns
# 05 - ICBM ... | Anime
# 06 - IBMP ... | Meteorisms

Grandfather’s Entertainment Channel in Telegram:teleg.run/bornded

There is a chat next to the channel. In it, you can try to raise questions for the demoscene, assembler, pixel art, tracker music and other aspects of the processes. You can be answered or sent to other, more thematic chats.

So they won - so we win!

All Articles