Befunge compiler in Python

In preparation for the course "Fundamentals of Compilers" for 4th year students, I studied various esoteric programming languages. Here is a good article on this topic . The Befunge language (Chris Press, 1993) seemed to me the most interesting in the article, I especially note three of its features:

  1. The program field is a two-dimensional torus, i.e. physically, this is a rectangular matrix of symbol commands closed along the upper (lower) boundary and along the left (right) column. The command pointer moves around the field (each command is a certain character with x, y coordinates), executes the command, and moves on. The movement can be in all 4 directions (by default, right from the point 0,0), and when you go beyond the "field" the pointer appears on the opposite side.
  2. There are two commands (p, g) in the language that change the field itself, i.e. the program is "self-rewriting" during execution. The program code at the start may not be equal to the code at the finish. The program “123pgpg ## @” started, and the program “ABC @ 1 @ 2 @ 3.14” (not a correct example) finished working.
  3. Chris Pressy noted that he wanted to create a language that was as complex as possible to compile. De facto, it’s true, creating a compiler that makes exe files hellishly difficult for the program, I found information that someone could do it in C ... It’s best to create a translator from the language to the Python code, which I still call compiler for simplicity.


The 1993 program field consists of 25 lines of 80 characters each. There are 36 teams in the language, each of which is an ASCII table character. For more information, see Wikipedia , I will give a brief description from there:

Move commands (9):

> Move right
<Move left
^ Move up
v Move down
_ Move right if the top of the stack is 0, otherwise left.
| Move down if on top of stack 0, otherwise up.
? Move in a random direction
# Skip the next cell (“springboard”)
@ End of program

Stack manipulation commands (3)

:: Place a copy of the vertex on the stack
\ Swap vertex and vertex
$ Delete vertex

Commands modification of the program code (2):
p “PUT”: cell coordinates and ASCII code of the character that is placed at these coordinates
are extracted from the stack g “GET”: cell coordinates are extracted from the stack; The ASCII code of a symbol at these coordinates is pushed onto the stack.

Constant commands (2):

0-9 Place a number on the
Start / End of symbol mode stack , in which ASCII codes of all current program characters are

pushed onto the stack . Arithmetic operations (5):

+ Addition of a vertex and a vertex
- Subtraction of a vertex and a vertex
* Multiplication of a vertex and a vertex
/ Integer division
% Remainder of division

Commands for the stack and logical operations (2)

:! Negation: zero on the vertex is replaced by 1, nonzero value is replaced by 0.
Comparison “greater than”: if the vertex is larger than the vertex, 1 is placed on the stack, otherwise 0

I / O commands (4):

& Request the user for a number and place it on the stack
~ Ask the user for a character and put on the stack its ASCII code
. Print the top of the stack as an integer
, Print the character corresponding to the ASCII code on the top of the stack

I decided to write a Befunge compiler (interpreter) in Python according to the rules of 1993 with a couple of restrictions: 1) the field is not 25x80 characters, but the minimum in width and height of the text block, 2) the field is not looped into a torus, i.e. going beyond borders with jumping to the opposite side is not processed. This is not laziness (though, whom am I kidding?), For small examples everything works fine, and to finish the field to a real torus is quite simple, there would be a desire.

The code came out in some places unnecessarily “on the forehead”, but this is due to the fact that it is for students and its task is to be as clear as possible, and not shortened to a couple of lines in Chinese.

Part 1


The code is provided from beginning to end with one exception (which will be specified), it can be copied to a file and run. The full text is available at the link rentry.co/ivansedov-befunge , it’s too early for me to give all the best on GitHub. By the way, there are about 20 Befunge language implementations there, but the code is either in C (not my language) or Python, but so complicated that I did not dare to dive. However, there you can take sample programs for tests, for example, here github.com/causal-agent/befungee

from sys import *
import time
import random

Import the libraries:

  1. The sys library is needed in order to get the name of the program file. My compiler was called bbb.py, a test example in the same directory 1.bf, Python version 3.7 itself, and so the program call in the console looked like this: python3 bbb.py 1.bf
  2. time , , 0,5-1,0 .
  3. random «?» ( ), . -, Befunge - , « » (1 , 2 , 3 , 4 ). « » .

class Pointer:
    def __init__(self, x=0, y=0, vector=2, value=None):
        self.x = x
        self.y = y
        self.vector = vector
        self.value = value
        self.stack = []
        self.stack_sf = 0

    def __str__(self):
        return 'Point ({},{}) vektor:{} value:{} stack_sf:{} stack:{}'.format(self.x, self.y, self.vector, self.value, self.stack_sf, self.stack)

    def step(self):
        if self.vector == 1:
            self.x -= 1
        elif self.vector == 2:
            self.y += 1
        elif self.vector == 3:
            self.x += 1
        elif self.vector == 4:
            self.y -= 1

    def action(self):
	#   ,   

The main class used in the program: Pointer (pointer), has 6 properties and 2 methods. Properties: 1) x coordinate (initially = 0), 2) y coordinate (initially = 0), 3) vector (direction of movement, initially = 2 to the right), 4) value (field value, which is located at the x, y coordinate, initially = None), this is, in fact, the command to be executed, 5) stack (program stack, initially = []) and 6) stack_st (flag for entering lines, initially = 0).

The class has two methods: 1) step () the pointer step, without any checks and tests, changes the x, y coordinates depending on the direction in vector and 2) action () is the heart of the compiler, executing the current program command. I will give the action () code in the second part, so that it does not distract from the logic.

# ===============================
# =                      =
# ===============================

def open_file(name):
    data = open(name, 'r').read()
    return data

def field_print():
    for row in A:
        for elem in row:
            print(elem, end='')
        print()  

Two auxiliary functions: 1) open_file (name) opens the file by the submitted name, reads it and returns the contents, and 2) field_print () prints the contents of array A, in which the program characters are located. Creating array A is shown below.

# ===========================================
# =                       =
# ===========================================

numbers = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
operators = ['+', '-', '*', '/', '%']
point = Pointer()                           #  
text = open_file(argv[1]).split("\n")       #  
n = len(text)                               # n =  
m = 0                                       # m =  

#    (   m)
for line in text:
    if len(line) > m:
        m = len(line)

#   ( n  m )
A = [' '] * n
for i in range(n):
    A[i] = [' '] * m

#    
for i in range(len(text)):
    for j in range(len(text[i])):
        A[i][j] = text[i][j]

Basic program settings. In the numbers list we put all the numbers from 0 to 9, which can be written in the program (10 will no longer fit, two characters). In the list of operators we put the basic arithmetic operators. Create a point = Pointer object (default settings), which we will work with all the time ...

In the variable text we put the read text of the executable program, broken by the “new line” symbol. As a result, text consists of several lines of text of different lengths, which must be placed in a rectangular array of spaces in their places. It is easy to find the number of lines n = len (text), but the number of columns must be calculated based on the maximum length of the lines included in text. I did not find another way to do this forehead: go through all the lines and find the one with the maximum length. Having on hand n and m (the number of rows and the number of columns of the future field), you can make a two-dimensional array, fill it with spaces, and then go over text to place the characters in their places.

The result is a rectangle of characters between which spaces and all this is a two-dimensional matrix n by m. After these settings, you can call the field_print () function and make sure that everything looks beautiful, nothing has floated and has not violated the situation.

# ==========================================
# =                       =
# ==========================================

# field_print()

while point.value != '@':
    try:
        point.value = A[point.x][point.y]       # 1)    point
        point.action()                          # 2)  
        # print(point)                            # 3) :  
        # time.sleep(0.5)                         # 4) :  0.5 c
    except IndexError:                          #     =  
        print('     ')
        break

# field_print()
print()

Everything ends with the main program (cycle), before and after which you can display the field (sometimes useful). The cycle spins until the pointer points to the “@” symbol (ampersand, dog, end of program). Inside the cycle, 4 actions are done each time:

  1. In the point.value property, the character that is located in array A at the coordinates point.x, point.y is read
  2. The point.action () method is called, in which the current (just read) command is executed
  3. A pointer is displayed on the screen (all properties)
  4. A delay is made before the next iteration (0.1 sec - 0.5 sec)

Items 3 and 4 are completely optional (they are even commented out), but I recommend using them for testing. All actions inside the loop occur with catching an IndexError error (error exceeding the limits of the index), this allows you to intercept two main compiler errors:

  1. We turned to the stack, and there are no values
  2. We accidentally went beyond the program (array) in width or height

The last empty print () is needed so that after the output is displayed, the console works from a new line.

Part 2


Now it's time for the most important code - the contents of the point.action () method of the Pointer class. Everything below should be inserted where it was written:

    def action(self):
	#   ,   

Pay attention to the indentation:

        if self.value == '"' and self.stack_sf == 0:                # ,  "
            self.stack_sf = 1
        elif self.value == '"' and self.stack_sf == 1:
            self.stack_sf = 0
        elif self.stack_sf == 1:                                    # "Hello"   
            self.stack.append(ord(self.value))
        elif self.value in numbers and self.stack_sf == 0:
            # 123   
            self.stack.append(int(self.value))

In fact, the code inside action () is a lot of conditions, each of which is executed when this command is under the pointer. It all starts with the condition "if the current command = quotation mark and the flag of the beginning of the line stack_sf = 0", in this case the flag is raised to 1. We entered the line.

(Otherwise) if the current command = quotation mark and the flag is raised to 1, then this means that the quotation mark was found a second time and you must stop entering the string (the stack_sf flag is lowered to 0). We are out of line.

(Otherwise) if the first two conditions did not work and the flag stack_sf = 1, then we are “located inside the line” and we need to add the code of the current symbol to the stack. Not the character itself, but its ASCII code.

(Otherwise) if the current character is among the elements of numbers and the flag is stack_sf = 0, then, firstly, it is a digit and, secondly, we are not inside the line, we need to add the current character = digit to the stack. Add not the character code, but the character itself. Still remember that there is a number 1, and she has a code = 49. So, if we are inside the line, then we need to add 49 to the stack, and if it's just in the program, then command 1 should add to the stack 1.

Hereinafter, all the conditions will be elif (otherwise, if ...), so I will write them simply "if". In addition, all conditions are double, you need to check the current character for equality to the command and the fact that we are not inside the string (inside the string, all characters are processed differently). You could write all this in a more optimized way, but this solution allows you to focus attention on this forehead.

        elif self.value in operators and self.stack_sf == 0:
            b = self.stack.pop()
            a = self.stack.pop()
            if self.value == '+':
                res = a + b                                         # a+b  
            elif self.value == '-':
                res = a - b                                         # a-b  
            elif self.value == '*':
                res = a * b                                         # a*b  
            elif self.value == '/':
                if b == 0:
                    res = 0
                else:
                    res = a // b                                    # a//b  
            elif self.value == '%':
                res = a % b                                         # a%b  
            self.stack.append(res)

If the current character is among operators (and stack_sf = 0), then it means we got into an arithmetic operation. All of them are exactly the same: 1) the number b is removed (with deletion), 2) the number a is removed (with deletion), 3) res = the value of the operation between a and b, 4) res is pushed onto the stack. When dividing by 0, the answer is 0, although the author of the language provided for the choice of 0 or 1.

        elif self.value == '!' and self.stack_sf == 0:
            a = self.stack.pop()
            if a == 0:
                a = 1
            else:
                a = 0
            # 0->1, 1->0
            self.stack.append(a)

        elif self.value == '`' and self.stack_sf == 0:
            a = self.stack.pop()        # 
            b = self.stack.pop()        # 
            if b > a:
                res = 1
            else:
                res = 0
            # b>a -> 1|0
            self.stack.append(res)

If the current character is “!”, Then you need to replace the head (vertex) of the stack: it was 0 - it will become 1, there was something different from 0 - it will be 1. If the current character is “» ”(apostrophe), then you need to check the top and spine : 1) if the vertex is larger than the vertex, then 1, 2) if the vertex is less than (or equal to) the vertex, then 0 is placed on the stack. Please note that when removing items for comparison, they are removed (deleted), not are copied.

        elif self.value == '?' and self.stack_sf == 0:
            # ? ( )
            a = random.randint(1, 4)
            self.vector = a

        elif self.value == ':' and self.stack_sf == 0:              #  
            last = self.stack.pop()
            self.stack.append(last)
            self.stack.append(last)

        elif self.value == '\\' and self.stack_sf == 0:             # ab => ba
            a = self.stack.pop()
            b = self.stack.pop()
            self.stack.append(a)
            self.stack.append(b)

If the current character is "?", Then you need to choose a random direction and go along it. We use the random.randint (1, 4) function, which generates numbers 1,2,3,4 and puts the new value in point.vector

If the current character is “:”, then we put on the stack a copy of the top of the stack, that is read it, and then add it to the stack twice.

If the current character is "\\" (backslash), then you need to swap the vertex and sub-vertex. We get two numbers, put them on the stack in the reverse order.

        elif self.value == '#' and self.stack_sf == 0:              #  ""
            self.step()

        elif self.value == ',' and self.stack_sf == 0:              # =65=A
            value = self.stack.pop()
            print(chr(value), end='')

        elif self.value == '.' and self.stack_sf == 0:              # Print 
            a = self.stack.pop()
            print(a, end='')

If the current symbol is "#" (pound), then you must jump over the next (in direction) command. Note that at the end of action () there is an unconditional jump forward self.step (), it allows you to move forward to the next command. Having written self.step () into the “#” processing, we actually make two jumps and “skip” the next command after the “#”.

If the current character is “,” (comma), then you need to print a character whose ASCII code is on top of the stack. If the number 65 is there, then “A” should be displayed.

If the current character is "." (dot), then you need to print the number that lies on the top of the stack, just like a number. If there lies 65, then you need to display "65". In both cases, the end = '' parameter is set during the output so that there is no new line break.

        elif self.value == '_' and self.stack_sf == 0:              #  "_"
            test = self.stack.pop()
            if test == 0:
                #  = 0, (2)
                self.vector = 2
            else:
                #  !=0, (4)
                self.vector = 4

        elif self.value == '|' and self.stack_sf == 0:              #  "|"
            test = self.stack.pop()
            if test == 0:
                self.vector = 3
            else:
                self.vector = 1

If the current character is "_" (underscore), then check horizontally. We remove from the stack the number to check (test), if it = 0, then we move to the right (vector = 2), if it! = 0, then we move to the left (vector = 4).

If the current character = "|" (vertical bar), then you need to do a vertical check. We remove the number (test) from the stack, if it = 0, then we move down (vector = 3), otherwise we move up (vector = 1).

        elif self.value == '$' and self.stack_sf == 0:              #  
            self.stack.pop()

        elif self.value == '~' and self.stack_sf == 0:              # Input: A => 65
            val = input(' : ')
            self.stack.append(ord(val[0]))

        elif self.value == '&' and self.stack_sf == 0:              # Input: 65 => 65
            val = int(input(' : '))
            self.stack.append((val))

        elif self.value == 'p' and self.stack_sf == 0:              # x, y, symcode
            x = self.stack.pop()                                    # A(x,y) = symcode
            y = self.stack.pop()
            symcode = self.stack.pop()
            A[x][y] = chr(symcode)

        # x, y, value=A(x,y)
        elif self.value == 'g' and self.stack_sf == 0:
            x = self.stack.pop()                                    # ord(value) => 
            y = self.stack.pop()
            value = A[x][y]
            self.stack.append(ord(value))

If the current character = "$", then you need to remove the vertex. Make simple pop ().

If the current character = "~" (tilde), then we ask the user for a character and put their ASCII code on the stack. User "A" (English) sent, we must put 65 on the stack. Just in case, we will put val [0], otherwise the user can enter "Apple" and translate it into code does not work.

If the current character = "&" (ampersand), then we ask the user for a number and put the number on the stack. You entered 65, you need to put 65 on the stack.

Now the two most difficult teams.

If the current character = "p", then you need to extract the cell coordinates and the ASCII code of the character from the stack, and then put this character in these coordinates on field A. Suppose that 1.2.65 was on the stack and we got (1.2) and 65, we should put the symbol “A” in cell (1.2). Once again, I note: we got three numbers, and put a symbol in the coordinates.

If the current character = "g", then the cell coordinates are retrieved from the stack, the cell is searched for on the field, the character is taken from there and its ASCII code is pushed onto the stack. Suppose, the symbol “B” was lying on the field in cell (2,3), the current team got “g” and we got 2,3 from the stack. In this case, we go along the coordinates (2,3), get the symbol “B” from there, translate it into number 66 (symbol code B) and put 66 on the stack.

        elif self.value == '>' and self.stack_sf == 0:              # >
            self.vector = 2
        elif self.value == '<' and self.stack_sf == 0:              # <
            self.vector = 4
        elif self.value == '^' and self.stack_sf == 0:              # ^
            self.vector = 1
        elif self.value == 'v' and self.stack_sf == 0:              # v
            self.vector = 3
        self.step()                                                 #  

Well, and the last lines of code: the organization of moving the pointer across the field. Everything is simple here: we look at the symbol and change the direction (vector). At the very end of the action () function is self.step (), which takes one step in the current direction. Thus, action () is both the execution of the action and the step for the next character.

Conclusion


Writing this compiler was a pleasure. And how much joy brought the moments when you throw a certain code into the program, and it is executed (correctly!) And you observe it! The site befungius.aurlien.net helped a lot when working , on which the author posted the Befunge online interpreter (in JavaScript). Here is his video from some conference www.youtube.com/watch?v=oCPT3L33848 , in which he talks about the language.

For testing, you can use almost any program on Befunge, except those that require a field as a torus, i.e. have transitions in different directions of the world. For some programs, you need to see the field itself (for example, the counter program changes the coordinate A [0] [1]), so either insert the output of this coordinate on the screen or display the entire matrix A after each step.

In any case, this is not a program licked to shine, but a training code that can be added and edited. Typos are possible, although I have tested it many times. Criticism is not welcome, but not forbidden. Good luck to everyone and good code.

Hello world!

0"!dlrow olleH">:#,_@

Bazz

0> 1+:3%v
>^  v%5:_:5% v
,v.:_v     v0_0"zzub"v
"v         #
     >0"zzub"v
"   v"fizz"<         <
^<         $<>:#,_v
    >      #^^#   <

Fibonacci

62*1+v>01p001>+v>\:02p\:02gv
     0       ^             <
     .         :p
     "         .1
        v 0," "<0
     "  >1g12-+:|
     ,          @
     >^

Count

>91+:9`v
p   v  _v
    >$0 v
^ 01+*68<

But this does not work, going out of the field

0>:00p58*`#@_0>:01p78vv$$<
@^+1g00,+55_v# !`\+*9<>4v$
@v30p20"?~^"< ^+1g10,+*8<$
@>p0\>\::*::882**02g*0v >^
`*:*" d":+*:-*"[Z"+g3 < |<
v-*"[Z"+g30*g20**288\--\<#
>2**5#>8*:*/00g"P"*58*:*v^
v*288 p20/**288:+*"[Z"+-<:
>*%03 p58*:*/01g"3"* v>::^
   \_^#!:-1\+-*2*:*85<^

Links and additional materials:

  1. Full code
  2. Online version
  3. She's in JavaScript
  4. Documentation (English)
  5. Wiki article (English)

All Articles