Umka: new statically typed scripting language


The first version of the statically typed embeddable scripting language Umka that I developed is just released . It aims to combine the flexibility of familiar scripting languages ​​with protection against type errors at the stage of compilation into bytecode. The main idea of ​​the language - Explicit is better than implicit - is borrowed from "Zen of Python", but it should acquire a slightly different and more obvious meaning here.

No matter how private and subjective the impressions that prompted me to undertake the development of the language, I hope that the plan was not naive. Under the cut, I will briefly talk about the capabilities of the language and the motives for its creation.

Motives


The first virtue of dynamic typing is usually called shortening the development / debugging cycle and saving programmer time. At the risk of causing public displeasure, I must admit that my own experience does not confirm this in any way. Every time after a minor correction of my neural network training script in Python, I have to wait for Python, NumPy, PyTorch to load, read a large array of data from files, transfer it to the GPU, start processing - and only then discover that PyTorch expected a size tensor (1, 1 , m, n, 3) instead of (1, m, n, 3).

I readily admit that many people prefer dynamically typed languages ​​for their personal reasons. It is even possible that the tendency or hostility to dynamic typing is a phenomenon of the same order as the attitude to olives and tomato juice. Attempts at an objective study of this issue apparently lead to inconclusive results .

At the same time, the popularity of TypeScript, the introduction of type annotations in Python, heated discussions on Reddit and Habré make us think that the actual identification of scripting languages ​​with dynamically typed languages ​​is not a dogma at all, but a coincidence, and a statically typed scripting language has every right to exist.

So there was a language named after a cat named after a bear.

Tongue


The syntax of the language as a whole was inspired by Go. Examples of syntax constructs can be found on the project page . When declaring variables, a shorthand notation with type inference can be used. Noteworthy is the deviation from Go rules made in the syntax of pointers. The creators of Go complained that literally following example C turned out to be an unnecessarily complicated syntax here and that it would be more reasonable to introduce a postfix dereferencing operator like Pascal p^instead *p. That's exactly what Umka did.

TranslatorUmka compiles into bytecode, which is then executed by the stack virtual machine. All type checks are done at the compilation stage. The data on the stack no longer carries any type information. The translator comes in the form of a dynamic library with its own API and a small "wrapper" - an executable file. The source code is written in C99 and ported to different platforms. Builds for the x86-64 processor (Windows and Linux) are now released .

Memory managementso far done on the basis of reference counters. If the language causes any interest and attempts are made to use it, it will make sense to arrange a more advanced garbage collector. The language supports the classic composite data types (arrays and structures) placed on the stack, and dynamic arrays placed on the heap. Any classic array or structure can also be placed on the heap with an explicit call new().

Polymorphism is provided by Go-style interfaces. There are no concepts of class, object, and inheritance.

MultitaskingIt is based on the concept of “fibers” - simplified flows that are launched within the same virtual machine and that clearly cause each other. In essence, this is synonymous with coroutines. Since the logic of using these coroutines deviates slightly from the Go tradition and becomes closer to Lua and Wren, it makes sense to give a code sample:

fn childFunc(parent: std.Fiber, buf: ^int) {
    for i := 0; i < 5; i++ {
        std.println("Child : i=" + std.itoa(i) + " buf=" + std.itoa(buf^))
        buf^ = i * 3
        fibercall(parent)
    }
}

fn parentFunc() {
    a := 0
    child := fiberspawn(childFunc, &a)    
    for i := 0; i < 10; i++ {
        std.println("Parent: i=" + std.itoa(i) + " buf=" + std.itoa(a))
        a = i * 7
        if fiberalive(child) {
            fibercall(child)
        }
    }    
    fiberfree(child)
}

Examples


As a basic example demonstrating the capabilities of the language, I recommend looking at a rendering program for three-dimensional scenes based on reverse ray tracing. It is very organically used recursion, interfaces, dynamic arrays; somewhat more artificial - multitasking.

An example of embedding the Umka translator in a C project is the source code of the executable wrapper of the translator itself. There is also a sample Umka language extension with external functions in C.

image

All Articles