An Introduction to the Lena Data Analysis Architectural Framework

Hello, Habr! I will talk about the architectural framework that I am developing.

Architecture determines the most general structure of the program and the interaction of its components. Lena as a framework implements a specific architecture for data analysis (more about it below) and provides the user with classes and functions that may be useful (taking into account this architecture).

Lena is written in the popular Python language and works with versions of Python 2, 3 and PyPy. It is published under the free Apache license (version 2) here . At the moment, it is still being developed, but the features described in this manual are already in use, tested (the total coverage of the entire framework is about 90%) and is unlikely to be changed. Lena arose in the analysis of experimental data in neutrino physics and is named after the great Siberian river.

Architecture issues arise, as a rule, in large and medium-sized projects. If you are thinking about using this framework, then here is a brief overview of its tasks and advantages.

From a programming point of view:

  • modularity, weak engagement. Algorithms can be easily added, replaced or reused.
>>> from __future__ import print_function
>>> from lena.core import Sequence
>>> s = Sequence(
...     lambda i: pow(-1, i) * (2 * i + 1),
... )
>>> results =[0, 1, 2, 3])
>>> for res in results:
...     print(res)
1 -3 5 -7

. run, flow:

class Sum():
    def run(self, flow):
        s = 0
        for val in flow:
            s += val
        yield s

from __future__ import print_function

import os

from lena.core import Sequence, Source
from lena.math import mesh
from lena.output import HistToCSV, Writer, LaTeXToPDF, PDFToPNG
from lena.output import MakeFilename, RenderLaTeX
from lena.structures import Histogram

from read_data import ReadData

def main():
    data_file = os.path.join("..", "data", "normal_3d.csv")
    s = Sequence(
        lambda dt: (dt[0][0], dt[1]),
        Histogram(mesh((-10, 10), 10)),
    results =[data_file])

if __name__ == "__main__":

1. This feature may be added in the future.
2. Jinja documentation
3. Using Jinja for LaTeX layout was proposed here and here , the syntax of the templates was taken from the original article.


Ruffus is a computational pipeline for Python used in science and bioinformatics. It connects program components through writing and reading files.

