Hello, Habr! Today we want to tell you what the laboratory of nuclear physics experiment methods , part of JetBrains Research , does .Where are JetBrains and where is nuclear physics, you ask. We agreed on the basis of love for Kotlin, although in this post we will not talk about him. Our group focuses on the development of data analysis, modeling and writing software for scientists, and therefore focuses on cooperation and knowledge sharing with IT companies.In this article we want to talk about the method of statistical regularization that we popularize , proposed by V.F. Turchin in the 70s of the XX century, and its implementation in the form of code in Python and Julia.The presentation will be quite detailed, so those who are clear about the inverse problems can go straight to the examples and read the theory in this article .Occurrence of the problem: why should anyone regularize at all?
If it is enough to ignore, any measurement in the experiment can be described as follows: there is a certain device that captures the spectrum or signal of a process and shows some numbers according to the measurement results. Our task as researchers, looking at these numbers and knowing the device structure, is to understand what the measured spectrum or signal was. That is, on the face of what is called the inverse problem . If you imagine this mathematically, we get this equation (which, incidentally, is called the Fredholm equation of the first kind ):
In fact, this equation describes the following: our measuring device is represented here by its hardware function which acts on the studied spectrum or other input signal causing the researcher to observe the output signal . Researcher's goal is to restore the signal by famous and . You can also formulate this expression in matrix form, replacing functions with vectors and matrices:
It would seem that signal reconstruction is not a difficult task, since both the Fredholm equation and the system of linear equations (even overdetermined) have an exact solution. So let's try it. Let the measured signal be described as the sum of two gausses:
As an instrument, we take the simplest integrator - a matrix that translates our signal into a cumulative sum using the Heaviside function:
The type of the measured signal and our device, as well as the measurement result are shown on the graph.
It is important that any real measurement has an error, so we will spoil our result a little by adding normal noise, giving a five percent measurement error.We will restore the signal by the least squares method:
And as a result we get:
Actually, on this we could finish the article, once again having convinced ourselves of the helplessness of idealistic methods of mathematics in the face of harsh and ruthless physical reality, and go to smoke soldering irons.But first, letβs figure out what caused this failure to happen to us? Obviously, the point is measurement errors, but what do they affect? The fact is that Jacques Hadamard (the same one who added a dash to the Cauchy β Hadamard formula) divided all the tasks into correctly posed and incorrect ones.Remembering the classics: βIt makes no sense to look for a solution, if any. It is about how to deal with a task that has no solution. This is a deeply fundamental question ... β- we will not talk about the correct tasks and immediately take up the incorrect ones. Fortunately, we have already met this: the Fredholm equation written above is an incorrect inverse problem - even with infinitely small fluctuations in the input data (and even our measurement errors are far from infinitesimal), the solution of the equation obtained in an exact analytical way can differ arbitrarily from the true one .You can read the proof of this statement in the first chapter of the classical work of academician A.N. Tikhonova "Methods for solving ill-posed problems." This book has tips on what to do with incorrect tasks, but the technique outlined there has a number of drawbacks that are fixed in the Turchin method. But first, we describe the general principles of working with incorrect tasks: what to do if you come across such a task?Since the task itself cannot offer us anything, it will be necessary to commit a small crime: supplement the task with data so that it becomes correct, in other words, enter some * additional a priori information about the task * (this process is called regularization of the task). In contrast to the classical Tikhonov method, based on the introduction of parametrized regularizing functionals, the Turchin method calls on the other hand using Bayesian methods.Theoretical Description of Statistical Regularization
Strategy
We formulate our problem in terms of mathematical statistics: according to a well-known implementation (which we measure in the experiment) we need to evaluate the value of the parameter . Functional calculating based we will call strategy . In order to determine which strategies are more optimal, we introduce a quadratic loss function . The real loss function can be any, why do we choose the quadratic one? Because any loss function near its minimum can be approximated by a quadratic function:
Where - the best solution. Then the losses for our chosen strategy are determined by the risk function :
Here determines the probability density of our ensemble, over which loss averaging is performed. This ensemble is formed by a hypothetical multiple repetition of measurements. for a given . In this way, - this is the probability density known to us obtained in the experiment.According to the Bayesian approach, it is proposed to consideras a random variable with a priori probability density expressing the reliability of various solutions to our problem.determined on the basis of information existing prior to the experiment. Then the choice of the optimal strategy is based on minimizing a posteriori risk :
In this case, the optimal strategy is well known:
where is the posterior density is determined by the Bayes theorem:
This approach will allow us to determine the variance (correlation function) of the resulting solution:
So, we have obtained the optimal solution to our problem by introducing an a priori density . Can we say anything about that world of functionswhich is given by a priori density?If the answer to this question is no, then we will have to accept all possibleequally probable and return to an irregular solution. Therefore, we must answer this question in the affirmative.This is precisely what the method of statistical regularization consists in β regularization of a solution by introducing additional a priori information on. If the researcher already has any a priori information (a priori density), he can just calculate the integral and get the answer.If there is no such information, the next paragraph describes what minimal information a researcher may have and how to use it to obtain a regularized solution.A priori information
As British scientists have shown, in the rest of the world they like to differentiate. Moreover, if the mathematician will ask questions about the legality of this operation, the physicist optimistically believes that the laws of nature are described by "good" functions, that is, smooth.In other words, it makes it smootherhigher a priori probability density. So let's try to introduce an a priori probability based on smoothness. To do this, we recall that the introduction of a priori information is some violence against the world, forcing the laws of nature to look in a way convenient for us.This violence should be minimized, and by introducing an a priori probability density, it is necessary that Shannonβs information oncontained in was minimal. Formalizing the above, we derive the form of a priori density based on the smoothness of the function. To do this, we will look for a conditional extremum of information:
Under the following conditions:- Smoothness condition . Let beIs a certain matrix characterizing the smoothness of the function. Then we require that a certain value of the smoothness functional be achieved:
An attentive reader should ask a question about determining the value of a parameter.
. The answer will be given further in the text.
- Probability normalization per unit:
Under these conditions, the following function will deliver a minimum of functionality:
Parameter Connected with , but since we do not have information about the specific values ββof the smoothness functional, figuring out exactly how it is connected is pointless. What then to do with, you ask. Here three ways are revealed to you:
- Match parameter value manually and thereby actually proceed to Tikhonov's regularization
- Averaging (integrating) over all possible assuming all possible equally probable
- Choose the most likely by its posterior probability density . This approach is correct because it gives a good approximation of the integral if the experimental data contain enough information about.
The first case is of little interest to us. In the second case, we must calculate such a ugly integral here:
For the third case, we can obtain the value of the integral analytically for Gaussian noises in the experiment (this will be considered through the section).It should also be noted that we have not used anywhere, thatIs a smoothness operator. In fact, we can use any other operator (or their linear combination) here, just the smoothness of the function is the most obvious form of a priori information that we can use.Sampling
We talked about functions, but any real device cannot measure not just a continuum, but even a countable set of points. We always take measurements in a finite set of points, therefore we are forced to carry out discretization and transition procedures from the integral equation to the matrix one. In the method of statistical regularization, we proceed as follows: we will decompose over some system of functions :
Thus, the coefficients of this expansion form some vector which is a vector in function space.As a functional space, we can take the Hilbert space, or, for example, the space of polynomials. Moreover, the choice of basis in these spaces is limited only by your imagination (we tried to work with the trigonometric Fourier series, poligandra and cubic splines).Then the elements of the matrix calculated as:
Where - points at which measurements were made. Matrix Elements we will calculate by the formula:
Where and - the boundaries of the interval on which the function is defined .To recalculate errors, use the dispersion formula of a linear combination of random variables:
It should be borne in mind that in some cases, the representation of a function using a vector of finite dimension leads to a partial loss or change of information. In fact, we can consider algebraization a kind of regularization, however weak and insufficient to turn an incorrect task into a correct one. But, anyway, we have now moved from searching to vector search and in the next section we find it.Gaussian Noise Case
The case when the errors in the experiment are distributed according to Gauss is remarkablein that an analytical solution to our problem can be obtained. Since a priori information and errors have a Gaussian form, their product also has a Gaussian form, and then the ugly integral that we wrote above is easily taken. The solution and its error will be as follows:
Where - covariance matrix of the multidimensional Gaussian distribution, - the most probable value of the parameter , which is determined from the condition of maximum a posteriori probability density:
And if I do not have Gaussian errors?
The second part of the article will be devoted to this, but for now let us outline the essence of the problem.
The main problem is that this terrible integral, firstly multidimensional, and secondly, in infinite limits. Moreover, it is highly multidimensional, vector can easily have dimension , and grid methods for calculating integrals have complexity of type , therefore, inapplicable in this case. When taking multi-dimensional integrals, Monte Carlo integration works well.Moreover, since our limits are infinite, we must use the important sampling method, but then we need to select a function for sampling. To make everything more automated, you should use the Markov Chain Monte Carlo (MCMC) , which can independently adapt the sampling function to the calculated integral. We will talk about the application of MCMC in the next article.Practical part
The first implementation of the statistical regularization method was written back in the 70s on Algol and was successfully used for calculations in atmospheric physics. Despite the fact that we still have the handwritten sources of the algorithm, we decided to add a bit of modernism and make an implementation in Python, and then on Julia.Python
Installation
Install through pip
:pip install statreg
or download the source code .Examples
As an example, consider how to use a module stareg
to recover data for a matrix and integral equation.We import the necessary scientific packages.import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
from scipy.integrate import quad
%matplotlib inline
We determine the true signal, which we will restore.a = 0
b = 5
phi = lambda x: 4*norm.pdf(x-2, scale=0.4) + 2*norm.pdf(x-4, scale = 0.5)
x = np.linspace(a, b,100)
plt.plot(x, phi(x));
Define the kernel and the operation of convolution of functions (Note: np.convolution
specifically for arrays):kernel = lambda x,y : np.heaviside(x-y, 1)
convolution = np.vectorize(lambda y: quad(lambda x: kernel(x,y)*phi(x), a,b)[0])
We generate the measured data and noise them using the normal distribution:y = np.linspace(a, b, 50)
ftrue = convolution(y)
sig = 0.05*ftrue +0.01
f = norm.rvs(loc = ftrue, scale=sig)
plt.errorbar(y, f, yerr=sig);

We solve the integral equation
We import the solver and auxiliary class for discretization:from statreg.model import GaussErrorUnfolder
from statreg.basis import CubicSplines
As a functional basis for discretization, we use cubic splines, and as an additional condition, we indicate that the function takes zero values ββon the edges.basis = CubicSplines(y, boundary='dirichlet')
model = GaussErrorUnfolder(basis, basis.omega(2))
We solve the equation:phi_reconstruct = model.solve(kernel, f, sig, y)
We are building a solution schedule:plt.plot(x,phi(x))
phir = phi_reconstruct(x)
phiEr = phi_reconstruct.error(x)
plt.plot(x, phir, 'g')
plt.fill_between(x, phir-phiEr, phir + phiEr, color='g', alpha=0.3);

We solve the matrix equation
We import the solver and auxiliary class for discretization:from statreg.model import GaussErrorMatrixUnfolder
from statreg.basis import CubicSplines
To obtain matrices, we use our functional basis, but it is clear that matrices can be obtained in any way.cubicSplines = CubicSplines(y, boundary='dirichlet')
omega = cubicSplines.omega(2)
Kmn = cubicSplines.discretizeKernel(kernel,y)
We solve the matrix equation:model = GaussErrorMatrixUnfolder(omega)
result = model.solve(Kmn, f, sig)
Build the chart:phir = lambda x: sum([p*bf(x) for p, bf in zip(result.phi,cubicSplines.basisFun)])
plt.plot(x,phir(x))
plt.plot(x,phi(x));

Julia
As we mentioned, further development of the technique requires advanced Monte Carlo integration. We could use some kind of module in Python (for example, we worked with PyMC3), but we, among other things, are participating in a joint project with the Max Planck Institute in Munich.This project is called the Bayesian Analysis Toolkit . Its goal is to create a framework with tools for Bayesian analysis methods, primarily including tools for MCMC. Now the team is working on the second version of the framework, which is written in Julia (the first is written in bad C ++). One of the tasks of our group is to demonstrate the capabilities of this framework using the example of statistical regularization, so we wrote an implementation in Julia .using PyCall
include("../src/gauss_error.jl")
include("../src/kernels.jl")
a = 0.
b = 6.
function phi(x::Float64)
mu1 = 1.
mu2 = 4.
n1 = 4.
n2 = 2.
sig1 = 0.3
sig2 = 0.5
norm(n, mu, sig, x) = n / sqrt(2 * pi*sig^2) * exp(-(x - mu)^2 / (2 * sig^2))
return norm(n1, mu1, sig1, x) + norm(n2, mu2, sig2, x)
end
x = collect(range(a, stop=b, length=300))
import PyPlot.plot
myplot = plot(x, phi.(x))
savefig("function.png", dpi=1000)
This time we use a different core, we will take not an integrating step, but a convolution with a Gaussian, which actually leads a certain βblurβ to our data:function kernel(x::Float64, y::Float64)
return getOpticsKernels("gaussian")(x, y)
end
convolution = y -> quadgk(x -> kernel(x,y) * phi(x), a, b, maxevals=10^7)[1]
y = collect(range(a, stop = b, length=50))
ftrue = convolution.(y)
sig = 0.05*abs.(ftrue) +[0.01 for i = 1:Base.length(ftrue)]
using Compat, Random, Distributions
noise = []
for sigma in sig
n = rand(Normal(0., sigma), 1)[1]
push!(noise, n)
end
f = ftrue + noise
plot(y, f)
Similarly, we take the basis of splines with fixed ends:basis = CubicSplineBasis(y, "dirichlet")
Kmn = discretize_kernel(basis, kernel, y)
model = GaussErrorMatrixUnfolder([omega(basis, 2)], "EmpiricalBayes", nothing, [1e-5], [1.], [0.5])
result = solve(model, Kmn, f, sig)
phivec = PhiVec(result, basis)
x = collect(range(a, stop=b, length=5000))
plot(x, phi.(x))
phi_reconstructed = phivec.phi_function.(x)
phi_reconstructed_errors = phivec.error_function.(x)
plot(x, phi_reconstructed)
fill_between(x, phi_reconstructed - phi_reconstructed_errors, phi_reconstructed + phi_reconstructed_errors, alpha=0.3)

Real-world example
As an example of analysis of real data, we will restore the electron scattering spectrum of a hydrogen-deuterium mixture. In the experiment, the integral spectrum was measured (that is, the number of electrons above a certain energy), and we need to restore the differential spectrum. For this data, the spectrum was initially reconstructed using fitting, so that we have a basis for checking the correctness of our algorithm.This is how the initial integrated spectrum looks:
And so - the result of the restoration:
Analysis with fit has three main disadvantages:Statistical regularization avoids all these problems and provides a model-independent result with measurement errors. The solution obtained by regularization is in good agreement with the fitting curve. Note the two small peaks at 25 and 30 eV. It is known that a peak at 25 eV is formed during double scattering, and it was restored by a fit, since it was clearly specified in the fitting function. A peak of 30 eV may be a statistical anomaly (the errors are quite large at this point), or, possibly, indicates the presence of additional dissociative scattering.Conclusions and announcement of the next part
We told you about a useful technique that can be adapted to many tasks of data analysis (including machine learning), and get an honest "fit" of the answer - the most rational solution to the equation in the face of uncertainty caused by measurement errors. As a nice bonus, we get values ββfor the decision error. Those who want to participate in the development or apply the method of statistical regularization can contribute in the form of code in Python, Julia or on something else.In the next part we will talk about:- Using MCMC
- Cholesky decomposition
- As a practical example, we consider the use of regularization for processing a signal from a model of an orbital detector of protons and electrons
References
Posted by Mikhail Zeleny , Researcher at the Laboratory of Nuclear Physics Experiment Methods at JetBrains Research .