🧚🏻 🌘 ◻️ Bipolar morphological networks: a neuron without multiplication 👨🏼‍🎓 🎳 🧕

Nowadays it is difficult to find a problem that has not yet been proposed to be solved by neural networks. And in many problems other methods are no longer even considered. In such a situation, it is logical that in pursuit of the “silver bullet”, researchers and technologists offer more and more new modifications of neural network architectures, which should bring the applicants “happiness for all, for nothing, and let no one go offended!” However, in industrial tasks it often turns out that the accuracy of the model mainly depends on the cleanliness, size and structure of the training sample, and the interface needs to be reasonable on a neural network model (for example, it is unpleasant when the logic answer should be a variable-length list).

Another thing is productivity, speed. Here the dependence on architecture is direct and quite predictable. However, not all scientists are interested. It is much more pleasant to think for centuries, epochs, to mentally aim for a century when magically the computing power will be unimaginable, and energy is extracted from the air. However, there are also enough mundane people. And it is important for them that the neural networks are more compact, faster and more energy-efficient right now. For example, this is important when working on mobile devices and in embedded systems where there is no powerful video card or you need to save battery. A lot has been done in this direction: here are small-sized integer neural networks, and the removal of excess neurons, and tensor decomposition decompositions, and much more.

We managed to remove the multiplications from the calculations inside the neuron, replacing them with additions and taking the maximum, although we left the opportunity to use multiplications and nonlinear operations in the activation function. We called the proposed model a bipolar morphological model of a neuron.

, , “ ” - . , , . . , . , , , .

, , , , . , , . . . Labor omnia vīcit improbus et dūrīs urgēns in rēbus egestās.

, — . 90- [1, 2]. . , [3], [4]. , , . [5], [6]. , . .

, , , , .

y (x, w) = σ (\sum_{i = 1}^{N} w_{i} x_{i} + w_{N + 1})

$y(\mathbf{x}, \mathbf{w}) = \sigma \left (\sum_{i=1}^N w_i x_i + w_{N+1} \right)$

, $x$ $w$ , $\sigma$ .

(-), . , . , 4 , :

\sum_{i = 1}^{N} x_{i} w_{i} = \sum_{i = 1}^{N} p_{i}^{00} x_{i} w_{i} - \sum_{i = 1}^{N} p_{i}^{01} x_{i} | w_{i} | - \sum_{i = 1}^{N} p_{i}^{10} | x_{i} | w_{i} + \sum_{i = 1}^{N} p_{i}^{11} | x_{i} | | w_{i} |,

$\sum_{i=1}^N x_i w_i = \sum_{i=1}^N p_i^{00} x_i w_i - \sum_{i=1}^N p_i^{01} x_i |w_i| - \sum_{i=1}^N p_i^{10} |x_i| w_i + \sum_{i=1}^N p_i^{11} |x_i| |w_i|,$

p_{i}^{k j} = {\begin{cases} 1, если (- 1)^{k} x_{i} > 0 and (- 1)^{j} w_{i} > 0 \\ 0, иначе \end{cases}

$p_i^{kj} = \begin{cases} 1 , \mbox{ } (-1)^k x_i > 0 \mbox{ and } (-1)^j w_i > 0\\ 0 , \mbox{ } \end{cases}$

. :

M = max_{j} (x_{j} w_{j}) k = \frac{\sum_{i = 1}^{N} x_{i} w_{i}}{M} - 1

$M = \max_j (x_j w_j) \\ k = \frac{\sum \limits_{i=1}^N x_i w_i}{M} -1$

\sum_{i = 1}^{N} x_{i} w_{i} = \exp {\ln \sum_{i = 1}^{N} x_{i} w_{i}} = \exp {\ln M (1 + k)} = (1 + k) \exp \ln M = = (1 + k) \exp {\ln (max_{j} (x_{j} w_{j}))} = (1 + k) \exp max_{j} \ln (x_{j} w_{j}) = = (1 + k) \exp max_{j} (\ln x_{j} + \ln w_{j}) = (1 + k) \exp max_{j} (y_{j} + v_{j}) \approx \exp max_{j} (y_{j} + v_{j}),

$\sum_{i=1}^N x_i w_i = \exp \lbrace \ln \sum_{i=1}^N x_i w_i \rbrace = \exp \left \{ \ln M (1 + k) \right \} =(1 + k)\exp \ln M = \\ = (1 + k) \exp \left \{ \ln(\max_j (x_j w_j))\right \} = (1 + k)\exp \max_j \ln (x_j w_j) = \\ = (1 + k)\exp \max_j (\ln x_j + \ln w_j) = (1 + k)\exp \max_j (y_j + v_j) \approx \exp \max_j (y_j + v_j),$

$y_j$ — , $v_j = \ln w_j$ — . , , $k \ll 1$ . $0 \leq k \leq N - 1$ , , ( $k = 0$ ), — ( $k = N-1$ ). $N$ . , — , . , , — , . - .

- . 1. ReLU 4 : . . , .

, , . , , . (, -), — .

. 1. .

, - :

BM (x, w) = \exp max_{j} (\ln ReLU (x_{j}) + v_{j}^{0}) - \exp max_{j} (\ln ReLU (x_{j}) + v_{j}^{1}) - - \exp max_{j} (\ln ReLU (- x_{j}) + v_{j}^{0}) + \exp max_{j} (\ln ReLU (- x_{j}) + v_{j}^{1}),

$\mbox{BM}(x, w) = \exp{\max_j (\ln \mbox{ReLU}(x_j) + v^{0}_j)} - \exp{\max_j (\ln \mbox{ReLU}(x_j) + v^{1}_j)} - \\ - \exp{\max_j (\ln \mbox{ReLU}(-x_j) + v^{0}_j)} + \exp{\max_j (\ln \mbox{ReLU}(-x_j) + v^{1}_j)},$

v_{j}^{k} = {\begin{cases} \ln | w_{j} |, если (- 1)^{k} w_{j} > 0 \\ - \infty, иначе \end{cases}

$v_j^{k} = \begin{cases} \ln |w_j| , \mbox{ } (-1)^k w_j > 0\\ -\infty , \mbox{ } \end{cases}$

, . , . , , .

, , , : - ! , . ( 1) ( 2). ? , . , .

, -: - , , -, , . - : , , , .

, , , incremental learning — , . . - , . “” — ( 1), — ( 2). “” , , . , -, , , -.

MNIST

MNIST — , 60000 28 28. 10000 . 10% , — . . 2.

. 2. MNIST.

conv(n, w_x, w_y) — n w_x w_y;
fc(n) — n ;
maxpool(w_x, w_y) — max-pooling w_x w_y;
dropout(p) — dropout p;
relu — ${\mbox{ReLU}(x) = \max(x, 0)}$ ;
softmax — softmax.

MNIST :

CNN1: conv1(30, 5, 5) — relu1 — dropout1(0,2) — fc1(10) — softmax1.

CNN2: conv1(40, 5, 5) — relu1 — maxpool1(2, 2) — conv2(40, 5, 5) — relu2 — fc1(200) — relu3 — dropout1(0,3) — fc2(10) — softmax1.

. 1. “” . () ().

1. MNIST. — , — .

		1,	1, +	2,	2, +
CNN1	-	98,72	-	98,72	-
CNN1	conv1	42,47	98,51	38,38	98,76
CNN1	conv1 — relu1 — dropout1 — fc1	26,89	-	19,86	94,00
CNN2	-	99,45	-	99,45	-
CNN2	conv1	94,90	99,41	96,57	99,42
CNN2	conv1 — relu1 — maxpool1 — conv2	21,25	98,68	36,23	99,37
CNN2	conv1 — relu1 — maxpool1 — conv2 — relu2 — fc1	10,01	74,95	17,25	99,04
CNN2	conv1 — relu1 — maxpool1 — conv2 — relu2 — fc1 — dropout1 — relu3 — fc2	12,91	-	48,73	97,86

-, , - . , - , . , .

: . , . : - .

MRZ

MRZ- , (. . 3). 280 000 21 17 37 MRZ, .

. 3. MRZ .

CNN3: conv1(8, 3, 3) — relu1 — conv2(30, 5, 5) — relu2 — conv3(30, 5, 5) — relu3 — dropout1(0,25) — fc1(37) — softmax1.

CNN4: conv1(8, 3, 3) — relu1 — conv2(8, 5, 5) — relu2 — conv3(8, 3, 3) — relu3 — dropout1(0,25) — conv4(12, 5, 5) — relu4 — conv5(12, 3, 3) — relu5 — conv6(12, 1, 1) — relu6 — fc1(37) — softmax1.

2. “” . () ().

, MNIST: -, , . - , - .

2. MRZ. — , — .

		1,	1, +	2,	2, +
CNN3	-	99,63	-	99,63	-
CNN3	conv1	97,76	99,64	83,07	99,62
CNN3	conv1 — relu1 — conv2	8,59	99,47	21,12	99,58
CNN3	conv1 — relu1 — conv2 — relu2 — conv3	3,67	98,79	36,89	99,57
CNN3	conv1 — relu1 — conv2 — relu2 — conv3 — relu3 — dropout1 — fc1	12,58	-	27,84	93,38
CNN4	-	99,67	-	99,67	-
CNN4	conv1	91,20	99,66	93,71	99,67
CNN4	conv1 — relu1 — conv2	6,14	99,52	73,79	99,66
CNN4	conv1 — relu1 — conv2 — relu2 — conv3	23,58	99,42	70,25	99,66
CNN4	conv1 — relu1 — conv2 — relu2 — conv3 — relu3 — dropout1 — conv4	29,56	99,04	77,92	99,63
CNN4	conv1 — relu1 — conv2 — relu2 — conv3 — relu3 — dropout1 — conv4 — relu4 — conv5	34,18	98,45	17,08	99,64
CNN4	conv1 — relu1 — conv2 — relu2 — conv3 — relu3 — dropout1 — conv4 — relu4 — conv5 — relu5 — conv6	5,83	98,00	90,46	99,61
CNN4	conv1 — relu1 — conv2 — relu2 — conv3 — relu3 — dropout1 — conv4 — relu4 — conv5 — relu5 — conv6 -relu6 — fc1	4,70	-	27,57	95,46

, , . , - . MNIST MRZ.

? , - . , (, ) . , — TPU, .

, , : , .

PS. ICMV 2019:
E. Limonova, D. Matveev, D. Nikolaev and V. V. Arlazarov, “Bipolar morphological neural networks: convolution without multiplication,” ICMV 2019, 11433 ed., Wolfgang Osten, Dmitry Nikolaev, Jianhong Zhou, Ed., SPIE, Jan. 2020, vol. 11433, ISSN 0277-786X, ISBN 978-15-10636-43-9, vol. 11433, 11433 3J, pp. 1-8, 2020, DOI: 10.1117/12.2559299.

G. X. Ritter and P. Sussner, “An introduction to morphological neural networks,” Proceedings of 13th International Conference on Pattern Recognition 4, 709–717 vol.4 (1996).
P. Sussner and E. L. Esmi, Constructive Morphological Neural Networks: Some Theoretical Aspects and Experimental Results in Classification, 123–144, Springer Berlin Heidelberg, Berlin, Heidelberg (2009).
G. X. Ritter, L. Iancu, and G. Urcid, “Morphological perceptrons with dendritic structure,” in The 12th IEEE International Conference on Fuzzy Systems, 2003. FUZZ ’03., 2, 1296–1301 vol.2 (May 2003).
G. X. Ritter and G. Urcid, “Lattice algebra approach to single-neuron computation,” IEEE Transactions on Neural Networks 14, 282–295 (March 2003).
H. Sossa and E. Guevara, “Efficient training for dendrite morphological neural networks,” Neurocomputing 131, 132–142 (05 2014).
E. Zamora and H. Sossa, “Dendrite morphological neurons trained by stochastic gradient descent,” in 2016 IEEE Symposium Series on Computational Intelligence (SSCI), 1–8 (Dec 2016).

Bipolar morphological networks: a neuron without multiplication

More articles: