How relational databases work (Part 1)

Hello, Habr! I present to you the translation of the article
"How does a relational database work" .


When it comes to relational databases, I can't help but think that something is missing. They are used everywhere. There are many different databases: from the small and useful SQLite to the powerful Teradata. But there are only a few articles that explain how the database works. You can search for "howdoesarelationaldatabasework" ("how relational databases work") to see how few results there are. Moreover, these articles are short. If you are looking for the latest fashion technologies (BigData, NoSQL or JavaScript), you will find more in-depth articles explaining how they work.


Are relational databases too old and too boring to be explained outside of university courses, research papers, and books?


image


, . 40 , . , - , . , . , , .


, , , . , , CRUD; . , , .


, (BigO). , , . , , : SQL . , , .


, , , . ; .


, 3 :




( , ...), , . , .


, . .


O(1) vs O(n2)


… !


( ) , . , ! , . (cost based optimization).



, . , . , , .


, " O (some_function() )", , some_function(a_certain_amount_of_data) .


**, , ** . , .


image


. , . , 1 1 . , :


  • O(1) ( ).
  • O(log(n)) .
  • β€” O(n2), .
  • .


O(1) O(n2) . , , , 2000 .


  • O (1) 1
  • O (log (n)) 7
  • O (n) 2 000
  • O (n * log (n)) 14 000
  • O (n2) 4 000 000

O(1) O(n2) (4 ) 2 , . , . -.


, - . 1 000 000 ( ):


  • O (1) 1
  • O (log (n)) 14
  • O (n) 1 000 000
  • O (n * log (n)) 14 000 000
  • O (n2) 1 000 000 000 000

, , O (n2) ( !). 0 , , .



:


  • - O (1).
  • O (log (n)).
  • O (n).
  • O (n * log (n)).
  • O (n2).

: .


:



.


, :


  • /

, , n2, :


  • n4: ! .
  • 3n: ! , , ( ).
  • n: .
  • nn: , , ...

: Β« Β», . () .


MergeSort ( )


, ? ? sort ()… , … , sort ().


, : . , , , , . , join , merge join ( ).


Merge ()


, : 2 N / 2 N- N . .


, :


image


, 8 2 4- . 4- :


  • 1) ( = )
  • 2) , 8
  • 3) ,
  • 1,2,3, .
  • , 8 .

, 4- , «» .


, , merge:


array mergeSort(array a)
   if(length(a)==1)
      return a[0];
   end if

   //recursive calls
   [left_array right_array] := split_into_2_equally_sized_arrays(a);
   array new_left_array := mergeSort(left_array);
   array new_right_array := mergeSort(right_array);

   //merging the 2 small ordered arrays into a big one
   array result := merge(new_left_array,new_right_array);
   return result;

, , (: ). , ; , . , :


  • ,
  • , ( ), .

Division phase ( )


image


3 . β€” log(N) ( N=8, log(N) = 3).


?


! β€” . , 2. β€” , . ( 2).


Sorting phase ( )


image


() . , N = 8 :


  • 4 , 2
  • 2 , 4
  • 1 , 8

log (N) , N * log(N) .


merge sort


?


:


  • , , , , .

: in-place ( ).


  • /. , , . , 100 .

: .


  • / / .

, Hadoop ( ).


  • (!).

( ) , . , , .


, -


, , 3 . , . .



β€” . . :


image


2- :


  • , .
  • (integer, string, date …).

, , .


, , , , , . N , N β€” , , ? .


: : heap-organizedtables index-organizedtables. .



β€” , :


  • , ,
  • ,

,



image


N = 15 . , 208:


  • , 136. 136<208, 136.
  • 398>208, , 398
  • 250>208, , 250
  • 200<208, , 200. 200 , ( , , 200).

, , 40


  • , 136. 136 > 40, 136.
  • 80 > 40, , 80
  • 40= 40, . ( ) .
  • , , , .

. , , log (N) . , log(N), !



, . integer, , - . , , "country" (column 3) :


  • ,
  • , ,
  • "UKnode" .

log(N) N . , β€” .


(, , 2 , , …) , (.. ) ( ).


B+TreeIndex


, , . O(N) , (, ). , -, . . B+Tree. B+Tree :


  • () ( )
  • .

image


, ( ). , , Β« Β», ( ). O(log(N)) ( ). , .


B+Tree, 40 100:


  • 40 ( 40, 40 ), .
  • 40, , 100.

, M , N . log(N) . , , M M . M+log(N) N . , ( M + log (N) ), . (, 200 ) N (1 000 000 ), .


(!). (, , B+Tree):


  • B+Tree, .
  • B+Tree, O (log (N)) O (N).

, B+Tree . , . : B+ O (log (N)). , . , / / , O (log (N)) . , ( ).


, B+Tree. B+Tree , MySQL. , InnoDB ( MySQL) .


: , - B+ .


Hashtable (-)


β€” -. , . , - , - ( hash join). (, , ).


- β€” , . - :


  • - . ( ).
  • . , , , .


:


image


- 10 . , 5 , , , 5 . - 10 . , , :


  • 0, 0,
  • 1, 1,
  • 2, 2,
  • …

, , .


, 78:


  • - - 78, 8.
  • - 8, , , 78.
  • 78
  • 2 ( -, ).

, , 59:


  • - - 59, 9.
  • - 9 , β€” 99. 99!=59, 99 .
  • , (9), (79), …, (29).
  • .
  • 7 .

-


, , , !


- 1 000 000 ( , 6 ), 1 , 000059 . β€” -, , .


- . , - , :


  • ( β€” )
  • 2 ( β€” )
  • 2 ( β€” , )
  • …

- - O(1).


vs -


?


, .


  • A hash table can be partially loaded into memory , and the remaining segments can remain on disk.
  • With an array, you must use contiguous space in memory. If you are loading a large table, it is very difficult to find enough continuous space .
  • For a hash table, you can select the desired key (for example, the country and name of the person).

For more information, you can read the article on Java HashMap , which is an efficient implementation of a hash table; You do not need to understand Java to understand the concepts presented in this article.

Source: https://habr.com/ru/post/undefined/


All Articles