Slurm SRE. A complete experiment with experts from Booking.com and Google.com

Our team loves experimenting. Each Slurm is not a static repetition of the previous ones, but a comprehension of experience and a transition from good to better. But with Slerm SRE, we decided to use a completely new format - to give participants the conditions as close as possible to the “combat” ones.


Briefly outline what we did at the intensity: “We build, break, repair,
study.” SRE is worth little in bare theory - only practice, real solutions, real problems.


The participants were divided into teams so that a vigorous competitive spirit would not let anyone fall asleep or launch Angry Birds on the iPhone, following the example of Dmitry Anatolyevich.


Problems, glitches, bugs and tasks provided the participants with four mentors. Ivan Kruglov, Principal Developer at Booking.com (Netherlands). Ben Tyler, Principal Developer at Booking.com (USA). Eduard Medvedev, CTO at Tungsten Labs (Germany). Eugene Varavva, a broad-profile developer at Google (San Francisco).


Moreover, the participants are divided into teams - and compete with each other. Interesting?



, , SRE .


:


, ...


- . ( ), . : « » ; , 10 ; ; 0,1% ; - . -- , SRE .



… .


. — , , , , , , . , , . , .


SLO, SLI, SLA, , , . .


SLO, SLI, SLA
SLI — . SLO — . SLA — .

SLA — ITIL, , , , , .

SLO — : , SLI. SLO «SLI ≤ » « ≤ SLI ≤ ».

SLI — . SLI — . SLI , , , .

, , ...


«» SLO. — , , , DDoS-. , SLO.



«- , , … !»


, error budget, , .


, ...


— , .



«- , , , !»


, . . , ?



.


— , (stakeholders). . .



«- ?!»



SRE. — , . , : , , , . post-mortem.



«- ! — !»


.



— .


Source: https://habr.com/ru/post/undefined/


All Articles