Hello everyone, my name is Sasha, I lead the testing of the backend in FunCorp. We, like many, have implemented a service-oriented architecture. On the one hand, this simplifies the work, because it is easier to test each service separately, but on the other, it becomes necessary to test the interaction of services among themselves, which often occurs over the network.

In this article, I will talk about two utilities with which you can check the basic scripts that describe the operation of the application in case of network problems.

Mimicking network problems

Usually the software is tested on test servers with a good Internet channel. In harsh conditions, production may not be so smooth, so sometimes you need to check the program in a poor connection. On Linux, the tc utility will help you simulate such conditions .

tc ( short for Traffic Control ) allows you to configure the transmission of network packets in the system. This utility has great features, you can read more about them here . Immediately I will consider only a few of them: we are interested in traffic sheduling, for which we use qdisc , and since we need to emulate an unstable network, we will use classless qdisc netem .

Run the echo server on the server (I used nmap-ncat for this ):

ncat -l 127.0.0.1 12345 -k -c 'xargs -n1 -i echo "Response: {}"'

In order to display all timestamps in detail at each step of the client-server interaction, I wrote a simple Python script that sends a Test request to our echo server.

Client source code

#!/bin/python

import socket
import time

HOST = '127.0.0.1'
PORT = 12345
BUFFER_SIZE = 1024
MESSAGE = "Test\n"

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
t1 = time.time()
print "[time before connection: %.5f]" % t1
s.connect((HOST, PORT))
print "[time after connection, before sending: %.5f]" % time.time()
s.send(MESSAGE)
print "[time after sending, before receiving: %.5f]" % time.time()
data = s.recv(BUFFER_SIZE)
print "[time after receiving, before closing: %.5f]" % time.time()
s.close()
t2 = time.time()
print "[time after closing: %.5f]" % t2
print "[total duration: %.5f]" % (t2 - t1)

print data

Run it and look at the traffic on the lo interface and port 12345:

[user@host ~]# python client.py
[time before connection: 1578652979.44837]
[time after connection, before sending: 1578652979.44889]
[time after sending, before receiving: 1578652979.44894]
[time after receiving, before closing: 1578652979.45922]
[time after closing: 1578652979.45928]
[total duration: 0.01091]
Response: Test

Traffic dump

[user@host ~]# tcpdump -i lo -nn port 12345
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes
10:42:59.448601 IP 127.0.0.1.54054 > 127.0.0.1.12345: Flags [S], seq 3383332866, win 43690, options [mss 65495,sackOK,TS val 606325685 ecr 0,nop,wscale 7], length 0
10:42:59.448612 IP 127.0.0.1.12345 > 127.0.0.1.54054: Flags [S.], seq 2584700178, ack 3383332867, win 43690, options [mss 65495,sackOK,TS val 606325685 ecr 606325685,nop,wscale 7], length 0
10:42:59.448622 IP 127.0.0.1.54054 > 127.0.0.1.12345: Flags [.], ack 1, win 342, options [nop,nop,TS val 606325685 ecr 606325685], length 0
10:42:59.448923 IP 127.0.0.1.54054 > 127.0.0.1.12345: Flags [P.], seq 1:6, ack 1, win 342, options [nop,nop,TS val 606325685 ecr 606325685], length 5
10:42:59.448930 IP 127.0.0.1.12345 > 127.0.0.1.54054: Flags [.], ack 6, win 342, options [nop,nop,TS val 606325685 ecr 606325685], length 0
10:42:59.459118 IP 127.0.0.1.12345 > 127.0.0.1.54054: Flags [P.], seq 1:15, ack 6, win 342, options [nop,nop,TS val 606325696 ecr 606325685], length 14
10:42:59.459213 IP 127.0.0.1.54054 > 127.0.0.1.12345: Flags [.], ack 15, win 342, options [nop,nop,TS val 606325696 ecr 606325696], length 0
10:42:59.459268 IP 127.0.0.1.54054 > 127.0.0.1.12345: Flags [F.], seq 6, ack 15, win 342, options [nop,nop,TS val 606325696 ecr 606325696], length 0
10:42:59.460184 IP 127.0.0.1.12345 > 127.0.0.1.54054: Flags [F.], seq 15, ack 7, win 342, options [nop,nop,TS val 606325697 ecr 606325696], length 0
10:42:59.460196 IP 127.0.0.1.54054 > 127.0.0.1.12345: Flags [.], ack 16, win 342, options [nop,nop,TS val 606325697 ecr 606325697], length 0

Everything is standard: a three-way handshake, PSH / ACK and ACK in response twice - this is the exchange of request and response between the client and server, and twice FIN / ACK and ACK - the end of the connection.

Packet delay

Now set the delay to 500 milliseconds:

tc qdisc add dev lo root netem delay 500ms

We start the client and see that now the script is executed for 2 seconds:

[user@host ~]# ./client.py
[time before connection: 1578662612.71044]
[time after connection, before sending: 1578662613.71059]
[time after sending, before receiving: 1578662613.71065]
[time after receiving, before closing: 1578662614.72011]
[time after closing: 1578662614.72019]
[total duration: 2.00974]
Response: Test

What about traffic? We look:

Traffic dump

13:23:33.210520 IP 127.0.0.1.58694 > 127.0.0.1.12345: Flags [S], seq 1720950927, win 43690, options [mss 65495,sackOK,TS val 615958947 ecr 0,nop,wscale 7], length 0
13:23:33.710554 IP 127.0.0.1.12345 > 127.0.0.1.58694: Flags [S.], seq 1801168125, ack 1720950928, win 43690, options [mss 65495,sackOK,TS val 615959447 ecr 615958947,nop,wscale 7], length 0
13:23:34.210590 IP 127.0.0.1.58694 > 127.0.0.1.12345: Flags [.], ack 1, win 342, options [nop,nop,TS val 615959947 ecr 615959447], length 0
13:23:34.210657 IP 127.0.0.1.58694 > 127.0.0.1.12345: Flags [P.], seq 1:6, ack 1, win 342, options [nop,nop,TS val 615959947 ecr 615959447], length 5
13:23:34.710680 IP 127.0.0.1.12345 > 127.0.0.1.58694: Flags [.], ack 6, win 342, options [nop,nop,TS val 615960447 ecr 615959947], length 0
13:23:34.719371 IP 127.0.0.1.12345 > 127.0.0.1.58694: Flags [P.], seq 1:15, ack 6, win 342, options [nop,nop,TS val 615960456 ecr 615959947], length 14
13:23:35.220106 IP 127.0.0.1.58694 > 127.0.0.1.12345: Flags [.], ack 15, win 342, options [nop,nop,TS val 615960957 ecr 615960456], length 0
13:23:35.220188 IP 127.0.0.1.58694 > 127.0.0.1.12345: Flags [F.], seq 6, ack 15, win 342, options [nop,nop,TS val 615960957 ecr 615960456], length 0
13:23:35.720994 IP 127.0.0.1.12345 > 127.0.0.1.58694: Flags [F.], seq 15, ack 7, win 342, options [nop,nop,TS val 615961457 ecr 615960957], length 0
13:23:36.221025 IP 127.0.0.1.58694 > 127.0.0.1.12345: Flags [.], ack 16, win 342, options [nop,nop,TS val 615961957 ecr 615961457], length 0

You can see that in the interaction between the client and the server, the expected half-second lag appeared. The system behaves much more interesting if there is more lag: the kernel starts resending some TCP packets. Change the delay by 1 second and see the traffic (I won’t show the client’s output, there are expected 4 seconds in total duration):

tc qdisc change dev lo root netem delay 1s

Traffic dump

13:29:07.709981 IP 127.0.0.1.39306 > 127.0.0.1.12345: Flags [S], seq 283338334, win 43690, options [mss 65495,sackOK,TS val 616292946 ecr 0,nop,wscale 7], length 0
13:29:08.710018 IP 127.0.0.1.12345 > 127.0.0.1.39306: Flags [S.], seq 3514208179, ack 283338335, win 43690, options [mss 65495,sackOK,TS val 616293946 ecr 616292946,nop,wscale 7], length 0
13:29:08.711094 IP 127.0.0.1.39306 > 127.0.0.1.12345: Flags [S], seq 283338334, win 43690, options [mss 65495,sackOK,TS val 616293948 ecr 0,nop,wscale 7], length 0
13:29:09.710048 IP 127.0.0.1.39306 > 127.0.0.1.12345: Flags [.], ack 1, win 342, options [nop,nop,TS val 616294946 ecr 616293946], length 0
13:29:09.710152 IP 127.0.0.1.39306 > 127.0.0.1.12345: Flags [P.], seq 1:6, ack 1, win 342, options [nop,nop,TS val 616294947 ecr 616293946], length 5
13:29:09.711120 IP 127.0.0.1.12345 > 127.0.0.1.39306: Flags [S.], seq 3514208179, ack 283338335, win 43690, options [mss 65495,sackOK,TS val 616294948 ecr 616292946,nop,wscale 7], length 0
13:29:10.710173 IP 127.0.0.1.12345 > 127.0.0.1.39306: Flags [.], ack 6, win 342, options [nop,nop,TS val 616295947 ecr 616294947], length 0
13:29:10.711140 IP 127.0.0.1.39306 > 127.0.0.1.12345: Flags [.], ack 1, win 342, options [nop,nop,TS val 616295948 ecr 616293946], length 0
13:29:10.714782 IP 127.0.0.1.12345 > 127.0.0.1.39306: Flags [P.], seq 1:15, ack 6, win 342, options [nop,nop,TS val 616295951 ecr 616294947], length 14
13:29:11.714819 IP 127.0.0.1.39306 > 127.0.0.1.12345: Flags [.], ack 15, win 342, options [nop,nop,TS val 616296951 ecr 616295951], length 0
13:29:11.714893 IP 127.0.0.1.39306 > 127.0.0.1.12345: Flags [F.], seq 6, ack 15, win 342, options [nop,nop,TS val 616296951 ecr 616295951], length 0
13:29:12.715562 IP 127.0.0.1.12345 > 127.0.0.1.39306: Flags [F.], seq 15, ack 7, win 342, options [nop,nop,TS val 616297952 ecr 616296951], length 0
13:29:13.715596 IP 127.0.0.1.39306 > 127.0.0.1.12345: Flags [.], ack 16, win 342, options [nop,nop,TS val 616298952 ecr 616297952], length 0

It can be seen that the client sent the SYN packet twice, and the server sent the SYN / ACK twice.

In addition to the constant value, for the delay you can set the deviation, distribution function and correlation (with the value for the previous package). This is done as follows:

tc qdisc change dev lo root netem delay 500ms 400ms 50 distribution normal

Here we set the delay in the interval from 100 to 900 milliseconds, the values will be selected in accordance with the normal distribution and there will be a 50 percent correlation with the delay value for the previous packet.

You might notice that in the first command I used add and then change . The meaning of these commands is obvious, so I’ll just add that there is still del , with which you can remove the configuration.

Packet loss

Let's try to do packet loss now. As can be seen from the documentation, this can be done in three ways: to lose packets randomly with some probability, to use the Markov chain of 2, 3 or 4 states to calculate the packet loss, or to use the Elliot-Gilbert model. In the article I will consider the first (simplest and most obvious) method, and about others you can read here .

Let's make a loss of 50% of packets with a correlation of 25%:

tc qdisc add dev lo root netem loss 50% 25%

Unfortunately, tcpdump will not be able to show us packet loss, we will only assume that it really works. And to verify this, the increased and unstable runtime of the client.py script will help us (it can be executed instantly, or maybe in 20 seconds), as well as the increased number of retransmitted packets:

[user@host ~]# netstat -s | grep retransmited; sleep 10; netstat -s | grep retransmited
    17147 segments retransmited
    17185 segments retransmited

Adding noise to packages

In addition to packet loss, you can simulate their damage: noise will appear in the random position of the packet. Let's do packet damage with a 50 percent probability and without correlation:

tc qdisc change dev lo root netem corrupt 50%

We launch the client script (there is nothing interesting, but it took 2 seconds), we look at the traffic:

Traffic dump

[user@host ~]# tcpdump -i lo -nn port 12345
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes
10:20:54.812434 IP 127.0.0.1.43666 > 127.0.0.1.12345: Flags [S], seq 2023663770, win 43690, options [mss 65495,sackOK,TS val 1037001049 ecr 0,nop,wscale 7], length 0
10:20:54.812449 IP 127.0.0.1.12345 > 127.0.0.1.43666: Flags [S.], seq 2104268044, ack 2023663771, win 43690, options [mss 65495,sackOK,TS val 1037001049 ecr 1037001049,nop,wscale 7], length 0
10:20:54.812458 IP 127.0.0.1.43666 > 127.0.0.1.12345: Flags [.], ack 1, win 342, options [nop,nop,TS val 1037001049 ecr 1037001049], length 0
10:20:54.812509 IP 127.0.0.1.43666 > 127.0.0.1.12345: Flags [P.], seq 1:6, ack 1, win 342, options [nop,nop,TS val 1037001049 ecr 1037001049], length 5
10:20:55.013093 IP 127.0.0.1.43666 > 127.0.0.1.12345: Flags [P.], seq 1:6, ack 1, win 342, options [nop,nop,TS val 1037001250 ecr 1037001049], length 5
10:20:55.013122 IP 127.0.0.1.12345 > 127.0.0.1.43666: Flags [.], ack 6, win 342, options [nop,nop,TS val 1037001250 ecr 1037001250], length 0
10:20:55.014681 IP 127.0.0.1.12345 > 127.0.0.1.43666: Flags [P.], seq 1:15, ack 6, win 342, options [nop,nop,TS val 1037001251 ecr 1037001250], length 14
10:20:55.014745 IP 127.0.0.1.43666 > 127.0.0.1.12345: Flags [.], ack 15, win 340, options [nop,nop,TS val 1037001251 ecr 1037001251], length 0
10:20:55.014823 IP 127.0.0.1.43666 > 127.0.0.5.12345: Flags [F.], seq 2023663776, ack 2104268059, win 342, options [nop,nop,TS val 1037001251 ecr 1037001251], length 0
10:20:55.214088 IP 127.0.0.1.12345 > 127.0.0.1.43666: Flags [P.], seq 1:15, ack 6, win 342, options [nop,unknown-65 0x0a3dcf62eb3d,[bad opt]>
10:20:55.416087 IP 127.0.0.1.43666 > 127.0.0.1.12345: Flags [F.], seq 6, ack 15, win 342, options [nop,nop,TS val 1037001653 ecr 1037001251], length 0
10:20:55.416804 IP 127.0.0.1.12345 > 127.0.0.1.43666: Flags [F.], seq 15, ack 7, win 342, options [nop,nop,TS val 1037001653 ecr 1037001653], length 0
10:20:55.416818 IP 127.0.0.1.43666 > 127.0.0.1.12345: Flags [.], ack 16, win 343, options [nop,nop,TS val 1037001653 ecr 1037001653], length 0
10:20:56.147086 IP 127.0.0.1.12345 > 127.0.0.1.43666: Flags [F.], seq 15, ack 7, win 342, options [nop,nop,TS val 1037002384 ecr 1037001653], length 0
10:20:56.147101 IP 127.0.0.1.43666 > 127.0.0.1.12345: Flags [.], ack 16, win 342, options [nop,nop,TS val 1037002384 ecr 1037001653], length 0

It can be seen that some packets were resent and there is one packet with broken metadata: options [nop, unknown-65 0x0a3dcf62eb3d, [bad opt]> . But the main thing is that in the end everything worked correctly - TCP coped with its task.

Packet duplication

What else can you do with netem ? For example, to simulate a situation opposite to packet loss - duplication of packets. This command also takes 2 arguments: probability and correlation.

tc qdisc change dev lo root netem duplicate 50% 25%

Reorder Packages

You can mix packages, and in two ways.

In the first part, packets are sent immediately, the rest with a specified delay. Example from the documentation:

tc qdisc change dev lo root netem delay 10ms reorder 25% 50%

With a probability of 25% (and a correlation of 50%), the packet will go immediately, the rest will go with a delay of 10 milliseconds.

The second method is when each N-th packet is sent instantly with a given probability (and correlation), and the rest with a given delay. Example from the documentation:

tc qdisc change dev lo root netem delay 10ms reorder 25% 50% gap 5

Every fifth packet with a probability of 25% will be sent without delay.

Bandwidth change

Usually they are sent to TBF everywhere , but using netem you can also change the interface bandwidth:

tc qdisc change dev lo root netem rate 56kbit

This command will make localhost trips as painful as surfing the Internet with a dial-up modem. In addition to setting the bitrate, you can also emulate the data link layer protocol model: set the overhead for the packet, the cell size and the overhead for the cell. For example, in this way you can simulate ATM and a bit rate of 56 kbps:

tc qdisc change dev lo root netem rate 56kbit 0 48 5

Simulate connection timeout

Another important point in the test plan for software acceptance is timeouts. This is important, because in distributed systems, when one of the services is turned off, the others must follow up on the others on time or return an error to the client, while in no case should they just hang, waiting for a response or establishing a connection.

There are several ways to do this: for example, use a mock that does not respond, or connect to a process using a debugger, put a breakpoint in the right place and stop the process (this is probably the most perverted way). But one of the most obvious is to firewall ports or hosts. Iptables will help us with this .

For demonstration, we will firewall port 12345 and run our client script. You can firewall outgoing packets to this port from the sender or incoming to the receiver. In my examples, incoming packets will firewall (use chain INPUT and the --dport option ). Such packages can be made DROP, REJECT or REJECT with the TCP flag RST, it is possible with ICMP host unreachable (in fact, the default behavior is icmp-port-unreachable , and you can also send icmp-net-unreachable , icmp-proto- in response unreachable , icmp-net-prohibited and icmp-host-prohibited ).

Drop

If there is a rule with DROP, the packets will simply “disappear”.

iptables -A INPUT -p tcp --dport 12345 -j DROP

We start the client and see that it freezes at the stage of connecting to the server. We look traffic:

Traffic dump

[user@host ~]# tcpdump -i lo -nn port 12345
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes
08:28:20.213506 IP 127.0.0.1.32856 > 127.0.0.1.12345: Flags [S], seq 3019694933, win 43690, options [mss 65495,sackOK,TS val 1203046450 ecr 0,nop,wscale 7], length 0
08:28:21.215086 IP 127.0.0.1.32856 > 127.0.0.1.12345: Flags [S], seq 3019694933, win 43690, options [mss 65495,sackOK,TS val 1203047452 ecr 0,nop,wscale 7], length 0
08:28:23.219092 IP 127.0.0.1.32856 > 127.0.0.1.12345: Flags [S], seq 3019694933, win 43690, options [mss 65495,sackOK,TS val 1203049456 ecr 0,nop,wscale 7], length 0
08:28:27.227087 IP 127.0.0.1.32856 > 127.0.0.1.12345: Flags [S], seq 3019694933, win 43690, options [mss 65495,sackOK,TS val 1203053464 ecr 0,nop,wscale 7], length 0
08:28:35.235102 IP 127.0.0.1.32856 > 127.0.0.1.12345: Flags [S], seq 3019694933, win 43690, options [mss 65495,sackOK,TS val 1203061472 ecr 0,nop,wscale 7], length 0

It can be seen that the client sends SYN packets with an exponentially increasing timeout. So we found a small bug in the client: we need to use the settimeout () method to limit the time during which the client will try to connect to the server.

Immediately delete the rule:

iptables -D INPUT -p tcp --dport 12345 -j DROP

You can delete all the rules at once:
iptables -F

If you use Docker and you need to firewall all traffic going to the container, then you can do this as follows:
iptables -I DOCKER-USER -p tcp -d CONTAINER_IP -j DROP

REJECT

Now add a similar rule, but with REJECT:

iptables -A INPUT -p tcp --dport 12345 -j REJECT

The client terminates after a second with the error [Errno 111] Connection refused . We look at ICMP traffic:

[user@host ~]# tcpdump -i lo -nn icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes
08:45:32.871414 IP 127.0.0.1 > 127.0.0.1: ICMP 127.0.0.1 tcp port 12345 unreachable, length 68
08:45:33.873097 IP 127.0.0.1 > 127.0.0.1: ICMP 127.0.0.1 tcp port 12345 unreachable, length 68

It can be seen that the client received port unreachable twice and after that ended with an error.

REJECT with tcp-reset

Let's try adding the --reject-with tcp-reset option :

iptables -A INPUT -p tcp --dport 12345 -j REJECT --reject-with tcp-reset

In this case, the client immediately exits with an error, because on the first request it received an RST packet:

[user@host ~]# tcpdump -i lo -nn port 12345
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes
09:02:52.766175 IP 127.0.0.1.60658 > 127.0.0.1.12345: Flags [S], seq 1889460883, win 43690, options [mss 65495,sackOK,TS val 1205119003 ecr 0,nop,wscale 7], length 0
09:02:52.766184 IP 127.0.0.1.12345 > 127.0.0.1.60658: Flags [R.], seq 0, ack 1889460884, win 0, length 0

REJECT with icmp-host-unreachable

Let's try another use case for REJECT:

iptables -A INPUT -p tcp --dport 12345 -j REJECT --reject-with icmp-host-unreachable

The client ends in a second with the error [Errno 113] No route to host , in ICMP traffic we see ICMP host 127.0.0.1 unreachable .

You can also try the rest of the REJECT parameters, and I will focus on these :)

We simulate request timeout

Another situation is when the client was able to connect to the server, but cannot send a request to it. How to filter packets so that filtering does not start immediately? If you look at the traffic of any communication between the client and the server, you will notice that when establishing a connection, only the SYN and ACK flags are used, but when exchanging data, the PSH flag will be in the last request packet. It is set automatically to avoid buffering. You can use this information to create a filter: it will pass all packets except those that contain the PSH flag. Thus, the connection will be established, but the client will not be able to send data to the server.

Drop

For DROP, the command will look like this:

iptables -A INPUT -p tcp --tcp-flags PSH PSH --dport 12345 -j DROP

We launch the client and watch the traffic:

Traffic dump

[user@host ~]# tcpdump -i lo -nn port 12345
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes
10:02:47.549498 IP 127.0.0.1.49594 > 127.0.0.1.12345: Flags [S], seq 2166014137, win 43690, options [mss 65495,sackOK,TS val 1208713786 ecr 0,nop,wscale 7], length 0
10:02:47.549510 IP 127.0.0.1.12345 > 127.0.0.1.49594: Flags [S.], seq 2341799088, ack 2166014138, win 43690, options [mss 65495,sackOK,TS val 1208713786 ecr 1208713786,nop,wscale 7], length 0
10:02:47.549520 IP 127.0.0.1.49594 > 127.0.0.1.12345: Flags [.], ack 1, win 342, options [nop,nop,TS val 1208713786 ecr 1208713786], length 0
10:02:47.549568 IP 127.0.0.1.49594 > 127.0.0.1.12345: Flags [P.], seq 1:6, ack 1, win 342, options [nop,nop,TS val 1208713786 ecr 1208713786], length 5
10:02:47.750084 IP 127.0.0.1.49594 > 127.0.0.1.12345: Flags [P.], seq 1:6, ack 1, win 342, options [nop,nop,TS val 1208713987 ecr 1208713786], length 5
10:02:47.951088 IP 127.0.0.1.49594 > 127.0.0.1.12345: Flags [P.], seq 1:6, ack 1, win 342, options [nop,nop,TS val 1208714188 ecr 1208713786], length 5
10:02:48.354089 IP 127.0.0.1.49594 > 127.0.0.1.12345: Flags [P.], seq 1:6, ack 1, win 342, options [nop,nop,TS val 1208714591 ecr 1208713786], length 5

We see that the connection is established, and the client cannot send data to the server.

REJECT

In this case, the behavior will be the same: the client will not be able to send the request, but it will receive ICMP 127.0.0.1 tcp port 12345 unreachable and increase the time between sending the request exponentially. The command looks like this:

iptables -A INPUT -p tcp --tcp-flags PSH PSH --dport 12345 -j REJECT

REJECT with tcp-reset

The command is as follows:

iptables -A INPUT -p tcp --tcp-flags PSH PSH --dport 12345 -j REJECT --reject-with tcp-reset

We already know that when using --reject-with tcp-reset, the client will receive an RST packet in response, so we can predict the behavior: receiving an RST packet when a connection is established means the socket unexpectedly closed on the other hand, which means that the client should receive Connection reset by peer . We run our script and make sure of this. And here the traffic will look:

Traffic dump

[user@host ~]# tcpdump -i lo -nn port 12345
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes
10:22:14.186269 IP 127.0.0.1.52536 > 127.0.0.1.12345: Flags [S], seq 2615137531, win 43690, options [mss 65495,sackOK,TS val 1209880423 ecr 0,nop,wscale 7], length 0
10:22:14.186284 IP 127.0.0.1.12345 > 127.0.0.1.52536: Flags [S.], seq 3999904809, ack 2615137532, win 43690, options [mss 65495,sackOK,TS val 1209880423 ecr 1209880423,nop,wscale 7], length 0
10:22:14.186293 IP 127.0.0.1.52536 > 127.0.0.1.12345: Flags [.], ack 1, win 342, options [nop,nop,TS val 1209880423 ecr 1209880423], length 0
10:22:14.186338 IP 127.0.0.1.52536 > 127.0.0.1.12345: Flags [P.], seq 1:6, ack 1, win 342, options [nop,nop,TS val 1209880423 ecr 1209880423], length 5
10:22:14.186344 IP 127.0.0.1.12345 > 127.0.0.1.52536: Flags [R], seq 3999904810, win 0, length 0

REJECT with icmp-host-unreachable

I think it’s already obvious to everyone how the team will look :) The behavior of the client in this case will be slightly different from the one with the simple REJECT: the client will not increase the timeout between attempts to resend the packet.

[user@host ~]# tcpdump -i lo -nn icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes
10:29:56.149202 IP 127.0.0.1 > 127.0.0.1: ICMP host 127.0.0.1 unreachable, length 65
10:29:56.349107 IP 127.0.0.1 > 127.0.0.1: ICMP host 127.0.0.1 unreachable, length 65
10:29:56.549117 IP 127.0.0.1 > 127.0.0.1: ICMP host 127.0.0.1 unreachable, length 65
10:29:56.750125 IP 127.0.0.1 > 127.0.0.1: ICMP host 127.0.0.1 unreachable, length 65
10:29:56.951130 IP 127.0.0.1 > 127.0.0.1: ICMP host 127.0.0.1 unreachable, length 65
10:29:57.152107 IP 127.0.0.1 > 127.0.0.1: ICMP host 127.0.0.1 unreachable, length 65
10:29:57.353115 IP 127.0.0.1 > 127.0.0.1: ICMP host 127.0.0.1 unreachable, length 65

Conclusion

It is not necessary to write a mock to test the interaction of the service with a hung client or server, sometimes it is enough to use the standard utilities that are on Linux.

The utilities discussed in the article have even more features than described, so you can come up with some of your options for their use. Personally, I always have enough of what I wrote about (in fact, even less). If you use these or similar utilities in testing at your company, please write how. If not, then I hope your software will become better if you decide to test it in the face of network problems using the proposed methods.

Imitating network problems in Linux

Mimicking network problems

Packet delay

Packet loss

Adding noise to packages

Packet duplication

Reorder Packages

Bandwidth change

Simulate connection timeout

Drop

REJECT

REJECT with tcp-reset

REJECT with icmp-host-unreachable

We simulate request timeout

Drop

REJECT

REJECT with tcp-reset

REJECT with icmp-host-unreachable

Conclusion

More articles: