🌟 🛰️ 👉 Geo-chat, harmful bots and steganography: enriching knowledge about Telegram 👩🏼‍🍳 👩🏽‍🎨 🍈

What do you know about telegrams in Telegram? And can you distinguish steganography in VideoNote (popularly - round-robin)? We analyze the very task NeoQUEST-2020 , which caused the most questions and exclamations to our support! Spoiler: yes, and there will also be a bit of crypto here :)

In the NeoQUEST-2020 legend, we find a link to the profile of a traveling robot on Instagram. Nothing out of the ordinary, right? So we decided so too, but we still need to solve the problem, so we carefully consider all the pictures in the profile and look for at least some clues. A little meditation on a beautiful picture of Lake Baikal, and we come to the realization that the clue is in the last post:

Thanks to the picture, we understand that we need to somehow connect Baikal (Shaman Rock) and Telegram (“U can join my ...” - doesn’t it remind anything?). At first, we decided not to give the participants a direct hint of geo-chat (which is exactly it!), And many of them successfully coped with the task using an emulator or a mobile device with the ability to change the geo-location. ~~Shamanim We set the~~ coordinates (53.20074, 107.349426) (you can by eye) in the area of the Shamanka rock and prepare for the most difficult thing - the wait. Telegram strangely works with geo-positioning and pulls up relevant contacts and chats for an hour. For our diligence and patience, we are fully rewarded - the desired chat appears in the Contacts section -> Find people nearby -> Groups nearby.

Voila, we're in business!

The bot meets us with a task in the form of a file some.bytes with unidentified contents, in which we can read the lines “Decrypt me” and “Apocalypse Spares Nobody”.

We understand the first line without any problems, but what does the second mean? .. Here, the participants divided into two camps: some wrote to us by mail, because they got into a dead end, while others carefully looked at the phrase "Apocalypse Spares Nobody" and discerned that ? Right! Good old ASN.1 format ( here we already wrote about how to parse it).

Let's get it right. Inside are 2 structures. In one, we find a set of bytes marked "Decrypt me", from which we assume that this is ciphertext. In the second structure, we see two numbers. It is unlikely that this is a key generously presented by the participant along with ciphertext, which means most likely. We are dealing with a public key. All the information gathered leads us to the obvious conclusion - why not try RSA ?

So, we have a module and an open indicator, which, by the way, is quite large. After a ~~convulsive study of the RSA for some~~ thought, we conclude that the closed indicator is small, which means what? Bingo! We can definitely play bad guys and use Wiener's attack .

We thought it through even for those who do not like cryptography - you could use a ready-made version of the attack, for example, this .

Then we get the value of a closed figure d=40553818206320299896275948250950248823966726834704013657854904761789429403771and decrypt ciphertext: key=nq2020faAeFeGUCBjYf7UDrH9FapFCdFPa4u;pass=passCxws3jzYhp0HD5Fy84.

We get the key "nq2020faAeFeGUCBjYf7UDrH9FapFCdFPa4u" to the first part of the job and the password "passCxws3jzYhp0HD5Fy84", which you need to feed the bot representative. It can be found among the chat participants under the name @neoquestbot.

Being on the wave of positive from receiving the first key, we do not immediately realize that the bot is picky in communication and all the time says that it does not see the interlocutor:

But the bot happily receives VideoNote round messages and even answers them ... in the same round form:

It seems that the video and sound are the same, but this is only at first glance. What if our bot gives us some secret signs? To find out, we’ll save and compare the original video with the bot’s response. For this and for the next steps, the FFmpeg package is great for us . So, let's see what is there:

Format aac -> flac, frequency 44100 Hz -> 98000 Hz. They found out, we continue to work further with audio.

With a deft movement of the hands we pull it out of the video:

The same can be done with our original message, so that we can compare them later. For clarity, open both tracks in Audacity .

The jump in amplitude in the bot’s audio response immediately catches your eye (especially strange if we were silent at all). On closer inspection, we note the clear boundaries of the intervals during the alternation of "wave-silence":

We suggest putting all things aside and counting a little. We analyze by fragments:

0 - 0.005 - silence
0.005 - 0.01 - wave
0, 01 - 0.0225 - silence
0.025 - 0.04 - wave
0.04 - 0.045 - silence

The smallest interval is 0.005, and all the others intervals are multiples of 0.005.
We take the presence of a wave of 0.005 for 1, and silence for 0. We get nothing more than a binary code!
We recall that the frequency has changed, and try to look at the spectrum graph (Analysis -> Spectrum graph):

We see that the most powerful signal is at a frequency of ~ 44100 Hz, which is ultrasound.
So, then you should work only with high frequencies.

In fact, the bot superimposes its signal on the original audio in the audible spectrum. And those participants who had sound in the original video noticed this in Audacity.

We cut off the high frequencies with a high-pass filter either in Audacity, or in the same ffmpeg:

So, we have a 16-bit mono wav file. It consists of a header, uncompressed audio stream, and metadata. The audio stream itself is divided into frames (and frames can store several samples, but this is a completely different story), in our case 16 bits each (this is indicated by the letters pcm_s16 in the screenshots). Frames are sequences of bits that describe the amplitude of the wave at a time for one or more channels (in our case, for one). The sampling frequency of the audio stream is 98000 frames (that is, 98000 frames per one second), 490 frames per 0.005 second interval.

Therefore, further we work according to a simple algorithm: we read 490 frames, determine whether it is a wave or silence, and, depending on this, set the bit to 0 or 1. We will

use python and a packagewave for parsing wav files.
If the error “wave.Error: unknown format: 65534” occurs when opening the file, then replace “wFormatTag” in the header from 'FE FF' to '01 00 ':

fh = open(input_file, "r+b")
fh.seek(20)
fh.write(b'\x01\x00')
fh.close()

So, open the file, process 490 frames and calculate the average value:

file = wave.open(input_file,"r")
    for i in range (1, int(file.getnframes()/490)+1):
        frames = file.readframes(490)
        bit = 0
        sum = 0
        for k in range(0, 246):
            frame_bytes = frames[k*2:k*2+2]
            sum += int.from_bytes(frame_bytes, "big")
        if sum/490 > 16000:
            bit = 1
        bits.append(bit)

It is possible that where there should be silence (compare with the picture in Audacity), noise may remain. Therefore, we set the threshold (let it be 16000), upon exceeding which we consider the signal to be 1.

Then we group the bits into bytes:

bytes = []    
for i in range (1, int(len(bits)/8)+1):
        b1 = bits[i*8-8]
        b2 = bits[i*8-7]
        b3 = bits[i*8-6]
        b4 = bits[i*8-5]
        b5 = bits[i*8-4]
        b6 = bits[i*8-3]
        b7 = bits[i*8-2]
        b8 = bits[i*8-1]
        byte = (b1 << 7) | (b2 << 6) | (b3 << 5) | (b4 << 4) | (b5 << 3) | (b6 << 2) | (b7 << 1) | b8
        bytes.append(byte.to_bytes(1, byteorder='big'))

If everything is done correctly, the result is the string “Givemethepassword”. Since the bot communicates in circles using steganography, it will be logical to slip a password for it (and we received it together with the key as a result of decryption) in the same format.

To get started, compose an audio track with a password. To do this, we use the data obtained during the analysis of the message from the bot: sampling frequency 98000 Hz; the duration of the signal describing each bit is 5 ms; the signal frequency corresponding to the bit value “1” - as we saw from the graphs, 44100 Hz.

Now we need to "generate" silence. We do this by neutralizing:

sample_rate = 98000.0
def generate_silence(duration_milliseconds=5):
    fragment = []
    num_samples = duration_milliseconds * (sample_rate / 1000.0)
    for x in range(int(num_samples)): 
        fragment.append(0.0)
    return fragment

To generate sound, we will use a sine wave (information can be read here ):

def generate_sinewave(
        freq=41000.0, 
        duration_milliseconds=5, 
        volume=0.5):
    fragment = []
    amplitude = volume * 32767.0
    num_samples = duration_milliseconds * (sample_rate / 1000.0)
    for x in range(int(num_samples)):
        fragment.append(amplitude * math.sin(2 * math.pi * freq * ( x / sample_rate )))
    return fragment

Now the thing is small: it remains to convert the password into bits, and then into sound.

Note: The bot uses the original input video track to overlay its message, as mentioned earlier. Therefore, you need to add a few zero bytes after the password in order to shake out the whole key from the bot, and not just its beginning (the key length was 36 bytes).

Sound generation

    audio = []
    f = open(input_file, 'rb')
    for character in f.read():
        a = character
        b8 = a & 0b00000001 
        b7 = (a & 0b00000010) >> 1 
        b6 = (a & 0b00000100) >> 2
        b5 = (a & 0b00001000) >> 3
        b4 = (a & 0b00010000) >> 4
        b3 = (a & 0b00100000) >> 5
        b2 = (a & 0b01000000) >> 6
        b1 = (a & 0b10000000) >> 7
        if b1 == 1:
            audio += generate_sinewave()
        else:
            audio += generate_silence()
        if b2 == 1:
            audio += generate_sinewave()
        else:
            audio += generate_silence()
        if b3 == 1:
            audio += generate_sinewave()
        else:
            audio += generate_silence()
        if b4 == 1:
            audio += generate_sinewave()
        else:
            audio += generate_silence()
        if b5 == 1:
            audio += generate_sinewave()
        else:
            audio += generate_silence()
        if b6 == 1:
            audio += generate_sinewave()
        else:
            audio += generate_silence()
        if b7 == 1:
            audio += generate_sinewave()
        else:
            audio += generate_silence()
        if b8 == 1:
            audio += generate_sinewave()
        else:
            audio += generate_silence()

Now we will create the finished wave file:

wav_file=wave.open(file_name,"w")
    nchannels = 1
    sampwidth = 2
    nframes = len(audio)
    comptype = "NONE"
    compname = "not compressed"
    wav_file.setparams((nchannels, sampwidth, sample_rate, nframes, comptype, compname))
    for sample in audio:
        wav_file.writeframes(struct.pack('h', int(sample)))
    wav_file.close()

We save our track, for example, in pass.wav. Along the way, we check with our stego decoder whether the password is recognized. If all is well, then we get a new video with a password from the original video my_video.mp4, replacing the audio track:

Now we need to make VideoNote out of this. You can try to look for working ones (some of the participants, for example, found @TelescopyBot), or you can write your bot using TelegramAPI.

Anyway, forward to our bot:

We get a new round and congratulations (we would have done such a job!), Decoded the audio according to the already worked out scenario, and got the key: “nq2020SyOMK7SnnJP1sNlvbTs8zt35vUrrsD”

Indeed, steganography is considered one of the most difficult areas of cybersecurity for no reason - try to guess all these here! But NeoQUEST participants showed great dexterity and a sense of humor during this task, so we are addressing our sincere (from the bot congratulations) sincere admiration!

Geo-chat, harmful bots and steganography: enriching knowledge about Telegram

More articles: