频率分析:我们分析了NeoQUEST-2020在线阶段参与者最喜欢的任务之一


今天,我们发布了Wright-Up作业NeoQUEST-2020,我们的参与者在该作业中写道:“ 4个任务超酷”和“奥林匹克最佳任务”。
尽管看起来很简单,但是只有8个人完成了此任务!现在,您可以更好地了解他了-欢迎来到猫下!


因此,邀请参与者下载录音。一分钟的聆听时间足以理解摩尔斯电码在这里甚至都不会闻到,就像在录音中也没有看到有意义的单词一样。因此,我们通过倾听来结束折磨,开始更详细地研究声音。例如,使用良好的旧(免费)Audacity。我们打开录音并注意其结构:两次击键之间绝对保持静音(有人很幸运,因为工作条件好,您永远没有电话和同事聊天!),这意味着通过过滤部分来提取每个击键并不困难。信号幅度为零。



好吧,让我们开始吧。现在,我们有两件事:一个“数组”,其中有100,500次点击,并且希望这些声音中有重复的声音–只有这样,才有可能说“ X键被按下N次,Y键被按下M次”。因此,所有的初始数据都经过了分析,仍然有一个问题要回答:“如果录音持续10分钟,这组语句将带给我们什么?”是的,没错,是时候揭开经典的频率分析了。好吧,我们使用python,numpy和scikits.audiolab测试了我们的理论(要处理声音,您可以选择其他任何具有相同成功的包甚至语言)。
要设置环境,您可以运行以下命令(在Ubuntu 19.10上肯定可以运行):

$ sudo apt-get install -y python python-dev python-pip python-setuptools libsndfile-dev python-numpy
$ pip install --upgrade pip
$ pip install scikits.audiolab


现在我们在几行代码中执行假设的测试,这些代码使用最原始的方式来识别点击的声音-通过相应的numpy数组的值之和:

get list of separated ‘clicks’
data, fs, enc = scikits.audiolab.wavread(filename)
data = data.ravel()
clicks = [list(g) for k, g in groupby(data, lambda x: x != 0) if k]

# calculate number of unique ‘clicks’ and their frequences
freqs = dict() 
cypher_test = list()
for click in clicks:
    value = np.sum(np.array(click), dtype = np.float32)
    cypher_test.append(value)
    if value not in freqs:
        freqs[value] = 1
    else:
        freqs[value] += 1


图表上显示的代码执行结果-就像核冬天中间篝火的热量一样-不仅温暖了灵魂:至少27次独特的点击,甚至分布的出现也激发了人们对频率分析不会徒劳的信心。



根据Wikipedia和其他更可靠的消息来源,以降序出现的顺序排列了26个英语字母,形成etaoinshrdlcumwfgypbvkjxqz系列。另外,当然还有差距,该差距处于领先地位。你知道该做什么!同样,我们使用几行原始代码来比较每种声音,方法是单击其字母(按出现的频率),然后进行初始替换:

COMMON_FREQ_ALPHABET = { 0:' ', 1:'e', 2:'t', 3:'a', 4:'o', 5:'i', 6:'n', 7:'s', 8:'h', 9:'r', 10:'d', 11:'l', 12:'c', 13:'u', 14:'m',15:'w', 16:'f', 17:'g', 18:'y', 19:'p', 20:'b', 21:'v', 22:'k', 23:'j', 24:'x', 25:'q', 26:'z' }

# calculate key
key = dict()
index = 0
for freq in sorted(freqs.items(), key=operator.itemgetter(1),reverse=True):
    key[freq[0]] = COMMON_FREQ_ALPHABET[index]
    index +=1

# replace sounds according the key
partly_plain_text = ""
for elem in cypher_test:
	partly_plain_text += key[elem]


替换频率分析后,我们得到以下文本:

onlineayvinf saty stw vltn selg kird moaher csctl ncmertl we tnswer utn metscre gtll saory retl enocfh worb ltrfe tvvetr tftin vltue scn satra zero gtua t gew rotd sveuitl vroklem etsa ketcay idet who lef ipere saone retuh vrodcua uca fipe reuord scre aoob jcesaion oca dcrinf vtss dry ahen show low lisa kltub okxeuas low lifha old frow wtauh aowtrd suhool my frow well heta norah mile mcsa resa tdd tmonf yocnf kesa kird beev kird gooa smtll wts bnow oca utme uca vltna some when bnow awo how dripe tftin vca while oaher some ocr letrn satr mincae digger men scrgtues infor retd hetrd lisa rtin utll vora yes whta idet ftme bind new him old deev eta keftn keftn or socah ahe some mtv tkle sonf i metn saory yetr fold mile dtrb eqtmvle world gish tfe geel ahocstnd one mile his ky ahe tkope ahose uhildren rtin gcll gtll tnimtl ury vocnd vltue ahose shora jciub where ahocfh saood fopern sorry i gorfoa i vromised ao lend yoc my uode fea ia grom my fiahck neoscveruoder htnd ahese binf wind utrb new vossikle tdd grom saood lifha down lipe mincae rotd foa satra gire ahocfh ahose some scn aia iaeai aet iaeai aia iaeai aet iaeai aia iaeai aet iaeai aia iaeai aet iaeai aia iaeai aet iaeai aia iaeai aet iaeai ucn shoa ucn shoa ucn shoa ucn shoa ucn shoa ucn shoa ucn shoa ucn shoa ucn shoa ucn shoa ucn shoa ucn shoa ucn shoa shtbe wtpe ltke nibe shtbe wtpe ltke nibe shtbe wtpe ltke nibe shtbe wtpe ltke nibe shtbe wtpe ltke nibe shtbe wtpe htn mtd dou htn mtd dou htn mtd dou htn mtd dou htn mtd dou htn mtd dou htn mtd dou htn mtd dou htn mtd dou htn dou gipe yet gipe yet gipe yet gipe yet gipe yet gipe yet gipe yet gipe yet gipe yet gipe yet gipe yet gipe yet gipe

看起来有点像有意义的东西,但是肯定会猜出单个单词。顺序查找显然应该替换的单词:
  1. moaher->母亲和小伙子->回答,即 a和t必须互换;
  2. larfe→大而→自我,即 f和g必须互换;
  3. →鸟,即 用b和onlinetyving→onlinetyping代替k,即 v用p替换;
  4. 衡量→测量和计算→足够,即 u和c必须互换。

这些操作的结果是,文本发生了巨大的变化,他的第一个单词解释了缺乏连接性:在键盘模拟器上练习排版时记录了录音。嗯,钥匙的主人显然没有为世界末日做准备!。我们可以为他感到高兴,也许现在他正在快速指法,刮起挡住他的避难所的雪。但这对我们来说是什么?我们将目光转向文字,也许还有一些有用的东西。

onlinetyping stay saw plan self bird mother usual numeral we answer can measure fall story real enough worb large appear again place sun start zero fact a few road special problem east beauty idea who leg ipere stone reach product cut gipe record sure toob juestion out during pass dry then show low list blacb obxects low light old grow watch toward school my grow well heat north mile must rest add among young best bird beep bird foot small was bnow out came cut plant some when bnow two how dripe again put while other some our learn star minute differ men surfaces ingor read heard list rain call port yes what idea game bind new him old deep eat began began or south the some map able song i mean story year gold mile darb eqample world fish age feel thousand one mile his by the abope those children rain full fall animal cry pound place those short juicb where though stood gopern sorry i forgot i promised to lend you my code get it from my github neosupercoder hand these bing wind carb new possible add from stood light down lipe minute road got start fire though those some sun tit iteti tea iteti tit iteti tea iteti tit iteti tea iteti tit iteti tea iteti tit iteti tea iteti tit iteti tea iteti cun shot cun shot cun shot cun shot cun shot cun shot cun shot cun shot cun shot cun shot cun shot cun shot cun shot shabe wape labe nibe shabe wape labe nibe shabe wape labe nibe shabe wape labe nibe shabe wape labe nibe shabe wape han mad doc han mad doc han mad doc han mad doc han mad doc han mad doc han mad doc han mad doc han mad doc han doc fipe yea fipe yea fipe yea fipe yea fipe yea fipe yea fipe yea fipe yea fipe yea fipe yea fipe yea fipe yea fipe

似乎我们很幸运:在培训期间,某人分心了,并可能在某些Messenger中向他的朋友写了一条救助消息:对不起,我忘了我答应我将我的代码借给你,请从我的github neosupercoder中获取。阅读别人的书信当然不好,但是有一个百分百的例外,我们为此花了足够的钱。我们转到github,并通过用户名查找存储库:



成功!只能从存储库“ neo-medicine”中读取唯一的文件了:



万岁!现在我们是钥匙的快乐主人!
提前-对其他任务的分析NeoQUEST-2020-我们不会让读者对自我孤立感到无聊!

All Articles