Rake walking in a clean field or how to collect MAC addresses of nearby Wi-Fi devices

I start all my public speeches (fortunately, there are not so many of them) by explicitly or implicitly mentioning the thesis “Our industry is complex, problems can be revealed at any, even the most obvious step, and it is optimistic to assume that everything will be simple and easy - naive” . Oddly enough, this simple idea, obtained by many years of stuffing cones, is sometimes a revelation for more experienced professionals, although it would seem that all the frantic enthusiasm and belief in the infallibility of their own ideas and practices should have been weathered for a long time. I’ll tell you a story about this, an example of a simple, at first sight, project.




One fine day, a friend threw me a link to an interesting startup . The guys suggested that representatives of small businesses from the service and sales sectors set up an access point (with a captive portal) for their customers in order to distribute the Internet, simultaneously collecting the MAC addresses of smartphones of people passing by. The purpose of this action is very simple - a large number of advertising networks allow you to target the list of device addresses, therefore, directing the advertising company to users passing by, we will most likely receive new visitors (because it’s close and “somewhere I already saw it”) . Those. such distribution of virtual flyers. A friend asked how this is done and whether we can do this again.

A quick googling on the topic revealed the mechanism of such data collection. The WiFi adapter was launched in the air listening mode and ran through the channels, capturing packets, analyzing them and aggregating the received data. There were also ready-made open utilities for this, for example, airodump-ng from aircrack-ng . Those. for repetition, we just need to run this utility, preferably on a separate compact and wearable device, and shove the received data into the database, from which we then get ready-made lists of MAC addresses for ad networks. It seems that the task is simple, solved in one, at most - two evenings of leisurely work, almost everything is ready.

Of course, this was never the case.

As you know, what is allowed to Jupiter is not allowed to the bull. And when single-payers are used, then you have to sacrifice, first of all, computing resources and memory. Further, the sacrifice of resources is followed by the sacrifice of development and debugging convenience - not every system will allow dragging compilation tools to the device.

Initially, we wanted to take something simple and cheap, for example, Orange Pi Zero boxes, put airodump-ng there and forward the data spit out by the utility to the server, where they can be safely put into the database. I had experience working with such distributed systems with a dedicated center (although, there, virtual machines acted as workhorses lifted through the cloud API through the same center as necessary, but not the point), so part of the code successfully migrated to a new project .

The tool for forwarding data to the server was the simplest Erlang application written, which was supposed to pull data from the ether damper (parsing), serialize it (Erlang's native serialization) and transfer it via the web socket to the server via HTTPS (without causing suspicion DPI systems and not inventing their own protocols). The Allwinner H2 + processors used in the Orange Pi are powerful enough to assemble and debug directly on the device. Again, in theory, everything is fine.

The practice has begun.

1. as it turned out, the built-in WiFi in the Orange Pi was only good for picking up an access point and throwing data into the server. Well, more precisely, not the adapter itself, but the support of its chipset in the kernel. For most IoT projects, this would probably be enough. However, we were ready for this blow, because a preliminary study of the aircrack-ng website gave a very clear and ambiguous “it won’t work everywhere, if that’s not our fault, we’ll attach a list of tested chipsets”. Almost all Atheros devices (purchased by Qualcomm) and Ralink (purchased by MediaTek) were found in the list, which inspired some prospects in the case of a transition from voracious Chinese ARMs to more ascetic MIPSs from the chipsets for routers.

But, while this is all going from snot and sticks, i.e. prototyped - you need to solve the problem here and now. Therefore, we took advantage of such exotics in our technological time (when wireless communication is in any lighter) as a Wi-Fi USB adapter. Studying the compatibility list and comparing it with the assortment of the nearest store gave the victim - the DLink DWA-160 in revision C1 (this is important because other hardware revisions used a different chip and caused a headache in terms of enforcing work). Dual-band, which does not require dancing with the driver, since support has long been in the kernel, this whistle came in handy later in operation in other projects, so I bought them, probably all (five pieces) that were available in our provincial city.



After making sure that the device was working, I connected it to a single-board device and turned off the built-in WiFi adapter with the expectation that the Internet would be available through the Ethernet interface.



The second pig was laid by aircrack-ng. This set of utilities was created with the goal of hacking.WiFi penetration checks, i.e. was written by hackers for hackers. I don’t know, thanks to which logic they preferred to use the wireless ether dumper not in the form of a traditional unix-way approach, to spit out structured text for further processing, but to make a full-fledged term interface on which to display information in almost real time (and taking into account the terminal settings) by discovered networks and devices, but they did just that. Yes, I found the Python API of an unknown degree of readiness for all this, but, again, the prototyping spider that lived in my head strictly forbade dragging another language, switching to another (we remember, the server part was already partially ready and written was far from Python-e) or, God forbid, implement airodump-ng yourself on the basis of tcpdump. And consequently,I had to look for workarounds.

Fortunately, wireless hackers began to suspect something that constantly getting stuck in the interface was such an activity, so they implemented periodic uploading of everything found and aggregated as CSV files. With a set interval. You can already live with it. Of course, the naive option - to run the utility and re-read the file on a timer - was immediately given by hand. Working on a laptop, when transferring to a single-board, it began to fail in the process of reading the file for obvious reasons - sometimes the utility simply did not have time to unload everything and some of the data was irretrievably lost.

The solution to this was the inotify mechanism in the kernel, notifying of file operations - as soon as my code saw the changes to the data file, it initiated its reading with a slight delay (rather, having a purely psychological value, reassure its author). The experiments showed that in this case, reading failures and data loss does not occur. Well, nice, CSV parsim, put into internal structures and send to the server. We save it on the server in PostgreSQL (thanks for jsonb) and after that it is already possible to make queries, form unloads, etc. We will add the simplest authorization using a symmetric key, so that we don’t get stuffed there and we could bind the data to the point where the device is installed, and everything seems to be fine, you can go to battle.

Yeah, now how. The test assembly of this chain (and code writing and debugging really took a couple of evenings) revealed a funny fact - the number of addresses caught per day in our office, quite remote from public passable places, fluctuated around a couple of thousand pieces. Yes, of course, there was a small hotel nearby (it was in pre-quarantine isolation time, do not be surprised), but still, there is a lot of something.

Refreshing the knowledge of the structure of the MAC address and recalling the fact that mobile devices often generate local addresses to hide their true MAC addresses, I modified the server part with a simple filter that cleans all broadcast and local addresses at the input. The list was reduced by an order of magnitude and already looked like the truth. Everything was ready for field trials.

As you know, when changing locations from a warm and comfortable office to ruthless combat conditions, prototypes tend to cease to function normally, so the implementation process should be accompanied by the supply of a pocket engineer who will correct all emerging issues. On the other hand, it is also well known that a device that does not require additional dancing with a tambourine at the beginning of operation is likely to break soon, and irrevocably. This, for sure, was reflected in Murphy’s laws, but, alas, the author of these lines is too lazy to check which one, therefore, we will agree on the term “meanness law”.

The first installation immediately revealed a bunch of flaws.

Firstly, most Chinese prototyping boards come with long-term microSD memory as opposed to NAND / NOR flash chips. An exception is made only for powerful SoC, clearly redundant for this task. Alas, MicroSD is the operator’s immediate headache - oxidation of the pads, failure of SD cards, the dependence of the contacts on the temperature inside the case (which is considerable, the Chinese chips are not very energy efficient, and the boards are often calculated completely based on peak power consumption, so without an additional radiator, well, no way). So it turned out that when the power was pulled out of the device, the system came into an inoperable state - files with the ERTS bytecode were damaged, after the reboot the application refused to work.

The second unpleasant moment - at the installation point, the Internet was provided by an LTE router and was, to put it mildly, of mediocre quality, unlike an office wire. The network constantly sparked, the application often reconnected, or even died from accumulated messages in the queues.

Of course, both problems are surmountable, for example, data loss would be eliminated by searching for the optimal combination of a good microSD card and file system settings, and connection instability could be compensated by preliminary data aggregation, short sending sessions, time-out, etc. But the problems that have been revealed are an occasion to reflect on whether the right path has been chosen. The need for a permanent connection to the server put an end to event data collection, when the device is hung on an external battery and thrown into a backpack, the owner of which goes to a mass event, where, of course, you can’t expect stability of the connection.

Accordingly, the next step was to abandon the server part and localize the data warehouse directly on the device. In addition, in order to avoid long and very dreary experiments with SD cards, it was decided to use breadboard models with flash chips in the next iteration.

At that moment, I remembered that in my collection there was a wonderful Carambola 2 board from Lithuanian comrades 8Devices . And if you go to their site, you can find an even more compact device on the same chip called Centipede. Previous experiments with this class of devices showed that Erlang completely fits into the allocated 16 MB of flash memory (and a little remains for the application). The only minus (which is rather even a plus) is the low-power MIPS and the need for cross-compilation, which makes building an Erlang application a little more nontrivial. But it was already a well-known route, so I ordered a couple of Centipede, and so far I have ported the existing version that works with the server to Carambol.



When the components arrived, a new phase began. The AR9331 chip was successfully supported by aircrack-ng from the box, data can be taken from the Ethernet interface, the latest versions of OpenWRT and ERTS have been collected and successfully tested. The application was rewritten - part of the code moved to the device code, data was accumulated in a separate process and periodically dumped to a file in the form of a serialized Erlang term. To this was drawn the simplest web-based interface that receives data via websocket. The ports for inotify and erlexec are safely compiled with OpenWRT.

Only one thing confused - 300 kilobytes remained on the data. It's not that small if you only store the MAC addresses of client devices, but airodump-ng gives you much more interesting information, including access point addresses, their ESSIDs and so on, which would also be nice to remember. Just in case. Okay, we will act on the circumstances.

We collect, check. A problem is revealed on the fly.

Openwrt, as we all know, this is such a minimalistic Linux build that is designed specifically for devices with limited memory. As a result, it threw out from there that it was possible to throw out painlessly, and simplified that which could be simplified, including the multiplayer mode. Those. It’s a common practice when the code starts from root and works with maximum privileges, which, of course, facilitates issues related to groups, users and the control of their actions. Yes, yes, the letter S in the abbreviation IoT is responsible for security. The trouble is that erlexec, which I used to run and manage airodump-ng, cannot perform operations from under the root — for this, it needs an additional user, on behalf of which it will spawn the processes assigned to it. And when creating an additional user with a different privilege level ... correctly, it prevents airodump from reaching the network device. Unscrewing this restriction from the library seemed to be a slow process, so erlexec was replaced with ports - the built-in mechanism for starting third-party processes in Erlang. A trifle, but unpleasant.

So, the devices are received, reflashed and even work in greenhouse conditions. We catch the battery, throw the box into the backpack, go to the mall. The next day, we look at the result - a fiasco, a data file of zero length - either there was not enough space, or a distortion of the power did not work at a very good time. We correct the code so that the saving occurs in two stages - first a temporary file was created, then it replaced the current one.

However, the hands didn’t get to check the operability of this option - the next toy - Onion Omega2 + on the Mediatek 7688 came into view. Like their brothers, the LinkIt Smart 7688 designer , there were a lot of things, but the most important thing is twice as much flash-memory, which means that you can no longer worry about the lack of space for data storage. Okay

We order, wait. Month. Two. Patience bursts - we write to the Americans on the subject “where is the goods, Zin”. Silence. Open a dispute on PayPal. The Americans are waking up. They say, “Oh, our order acceptance system has failed, now we’ll send everything.” They send, we wait three weeks. Fuh, the device is on hand and even works.



Here we need to make a small digression - despite the fact that I had several LinkIt Smart boards within walking distance, I did not consider them as a platform, because at the very beginning of the saga, an attempt to use them as capture devices failed. Then the drivers for the chip were supplied in the form of assembled modules for specific versions of the kernel and, apparently, this became the cause of inoperability. In the latest versions of OpenWRT, both native support for 7688 and an open driver appeared, so this is an occasion to reconsider the approach to these devices.

However, it was customary to use WiFi directly on the chip for its intended purpose - after all, the device needs at least some control interface, and also in the field, at least in order to understand if it is working or not. To look at the data obtained would also be useful.

Accordingly, we combine the previous approaches - we use the only USB interface displayed on the MiniDoc for a WiFi whistle to scan the space, and the built-in WiFi - to control the device as a low-power access point. We collect, check, everything works.

But the appetite comes with eating. To begin with, the data file in the form of Erlang-serialization is the lot of real maniacs, and a slightly wider range of specially trained professionals needs something simpler. Again, in addition to adding data from airodump, I would also like the exact measurement time, and, preferably, at least some kind of reference to the location of the device in space.

We embed between a WiFi whistle and a USB hub device. Settings (and they depend on the position of the device on the bus in the case of OpenWRT) are forested, but these are minor trifles. Correct. We take out the USB-GPS receiver from the rubble, fortunately, already time-tested and with the written codeNMEA-0183 parsing (the code, of course, had to be corrected anyway). We check - the device is not safely detected by the system, there is clearly a lack of drivers. We collect the USB Serial drivers and drop it onto the device - also silence. Then we recall that in large systems the GPS whistle was detected not as ttyUSBx, but as ttyACMx, i.e. USB GSM modem. Well, fine, the second call to add drivers, success.

We take the code, integrate into the application. Add sqlite3 to the application as storage. Now it will not be necessary to check the availability of the record in the state, and in general the work with data is simplified to a small number of lines. Putting it all together, we teach when adding data to take GPS readings, we correct the JS code on the face to display in case of an incomplete data set (it can happen when GPS has not yet caught satellites, and the air scan data is already coming). We check the work - it seems to live. You can declare an interim victory.



For a couple of weeks of uninterrupted work - a lot of data both on stations on the air and on customers. Now I am struggling with the temptation to offer this device to infobes to control broadcasting in the territories entrusted and to the state to control the movement of citizens' phones. It’s a joke, of course, they themselves already know everything.

So, all the ordeals described above is just a pet project with very low complexity (it was almost immediately clear what to do and how), the lack of hardware development (hi, physics) and access to a somewhat more or less complete product. No, of course, it cannot be ruled out that the author of these lines is a dense amateur, and real gurus go this way in one evening between evening tea and a glass of cognac, but so far experience has shown only one thing: IT is complicated and optimism is punished financially and reputationally and motivationally, and those who say “everything is simple” are either geniuses or crooks, and the second is more likely.

All Articles