We bypass the prohibition of messages API Vkontakte through Python

Hello, Habr. In my previous article I talked about the possibility of access to the methods of the messages section through the documentation, for which it was enough to log in to the VK site. Many then said that this is not a threat to the personal data of users, and the inability to pump out their messages is a platform disadvantage. Also in the comments I was left a link to the node.js library , which can log in using login / password and provide access to the message API, pretending to be an official application.

Disclaimer:


The article and all the written code were created only for educational and research purposes and were never used for illegal activities. The author does not urge you to repeat any of the actions described here and does not bear any responsibility for them.


But not all people are familiar with javascript and node.js, so I decided to write my library in python, which many people use now, which allows us to provide the full functionality of the messages API through the "test requests" of the documentation. I immediately ask you not to be mad at me in places where I will repeat aspects of the past โ€œspeechโ€, because I want to arrange this article in the form of independent documentation.

How to use it?


The library itself is located in the github-a repository (there, in the examples folder, there is a script with examples of use from this article). To install it on a computer, you can use the command in the terminal:
pip install vk-messages

Now we can import the main class from this package and create an instance of it, specifying the login, password, what type of authorization this account uses, as well as the directory where we want to save user authorization cookies. This is necessary so that users with two-factor authorization do not have to constantly enter the code from the message each time the script is run.

from vk_messages import MessagesAPI

login, password = 'login', 'password'                                
messages = MessagesAPI(login=login, password=password,
                                two_factor=True, cookies_save_path='sessions/')

And in fact, thatโ€™s all. Now we just have to open the documentation and use the methods we are interested in. I want to immediately note that this approach allows us to use almost any method from the documentation, even not related to the messages section:

history = messages.method('messages.getHistory', user_id='1234567', count=5)

We can also combine this library with others, for example, through vk_api we can upload photos from a computer (the code for this action is given in their examples section), and through vk_messages attach these attachments to the message:

from vk_messages.utils import get_random

messages.method('messages.send', user_id=peer_id, message='Hello',
            attachment='photo123456_7891011', random_id=get_random())

Out of curiosity, I implemented the classic function, which in a given folder creates subfolders of people with whom a person was talking, and tries to pump out the latest messages and absolute urls of photos. Fortunately, everything worked like a clock, and there were no unnecessary errors:

from vk_messages.utils import fast_parser

fast_parser(messages, path='parsing/',                    
       count_conv=10, messages_deep=400, photos_deep=100) 

Now I want to move on to one of the most interesting parts of this library: having authorization cookies, we can perform absolutely any action. Let me give you a personal example, when for the posts of the group I am a member of, I had to make a table consisting of the post ID and its author . But what was the catch: the official api returns only the person who published the article. Using the sniffer, I saw that when you hover on the post publication date, this data is downloaded from the server. And after that I wrote a wrapper that allowed you to send as many similar requests as you like, using only the post link and authorization cookies to get the authors. In the example below, it remains only to get rid of unnecessary tags:

def get_creators(post, cookies):
    group = -int(post.split('_')[0])
    response = requests.post('https://vk.com/al_page.php', cookies=cookies,
               data=f"_ads_group_id={group}&act=post_author_data_tt&al=1&raw={post}")

    response_json = json.loads(response.text[4:])['payload'][1]
    return response_json[0]

authors = get_creators(post='-12345_67890',
                    cookies=messages.get_cookies())
print(authors)   

But what does the top piece of code prove to us? That's right, even if VC closes test requests on its documentation, we can always simulate user actions and get the necessary information. As an experiment, I made a small function which, through requests for โ€œscrollingโ€ the page, can receive links to photos without using the official API.

The code was too big, so I decided to hide it
def get_attachments(attachment_type, peer_id, count, offset, cookies_final):
    if attachment_type == 'photo':
        session = requests.Session()
        parsed = []
        
        response = session.post(f'https://vk.com/wkview.php',
                            data=f'act=show&al=1&dmcah=&loc=im&ref=&w=history{peer_id}_photo', cookies=cookies_final)
        
        response_json = json.loads(response.text[4:])
        
        try:
            last_offset = response_json['payload'][1][2]['offset']
            count_all = response_json['payload'][1][2]['count']
        except:
            last_offset = response_json['payload'][1][0]['offset']
            count_all = response_json['payload'][1][0]['count']

        while (len(parsed) < count + offset)  and (last_offset != count_all):
            response_json = json.loads(response.text[4:])
        
            try:
                last_offset = response_json['payload'][1][2]['offset']
            except:
                last_offset = response_json['payload'][1][0]['offset']

            photos_vk =  re.findall(r'<a href="/photo(\S*)?all=1"', response_json['payload'][1][1])
            mails =  re.findall(r"'(\S*)', {img: this ,", response_json['payload'][1][1])

            for photo, mail in zip(photos_vk, mails):
                if len(parsed) < offset:
                    parsed.append(photo)
                    continue
                
                response = session.post(f'https://vk.com/al_photos.php', cookies=cookies_final,
                            data=f'act=show&al=1&al_ad=0&dmcah=&gid=0&list={mail}&module=im&photo={photo}')
                
                response_json = json.loads(response.text[4:])
                photo_size = list(response_json['payload'][1][3][0].items())
                photo_size.reverse()
                
                for i in range(len(photo_size)):
                    if 'attached_tags' in photo_size[i][0]:
                        photo_size = photo_size[:i]
                        break
            
                parsed.append(photo_size) 

            response = session.post(f'https://vk.com/wkview.php', cookies=cookies_final,
                    data=f'act=show&al=1&offset={last_offset}&part=1&w=history{peer_id}_photo')
        
        return parsed[offset + 3 : offset + 3 + count]


Does it look bulky? Yes. Does it work much slower than the official api? Yes. But if VC takes away the last opportunity to access messages, we can always find a way out.

I also note that I tried to add to all places in the library where errors are possible, exceptions with explanations, but if you find any events that do not display an explanation, please inform me about this.
, , - , , , , , . , , .

?


For those who are interested in what is happening under the hood of this script, I will briefly go through the main points. During authorization, simple request requests are made that simulate the user's login, which only slightly changes depending on the type of authorization, and after successful login, cookies are saved in a pickle file. When requesting api through the documentation, โ€œparam_โ€ is added to all custom parameters, that is, the offset value will turn into param_offset. Also, the hash code is transmitted in the request, which is contained in the data-hash attribute of the "Run" button tag. As far as I noticed, this value is constant for each method.

I also note one important point: the password is sent in ANSI encoding, where the characters of the Russian alphabet are separated by the sign "%", and this code is enough to implement such decoding. This can be a problem for some Linux users, because, as far as I remember, this encoding is not included by default in python on this operating system.

self.password = str(password.encode('ANSI')).replace('\\x', '%')[2:-1]

Also, one of the problems for me was the strange behavior of some methods. For example, if I interchanged the parameters, the script could return a response 10 times smaller than the requested one or not return anything at all. To solve this problem, I just decided to parse and send the parameters in a strict order, as they are indicated in the documentation. Perhaps this is a simple coincidence, but after this problems of this kind I did not have:

response = session.get(f'https://vk.com/dev/{name}', cookies=self.cookies_final)
hash_data =  re.findall(r'data-hash="(\S*)"', response.text)

soup = BeautifulSoup(response.text, features="html.parser")
params = soup.findAll("div", {"class": "dev_const_param_name"})
params = [cleanhtml(str(i)) for i in params]

payload, checker = '', 0
for param in params:
   if param in kwargs:
       checker += 1
       payload += '&{}={}'.format('param_' + \
   param, quote(str(kwargs[param]) if type(kwargs[param]) != bool else str(int(kwargs[param]))))
        
if checker != len(kwargs):
   raise Exception_MessagesAPI('Some of the parametrs invalid', 'InvalidParameters')

Total


Well, for me, this library was the first experience in writing "open" projects, so I ask you not to judge it severely. I just wanted to help people who face the same problem as me: message API limitation. I also really want to thank my friends who helped me write this article and test the code.

All Articles