Hello! I'm Ilya. Two years ago, I joined the IMAP mobile client. Earlier versions of the application downloaded the list of letters for a long time and spent a lot of traffic for updating the mailbox. The question arose about optimizing the work with the protocol and about the capabilities of this protocol in general. I did not know anything about the protocol and plunged into reading the documentation. It turns out that all this time the client used the protocol without a break and did not at all take into account the implementation features. These features helped speed up mail downloads by 2 to 3 times. About what IMAP is and what are the chips for optimizing it later in my article.

I will not dive into the protocol too deeply. An article rather from the category “I would like to read this article two years ago.” IMAP gurus are unlikely to find new information for themselves. This article relies on the protocol description from RFC 3501 .

Server connection

IMAP is a stateful protocol. This was a discovery for me, before that I had not seen or worked with such protocols. Consider the scheme of working with the server.

Let's go in order, and most importantly, with examples. First you need to create a connection to the server. To do this, use the openSSL library.

openssl s_client -connect imap.server.com:993 -crlf

Great, the connection is established and you can observe the OK response with a line that starts with the CAPABILITY response

OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE  SPECIAL-USE AUTH=PLAIN AUTH=LOGIN]

There is a convenient cheat sheet for each of CAPABILITY , where all possible CAPABILITY values are written with links to the RFC. For example, IMAP4rev1 tells the client that the server is working according to the IMAP4 standard, and IDLE signals that you can subscribe to changes that occur in the mailbox.

Server authorization

After connecting to the server, you need to go to your mailbox. This is done using the LOGIN command.

a1 LOGIN email pass

So, stop, login, I understand, and a1 what is this? - Perhaps you ask. And this is the team tag. In the interests of the client, the tags should be different, since the response arrives with the same tag as the request, which means that it can be matched for parsing between teams. The server can also return a response with an asterisk at the beginning, such as * OK, this is called an untagged response. Basically, such an answer is returned for teams that expect several entities in the response, for example, LIST.

Folder List Request

To request a list of letters in a folder, you must first find out these folders. This is done by the LIST command. This command returns a list of folders on the server.

A2 LIST «» *
* LIST (\HasNoChildren \Trash) «/» Trash
* LIST (\HasNoChildren \Sent) «/» Sent
* LIST (\HasNoChildren \Drafts) «/» Drafts
* LIST (\HasNoChildren \Junk) «/» Junk
* LIST (\HasNoChildren) «/» INBOX
A2 OK List completed (0.001 + 0.000 + 0.001 secs).

The first parameter in the command is namespace. If the server supports namespace, then its values can be requested using the NAMESPACE query. The standard namespace looks like an empty string. Next, the wildcards parameter comes into play. With it, we can tell the server which folders we need to return. For example, we can get: a folder tree branch, only roots, or just everything, as in the example above. It’s better not to do this, because who knows how many folders the user has in the box. The authors of the protocol recommend using “%” - in this case you will get all the top-level folders from the mailbox.

From the answer, we understand that this is an untagged answer where each line is your folder in the box. First, there are flags by which we read the folder’s meta-information, for example, in the example all folders have no descendants and some special-purpose folders (such as Trash, Junk, etc.). Next comes the folder separator character. This symbol is used for subfolders. For example, for a descendant of the Trash folder, the name would look like “Trash / New Folder”. After all the folders, the server will return OK to us with the tag that we assigned to the command and the execution time of this command.

Folder selection

Further according to the scheme, we must select a folder from which we will tighten our messages. This is done using the SELECT command.

4 SELECT INBOX
* FLAGS (\Answered \Flagged \Deleted \Seen \Draft $Forwarded $MDNSent)
* OK [PERMANENTFLAGS (\Answered \Flagged \Deleted \Seen \Draft $Forwarded $MDNSent \*)] Flags permitted.
* 16337 EXISTS
* 2 RECENT
* OK [UNSEEN 6037] First unseen.
* OK [UIDVALIDITY 1532079879] UIDs valid
* OK [UIDNEXT 17412] Predicted next UID
* OK [HIGHESTMODSEQ 21503] Highest
4 OK [READ-WRITE] Select completed (0.015 + 0.000 + 0.014 secs).

When you select a folder, all information about it is returned. Let's go in order.

Answer with flags that are allowed inside the folder for letters.
Answer with flags that the client can change forever
Reply with the number of letters in the folder
The answer is with the number of recent letters, that is, those that we received between the folder selections
Reply with the number of unread messages

Well, for now, let's dwell on this. The rest of the information we do not need.

Request letters

Now the most interesting thing is the request for letters. You have to be extremely careful here, especially on mobile clients. Agree, it is unlikely that when you enter the application you will receive thousands of messages from the server to your database. Moreover, it makes no sense to download the entire letter, since it may not be practical to display, for example, a list of all letters. For example, to quickly show the user letters, we will only ask for an "envelope". In this envelope we want to see: sender, recipient, subject of the letter and date of sending. We will load the first 10 posts.

5 FETCH 16337:16327 (ENVELOPE)

The colon enumerates the segment of the numbers of letters that we want to receive, and in parentheses what we want to read from these letters, in this case, the envelope of the letter.

I will give the answer in abbreviated form:

* 16334 FETCH (ENVELOPE ("Sat, 07 Sep 2019 23:07:48 +0000" "Hello from Fabric.io" (("Fabric" NIL "notifier" "fabric.io")) (("Fabric" NIL "notifier" "fabric.io")) (("Fabric" NIL "notifier" "fabric.io")) ((NIL NIL "me" "me@mail")) NIL NIL NIL "<5d7438441b07c_2d872ad30967b9646405c6@answers-notifier2012.mail>"))

It is clear that nothing is clear. And the thing is that the envelope format is dictated by RFC 2822. I will not consider it in this article. This envelope has all the necessary information: date of receipt of the letter, subject of the letter, sender, recipient, and even messageId. His clients use to display a conversation.

So, we were able to show the user basic information about the letter, but what about the body?
We can immediately download the entire body of the letter, regardless of its size, this is of course not for long but nonetheless costly over the network and memory. By the way, this is done with the same FETCH command.

6 FETCH 16337:16327 (BODY[])

Try such a command on your inbox, and you will understand what I meant by “costly”, even with 10 messages we get a fairly voluminous response with absolutely all information about the letter. Speaking of her.

How often did you download the source of the letter in any client you know to see how it looks in its original form? If not, let's get a test letter out of it. In it, I added a picture directly to the letter and a picture as an attachment. Save it in eml format, and then open it with any text editor. Depending on the client, you will receive different sources of the letter, but in general they will be similar.

Let's start with the email header:

Return-Path: <myemail>
Delivered-To:myemail
Received: from localhost (localhost [127.0.0.1])
	byimap.server.com (imap.server.com) with ESMTP id 6C2BE2A0363
	for <myemail>; Sun,  8 Sep 2019 23:41:29 +0300 (MSK)
X-Virus-Scanned: amavisd-new at imap.server.com
Received: from imap.server.com ([127.0.0.1])
	by localhost ( imap.server.com [127.0.0.1]) (amavisd-new, port 10026)
	with ESMTP id abx8HQQT_k5A for <myemail>;
	Sun,  8 Sep 2019 23:41:29 +0300 (MSK)
Mime-Version: 1.0
Date: Sun, 08 Sep 2019 20:41:28 +0000
Content-Type: multipart/mixed;
 boundary=»--=_Part_722_554093397.1567975288»
Message-ID: <9e4e3872e603eac2c20f26bb1d65548d>
From: "Me" <myemail>
Subject: Hey, Habr!
To: myemail
X-Priority: 3 (Normal)

All meta-information is described in the header of the letter, from whom, to whom, when, type of message content, subject and priority of the letter. The boundary field indicates the boundary of the letter.

Further understand what this means.

----=_Part_722_554093397.1567975288
Content-Type: multipart/related;
 boundary=»--=_Part_583_946112260.1567975288»
----=_Part_583_946112260.1567975288
Content-Type: multipart/alternative;
 boundary=»--=_Part_881_599167713.1567975288»
----=_Part_881_599167713.1567975288
Content-Type: text/plain; charset=«utf-8»
Content-Transfer-Encoding: quoted-printable
----=_Part_881_599167713.1567975288
Content-Type: text/html; charset=«utf-8»
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE html><html><head><meta http-equiv=3D"Content-Type" content=3D"t=
ext/html; charset=3Dutf-8" /></head><body><div data-crea=3D"font-wrapper"=
 style=3D«font-family: XO Tahion; font-size: 16px; direction: ltr»> <img =
src=3D"cid:jua-uid-q1nz1guinitrcfd3-1567975257318"><br><br><div></div> <b=
r> </div></body></html>
----=_Part_881_599167713.1567975288--
----=_Part_583_946112260.1567975288
Content-Type: image/jpeg; name=«2018-09-04 22.46.36.jpg»
Content-Disposition: inline; filename=«2018-09-04 22.46.36.jpg»
Content-ID: <jua-uid-q1nz1guinitrcfd3-1567975257318>
Content-Transfer-Encoding: base64

Each boundary is the usual border of a piece of writing. They begin with two hyphens "-". The closing border has these two hyphens at the end. It is described in more detail in RFC1341.

This can be called the main part of the letter, parts of the letter and their MIME types are described here.

About MIME Types

MIME- , MIME (Multipurpose Internet Mail Extensions) email .

multipart/mixed , .

multipart/related , , ,

multipart/alternative , , , text/plain text/html, .

We do not have simple text here, so it’s more logical to take an html representation. In this html-representation there is just a picture, with the parameter Content-Disposition: inline, that is, it is located directly in the body of the letter, and not in the attached documents.

The link to this picture is not quite simple. It is described by the Content-ID parameter, which is equal to jua-uid-q1nz1guinitrcfd3-1567975257318 . This is a link to the next part of the letter - a picture that is encoded in base-64. To save my nerves, I did not include all the base-64 code.

The last part of the letter has the form

----=_Part_722_554093397.1567975288
Content-Type: image/png; name=«2018-07-02 11.08.23 pm.png»
Content-Disposition: attachment; filename=«2018-07-02 11.08.23 pm.png»
Content-Transfer-Encoding: base64

which already has Content-Disposition not inline, like the image above, but attachment. This image should just go to the file attachment panel, by the way it is also encoded in base-64 and has a large size. Here it becomes clear that you should not once again load the entire body of the letter if we want to show only basic information.

Back to the protocol

After working on the letters, you need to close the selected folder and say goodbye to the server. To close the folder, we need to enter the CLOSE command. Yes, it’s so simple


7 CLOSE
7 OK Close completed (0.001 + 0.000 secs).

By the way, if you worked with the console in parallel with me and read the article, then a not-so-pleasant event could have happened, the server could close your connection by timeout. This is completely normal, and each server has its own timeout, for example, we have 30 minutes.
Therefore, it is recommended to do the NOOP command in the background

1 NOOP
1 OK NOOP completed (0.001 + 0.000 secs).

It literally does nothing, but allows you to keep the connection without a timeout as much as we need. If you currently select a folder, NOOP can work as a periodic request for changes to this folder

1 NOOP
* 16472 EXPUNGE
* 16471 EXPUNGE
* 16472 EXISTS
* 1 RECENT
1 OK NOOP completed (0.004 + 0.000 + 0.003 secs).

Here in the response we are notified of two deleted messages, one new and that the number of messages in this folder is 16 472.

I also note that you can work with only one selected folder, there is no parallel work here.

Well, in the end, close the session with the server and we will say goodbye to it.

8 LOGOUT
* BYE Logging out
8 OK Logout completed (0.001 + 0.000 secs).

We see the sad untagged BYE answer, which means it's time to finish the job.

Quick sync with CONDSOTORE and QRESYNC

You can use the NOOP operation to track changes in a box in a selected folder. But what if we want to find out what has changed in the folder while we were working with another? The most obvious option is to sort through all the letters in the local storage, whether it be a cache or a database, and compare with what the server will return. On the one hand, this is indeed a solution, and on some servers it will be literally the only true one. On the other hand, we want to show letters as fast as the protocol generally allows. Fortunately, our server supports protocol extensions such as CONDSTORE and QRESYNC, which were added to RFC7162. The first one adds a special 63-bit number to the message and folder, called the mod-sequence, which increases with each operation on this letter. The highest mod-sequence among all messages is added to the folder. As a result, each time you connect to a folder on a server that supports CONDSTORE, we can easily find out if something has changed or not, simply by comparing the mod-sequence values for the local and server folders.

In addition, this extension adds additional parameters for the STORE and FETCH commands - CHANGEDSINCE mod-sequence and UNCHANGEDSINCE mod-sequence, which allow you to perform an operation if the mod-sequence of transmitted messages is larger and smaller than this, respectively. Let's look at an example.

FETCH 17221:17241 (UID) (CHANGEDSINCE 0)
* OK [HIGHESTMODSEQ 22746] Highest
* 17222 FETCH (UID 18319 MODSEQ (22580))
* 17223 FETCH (UID 18320 MODSEQ (22601))
* 17224 FETCH (UID 18324 MODSEQ (22607))
* 17225 FETCH (UID 18325 MODSEQ (22604))
* 17226 FETCH (UID 18326 MODSEQ (22608))
* 17227 FETCH (UID 18327 MODSEQ (22614))
* 17228 FETCH (UID 18328 MODSEQ (22613))
* 17229 FETCH (UID 18336 MODSEQ (22628))
* 17230 FETCH (UID 18338 MODSEQ (22628))
* 17231 FETCH (UID 18340 MODSEQ (22628)
* 17232 FETCH (UID 18341 MODSEQ (22628))
* 17221 FETCH (UID 18318 MODSEQ (22583))

I simulated a situation in which we go into the mailbox and did not know anything about it before, that is, our local mod-sequence is 0. As you can see, the server returns to us generally all the messages that are in the mailbox, since before that we did not receive anything and don’t know anything about the box. In response to a request for UID letters from CHANGEDSINCE, an untagged response OK also comes with a HIGHESTMODESEQ which we will now save, and for each message our MODSEQ.

We will carry out some operations with the mailbox: add new letters, change the flags. Let's make a new request but with the previous mod-sequence

1 fetch 17221:* (UID FLAGS) (CHANGEDSINCE 22746)
* 17267 FETCH (UID 18378 FLAGS () MODSEQ (22753))
* 17270 FETCH (UID 18381 FLAGS (\Seen) MODSEQ (22754))
* 17271 FETCH (UID 18382 FLAGS () MODSEQ (22751))
* 17273 FETCH (UID 18384 FLAGS () MODSEQ (22750))

and we already see the difference, instead of outputting 20 old and new communities that just arrived (asterisk in 17221: * means to take letters from number 17221 to the maximum possible) we receive letters whose MODSEQ is greater than the previous one. This helps quite well to synchronize a folder in which we have not been for some time and get a sort of cast of the changed letters, instead of trying all possible ones.

It would seem, much better? But QRESYNC makes the synchronization operation even faster, it allows you to specify the MODSEQ parameters and the message UIDs known to us right during the folder selection. Let's explain with an example. First, QRESYNC must be enabled with the ENABLE command.

1 ENABLE QRESYNC
* ENABLED QRESYNC
1 OK Enabled (0.001 + 0.000 secs).
1 SELECT INBOX (QRESYNC (0 0))
* FLAGS (\Answered \Flagged \Deleted \Seen \Draft $Forwarded $MDNSent)
* OK [PERMANENTFLAGS (\Answered \Flagged \Deleted \Seen \Draft $Forwarded $MDNSent \*)] Flags permitted.
* 17271 EXISTS
* 0 RECENT
* OK [UNSEEN 17241] First unseen.
* OK [UIDVALIDITY 1532079879] UIDs valid
* OK [UIDNEXT 18385] Predicted next UID
* OK [HIGHESTMODSEQ 22754] Highest
1 OK [READ-WRITE] Select completed (0.001 + 0.000 secs).

since we did not know anything about the folder before that, the server returns only information about the folder to us, without a nugget of its changes. Suppose we asked the first twenty messages and remembered their UID and also HIGHESTMODESEQ. We leave the folder, send ourselves a message, delete the message, change the flags and return with the past information about the folder

1 CLOSE
1 OK Close completed (0.001 + 0.000 secs).
1 SELECT INBOX (QRESYNC (1532079879 22754 18300:18385))
* FLAGS (\Answered \Flagged \Deleted \Seen \Draft $Forwarded $MDNSent)
* OK [PERMANENTFLAGS (\Answered \Flagged \Deleted \Seen \Draft $Forwarded $MDNSent \*)] Flags permitted.
* 17271 EXISTS
* 0 RECENT
* OK [UNSEEN 17241] First unseen.
* OK [UIDVALIDITY 1532079879] UIDs valid
* OK [UIDNEXT 18386] Predicted next UID
* OK [HIGHESTMODSEQ 22757] Highest
* VANISHED (EARLIER) 18380
* 17269 FETCH (UID 18383 FLAGS () MODSEQ (22757))
* 17271 FETCH (UID 18385 FLAGS () MODSEQ (22755))
1 OK [READ-WRITE] Select completed (0.001 + 0.000 secs).

And now, when choosing a changed folder, we immediately get a nugget of changes, in the form of a response VANISHED (EARLIER) for messages that were deleted, and FETCH for messages that were added or changed. Now it’s even easier to synchronize the folder if the user has not visited it for a long time. This is a very cool way if you have a bunch of messages stored locally in the cache and you don’t want to compare them with messages on the server.

The first parameter of this request is UIDVALIDITY, which is essentially used to verify that the uid that you received previously did not change in the folder. This can happen if the server changes session uid from session to session for all messages or the folder was deleted and a folder with the same name was created in its place.

The second parameter is the HIGHESTMODSEQ known to us and the last is the interval of known UIDs, they can be written as a colon, if the interval is continuous, or separated by a comma.

Conclusion

In my example, I came across a situation where ignorance of the subject area leads to incorrect and suboptimal operation of the application. I did not cover all possible options for using the protocol with this article. But I hope for the next developer of the IMAP client the information above will be useful.

IMAP has a ton of interesting stuff. Commands for quick synchronization is just the beginning, in fact, you can further optimize different IMAP commands, depending on the capabilities of the server, and make working with mail faster, more economical on network and memory, and generally more pleasant. But I will talk about this later.

Gotta go fast. Fast IMAP Email Sync