Migrating from reCAPTCHA to hCaptcha in Cloudflare



Cloudflare announced that it has recently switched from using reCAPTCHA service provided by Google to hCaptcha, which is supported by Intuition Machines. Cloudflare is very pleased that they were able to make this transition, as it helps solve problems with the collection of confidential information relevant at a time when the company relied on Google services. This, in addition, contributes to a more flexible configuration of CAPTCHA tasks offered to site visitors. This change, in principle, affects all Cloudflare users. Therefore, the company decided to share details about the transition to reCaptcha and prepared a material, the translation of which we publish today.

Using CAPTCHA Technology in Cloudflare



One of the services provided by Cloudflare is that the company's customers are given the opportunity to block malicious automatic traffic (bot traffic). We use many mechanisms aimed at solving this problem. If we are absolutely sure that some traffic is harmful, we completely block it. If we know for sure that some traffic is the result of normal human activity, we skip it. The same applies to normal traffic generated by bots - like search engine bots. But sometimes, in cases where we do not have full confidence in the nature of traffic, we subject this traffic to a “test”.

We have different tests. Some of them are fully automatic, but one of these tests requires human intervention. Similar tests are known as CAPTCHA (in Russian they are called "captcha"). This abbreviation stands for Completely Automated Public Turing Test to Tell Computers and Humans Apart - a fully automated Turing public test to distinguish between computers and people. As you can see, a few T letters in the word CAPTCHA are omitted - otherwise it would look like CAPTTTCHA. CAPTCHA tests usually consist in the fact that the user is prompted to read the distorted text and enter it into the field, or select from the set of pictures those that have traffic lights or pedestrian crossings. The essence of captcha tasks is to make them easy to solve for a person, but not for a computer.


Cloudflare, from the very beginning of the existence of the company, used the Google service reCAPTCHA. This service appeared in 2007 as a research project at Carnegie Mellon University. Google bought this project in 2009. Around the same time, Cloudflare appeared. Google gave free access to reCaptcha in exchange for the fact that the data from the service was used to train the company's visual identification systems. When we were looking for a CAPTCHA solution for Cloudflare, we chose reCATPCHA because this service was efficient, scalable and free. The last item on this list was important to us due to the fact that so many Cloudflare customers use our free services.

About privacy and locks


From the early days of using reCAPTCHA on Cloudflare, some of our customers have expressed concerns that we use the Google service. Google’s business is focused on targeted advertising. Cloudflare does not do this. We have a strict privacy policy. We were comfortable with the privacy policy associated with reCAPTCHA, but we understand the reasons why some of our customers are worried that they have to transfer more data to Google than they would like.

In addition, we are experiencing problems in some regions, such as China, where Google services block from time to time. But only China is 25% of Internet users. As a result, we were always worried that some of these users could not work with sites protected by Cloudflare, if they were asked to solve the captcha problem.

There were already enough questions regarding privacy and locks accumulated over the years to make us think about changing reCAPTCHA for something else. But for us, like most IT companies, it’s difficult to focus on abandoning some widely used technology and on changing it to something new.

Google’s changing business model


Earlier this year, Google informed us that they were going to start charging for using the reCaptcha service. This is their full right. Cloudflare’s captcha-needs service, given our size, is no doubt worth a lot of money, which is noticeable even on a Google scale.

And again, charging for reCAPTCHA is Google’s absolutely reasonable move. If the company's benefit from training image classification systems is less than the cost of maintaining the service, it is clear that Google has a desire to charge for working with this service. In our case, this would mean annual expenditures of millions of dollars, which would be needed only to allow our free users to continue to use reCAPTCHA. This, along with other reasons, in the end was enough for us to start looking for an alternative to reCAPTCHA.

The best captcha


We analyzed many suppliers of CAPTCHA solutions and thought about developing our own service of this kind. As a result, it turned out that the most successful alternative to reCAPTCHA is hCaptcha . We liked a lot in this service:

  1. They do not sell personal data. They collect only the necessary minimum of such data. The company clearly describes the information that it collects and how it uses and discloses the data. The company adheres to these rules by providing the hCaptcha Cloudflare service.
  2. The hCaptcha system has a good level of performance (both in terms of speed and in terms of performance related to solving captcha problems). This level corresponds to our expectations during A / B testing, or even exceeds the level of such expectations.
  3. hCaptcha , - , .
  4.   Privacy Pass, -.
  5. , Google .
  6. hCaptcha , , .

The hCaptcha standard business model is similar to the one used at the start of the reCAPTCHA service. Namely, the company planned to charge users who need image classification data. And those who use hCaptcha on the site were planning to pay a reward. It sounded attractive to us, but unfortunately, although this approach may work well for most ordinary hCaptcha clients, it was not suitable for our scale.

We cooperate with the hCaptcha service in two directions. Firstly, we are in the process of allocating the capacity of our Workers platform, which will take on most of the load when our customers use hCaptcha. Thanks to this, we will reduce Intuition Machines costs. Secondly, we suggested that the company pay her, instead of what she would pay us. This will provide the company with the resources necessary to scale its service so that it meets our needs. Although this means additional costs for us, these costs are only a small fraction of what reCAPTCHA could pay. In return, we get a CAPTCHA platform, which is much more flexible than the one we used before. In addition, we have the opportunity to interact with the development team,which responds very quickly to our requests.

When do our customers show captcha to their users?


When we first started working on this project, we assumed that the main consumers of CAPTCHA would be our Cloudflare Bot Management and Cloudflare Firewall Rules solutions. This assumption, to some extent, has been confirmed. Although Firewall / Bot solutions turned out to be the main consumers of CAPTCHA, their share in the total consumption of this service was only slightly more than 50%.

Here is a summary of those of our solutions, in which users request a captcha output.
Cloudflare SolutionUsing CAPTCHA
Firewall Rules and Bot Management54.8%
IP Firewall18.6%
Security level16.8%
DDoS6.3%
Rate limiting1.7%
WAF Rules1.5%
Other0.3%

Firewall / Bot solutions are at the top of this list. They account for the bulk of captcha. These solutions enforce the rules written by our users. When the conditions specified by these rules are met, captcha is displayed. As an example, here we can cite a situation in which captcha is displayed when the request is evaluated by Cloudflare Bot ManagementIt turns out to be ambiguous. On the one hand, it is below a predetermined threshold value, which may indicate that we are talking about automated traffic. But it, on the other hand, is above a threshold value indicating the uncertainty of the situation. Another common script for using captcha related to the Firewall / Bot section is to show captcha tasks for all requests to a certain site or to a certain endpoint of a site. Our clients can do this in order to limit the number of connections to their servers, or to slow down the work of automated systems that sort out credentials on the login page or create fake accounts. This leads to the fact that some sites that are protected by Cloudflare request to display hundreds of millions of captcha per day.

Second on this list is our IP Firewall solution . It, in general, is similar to Firewall / Bot solutions, but allows you to analyze traffic more accurately, working at the IP address, ASN or country level. The main volume of captchas displayed as part of the IP Firewall service is related to the levels of ASN and countries. Probably, our clients are protected in this way from traffic associated with a certain ASN (for example, can traffic from a cloud provider be generated by ordinary users?), Or are protected from attacks carried out from some countries.

Next comes the Security Levels service . This service is used in two different ways:

  1. It can play the role of a tool to gauge the reputation of an IP address.
  2. She can work in I'm Under Attack mode.

Although we recommend that customers use the I'm Under Attack mode only when they are under an active DDoS attack, some of our customers keep the system in this mode all the time, using it as a primitive mechanism to limit the speed of requests to the site and to filter traffic.

The last main area of ​​use of captcha belongs to one of our automated systems. For example, recently our DoS attack protection engineers taught Gatebot how to use captcha to fix small problems in some specific situations. Now Gatebot can write temporary rules, the application of which leads to the display of attacking captcha.

Finally, some of our clients customize the captcha display by creating Rate Limiting and Managed WAF rule sets.

We were also interested in the question of the types of our clients using captcha. During the week, our customers using the services for free requested about 40-60% of all captcha displayed by Cloudflare. This indicator was obtained taking into account the impact on the display of captcha attacks on sites. Among the two groups of our paid customers - corporate, and those who pay for the services upon their provision, the remaining volume of use of captcha is divided approximately equally. In general, we found out that Cloudflare every second shows several million captcha during an attack on one or more of our customers.

About the problems of transition to a new technology


When we change some part of the Cloudflare system, it makes life easier for some of our customers, but other customers run into problems. We and the hCaptcha development team are ready to solve any arising difficulties. If you or your users encounter difficulties when using hCaptcha - we ask you to write about it on the forum or open a support ticket , while giving as detailed a description of what happened as possible.

If possible, please include in the message Ray ID - the identifier that usually appears at the bottom of the CAPTCHA page. This will help us figure out what went wrong.


Ray id

Summary


Experience tells us that visual (and sound) captcha is far from an ideal solution to many complex problems. Cloudflare continues to work to minimize the number of captchas shown to users, and, in the end, completely abandon this technology. We plan to write more about this. And by the way, do you know what our internal chat is called in which the team involved in the implementation of hCaptcha communicates? You might think that this chat is called New CAPTCHA. But actually it is not. It is called (No) CAPTCHA.

Dear readers! Have you already encountered hCaptcha?


All Articles