SEMrush vs SimilarWeb - who is more accurate?

When you analyze the effectiveness of marketing and the site as a whole, it is important not only to collect your own statistics, but also to compare it with market trends. For example, the marketing team did not reach the 5% plan. If at the same time the market dipped by 15%, then the result is good, but if on the contrary it has grown, that is, something to think about.

There are many tools for researching competitor sites. All of them collect traffic information in various ways and process it according to their own algorithm. It is clear that such services provide data with a certain margin of error. The question is what this error is and how much you can trust the results.

We decided to conduct a little research and find out how accurate in their performance two popular services for analyzing the web traffic of competitors' sites - SEMrush Traffic Analytics and SimilarWeb. For comparison, we used Google Analytics data for 787 sites available to us in OWOX BI.

Before moving on to the study itself, let's figure out where each of the services comes from.

Where does the data come from


The Google Analytics tracking code collects user behavior data directly from the site. This information is not available to third parties.

SimilarWeb uses the following sources:

  • Data obtained directly from some site owners.
  • Data from partners, Internet providers with millions of subscribers.
  • Public data sources: patented technologies and indexing mechanisms that constantly scan publicly available data.
  • Panel data from browser extensions. This is anonymous information that does not allow you to identify the user, but makes it clear which sites he visited.

SEMrush Traffic Analytics reports are based on the same clickstream data as SimilarWeb, they come from own and third-party sources, and are processed using SEMrush AI algorithms. Data is accumulated and approximated based on the anonymous behavior of millions of real Internet users.

In addition, over more than 10 years of work, SEMrush has released many tools that allow you to find out not only data on the search positions of competitors, but also what they do in contextual advertising, PR, content marketing, social networks, as well as detailed data on their traffic websites. For each direction (SEO, content, PPC, SMM), the service uses special maximum relevant data sources.

How we compared services and considered the error


Both SEMrush and SimilarWeb can be used to evaluate competitors, leads or inbound traffic partners. Since OWOX BI has access to anonymous, anonymized data in Google Analytics for our users, we assume that we know the attendance of a number of projects that is close to the truth. Based on GA data as truth, we decided to compare how accurate SEMrush and SimilarWeb are in our data. Our task is to show in which segments and how much each of these services deviates.

What did they compare?

The total number of sessions in January 2020 for versions of Google Analytics, SEMrush and SimilarWeb. The sample involved 787 sites from Australia, Canada, the USA, Great Britain and Germany.

Sites were grouped by niche:

  1. Computers Electronics and Technology
  2. Entertainment
  3. Finance
  4. Health & beauty
  5. Jobs and education
  6. News and Media
  7. Professional services
  8. Retail
  9. Telecom
  10. Travel

How were they compared?

To calculate the error with which the services consider the attendance of competitors, we have reduced in one table:

  • Anonymized data from 787 sites with traffic of more than 100 thousand sessions per month, to which OWOX BI has access.
  • Session data for the same sites from SEMrush and SimilarWeb.

At the same time, we excluded sites for which GA abnormally low values ​​were. If, according to Google Analytics, the number of sessions is an order of magnitude smaller, it is likely that incomplete data entered the system due to filters in the view.

Then we calculated the deviation modulo in percent for the SEMrush and SimilarWeb data. Why did we work with this quantity?

Deviation can be either a plus or a minus, that is, a service can show more sessions than it actually is, or less. When calculating the average deviation, plus or minus can give a value close to zero. To prevent this, we used a deviation modulo. In other words, it was important for us to know how generally the service was rejected, and not which way.

Then we identified 10 main business niches and grouped all sites by average traffic in three main groups:

  • From 100 thousand to 500 thousand sessions per month.
  • From 500 thousand to 1 million sessions per month.
  • From 1 million and more sessions per month.

results


The higher the standard deviation, the more the indicator differs from the GA data, and vice versa. SimilarWeb's standard deviation ranges from 57% to 61% and does not depend much on site traffic. With SEMrush, on the contrary, the larger the sites (1 million sessions and higher), the more accurate the data and the lower the standard deviation (45%) from GA data.



For sites with traffic from 500 thousand and higher, SEMrush shows more accurate results (9-12%). SimilarWeb worked slightly better for projects with low traffic, although both services showed a high error in this group.

Why it happens? Due to the features of the algorithms for collecting and analyzing events for both services, as well as due to the peculiarities of the clickstream data themselves, on which both services work. Clickstream involves the use of data on a selection of site visitors. Then, using their AI / ME algorithms, companies approximate this data to the entire population of the site’s audience. Accordingly, the smaller the site, the lower the accuracy of the conclusions from clickstream.

What if your site and the sites of your direct competitors are small, and the accuracy of the data on them is small? In this case, you should benchmark with larger competitors in your market. If you compare several major players in the market, you will not only see their performance, but you will also be able to track the development trends of the market as a whole. Thus, by comparing performance and development trends with your achievements, you can determine the effectiveness of your own marketing.

The second graph shows for what proportion of sites each service was more accurate. For example, in a segment of 1 million sessions for 57% of the sites under consideration, SEMrush showed data closer to the values ​​of Google Analytics:



For 52% of sites with traffic from 500 thousand to 1 million SimilarWeb was more accurate.

If we compare a group of 100 thousand to 500 thousand sessions on this and on the first chart, we will see an interesting thing - SEMrush has a higher standard deviation, that is, it makes a larger percentage of sessions, but it still remains more accurate at 53% cases. In other words - it is rarely mistaken, but aptly.

Why is this so? Data accuracy is highly dependent on several factors:

  • How Google Analytics is configured, on which pages of the site is the GA counter and what it measures.
  • How “live” the site is in Google search. If this is a redirect site (domain of the advertising network) or a promotional site that receives mostly advertising traffic, then SEMrush will have underestimated numbers.
  • For sites with a large share of organic traffic, the SEMrush algorithm works better and more accurately than for sites with a small amount of organic matter.
  • . , SEMrush - . . , , , SimilarWeb.

The following two graphs show the standard deviation and the share of more accurate answers for both services by various business niches.

As you can see, SEMrush’s and SimilarWeb’s deviations are very dependent on the niche:



And the percentage of sites for which services gave more accurate indicators also depends on the business specialization:



This graph shows for what percentage of sites each service gave values ​​that are approximate to GA. For example, in the Computers niche, SimilarWeb was more accurate for 58% of sites, and SemRush for 42% (the first columns in the graph).

In the scatter chart below, we showed deviations by minus and plus by SEMrush and SimilarWeb:



Visually, you can determine that there are more red dots in the bottom of the graph, which means that SimilarWeb more often than SEMrush underestimates data compared to Google Analytics.

Brief conclusions


Summary of the study:

  • The accuracy level of both services is approximately the same.
  • SEMrush shows the best results on small sites - it rarely makes mistakes, and where it does not make mistakes, it shows more precisely the competitor.
  • In the segment of 1 million sessions, SEMrush more often than SimilarWeb shows data that is close to the values ​​of Google Analytics.
  • SimilarWeb more often than SEMrush underestimates data compared to Google Analytics.

Neither SimilarWeb nor SEMrush provide 100% accurate data, but they should not - to analyze your own site and traffic you have Google Analytics.

These services are well suited for independent comparison of sites among themselves and tracking trends. But they must be used, like any analytical tool, with an understanding of the nature of the data collected and the measurement error.

All Articles