😟 🅿️ 😕 How to cut costs in AWS 🤷🏽 👨🏿‍🚀 🎩

The world is changing dramatically. Many companies are looking for any way to cut costs, just to survive. At the same time, the burden on online services is intensifying - especially those related to organizing remote work, conducting video conferences, and online learning.

In these conditions, it is extremely important on the one hand - to ensure the reliability and scalability of your infrastructure. On the other hand, do not fly into the pipe with the purchase of servers, disks, payment for traffic.

We at Bitrix24 are very active in using Amazon Web Services, and in this article I will talk about several AWS features that can help you reduce your costs.

Bitrix24 is a global service, we work for customers around the world. Since 2015, since the entry into force of Law 242- on the localization of personal data, the data of Russian users are stored on servers in Russia. But the entire infrastructure of the service serving the remaining customers around the world is deployed in Amazon Web Services.

The economic crisis is already a fait accompli. The dollar has grown and is unlikely to return to its previous position in the near future, so paying for hosting in foreign currency becomes quite onerous.

In these conditions, if you continue to place your resources in AWS, you will most likely be interested in techniques that will help you save on renting infrastructure.

I will talk about the main ones that we ourselves have been using for many years. If there are any other methods that you use - please share in the comments.

So let's go.

RI - Reserved Instances

The easiest and most obvious way to save money on AWS is to use reserved EC2 instances. If you know that you will definitely use certain instances for at least a year, you can make an advance payment for them and get a discount from 30% to 75%.

Example of calculations for c5.xlarge instances:

All prices and calculations are here .

Sometimes it looks complicated and confusing, but in general, the logic is this:

The longer the reservation period - 1 or 3 years - the greater the discount.
The larger the prepayment immediately - you can do it without any prepayment at all, you can use all upfront, full prepayment - the greater the discount.
If we reserve a specific type of instance, but do not use a convertible one, there is a greater discount.

For ourselves, we make reservations for 1 year, because planning for 3 years is quite difficult. And this allows a good save on EC2.

Spot instances

Using Spot Instances is essentially some kind of resource exchange. When Amazon has a lot of idle resources, you can set the maximum price that you are willing to pay for them for special “Spot Instances”.

If the current demand in a certain region and AZ (Availability Zone) is small, then these resources will be provided to you. Moreover, at a price 3-8 times lower than the price of on-demand.

What's the catch?

The only thing is that if there is no free capacity, the requested resources will not be given to you. And if demand rises sharply and the spot price exceeds your set maximum price, your spot will be terminated.

Naturally, such instances are not suitable, for example, for a database in production. But for some tasks that are associated, for example, with certain calculations, rendering, calculation of models, for tests is a great way to save.

Let's see in practice what amounts we are fighting for. And how often spots can “go away” at a price.

Here's an example of Spot Instance Pricing History directly from the AWS console for the eu-central-1 region (Frankfurt) for c5.4xlarge instances:

What do we see?

The price is about 3 times lower than on-demand.
There are available spot instances in all three AZs in this region.
For three months, the price has never risen. This means that a spot launched three months ago with a given maximum price, for example, $ 0.3, would still continue to work.

How we use spots in practice:

- We use spots for application servers.

- For them, we actively use the CloudWatch + Auto Scaling bundle - to automatically scale throughout the day: the load increases - new instances are launched, the load drops - they are extinguished.

- For safety, we have two Auto Scaling groups working for balancers - in case the spots run out. One AS group - with normal (on demand) instances, the second - with spots. Amazon through CloudWatch Events warns in 2 minutes that the spot instance will be deleted (terminate). We process these events and manage to expand the main group to the required number of instances, if this happens.

Details here - Spot Instance Interruption Notices .

- In order to work more efficiently with Auto Scaling and use the machines in the group with spots to the maximum, we use this approach:

The spot group has a lower upper trashhold - we begin to scale “up” earlier.
And a lower lower trashhold - we begin to scale later “down”.
For a regular group, the opposite is true.

EC2 Instance Store

Everyone who works with AWS knows that the main drives used for EC2 instances are EBS (Elastic Block Store). They can be mounted to instances, disabled, mounted to other instances, snapshoted from them.

All EBS depending on their type are charged in one way or another. And they cost quite tangible money.

At the same time, for many types of instances when creating them, the connection of local disks is available - EC2 Instance Store .

The main feature of these disks is that if you stop the instance with such a disk, then after the start this data is lost.

But at the same time they are conditionally free - there is a payment only for the instance itself.

Such disks can be perfectly used for any temporary data that does not require constant storage - swaps, cache, any other temporary data. The performance of the Instance Store drives is quite high: in addition to the very old types of instances, either SSDs or NVMe SSDs are now used for them.

As a result: we connect less EBS disks, we pay less.

S3 Incomplete Multipart Uploads

The discussion above was mainly about EC2. Next, we describe a few tricks that will save when using S3 (Simple Storage Service).

If you are actively working with S3, then you probably know that both S3 and most of the clients for working with this storage support Multipart Upload - a large object is loaded with “pieces”, which are then “assembled” into a single object.

This works great, but there is one ambush.

If the download is not completed for some reason (the connection was interrupted, for example, and the download did not resume), then the downloaded parts are not deleted by themselves. However, they take up space, you pay for them.

And an unpleasant surprise - this incomplete data is not visible at all when working with standard tools for S3: neither through “ls” in cli, nor in client programs. You can find them, for example, in aws cli using the list-multipart-uploads command .

But working with them is too tiring ...

It would be most logical to take out some option about storing Incomplete Multipart Uploads in the settings of a particular bucket. But for some reason, Amazon did not.

Nevertheless, there is a way to make your life easier and automatically remove Incomplete Multipart Uploads. In the bucket settings on the Management tab, there is a Lifecycle section. This is a convenient tool that allows you to configure various automatic rules for working with objects: move them to other repositories, delete them after some time (expire), and - including - control the behavior of Incomplete Multipart Uploads.

A detailed article on this is on the AWS blog - though with examples from the old interface, but everything is pretty clear.

The important thing is that Lifecycle configured to delete incomplete data will work not only for new objects, but also for existing ones.

The real amount of space occupied in S3 can be monitored through CloudWatch. When we set up the removal of Incomplete Multipart Uploads, we were surprised to find that we freed more than a dozen terabytes ...

S3 Intelligent Tiering

S3 has several different storage classes:

Standard.
Standard-IA (infrequently accessed) - for objects that are rarely accessed.
One Zone-IA - for relatively non-critical data. In this class, objects are replicated to fewer points.
Glacier is a very cheap storage, but you cannot get an object from it instantly. You need to make a special request and wait a while.

All of them have different conditions of use and different prices. They can and should be combined - depending on your different tasks.

But relatively recently, another very interesting type of storage has appeared - Intelligent Tiering.

The essence of his work is as follows: for a small additional fee, monitoring and analysis of your data in S3 is carried out, calls to them are monitored, and if there are no calls for 30 days, the object is automatically moved to the infrequent access storage, which costs significantly less than the standard one. If after some time the object is accessed again - no performance loss occurs at the same time - it is again moved to the standard access repository.

The most important convenience: you yourself do not have to do anything.

Amazon will do everything “in a smart way” itself - it will figure out which objects where to put.

By enabling Intelligent Tiering, we saved up to 10-15% on some buckets.

Everything sounds too good and magical. But can not there be some pitfalls? They are, and they, of course, must be taken into account.

There is an additional fee for monitoring facilities. In our case, it is fully covered by the savings.
You can use Intelligent Tiering for any objects. However, objects less than 128 Kb will never be transferred to the infrequent access level and will always be paid at the regular price.
Not suitable for objects that are stored less than 30 days. Such objects will still be paid for at least 30 days in advance.

How to enable Intelligent Tiering?

You can explicitly specify the storage class INTELLIGENT_TIERING in the S3 API or CLI.

And you can configure the Lifecycle rule, according to which, for example, all objects after a certain storage time will be automatically moved to Intelligent Tiering.

Glacier

If we are talking about different classes of storage in S3, then, of course, Glacier is also worth mentioning .

If you have data that needs to be stored for months and years, but access to which is extremely rare - for example, logs, backups - then be sure to consider using Glacier. The price of its use is many times less than the standard S3.

For convenient work with Glacier, you can use the same Lifecycle rules.

For example, you can set a rule by which an object will be stored in a regular storage for some time, for example, 30-60 days (usually access is needed to the closest logs or backups in time, if we talk about their storage), then it will be moved to Glacier, and after 1-2-3 ... years - completely removed.

It will be much cheaper than storing simply in S3.

* * *

I talked about some tricks that we ourselves are actively using. AWS is a huge infrastructure platform. Surely, we did not talk about any services that you use. If there are any other ways of saving in AWS that are useful to you, please share in the comments.