🌏 🤟🏻 🚧 We check on ourselves: how to deploy and how to administer 1C: Document flow within the 1C company ▶️ 🚦 🎅🏼

We at 1C widely use our own developments to organize the work of the company. In particular, “1C: Document Management 8” . In addition to document management (as the name implies), it is also a modern ECM- system (Enterprise Content Management - corporate content management) with a wide range of functionalities - mail, employee work calendars, organization of shared access to resources (for example, booking conference rooms), accounting working hours, corporate forum and much more.

At 1C, more than a thousand employees use documents. The database has already become impressive (11 billion records), which means that it requires more thorough maintenance and more powerful equipment.

How the work of our system is arranged, what difficulties we face in servicing the database, and how we solve them (we use MS SQL Server as the DBMS), we will describe in the article.

For those who first read about 1C products.
1C: Document management is an application solution (configuration) implemented on the basis of a framework for developing business applications - 1C: Enterprise platform.

“1C: Document Management 8” (in abbreviated form - DO) allows you to automate the work with documents at the enterprise. One of the main tools for employee interaction is email. In addition to mail, DO also solves other tasks:

Time tracking
Accounting for absences
Applications for couriers / transport
Employee Calendars
Registration of correspondence
Employee Contacts (Address Book)
Corporate forum
Room reservation
Event planning
CRM
Collective work with files (with preserving file versions)
and etc.

We enter Document Management as a thin client (native executable application) from Windows, Linux, macOS, a web client (from browsers) and a mobile client - depending on the situation.

And thanks to our other product connected to the Document Management - the Interaction System- we directly in the Document Management receive the messenger functionality - chats, audio and video calls (including group calls, which has now become especially important, including from a mobile client), quick file sharing plus the ability to write chat bots that simplify working with the system. Another plus from using the Interaction System (in comparison with other messengers) is the ability to conduct contextual discussions tied to specific objects of the Document Management - documents, events, etc. That is, the Interaction System integrates deeply with the target application, and does not act as a “separate button”.

The number of letters in our DL has already exceeded 100 million, and in general in the DBMS - more than 11 billion records. In total, the system uses almost 30 TB of storage: the database is 7.5 TB, files for collective work are separate and occupy another 21 TB.

If we talk about more specific numbers, then here is the number of letters and files at the moment:

Outgoing letters - 14.7 million.
Incoming letters - 85.4 million
File Versions - 70.8 Million
Internal documents - 30.6 thousand

In DO there is not only mail and files. Below are the numbers of other accounting objects:

Conference room reservation - 52 126
Weekly reports - 153,940
Daily reports - 628 153
Visa approval - 11,821
Incoming documents - 79 677
Outgoing documents - 28 357
Records of events in the working calendars of users - 168,228
Applications for couriers - 21 883
Counterparties - 81 029
Records of work with contractors - 45 632
– 41 795
– 10 243
– 6 320
– 245 980
– 26 282
– 891 095
- – 109 056. – , , , , .. , , , , . , , .

?

These figures indicate an impressive amount of tasks, so we faced the need to allocate fairly productive equipment for the needs of internal subsidiaries. To date, its characteristics are as follows: 38 cores, 240 GB of RAM, 26 TB drives. We

give the table of servers: In the future we plan to increase the capacity of the equipment.

What about server loading?

Network activity has never been a problem for us or our customers. As a rule, the weak point is the processor and disks, because everyone already knows how to deal with a lack of memory. Here are screenshots of our servers from Resource Monitor, which show that we have no terrible load, it is very modest.

For example, in the screenshot below, we see an SQL server where the CPU is 23% loaded. And this is a very good indicator (for comparison: if the load approaches 70%, then, most likely, employees will observe quite significant slowdowns).

The second screenshot shows the application server on which the 1C: Enterprise platform runs - it only serves user sessions. Here the processor load is slightly higher - 38%, it is smooth and calm. There is disk loading, but it is acceptable.

The third screenshot shows another 1C: Enterprise server (it is the second, we have two of them in the cluster). Only the previous one serves users, and robots work on this. For example, they receive mail, route documents, exchange data, consider rights, etc. All these background activities perform approximately 90-100 background tasks. And this server is very busy - at 88%. But this does not affect people, and it implements just all the automation that Document Management should do.

What are the metrics to determine performance?

We have built into the BS a serious subsystem for measuring performance indicators and computing various metrics. This is necessary in order to understand at the current moment of time and in the historical perspective what is happening in the system, what is getting worse, what is getting better. Monitoring tools - metrics and measurements of time - are included in the standard package "1C: Document Management 8". Metrics require tuning on implementation, but the mechanism itself is typical.

Metrics are measurements of various business indicators at certain points in time (for example, average mail delivery time at the moment of 10 minutes).

One of the metrics shows the number of active users in the database. On average, there are 1000-1400 of them per day. The graph shows that at the time of the screenshot there were 2144 active users in the database.

There are more than 30 such actions, the list is under the cut.

List

The week before last, our average user activity increased one and a half times (the graph shows in red) - this is due to the transition of most employees to remote work (in connection with known events). Also, the number of active users increased by 3 times (shown in blue on the screen), as employees began to actively use mobile: each mobile client creates a connection to the server. Now, on average, for each of our employees, there are 2 connections to the server.

For us, as for administrators, this is a signal that we need to be more attentive to issues of speed, to see if it has become worse. And we look at it in other ways. For example, how mail delivery time for internal routing changes (the screenshot below shows blue). We see that it jumped until this year, and now it is stable - for us it is an indicator that everything is in order with the system.

Another applied metric for us is the average waiting time for downloading letters from the mail server (shown in red in the screenshot). Roughly speaking, how long will the letter go on the Internet before it comes to our employee. The screenshot shows that this time has also not changed in any recent way. There are separate outbursts - but they are associated not with delays, but with the fact that time gets lost on mail servers.

Or, for example, another metric (shown in blue in the screenshot) - updating letters in a folder. Opening a mail folder is a very common operation, and it needs to be done quickly. We measure how fast it is being performed. This indicator is measured for each client. You can see the overall picture of the company, and the dynamics, for example, of an individual employee. The screenshot shows that until this year the metric was unbalanced, then we made a number of improvements, and now it does not get worse - an almost even schedule.

Metrics are basically an administrator’s tool for monitoring the system, for quickly responding to any changes in the behavior of the system. On a screenshot - metrics of internal DO for a year. The jump in the graphs is due to the fact that we have been set the task of developing internal subsidiaries.

Here is a list of some more metrics (under the cut).

Metrics

()
10
:
( )
( )
( )
( )
()
« »

Our system makes measurements of more than 150 indicators around the clock, but not all of them can be quickly monitored. They can come in handy later, in some historical perspective, and you can focus on the most important for business.

At one of the implementations, for example, only 5 indicators were selected. The customer set himself the goal of making a minimum set of indicators, but at the same time such that he covered the main scenarios of work. It would be unjustified to include 150 indicators in the act of acceptance, because even within the enterprise it is difficult to agree on which indicators are considered acceptable. And they knew about these 5 indicators and already presented them to the system before the start of the implementation project, including in the tender documentation: card opening time no more than 3 seconds, task execution time with a file no more than 5 seconds, etc. In our subsidiaries, there were precisely metrics that very clearly reflected the initial request from the customer’s requirements.

And we also have a profile analysis of performance measurements. Performance indicators are the fixation of the duration of each operation that is performed (writing a letter to the database, sending a letter to the mail server, etc.) It is used exclusively by technical specialists. We have a lot of performance indicators in our program. We now measure approximately 1,500 key operations, which are broken down by profile.

One of the most important profiles for us is the “List of key indicators of mail from the point of view of consumers”. This profile includes, for example, the following indicators:

Command Execution: Filter by Tag
Opening a form: List form
Command Execution: Select by Folder
Display letters in the reading area
Saving a letter to your favorite folder
Search letters by details
Create a letter

If we see that the metric for some business indicator has become too large (for example, letters from a specific user began to arrive for a very long time), we begin to understand, turn to measuring the time of technical operations. We have the technical operation “Archiving letters on the mail server” - we see the excess of time for this operation for the last period. This operation, in turn, decomposes into other operations — for example, establishing a connection to the mail server. We see that for some reason it suddenly became very large (we have all the measurements in a month - we can compare that last week 10 milliseconds, and now 1000 milliseconds). And we understand that something here is broken - we need to fix it.

How do we maintain such a large database?

Our internal DO is an example of a really working highly loaded project. Let's talk about the technical features of its database.

How long is the restructuring of large database tables?

SQL server requires periodic maintenance, cleaning up the tables. In a good way, this should be done at least once a day, and for highly demanded tables - even more often. But if the base is large (and our number of records has already exceeded 11 billion), then it’s not easy to take care of it.

We did the restructuring of the tables 6 years ago, but then it began to take so much time that we no longer fit into the nightly intervals. And since these operations heavily load the SQL server, it cannot provide quality services to other users.

Therefore, now we have to apply various tricks. For example, we cannot perform these procedures on complete data sets. You have to resort to the Update Sample 500000 rows procedure - this takes 14 minutes. It does not update statistics for all the data in the table, but selects half a million rows, and calculates statistics from them that it uses for the entire table. This is some assumption, but we are forced to go for it, because for a particular table the collection of statistics for the entire billion records will be unacceptably long.

We also optimized other maintenance operations by making them partial.

Maintenance of a DBMS is generally a difficult task. In the case of active interaction between employees, the database is growing rapidly, it is becoming increasingly difficult for administrators to maintain it - updating statistics, defragmenting, and indexing. Here we need to apply different strategies, we well know how to do this, we have experience, we can share it.

How is backup implemented with such volumes?

A full backup of the DBMS is performed once a day at night, incremental - every hour. Also, a file directory is created every day, and it is a portion of the incremental backup of the file storage.

How long does a full backup take?

On a hard disk, a full backup is performed in three hours, partial - in an hour. It takes longer to write to the tape (a special device that backs up to a special cassette stored outside the office; an alienable copy is made to the tape, which will be saved if, for example, the server burns out). The backup is done exactly on the same server, whose parameters were higher - SQL-server with 20% processor load. At the time of backup, of course, the system becomes much worse, but it is still operational.

Is there deduplication?

There is file deduplication , we run it on ourselves, and soon it will be included in the new version of Document Management. We also run on the counterparty deduplication mechanism. There is no deduplication of records at the DBMS level, since this is not necessary. The 1C: Enterprise platform stores objects in the DBMS, and only the platform can be responsible for their consistency.

Are there read-only nodes?

There are no nodes for reading (dedicated system nodes that serve those who need to receive any reading data). DO is not an accounting system to put on a separate BI node, but there is a separate node for the development department, which is exchanged with messages in JSON format, and typical replication times are units and tens of seconds. The node is still small, it has about 800 million entries, but it is growing rapidly.

And messages marked for deletion are not deleted at all?

Not yet. We have no tasks to facilitate the base. There were several rather serious cases when I had to turn to letters marked for deletion, including 2009. Therefore, for now, we decided to keep everything. But when the cost of this becomes unjustified, we will think about removal. But, if you need to remove a separate letter from the database with the ends so that there are no traces, then this can be done by special request.

Why store it? Are there statistics on accesses to old documents?

There are no statistics. More precisely, it is in the form of a user log, but it is not stored for long. Records older than a year are deleted from the protocol.

There were situations when it was necessary to raise the old correspondence of five years or even ten years ago. And this has always been done not out of idle curiosity, but for making complex business decisions. There was a case when without the history of correspondence the wrong business decision would be made.

How is the examination of the value and destruction of documents according to the storage period?

For paper documents, this is done in the usual traditional way, like everyone else. For electronic we don’t do it - let it be stored. The sit is here. There is a benefit. All is well.

What are the development prospects?

Now our DO solves about 30 internal problems, some of which we listed at the beginning of the article. Also DO is used to prepare conferences that we hold twice a year for our partners: the entire program, all reports, all parallel sections, halls - all this is made up in DO, and then it is downloaded from it, and a printed program is made.

On the way for DO there are several more tasks, besides those that he already solves. There are company-wide tasks, but there are unique and rare tasks that are needed only by any particular unit. It is necessary to help them, and therefore, to expand the "geography" of using the system inside 1C - to expand the scope, solve the tasks of all departments. This would be the best test for performance and reliability. I would like to see the system work on trillions of records, petabytes of information.

We check on ourselves: how to deploy and how to administer 1C: Document flow within the 1C company

?