Data Engineer or die: a single developer story

In early December, I made a fatal mistake, made a pivotal decision in my life as a developer, and transferred to the Data Engineering (DE) team inside the company. In the article I will share some of the observations that I made during two months of working in the DE team.




Why Data Engineering?


My journey to DE began in the summer of 2019, when we Xnegdrove to the Distributed Computing School , and there I reached enlightenment. I began to become interested in the topic, study algorithms and even write about them , and then I thought about the field of application and quickly found out that the practical application in our company is distributed databases.

What does our team do in general? We, like all the fashionable boys and girls, want to become a Data Driven Company. And in order to make this possible, we need to at least build a reliable repository, on which it will be possible to build any reports required by the company. But the most important thing is that data in this storage must be trusted. Moreover, according to these data, it is necessary to be able to restore the state of the system at time t. All this is complicated by the fact that we live in a brave new world of microservices, and this ideology implies that each service implements its small functionality, its database is its own business, and it can delete it at least every day, but at the same time we have to Be able to receive and process the status of the service.

Want to be Data Driven, first become an Event Driven


Not so simple. Events are different, and the developer and the date engineer look at them differently. The conversation about events is the topic of a separate article, so here I will not go into it. Moreover, someone Martin Fowler already wrote such an article , I wonā€™t take his laurels from him, let him become famous too.

In general, there is something to think about and the area is attractive. It just so happened that in our company Data Engineer is a much wider area of ā€‹ā€‹responsibility than just a person who writes ETL / ELT pipelines (if you donā€™t know what these abbreviations mean, come to mitap . As a contextual advertisement ).

We are engaged in the architecture of building a warehouse, and modeling data, and issues related to data security, and the pipelines themselves, of course. And we need to make sure that, on the one hand, product developers were not very burdensome with our presence and had to be distracted as little as possible by our requirements when they saw new features in the system, and on the other hand, we need to provide conveniently stored data for analysts and BI teams. That's how we live.

Difficulties in moving from development


On the very first day of my work, I encountered a number of difficulties that I want to share with you.

1. The first thing I saw was the lack of tuning and some practices. Take, for example, code coverage with tests. In development, we have hundreds of frameworks for testing. When working with data, everything is more complicated. Yes, we can test ETL pipelines on test data, but we have to do all this with our hands and look for solutions for each specific case. As a result, test coverage is much worse. Fortunately, there is another layer of feedback in the form of monitoring and logs, but this already requires us to react rather than proactive, which infuriates us .

2. The world from the position of DE, is not at all what it seems to an ordinary product developer (well, of course, the reader is not like that, and he already knows everything, but I didnā€™t know and now heā€™s raking). As a developer, I saw my microservice, put the data in [database of your choice], saved my state there, got something by IDā€™s and itā€™s normal. Service is spinning, orders are muddying, that's all. They ask me in another service to fumble my state, well, I'll throw an event into some RabbitMQ and that's it. And here we again returned to the issue of events described above.

What the service needs for operational work does not suit us for historical data, the question of processing service contracts and working closely with development teams begins. You canā€™t even imagine how many hours it took us to agree: what kind of event Driven is he in our company.

3. You need to think with your head. No, I donā€™t mean that the developers donā€™t think (although who am I to speak for everyone), itā€™s just that in product development you often have some kind of architecture, and youā€™re cutting various backlogs. Of course, this requires planning and reflection, but this is a streaming work, where the main problem is simply good and high quality to do.

Itā€™s not so simple here because the transfer of various system components from a warm and comfortable monolith to the world of wild microservice jungle is not so simple. When the service begins to sprinkle with events, you need to revise the logic of filling the storage, because the data now looks different. Here you need to think a lot and thoroughly, not as a developer, but as a data engineer. It's a normal story when you spend days with a notebook and pen or with a marker near the board. It is very difficult, I donā€™t like to think, I love fig-figs and in production.

4. Perhaps the most important is information. What do we do when we lack knowledge? Who said stackoverflow? Take this person out of the room. We are going to read docs, books on the topic, and there is still a community that organizes forums, meetings and conferences. The documentation is cool, but unfortunately it is incomplete. We are using Cosmos DB in a number of projects. Good luck reading the documentation for this product. Books are the only salvation, fortunately, they exist and can be found, they have a lot of fundamental knowledge and you have to read a lot and constantly. But the community is in trouble.

Now in our direction it is difficult to find at least one adequate conference or meeting. Of course, there are a lot of mitaps with the word Data, but strange abbreviations like ML or AI usually appear next to this word. So, this is not for us, we are talking about how to build storage facilities, and not how to smear with neurons. These hipsters filled everything. As a result, we are without a community. By the way, if you are a Data Engineer and know good communities, please write in the comments.

Conclusions and announcement of the mitap


What do we have in the end? My first experience tells me that to feel in the shoes of an engineerā€™s date will be useful to every developer. It just allows you to look at things differently and not be surprised when our eyes are bleeding at the sight of how developers treat their data. So if your company has DE, just chat with these guys and learn a lot (about yourself).

And finally, the announcement. Since itā€™s impossible to find mitaps on our topic during the day, we decided to make our own. And what, what are we worse? Fortunately we have an amazingSchvepsssand our friends from New Professions Lab , who, like us, think that date engineers are unfairly deprived of attention.

I take this opportunity to invite all concerned to come to our first community meeting with the promising name ā€œDE or DIEā€, which will be held on 02.27.2020 in the office of Dodo Pizza. Details on TimePad .

If anything, I'll be there, you can personally tell me in person, how wrong I am about the developers.

All Articles