Photo by Yancy Min at UnsplashKubernetes is a big project. Not only because it is very in demand , but also from the point of view of the source code. At the time of this writing, there were more than 86,000 commits, more than 2,000 participants, more than 2,000 open tickets, more than 1,000 open pool requests and 62,800 stars in the repository on GitHub .The scc utilitycounted more than 4.3 million lines of code on Go (more than 5.2 million lines), of which more than 3 million lines of real code and more than 700 thousand lines with comments, a total of more than 16,000 files, including a directoryvendor/
.We recently developed a tool that processes TODO comments in a codebase to help support such large projects.We decided to set our little parser on Kubernetes sources - and see what happens. Here are some results.tickgit
processed the source code from the 9bf52c2 commit . The output in CSV format was then imported into SQLite for query processing. Note that the tool finds TODO only in the extracted tree. It does not take into account comments that have been added and subsequently deleted. Thus, the numbers reflect only TODOs that are still โlivingโ in the code at the time of this commit.- 2380 TODO in 1230 files from 363 authors
- 460 TODO , ,
// TODO (patrickdevivo) Fix the ...
- 489 TODO 2019
- TODO โ 860 ( 2,3 )
- TODO โ 6 2014 ( ยซ ยป)
- TODO 9 2019 ( )
- TODO: 33
- deads2k TODO (git blame): 147
- TODO, : 64
TODO
33 cluster/gce/util.sh
25 pkg/apis/core/types.go
23 staging/src/k8s.io/api/core/v1/types.go
21 staging/src/k8s.io/legacy-cloud-providers/aws/aws.go
20 staging/src/k8s.io/code-generator/cmd/conversion-gen/generators/conversion.go
20 pkg/apis/core/validation/validation.go
16 test/e2e/network/service.go
16 pkg/kubelet/kubelet.go
14 test/e2e/framework/util.go
14 pkg/kubelet/kubelet_pods.go
TODO
deads2k 147
Clayton Coleman 105
Chao Xu 99
Dr. Stefan Schimanski 93
Jordan Liggitt 81
David Eads 60
Random-Liu 54
Wojciech Tyczynski 50
Yu-Ju Hong 43
Prashanth Balasubramanian 38
, TODO ( TODO )
64 6a4d5cd7cc58e28c20ca133dab7b0e9e56192fe3
19 e01ff1641c7321ac81fe5775f6ccb21aa6775c04
19 4fb28dafad121e163fa86dc90067ce3d14415811
18 adb75e1fd17b11e6a0256a4984ef9b18957d94ce
14 963c85e1c807efcdbb82dd44439dc3c55f6a0bfd
14 8b17db7e0c4431cd5fd9a5d9a3ab11b04e2f0a7e
13 f0f78299348afcf770d4e8d89dcea82f80811b28
11 d0b94538b9744d0c06df6ddec2604be168568f9d
10 f1248b9c829e225138ab6d6234221c63092f7592
10 cd663d7ad00937cffa8a09e4761acb95d34c89a3
TODO
34 2014
249 2015
523 2016
650 2017
435 2018
489 2019
, TODO : tickgit todos --csv-output
. SQLite.This is a pretty cursory look at the TODO comments in Kubernetes source code. We see the most active "task directors", which more or less coincide with the leading contributors of the project.We also see that the attitude to TODO comments is not different from the norm, just because of the large size of the code base there are also a lot of them.An important observation is that there are more TODO comments than Github tickets (issues). This is interesting because it indicates a significant number of "hidden" tasks that are not immediately visible on GitHub, but are written in the source code.Probably, the main contributors are well versed in their areas of the code base and clearly present the number of their own TODOs and โhidden workโ. But this is not always noticeable to external observers. They are more familiar and understandable to see tickets on GitHub (or in other public trackers).Most developers understand that software projects "live and breathe." Frequent changes, a process of improvement, bug fixes and a lot of discussions are taking place. It is very important to organize the workflow well, because good code requires constant thought. In part, we see this in action through TODO comments in Kubernetes sources. Although we have nothing to compare with, the average age of tasks of 2.3 years seems rather high. Developers close to the project can more objectively evaluate this indicator. It is interesting to compare it with other large open source projects.A deeper analysis would include all of the TODOs in history, and not just those that remain at present. You can consider the following issues:- How fast do TODOs close?
- What is the average lifespan of a TODO comment?
- What do popular codebases look like in comparison?
How important is it?
TODO comments usually cover a type of work that is too small for a ticket, but important enough to be noted and described in a comment (although many refer to tickets / issue). Because comments are part of the code, they are often โcloserโ to the work that needs to be done. They are easy to add, but it seems just as easy to forget (Kubernetes sources still have over 1800 TODOs added before 2019).We hope that our tool for analyzing metadata in code will help developers to service projects of any size. Raising TODO comments to the surface is only part of what needs to be done.