☕️ 🤞🏻 🙆🏾 Assessing Tasks at Story Points ➗ ♣️ 🔶

Almost every person who has come across software development knows what Story Points (SP) task assessment is, however, I occasionally tell colleagues from other departments or newcomers to the team who have never encountered such an approach, why we use SP and why it is convenient for the team and effective for the company.

The purpose of this text is to describe what SP is, how to use them to evaluate problems, and why this technique has become so widespread.

Problem

Calculating the time required to complete a task is both a very simple and very risky task that development teams face.

An incorrect assessment becomes one of the first reasons for the breakdown of schedules or even the failure of the project.
The problem is that the business sees valuations as liabilities. Developers view ratings as assumptions.

To illustrate, I will cite an example of a fictional dialogue from Robert Martin's book, The Ideal Programmer.

Mike (Manager): What is the likelihood that you can manage in three days?

Peter (Developer): I can handle it.

Mike: Can you name a number?

Peter: Fifty or sixty percent.

Mike: So there is a pretty high probability that you will need four days?

Peter: Yes. even five or six may be needed, although I doubt it.

Mike: To what extent do you doubt it?

Peter: Oh, I don’t know ... I’m ninety-five percent sure that the work will be done in less than six days.

Mike: So maybe seven?

Peter:Well, if everything goes awry ... Damn, if EVERYTHING goes awry, maybe ten or even eleven days. But the probability of such a coincidence is very small, right?

I think the dialogue above sounds pretty familiar to any developer or project manager.

Unfortunately, problems with grades do not end there. Other pitfalls should also be considered:

Correlation of Grade and Grade

The rating given is valid only if the author of the rating will implement the task. After all, it is obvious that the time spent on the task by the senior developer and the intern will be different.

An ideal assessment in an imperfect world

Urgent meetings, work letters, messengers and a fallen task manager further complicate the already complex development process, which makes the ideal hours that we imagine when evaluating poorly useful for a project manager trying to assemble a rapidly aging Gantt chart.

Next, we will consider the approach to evaluating tasks in SP and how it addresses all the difficulties described above.

Alternative solutions

Naturally, the approach using SP is not the first attempt to solve the problems voiced, although it is probably the most popular.

In this block I will talk about another program that includes a task assessment scheme. The program is called PERT and familiarity with it is not necessary to achieve the goal of the texts, so you can safely proceed to the next block.

Program Evaluation and Review Technique

PERT Program Evaluation and Review Technique 50- XX .

:

O: . .

N: .

P: , , .

:

μ = \frac{O + 4 N + P}{6}

$\mu= \cfrac {O+4N+P} {6}$

, :

σ = \frac{P - O}{6}

$\sigma=\cfrac{P−O}{6}$

, :

\frac{1 + 12 + 12}{6} \pm \frac{12 - 1}{6}

$\cfrac{1+12+12}{6} ± \cfrac{12-1}{6}$

, . , , .

Story points

What are Story Points, and how do they help evaluate tasks? Mike Cohn, Agile Evangelist and CEO of Mountain Goat Software, talks about this technique very briefly and clearly.

What if, instead of evaluating the time it takes to complete a task, we will evaluate the effort required to solve this problem? To do this, we will take the rating scale and put on it tasks that require evaluation.

At the same time, all factors that may affect it should be included in the assessment of efforts:

The amount of work required;
The technical complexity of the task;
Possible risks and uncertainty in requirements;

It doesn’t sound easy, but let's recall that we don’t need to give each task a clear rating, we just need to find its place on the rating scale between other tasks to be evaluated.

I want to emphasize two important aspects of the Story Points method that allow it to solve the problems that we discussed on the previous page:

Relative Assessment

The tasks are evaluated relative to each other, thus a universal rating scale arises that does not depend on the experience of the evaluator. Even if the task is replaced by the responsible one - its assessment will remain unchanged, evaluate fairly new tasks relative to this scale.

Replacing watches with abstract points

So we remove from the evaluator the need to evaluate the task in hours. Instead, he evaluates it in points, so we remove the contradictions in the perception of the evaluation by the developer and manager. Moreover, now distractions and force majeure circumstances will not affect the assessment in any way, because they do not change the efforts required to solve the problem!

Fibonacci numbers, T-shirts and dogs

Yes, yes T-shirts and dogs. You can use any scale to evaluate tasks. The most common are Fibonacci numbers, these are understandable numerical values and also with a nice bonus: the elements of this sequence reflect well the growth of uncertainty that arises with the complexity of the estimated problem.

However, some teams use an alternative rating scale. The most common is an assessment in T-shirts and dogs, when the complexity of the task is indicated in the size of the T-shirt (S, M, L, XL) or in the breed of the dog (Chihuahua, Pug, Dog). Thus, teams are even more abstracted from the numerical representation of the assessment, which in some cases even undermines the transfer to a temporary assessment.

Team score

What is the difference between team assessment and individual assessment?
Why is it important to involve the whole team in grading?

One of the biggest mistakes that can be made when evaluating tasks is to make it yourself and not to ask the opinions of team members. Maybe they have an opinion on this? Want to add new browser support? What do QA think about this?

People are the most important evaluation resource. They can see what you do not see.

But how to conduct a team assessment? Just shouting grades is not very effective, besides, having heard your grade, another team member may change his mind and will not voice his own.

Poker planning

In 2002, James Granning described a method that later became so popular that now you can even buy real decks of cards for poker planning. Or use one of the online services for the session;

The essence of the method is as follows: all participants of the team are dealt cards with numbers from the rating scale. Then a task is selected and its requirements are discussed. After discussion, the moderator asks all members of the team to choose a card and put it upside down. Then the moderator gives a signal to show the cards.

If the ratings of the participants are consistent, the rating is fixed, otherwise the cards are returned to the hand, and team members continue to discuss the problem. It’s a good idea to ask those who have different grades: “What difficulties do you see in this task?” or "Why do you think that during the implementation there will be no problems?".

It is worth noting that consent should not be absolute. You can agree that a set of neighboring ratings is also considered a consent.

Alternatives

Like the evaluation method itself, poker planning has alternatives. I will briefly talk about one of them.

You can skip this block and go directly to the next page.

Affine rating

« . , . , . — . , , , .

, . , . .

, , .

, „“ .

Project planning

How many hours is there at Story Point'e and how do I build a Gantt chart?

So, we appreciated our backlog of tasks, but you can’t build a project plan on Story Point'ah. Often the project manager has a question: “How to translate SP into hours?”.

The short answer to this question is: “No way.”

Of course, you can follow the developers with a stopwatch and record the time it took them to solve the problem, and then display this information in a graph. Then you get the classic “bell”, as in the example in the block below. As we see in the first figure, some tasks take a little more time, some a little less, but in general, the whole value will correspond to some normal distribution.

The same is true for tasks in 2 SP and this is shown in the second figure. Have you noticed that the “tails” of the graphs intersect? Yes, some tasks rated at 1 SP may require more effort than the simplest tasks rated at 2 SP. In the end, no team has yet learned to evaluate perfectly. In addition, translating the SP into hours, we return to the old rake, how much time the developer will need to solve a specific problem depends heavily on the developer.

But what to do, we cannot completely abandon planning. Fortunately, for this we do not need to translate each Story Point into hours. What really matters is how much SP the development team can “close” for the sprint (iteration, release).

By collecting data on team speed, you can get sufficiently accurate data for long-term project planning. In addition, do not forget about the law of large numbers, the estimation errors are mutually compensated, this applies to both tasks and iterations. It is worth noting that this is a little optimistic, because inaccuracies are usually associated with underestimation rather than revaluation. But nothing is perfect.

Speed (or Velocity) is a powerful planning tool and the main metric of the development team. The team must work on continuous improvement in order to increase their speed. Do not forget that speed is a derivative of SP and therefore also relative. You can not compare two teams with each other, the team competes with itself.

Practice

What nuances do you need to know?
What mistakes can be avoided?

In conclusion, I want to collect some tips for those who for the first time decided to try the described techniques in their work.

Where to start

This is your first poker planning and the team does not understand what to evaluate new tasks. Collect a few tasks already completed, ideally well familiar or typical, and evaluate their complexity relative to each other. Use these tasks to evaluate new ones.

Do you have a new project and no completed tasks? Try using the affine rating described above and assign tasks to the rating scale.

Do not average ratings

Sometimes, when two team members have evaluated the task differently, it is tempting to assign the average score to the task and move on. Do not give in to this temptation, discussion is an important element of evaluation, during which the team can reveal previously unknown features in the implementation of the task.

But, as mentioned above, you can always agree that estimates that are close to each other will not be a reason for further discussion.

Do not change the ratings.

Even if during the implementation you realized that you made a mistake during the planning, leave the rating unchanged. You will be mistaken in the future, and in both directions. Let these errors compensate each other, do not interfere with the process.

Bug rating

I came across different approaches to evaluating bugs. Some teams evaluate all bugs except those that arose during the implementation of new tasks in iteration. Some do not evaluate bugs, justifying this by the fact that the speed of the team should show a new value that is added to the product, and fixing bugs should not affect the growth of this indicator.

Whichever approach you choose, stay consistent. Information about the historical speed of the team should not be affected by the use of different approaches to the assessment.

Zero ratings

Another question that does not have a clear answer. Someone believes that there are no tasks that do not require effort. Others answer them that assigning points to simple tasks leads to an unreasonable increase in the team’s speed graph.

You can enter a score of 1/2 points for such tasks and retrospectively monitor whether the proportion of such tasks exceeds reasonable limits. But the main advice is the same, stay consistent in your decisions.

Reassessing unfinished tasks between iterations

It is not always possible to complete a task in one iteration, even if it was originally planned. Nevertheless, you should not change its assessment when planning the next iteration based on the amount of remaining work. Keep this in mind when planning, but leave the estimate unchanged for the story.

Retrospective ratings

If you are not conducting retrospectives yet - it's time to start! This is a great team tool to increase team speed and coherence. However, this is a separate issue.

In the course of your retrospectives, go over the estimates made during the iteration planning and discuss whether there were any large deviations between expectations and reality.

You can also get several tasks from the history with the same ratings and discuss whether all these stories really required the same amount of effort.

Record everything.

If your task management system does not support ratings and does not automatically calculate team speed, then you will have to do it manually. As you probably already guessed, historical data is an important tool for improving your grades.

Assessing Tasks at Story Points