Story Points

The Metrics and Reporting View looks at the various feedback cycles, metrics and reports in Holistic Software Development.

Story Points are an abstract, arbitrary number associated with User Stories that indicates the relative effort (not complexity) required to complete the story development.

Often used in the form of a Fibonnacci series (1, 2, 3, 5, 8, 13, 21, 34, …) or a simple integer value Story Points are intended to indicate relative effort, not complexity, of low level requirements. Story Points are an estimation method based on picking a well-understood “normal” story, setting its value and then estimating other stories relative to the “normal”story.

Points are an abstract team based indicator and are not comparable across teams. They do not equate to “person-days” or complexity and so are fundamentally unsuitable for use in contractual arrangements.

Points cannot be aggregated meaningfully across teams or up to programme level due to their arbitrary team based value. For this reason,we strongly recommend against, using points at both a Programme Backlog and Product Backlog level.

Story Point estimating may be useful within a team to help size Stories for inclusion or not in sprints/iterations. This is an in-team private metric that does not make sense outside of the team. Over time story point sizes tend to lower values as large stories are broken up more is understood about the work (risks reduce in line with the Cone of Uncertainty). As a result, points are not numerically consistent even within the context of a single team over time. We’ve seen many planning and reporting dysfunctions based on poor understanding, and the implied false accuracy of Story Points such as Project Managers setting a target velocity!

Metric: Velocity

How quickly are we working? When will we be done?

Velocity is how many points are completed over time and is often used, especially in Scrum teams, as a measure of progress towards the total number of points.

Over time teams will gain an understanding of roughly how many points they can deliver in a period of time (or an iteration/sprint). This is called their “Velocity” and can be used to extrapolate remaining effort to completion (ETC). This is the original intention behind story points and is why they do not equate to complexity since some very complex things don’t take long to implement but some very simple (but large in volume) requirements can take a long time to implement. When using velocity it will normally vary a little over the lifecycle and will typically be a little unstable during the first couple and last few time periods/iterations/sprints.

This means that teams need to throw away the first few iterations of velocity, then establish at least 3 (preferably 5) iterations to extrapolate something that’s even close to statistically meaningful. That means velocity can only be meaningfully calculated at somewhere approaching the 6th or 7th iteration/sprint.
Based on mining the work item data of 500 projects of diverse types we have found that team velocities are, generally, pretty stable past the first 3 sprints, and until the last 2 or 3. The biggest factor that affects velocity is changing team members, otherwise they’re pretty stable. Despite this long term stability in velocity, 90% of the projects we mined had over-planned sprints. Often when work wasn’t completed in a previous iteration it was simply added to the next, cumulatively overloading the team. Team velocity never increases because more work is planned into a timebox.
Numerically, there seems to be no significant difference in our dataset between simple extrapolation of numbers of done items on a backlog (per Release) and story point extrapolation indicating the Story Points are pointless. Simply tracking the % of items done per Release is simpler, and easier to communicate.

As with any estimate, we recommend presenting it with an Uncertainty Indicator. Extrapolation of progress as effort indicators based on actual activity so far helps teams to answer “how long until it’s done?”. For more on tracking Story Points over time see Workflow Metrics.

Importantly, in our examination of project data (including projects that did, and didn’t use Story Points) we found no statistically significant difference between Velocity and simply the % Complete over time. We recommend against using Story Points and Velocity as a metric because:
  • Abstract numbers are difficult for people and leaders to understand
  • Their meaning changes over time as the Cone of Uncertainty is reduced in teams undermining extrapolation
  • They imply false accuracy by being, often small discrete numbers
  • They offer no additional benefit over simply counting items complete
  • They cannot be aggregated because they are team-defined and not normalized against any standard

We recommend strongly resisting schemes to normalize story points in organizations, or make the same mistakes with other purely abstract measures (such as Business Value Points!)