Should Story Points (as a size measure) be comparable across teams?

Over the years, story points have been used by Agile teams as a relative size measure for user stories. With a relative measure there is no such thing as a unit and so a story point does not equate to any unit (such as person days of effort).  It is intended purely as a relative measure by individual teams to compare and size stories, to measure their velocities and to see how they improve over time. In this regard, organizations have been advised not to compare velocities across teams, as a story point could mean different things to different teams.

Most managers and teams find this concept difficult to implement. They have all along been used to estimating any work to be done in terms of effort that it takes – in terms of days or hours – and not in terms of “what” a feature or work unit represents in terms of size. Their argument is “if I can estimate effort, why do you need size?” Fair argument, but there are two issues with this. First, estimating stories or features using effort estimates to any degree of accuracy is difficult at the beginning of a release (when most details are not known). Secondly, if I don’t estimate size, how do I know whether I am delivering more or less work, over a period of time? You may say number of features delivered from time to time but not all features are the same in terms of value or what they take to deliver.

One way teams have got around is by explicitly associating story points to effort – for example, some organizations have used one SP equals one day of effort. So they estimate story sizes based on effort during planning. If a story takes 3 days of effort overall, then it is 3 story points and so on. Once they size stories based on this, they have a planned velocity for an iteration. During execution, actual effort and actual velocity are tracked.  From here on, some mature teams use actual versus planned to drive improvements in estimation, and many teams also subsequently use velocity driven planning for making commitments (based on the concept of “yesterday’s weather” ).

There are some possible issues with this approach. The first thing is a fundamental one – should size estimates drive effort estimates or effort estimates drive size? While effort driving size defies traditional software engineering thinking and basics, I would actually not be worried about that. My response to this issue is another question – do you really need any effort estimates when you do a higher level plan (such as at a release level) and make higher level commitments? I don’t believe so and I would give one of my actual experiences a little later to support this viewpoint.

The second issue is a little more serious – whose estimate of effort do you use for sizing? A highly experienced developer or an inexperienced person or an average experienced one? How do you define an experienced or an average experienced developer? How do you reconcile the differences in estimates given by different people during the planning exercise?  How do you decide what commitments to make for an iteration – based on available capacity or past velocity? If commitments are based on available capacity, how do you know whether you are improving over time (assuming capacity does not change over time) since SPs committed will always match available capacity? Needless to add that with people changing over time and estimating size based on their own perception of effort, the same story could be estimated completely differently at different times by different people.

The big question is why are we doing this? The only reason organizations give is to arrive at a standard for sizing and estimation and for comparing productivity of teams. With so many variables involved (and we have not really talked about all of them here), would you get any meaningful comparison of productivity across teams? Such comparisons would also lead to changes in behavior that are not desirable –those that I have seen include a defensive approach by teams to estimating incorporating buffers, conflicts between POs and teams on estimates, conservative commitments, and inaccurate reporting of data by the team to name a few.

To me the focus of leadership and management should be to address more fundamental issues – those include providing the right environment, infrastructure and support for teams to succeed with Agile. If this is done right, outcomes will take care of themselves.

Let me end the blog with sharing my personal experience with coaching a product group several years ago. I was coaching two teams that belonged to this product group in India and there were two other teams based in the US working on the same product. I introduced the teams to relative size estimation and planning poker as an approach and we used this approach during their first release planning exercise. We used a reference story (with 3 points as the size) to do relative sizing of all stories for the release using Fibonacci.  After prioritization and dependency identification for stories, the teams decided on a velocity they believed they could accomplish and made a release commitment based on that. A similar process was followed for subsequent releases with the learning from the previous release used to drive velocity driven planning. The teams were able to commit to a higher velocity with every subsequent release based on their own experience and comfort level.

During the course of the entire 2 years during which 6 to 7 releases were made for the product, the teams used only velocity driven planning with no effort estimates involved whatsoever in sizing stories or making release commitments. Effort estimation was confined to iteration planning at a task level and there was never an intent to match story sizes to effort estimates.

At the end of the 2 year period, both teams in India were able to notice a 60-80% improvement in their velocities from when they started.  And all of this was done with only the teams driving their improvements. The management did not ask for metrics on productivity did not interfere with the team’s working style and did not impose any process (such as effort driven planning and commitments) – on the contrary they provided the teams with all the required support for them to succeed and the results took care of themselves.

What do you think?

Leave a Reply

What to read next