PEARL XIX : Effective Steps to reduce technical debt: An agile approach

PEARL XIX : Effective Steps to reduce technical debt: An agile approach

In every codebase, there are the dark corners and alleys developers fear. Code that’s impossibly brittle; code that bites back with regression bugs; code that when you attempt to follow, will drive you beyond chaos.

Ward Cunningham created a beautiful metaphor for the hard-to-change, error-prone parts of code when he likened it to financial debt. Technical debt prevents you from moving forward, from profiting, from staying “in the black.” As in the real world, there’s cheap debt, debt with an interest lower than you can make in a low-risk financial instrument. Then there’s the expensive stuff, the high-interest credit card fees that pile on even more debt.

The impact of accumulated technical debt can be decreased efficiency, increased cost, and extended delays in the maintenance of existing systems. This can directly jeopardize operations, undermining the stability and reliability of the business over time. It also can stymie the ability to innovate and grow

DB Systel, a subsidiary of Deutsche Bahn, is one of Germany’s leading information technology and communications providers, running approximately 500 high-availability business systems for its customers. In order to keep this complex environment—a mix of packaged and in-house–developed systems that range from mainframe to mobile—running efficiently while continuing to address the needs of its customers, DB Systel decided to embed processes and tools within its development and maintenance activities to actively address its technical debt.

DB Systel’s software developers have employed new tools during development so they can detect and correct errors more efficiently. Using a software analysis and measurement platform from CAST, DB Systel has been able to uncover architectural hot spots and transactions in its core systems that carry significant structural risk. DB Systel is now better able to track the nonfunctional quality characteristics of its systems and precisely measure changes in architecture- and code-level technical debt within these applications to prioritize the areas with highest impact.

By implementing this strategy at the architecture level, DB Systel has seen a reduction in time spent on error detection and an increased focus on leading-practice development techniques. The company also noticed a rise in employees’ intrinsic motivation as a result of using CAST. With an effective technical debt management process in place, DB Systel is mitigating the possibility of software deterioration while also enriching application quality.

Technical debt is a drag. It can kill productivity, making maintenance annoying, difficult, or, in some cases, impossible. Beyond the obvious economic downside, there’s a real psychological cost to technical debt. No developer enjoys sitting down to his computer in the morning knowing he’s about to face impossibly brittle, complicated source code. The frustration and helplessness thus engendered is often a root cause of more systemic problems, such as developer turnover— just one of the real economic costs of technical debt.

However, the consequences of failing to identify and measure technical debt can be significant. An application with a lot of technical debt may not be able to fulfill its business purpose and may never reach production. Or technical debt may require weeks or months of remedial refactoring before the application emerges into production. At best, it could reach production, but be limited in its ability to meet users’ needs.

Every codebase contains some measure of technical debt. One class of debt is fairly harmless: byzantine dependencies among bizarrely named types in stable, rarely modified recesses of  system. Another is sloppy code that is easily fixed on the spot, but often ignored in the rush to address higher-priority problems. There are many more examples.

This section outlines a general workflow and several tactics for dealing with the high-interest debt

In order to fix technical debt, team need to cultivate buy-in from stakeholders and teammates alike. To do this,they need to start thinking systemically. Systems thinking is long-range thinking. It is investment thinking. It’s the idea that effort you put in today will let you progress at a predictable and sustained pace in the future.

Technical debt (also known as design debt or code debt) is a neologism metaphor referring to the eventual consequences of poor software architecture and software development within a code-base. The debt can be thought of as work that needs to be done before a particular job can be considered complete. If the debt is not repaid, then it will keep on accumulating interest, making it hard to implement changes later on. Unaddressed technical debt increases software entropy.

As a change is started on a codebase, there is often the need to make other coordinated changes at the same time in other parts of the codebase or documentation. The other required, but uncompleted changes, are considered debt that must be paid at some point in the future. Just like financial debt, these uncompleted changes incur interest on top of interest, making it cumbersome to build a project. Although the term is used in software development primarily, it can also be applied to other professions.

Common causes of technical debt include (a combination of):

  • Business pressures, where the business considers getting something released sooner before all of the necessary changes are complete, builds up technical debt comprising those uncompleted changes
  • Lack of process or understanding, where businesses are blind to the concept of technical debt, and make decisions without considering the implications
  • Lack of building loosely coupled components, where functions are hard-coded, when business needs change, the software is inflexible.
  • Lack of test suite, which encourages quick and risky band-aids to fix bugs.
  • Lack of documentation, where code is created without necessary supporting documentation. That work to create the supporting documentation represents a debt that must be paid.
  • Lack of collaboration, where knowledge isn’t shared around the organization and business efficiency suffers, or junior developers are not properly mentored
  • Parallel development at the same time on two or more branches can cause the build up of technical debt because of the work that will eventually be required to merge the changes into a single source base. The more changes that are done in isolation, the more debt that is piled up.
  • Delayed refactoring – As the requirements for a project evolve, it may become clear that parts of the code have become unwieldy and must be refactored in order to support future requirements. The longer that refactoring is delayed, and the more code is written to use the current form, the more debt that piles up that must be paid at the time the refactoring is finally done.
  • Lack of knowledge, when the developer simply doesn’t know how to write elegant code.

“Interest payments” are both in the necessary local maintenance and the absence of maintenance by other users of the project. Ongoing development in the upstream project can increase the cost of “paying off the debt” in the future. One pays off the debt by simply completing the uncompleted work.

The build up of technical debt is a major cause for projects to miss deadlines. It is difficult to estimate exactly how much work is necessary to pay off the debt. For each change that is initiated, an uncertain amount of uncompleted work is committed to the project. The deadline is missed when the project realizes that there is more uncompleted work (debt) than there is time to complete it in. To have predictable release schedules, a development team should limit the amount of work in progress in order to keep the amount of uncompleted work (or debt) small at all times.

“As an evolving program is continually changed, its complexity, reflecting deteriorating structure, increases unless work is done to maintain or reduce it.”
— Meir Manny Lehman, 1980
While Manny Lehman’s Law already indicated that evolving programs continually add to their complexity and deteriorating structure unless work is done to maintain it, Ward Cunningham first drew the comparison between technical complexity and debt in a 1992 experience report:

“Shipping first time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite… The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organizations can be brought to a stand-still under the debt load of an unconsolidated implementation, object-oriented or otherwise.”
— Ward Cunningham, 1992
In his 2004 text, Refactoring to Patterns, Joshua Kerievsky presents a comparable argument concerning the costs associated with architectural negligence, which he describes as “design debt”.

“…doing things the quick and dirty way sets us up with a technical debt, which is similar to a financial debt. Like a financial debt, the technical  debt incurs interest payments, which come in  the form of the extra effort that we have to do in  future development because of the quick and dirty  design choice. We can choose to continue paying  the interest, or we can pay down the principal by  refactoring the quick and dirty design into the  better design. Although it costs to pay down the
principal, we gain by reduced interest payments  in the future.”

–Martin Fowler

Technical Debt” refers to delayed technical  work that is incurred when technical short  cuts are taken, usually in pursuit of calendar driven software schedules. Just like financial  debt, some technical debts can serve valuable  business purposes. Other technical debts are  simply counterproductive. The ability to take on
debt safely, track their debt, manage their debt, and pay down their debt varies among different  organizations. Explicit decision making before  taking on debt and more explicit tracking of debt  are advised.

–Steve McConnell

Activities that might be postponed include documentation, writing tests, attending to TODO comments and tackling compiler and static code analysis warnings. Other instances of technical debt include knowledge that isn’t shared around the organization and code that is too confusing to be modified easily.

In open source software, postponing sending local changes to the upstream project is a technical debt

 The basic workflow for tackling technical debt—indeed any kind of improvement—is repeatable. Essentially, there are four:

  1. Identify where are the debt. How much is each debt item affecting  company’s bottom line and team’s productivity?
  2. Build a business case and forge a consensus on priority with those affected by the debt, both team and stakeholders.
  3. Fix the debt  on the  chosen item, head on with proven tactics.
  4. Repeat. Go back to step 1 to identify additional debt and hold the line on the improvements  made.

Agile Approach to Technical Debt

Involve the Product Owner and “promote” him to be the sponsor of technical debt reduction.

Sometimes it’s hard to find debt, especially if a team is new to a codebase. In cases where there’s no collective memory or oral tradition to draw on, team can use a static analysis tool such as NDepend (ndepend.com) to probe the code for the more troublesome spots.

Determining test coverage  can be another valuable tool for discovering hidden debt.

Use the log feature of  version control system to generate a report of changes over the last month or two. Find the parts of the system that receive the most activity, changes or additions, and scrutinize them for technical debt. This will help to find the bottlenecks that are challenging  today; there’s very little value in fixing debt in those parts of your system that change rarely.

Inventory and structure known technical debt

Having convinced the product owner it is time to collect and inventory known technical problems and map them on a structure that visualizes the system and project landscape.

It is not about completely understanding all topics. It is about finding a proper structure, identifying the most important issues and mapping those onto the structure. It’s about extracting knowledge about the systems from the heads to develop a common picture of existing technical problems.

Therefore write the names / identifiers of all applications and modules individual own on cards. These cards shall be pinned on a whiteboard. In the next step extract to-do’s (to solve existing problems) from all documentation media used (wiki, jira, confluence, code documentation, paper), write them on post-its and stuck them next to the application name it belongs to.This board shall be accessible to all team members over a period of some days. Every team member was responsible to complete, restructure, and correct the board during this period so that they could go on with a round portfolio of the existing debt in the systems.

Having collected and understood the work to reduce the technical debt within the systems the team now need a baseline for defining a good strategy – a repayment plan. Therefore  Costs and benefits shall be estimated. 

Obtaining consensus is key. We want the majority of team members to support the current improvement initiative e selected.Luke Hohmann’s “Buy a Feature” approach from his book Innovation Games (innovationgames.com) will help to get consensus.

  1. Generate a short list (5-9 items) of things you want to improve. Ideally these items are in your short-term path.
  2. Qualify the items in terms of difficulty. we can use the abstract notion of a T-shirt size: small, medium, large or extra-large
  3. Give your features a price based on their size. For example, small items may cost $50, medium items $100, and so on.
  4. Give everyone a certain amount of money. The key here is to introduce scarcity into the game. You want people to have to pool their money to buy the features they’re interested in. You want to price, say, medium features at a cost where no one individual can buy them. It’s valuable to find where more than a single individual sees the priority since you’re trying to build consensus.
  5. Run a short game, perhaps 20 or 30 minutes in length, where people can discuss, collude, and pitch their case. This can be quite chaotic and also quite fun, and you’ll see where the seats of influence are in your team.
  6. Review the items that were bought and by what margins they were bought. You can choose to rank your list by the purchased features or, better yet, use the results of the Buy a Feature game in combination with other techniques, such as an awareness of the next release plan.

Taking on some judicious technical debt can be an appropriate decision to meet schedules or to prototype a new feature set, as long as the decision was made with a clear understanding of the costs involved later in the project, such as code refactoring.

As Martin Fowler says, “The useful distinction  isn’t between debt or non-debt, but between prudent and reckless  debt.”

techdebt-state.png

Technical debt actually begets more tech debt over time, and its state diagram is depicted.

Load Testing as a Practice to identify Technical Debt

Load testing exposes weaknesses in an application that cannot be found through traditional functional testing. Those weaknesses are generally reflected in the application’s inability to scale appropriately. Testers are also typically  already planning to perform load testing at some point prior to  the production release of the application.

Load testing involves enabling virtual users to execute predetermined actions simultaneously against the application. The  scripts exercise features either singly or in sequences expected  to be common among production users.

Load testing looks at the characteristics of an application under a simulated load, similar to the way it might operate in a production environment. At the highest level, it determines if an application will support the number of simultaneous users specified in the project requirements.

However, it does more than that. By looking at system characteristics as you increase the number of simultaneous users, you  can make some useful statements regarding what resources  are being stressed, and where in the application they are being stressed. With this information, the team can identify weaknesses in the application that are generally the result of incurring  technical debt, therefore providing the basis for identifying the  debt.

Some automation and measurement tools are required to successfully identify and assess technical debt with load testing.

Coding / Testing Practices

Management has to make the time  through proactive investment, but so does the team. Each team member needs to invest in their own knowledge and  education on how to write clean code, their business domain  and how to do their jobs optimally. While teams learn during  the project through retrospectives, design reviews and pair  programming, teams should learn agile engineering practices  for design, development and testing. Whether through courses,  conferences, user groups, podcasts, web sites or books – there  are many options for learning better coding practices to reduce technical debt

Design Principles and Techniques

Additionally, architects need to learn about evolutionary design principles and refactoring techniques for fixing poor designs today and building better designs tomorrow. Lastly, a governance group should meet periodically to review performance and plan future system
changes to further reduce technical debt.

Definition of Done

Establish a common “definition of done” for each requirement, user story or use case and ensure its validated with the business before development begins. A simple format  such as “this story is done when: <list of criteria>” works well. The Product Owner presents “done” to the Developers, User
Interface Designers, Testers and Analysts and together they collaboratively work out the finer implementation details. Set expectations with developers that only stories meeting “done” (as validated by the testers) will be accepted and contribute towards velocity. Similarly, set expectations with management and analysts that only stories that are “ready” are scheduled for development to ensure poor requirements don’t cause further technical debt.

In all popular languages and platforms today, open source and commercial tools are available to automate builds, the continuous integration of code changes, unit testing, acceptance testing, deployments, database setup, performance testing and many other common manual activities. In addition to reducing manual effort, automation reduces the risk of mistakes and over-reliance on one individual for performing critical activities. First setup automated builds ( Ant, nAnt or rake), followed by continuous integration ( Hudson). Next setup automated unit testing (JUnit, NUnit or RSpec) and acceptance testing ( FitNesse and Selenium). Finally setup automated deployments (r Capistrano or custom
shells scripts,). It’s amazing what a few focused team members can accomplish in a relatively short period of time if given time to focus on automating common activities to reduce technical debt.

Consider rating and rewarding developers on the quality of their code. In some cases, fewer skilled developers may be better than volumes of mediocre resources whose work may require downstream reversal of debt. Regularly run code complexity reviews and technical debt assessments, sharing the results across the team. Not only can specific examples help the team improve, but trends can signal that a project is headed in the wrong direction or encountering unexpected complexity.

Advertisements

PEARL VII : Agile metrics measurement for Software Quality

PEARL VII :

The metrics for measuring software quality in an agile environment are drastically different from that of those used in traditional IT landscapes.  How to effectively measure Agile S/w development  Quality in Agile Enterprise using Agile Metrics  is dealt here.

 

Agile software development grew in part from the intersection of  erstwhile alternative practices that predated it. And even more  derivatives had sprung up throughout the years. This had led to  the evolution of Agile in its current state. Wherein, all the defined  practices becomes a catalog from which development teams could choose from, and adapt what is appropriate for their case. This evolution then leads to the fact that development teams will  not easily have uniform processes. Forcing uniformity will most  likely negate the benefits of Agile. That said, most metrics are  intimately tied to actual practices, but measurements should be  able to cope with this perceived variability

The traditional metrics are also in conflict with agile’s and  lean’s principles. For example, a focus on adherence to  estimates is incompatible with agile’s principle of embracing  change. It will lead to chasing obstacles, instead of seizing  opportunities.

 In Agile Management for Software Engineering: Applying the Theory of Constraints for Business Results, David Anderson combines TOC with Agile software development, with the objective of creating a process that “scales in scope and discipline to be acceptable in the boardrooms of the Fortune 1000″ . Anderson compares traditional “waterfall”, FDD, XP and RAD approaches, and proposes a rigorous metrics approach. 

Anderson provides a convincing argument for the traditional metrics inability to measure agile software  development . By demonstrating how they violate Reinertsen’s criteria for a good metric. Traditional metrics do not meet the criterion of being relevant, because of the high cost focus. The cost should not be the  main concern.  Moreover, they  elude the requirements of being simple and easy to collect. For  example, the once popular traditional metric to count the lines  of code has no simple correlation with the actual effort. The  software complexity results in a nonlinear function between the effort and the lines of code. It also motivate to squeeze in

It is impossible to create a metric set that would suit all agile projects. Every project has different goals and needs, and, as  the incremental and emergent nature of agile methods
[Mnkandla and28 Dwolatzky, 2007] implies that metrics – as part of the framework of development technologies – should also be allowed to emerge. There are, however, methods for choosing suitable metrics.

When measuring the production side of the development it is important to select metrics that support and reflect the financial counterparts.  The most commonly recommended agile production metrics, which are described in the following sections.

Total project duration: Agile projects get done quicker than traditional projects. By starting development sooner and cutting out bloatware — unnecessary requirements — agile project teams can deliver products quicker. Measure total project duration to help demonstrate efficiency

Time to market: Time to market is the amount of time an agile project takes to provide value, either through internal use or by generating income, by releasing working products and features to users.

Time to market is especially important for companies with revenue-generating products, because it aids in budgeting throughout the year. It’s also very important if you have a self-funding project— a project being paid for by the income from the product.

Total project cost: Cost on agile projects is directly related to duration. Because agile projects are faster than traditional projects, they can also cost less. Organizations can use project cost metrics to plan budgets, determine return on investment, and know when to exercise capital redeployment.

Return on investment: Return on investment (ROI) is income generated by the product, less project costs: money in versus money out. On agile projects, ROI is fundamentally different than it is on traditional projects. Agile projects have the potential to generate income with the very first release and can increase revenue with each new release.

New requests within ROI budgets: Agile projects’ ability to quickly generate high ROI provides organizations with a unique way to fund additional product development. New product features may translate to higher product income. If a project is already generating income, it can make sense for an organization to roll that income back into new development and see higher revenue.

Capital redeployment: On an agile project, when the cost of future development is higher than the value of that future development, it’s time for the project to end. The organization may then use the remaining budget from the old project to start a new, more valuable project.

Team member turnover: Agile projects tend to have higher morale. One way of quantifying morale is by measuring turnover through a couple metrics:

  • Scrum team turnover: Low scrum team turnover can be one sign of a healthy team environment. High scrum team turnover can indicate problems with the project, the organization, the work, individual scrum team members, burnout, ineffective product owners forcing development team commitments, personality incompatibility, a scrum master who fails to remove impediments, or overall team dynamics.

  • Company turnover: High company turnover, even if it doesn’t include the scrum team, can affect morale and effectiveness. High company turnover can be a sign of problems within the organization. As a company adopts agile practices, it may see turnover decrease.

Requirements and design quantification

Tom Gilb strongly believes that quantification of requirements is an essential concept missing from the agile paradigm, or even from software engineering in general. He claims that this lack is a risk for project failure, as software engineers and project managers cannot properly manage project results, control risks and costs, or prioritize tasks [Gilb and Cockburn, 2008]. Especially important are quality characteristics, because “functions and use cases are far less interesting” [Gilb and Cockburn, 2008] and “that is where most people have problems with quantification” [Gilb and Brodie, 2007]. Gilb assures that only numerically expressed quality goals are clear enough and therefore quantification is a step needed on the way from high level requirements to design ideas.
Design ideas also should be quantified [Gilb and Brodie, 2007]. It is done by the estimation of their value (to the customer – business value, not technical) and cost (effort). Then it is possible to identify the best designs – the ones with the highest value-to-cost ratio – and reprioritize the following development steps. A similar idea is described by Kile and Inampudi [2007]. The value and cost for implementing a requirement is estimated (without considering different design ideas) and the requirements are prioritized according to the result. This was a solution to a problem with the requirements prioritization ability of the customer and the development team. With some requirements having high value and high cost, and others having moderate value but very low cost, it was not easy to compare their desirability without quantification.

Metrics can also aid the application of agile practices. For example, in refactoring they can give information on the appropriate time for and the significance of a refactoring step [Kunz et al., 2008].

Heuristics for wise agile measurement [Hartmann and Dymond, 2006]

A good metric or diagnostic:

  1. Affirms and reinforces Lean and Agile principles.
  2. Measures outcome, not output.
  3. Follows trends, not numbers.
  4. Belongs to a small set of metrics and diagnostics.
  5. Is easy to collect.
  6. Reveals, rather than conceals, its context and significant variables.
  7. Provides fuel for meaningful conversation.
  8. Provides feedback on a frequent and regular basis.
  9. May measure Value (Product) or Process.
  10. Encourages “good-enough” quality.

Leffingwell argues that measurement should happen during iteration retrospectives and release retrospectives.

He proposes two categories of metrics: quantitative and qualitative. Quantitative metrics for an iteration consist of process metrics,  such  as  the  number  of  user   stories   accepted / rejected / rescheduled / added, and product metrics related to quality (defect count) and testing (number of test cases, percentage of automated test cases, unit test coverage).

Quantitative metrics for a release measure the release progress with value delivery (number of features delivered to the customer and their total value expressed in feature points, feature debt – existing customer commitments), conformance to release date, and technical debt (number of refactoring targets and number of refactorings completed, also called architectural debt or design debt). The qualitative assessment for both iteration and release requires listing what went well and what did not, revealing what should be continued and what needs to be improved.

Kunz et al. [2008] propose to combine software measurement with refactoring in order to indicate when a refactoring step is necessary, how important it is, how it affects quality, and what side effects it has. Metrics would be used here as triggers for needed refactoring steps.

Categories of Metrics

Quality

  • Defect Count
  • Technical Debt
  • Faults-Slip-Through
  • Sprint Goal success rate

Predictability

  • Velocity
  • Running Automated Tests

Value

  • Customer Satisfaction Survey
  • Business Value Delivered

Lean

  • Lead Time
  • Work In Progress
  • Queues

Cost

  • Average Cost Per Functions

Commonly recommended agile production metrics.

Lean Metrics

The selection of production metrics must carefully consider what has been advised in the previous sections. Inventory based metrics possess all these characteristics and give the advantage of addressing the importance of flow,  The most significant inventory based metrics are summarized below

Lead time – Relates to the financial metric Throughput. The lead time should be as short and stable as possible. It reduces the risk that the requirements become outdated and provides predictability. The metric is supported by Poppendieck, who states that the most important to measure is the ―concept-to-cash -time together with financial metrics .

 Queues – In software development queue time is a large part of the lead time. In contrast to the lead time, queue metrics are leading indicators. Large queues indicate that the future lead time will be long, which enables preventive actions. By calculating the cost of delay of the items in the queues, precedence can be given to the most urgent ones.

 Work in Progress – Constraining the WIP in different phases is one of the best ways to prevent large queues. If used in combination with queue metrics, WIP constraints prevent dysfunctional behavior such as simply renaming the objects in queues to work in progress. The metric is also an indicator of how well the team collaborates . A low WIP shows that the team works together on the same tasks. In addition, the Kanban method, which is built around the idea of constraining the WIP promises that it will result in an overall better software development,

These metrics can be visualized in a cumulative flow diagram, By tracking the investment’s way along the value chain towards becoming throughput, the inventory based metrics correlates well with the financial metric Investment.

Cost Metrics

Anderson argues the only cost metric needed is Average Cost Per Function (ACPF) and should only be used to estimate future operation expenses

Business Value Metrics

Agile software development puts the focus on the delivery of business value. Methods such as Scrum prioritize the work by value, making it sensible to measure the business value. It has also been observed that the trend in the industry is to measure value .

Hartmann notes that agile methods encourage the development to be responsible for delivering value rapidly and that the core metric should oversee this accountability. The quick delivery of value means that the investment is converted into value producing software as soon as possible. Leading metrics of business value includes estimations and is not an exact science.

Mike Cohn offers a possible solution to measure the business value , which involves dividing the business case’s value between the tasks. The delivery of value can be displayed in a Business Value Burnup Chart, One way to verify the delivery of business value, is to ask the customer if the features are actually used . It has proved useful to survey the customer over the time of a release and is much in line with the agile principle of customer cooperation.

 Quality Metrics

Lean metrics can indicate the products’ quality and provide predictability. For example, large queues in the implementation phase indicate poor quality and a stable lead time contributes to predictability. However, it might be necessary to supplement and balance them with more specific metrics.

A quality metric recommended by the agile community is Technical Debt .

Technical debt is a metaphor referring to the consequences of taking shortcuts in the software development. For example, code written in haste that is in need of refactoring. The debt can be represented in financial figures, which makes the metric suitable to communicate to upper management .

Technical metric measures the team’s overall technical indebtedness; known problems and issues being delivered at the end of the sprint. This is usually counted using bugs but could also be deliverables such as training material, user documentation, delivery media, and other

The counting of defects can be used as a quality metric. The defect count may occur in various stages of the development.

Counting defects in individual iterations can have a fairly large variation and may paint a misleading picture . Another aspect of defects is where they have been introduced. The fault-slips-through metric measures the test efficiency by where the defects should have been found and where it actually was . It monitors how well the test process works and addresses the cost savings of finding defects early. In case studies on implementation of lean metrics, the faults-slip-through has been recommended as the quality metric of choice

The primary purpose of measuring Faults Slip Through is to make sure that  the test process finds the right faults in the right phase, i.e. commonly earlier. Fault Slip Through represents the number of faults not  detected in a certain activity. These faults have instead been  detected in a later activity

Sprint goal success rates: A successful sprint should have a working product feature that fulfills the sprint goals and meets the scrum team’s definition of done: developed, tested, integrated, and documented.

Throughout the project, the scrum team can track how frequently it succeeds in reaching the sprint goals and use success rates to see whether the team is maturing or needs to correct its course.

EVM Metrics

AgileEVMTerms

AgileEVMTerms

AgileEVMMetrics

AgileEVMMetrics

 Predictability Metrics

What many organizations hope to gain from the measurement is predictability . In several of the agile methods the velocity of delivered requirements is used to achieve predictability and estimate the delivery capacity. The average velocity can serve as a good predictability metric, but can easily be gamed if used for other purposes. For example, Velocity used to measure productivity can degrade the quality.

Velocity is a capacity planning tool sometimes used in Agile software development. Velocity tracking is the act of measuring said velocity. The velocity is calculated by counting the number of units of work completed in a certain interval, the length of which is determined at the start of the project

The main idea behind velocity is to help teams estimate how much work they can complete in a given time period based on how quickly similar work was previously completed.

The following terminology is used in velocity tracking.

Unit of work
The unit chosen by the team to measure velocity. This can either be a real unit like hours or days or an abstract unit like story points or ideal days. Each task in the software development process should then be valued in terms of the chosen unit.
Interval
The interval is the duration of each iteration in the software development process for which the velocity is measured. The length of an interval is determined by the team. Most often, the interval is a week, but it can be as long as a month.

To calculate velocity, a team first has to determine how many units of work each task is worth and the length of each interval. During development, the team has to keep track of completed tasks and, at the end of the interval, count the number of units of work completed during the interval. The team then writes down the calculated velocity in a chart or on a graph.

The first week provides little value, but is essential to provide a basis for comparison.Each week after that, the velocity tracking will provide better information as the team provides better estimates and becomes more used to the methodology.

Velocity chart

Velocity chart

ss_iteration_burnup_2

The Velocity Chart shows the amount of value delivered in each sprint, enabling you to predict the amount of work the team can get done in future sprints. It is useful during your sprint planning meetings, to help you decide how much work you can feasibly commit to.

You can estimate your team’s velocity based on the total Estimate (for all completed stories) for each recent sprint. This isn’t an exact science — looking at several sprints will help you to get a feel for the trend. For each sprint, the Velocity Chart shows the sum of the Estimates for complete and incomplete stories. Estimates can be based on story points, business value, hours, issue count, or any numeric field of your choice . Please note that the values for each issue are recorded at the time the sprint is started.

 Running Automated Tests measures the productivity by the size of the product . It counts test points defined as each step in every running automated test. The belief is that the number of tests written is better in proportion to the requirement’s size, than the traditional lines-of-code metric. The metric addresses the risk of neglected testing, which is usually associated with productivity metrics. It motivates to write tests and to design smaller, more adaptive tests. Moreover, it has proven to be a good indicator of the complexity and to some extent on the quality . For measuring release predictability, Dean Leffingwell proposes to measure the projected value of each feature relative to the actual . However, the goal should not be to achieve total adherence. Instead, the objective should be to stay within a range of compliance to plan, which allows for both predictability and capturing of opportunities.

Actual Stories Completed vs. Committed Stories

The measure is taken by comparing the number of stories committed to in sprint planning and the number of stories identified in the sprint review as completed.

Visualization

To get the full value of agile measurement, the metrics need to be acted upon. The visualization of the metrics helps to ensure that actions are taken and achieves transparency in the organization . The company’s strategies become communicated and the coordination increases.

Spider Chart

Spider Chart

In Kanban the visualization of the workflow is an important activity and facilitates self-organizational behavior . For example, when a bottleneck is shown the employees tend to work together to elevate the bottleneck. Both Kanban and Scrum use card walls to visualize the work flow where each card represents a task and its current location in the value chain. The inventory based metrics can then be collected using the card walls. A very effective way to visualize the inventory based metrics is cumulative flow diagrams .

The cumulative flow diagram is an area graph, which shows the workflow on a daily basis.

Cumulative Flow Diagram

Are you a manager or business stakeholder working on an Agile points are the unit of measure used by this team for estimation Scrum Project and facing the following issues?

You have a strong feeling that there are bottlenecks in the process but are facing a lot of difficulty in mapping it to the process.

You are not satisfied with the burn-up and burn-down charts produced by the team and are interested in getting more insight into the process.
There is a panacea to solve all of the above issues. The name of the magic potion is “cumulative flow diagram”. Cumulative flow diagrams (CFDs) applied on top of the basic principles of KAN (visual) + Ban (card) can give you an insight into the project and keep everyone updated. CFD can be an extremely powerful tool when applied to a Scrum model.

A Cumulative Flow Diagram (CFD) is an area chart that shows the various statuses of work items for a product, version, or sprint. The horizontal x-axis in a CFD indicates time, and the vertical y-axis indicates cards (issues). Each coloured area of the chart equates to a workflow status (i.e. a column on your board).

A CFD can be useful for identifying bottlenecks. If your chart contains an area that is widening vertically over time, the column that equates to the widening area will generally be a bottleneck.

Multi-color CFDS look complicated but pretty. The pain of understanding it is worth the gain you get from it. These diagrams can help you in making critical business decisions. They will help give you better visibility regarding the time to market dates for features. Applying it on top of a Scrum project will help you see an accurate picture of the progress on your project.
Giving CFD to this team will help them in following the “Inspect and Adapt” Scrum principle. They can further zoom into the work in progress to see various flow states. CFD will help you in analyzing the actual progress and bottlenecks in any project. CFD can be drawn using area charts in MS Excel.

Workflow for CFD

Workflow for CFD

Cumulative Flow Diagram

Cumulative Flow Diagram

There are many other usages of CFDs besides finding bottlenecks. CFDs are multi-utility graphs that continuously report the true status of an Agile project. CFDs can help in determining lead time, cycle time, size of backlog, WIP, and bottlenecks at any point in time
Lead time is the time from when the feature entered  backlog to its completion status. This is of utmost interest to business stakeholders. It can help business people decide about the time  to market for features. They can plan marketing campaigns based  on lead times.
Cycle time is the time taken by team from when they started work  on it to completion status. This helps the project leads make important decisions in selecting items for working.
WIP is the work currently lying in different stages of the software lifecycle. Cycle time is directly proportional to the work in progress items. Keeping a limit on WIP is a key to success in any Agile project. In Scrum we try to limit the WIP within a sprint, but limiting
it across the various flow states will help further in gaining better control and visibility in a sprint.
Controlling WIP is the mantra for victory in any project. With CFDs, WIP is no longer a black box and anyone can see the work distribution at any point in time. Thus CFDs provide better insight and the power for better governance in any Agile methodology.A single diagram can contain information about lead time, WIP, queues and bottlenecks.

In Scrum, the Burndown Chart is a standard artifact. It allows the teams to monitor its progress and trends. The Burndown Chart tracks completed stories and the estimated remaining work. There are also variations of the Burndown Chart . For example, the Burnup Chart contains information about scope changes. For even better predictability, story points may be used. The stories are assigned points by the estimated effort to implement them.

A Burndown Chart shows the actual and estimated amount of work to be done in a sprint. The horizontal x-axis in a Burndown Chart indicates time, and the vertical y-axis indicates cards (issues).

Iteration Burn Down chart

Iteration Burn Down chart

Use a Burndown Chart to track the total work remaining and to project the likelihood of achieving the sprint goal. By tracking the remaining work throughout the iteration, a team can manage its progress and respond accordingly.

To communicate the KPIs, many organizations use Balanced Scorecards or Dashboards Dashboards are used to effectively monitor, analyze and manage the organization’s performance . The level of detail of the dashboards varies, ranging from graphical high-level KPIs to low-level data for root cause analysis. In order to communicate and facilitate that metrics are acted upon the measurement practice should: Visualize the metrics to achieve transparency. Be careful to not create dysfunctional behavior with the visualization.

Continuous Improvement

Kaizen is the Japanese word for continuous improvement and is a part of the lean software development . It is also found in agile software development. For example, Scrum has retrospectives after each sprint where improvements are identified. The retrospectives have similarities to Deming’s Plan-Do- Check-Act (PDCA) .

The PDCA is a cycle of four phases, which should drive the continuous improvement. What is notable is that the PDCA prescribe measurement to verify that improvements are achieved. Petri Heiramo observes that the retrospectives lack measurements and argues that it can lead to undesirable results . Without any metrics, it will be difficult to determine whether any targets have been met. This in turn can be demoralizing for the commitment to the improvement efforts.

Heiramo, suggest that these three questions should be added to the retrospective: What benefit or outcome do we expect out of this improvement/change? How do we measure it? Who is responsible for measuring it? Diagnostics can be used to obtain these measurements, In order for the diagnostics to achieve process improvement, the measurement practice should Be an integrated part of a process improvement framework.

Agile Metrics at  MAMDAS – a software development unit in the Israeli Air Force 

MAMDAS – a software development unit in the Israeli Air Force develops large scale , enterprise critical applications for Israeli Airforce. The project is developed by a team of 60 skilled developers and testers, organized in a hierarchical structure of small groups.
The project develops large-scale, enterprise-critical software, intended to be used by a large and varied user population. During December 2004, the first XP team was established for MAMDAS to implement XP . It was encouraging to observe that after the first two weeks iteration “managers were very surprised to see something running” and everyone agreed that “the pressure to deliver every two weeks leads to amazing results”  Still, accurate metrics are required in order to take professional decisions, to analyze long-term effects, and to increase confidence of all management levels with respect to the process that XP inspires.

They described four metrics and the kinds of data that are gathered to calculate them. These four metrics present information about the amount and quality of work that is performed, about the pace of the work progresses, and about the status of the remaining
work versus remaining human resources.

Product size, initially just called ‘Product’, is the first metrics. It aims at presenting the amount of completed work. The data that was selected to reflect the amount of work is the number of test points. One test point is defined as one test step in an automatic acceptance testing scenario  or as one line of unit tests. The number of test points is calculated for all kinds of written test and is gathered per iteration per component. Additional information is gathered with respect to the number of test points for tests that pass, the number of points for tests that fail, and the number of points for tests that do not run at all.

Pulse is the second metrics, which aims to measure how continuous the integration is. The data is automatically gathered from the development environment by counting how many check-in operations occur per day. The data is gathered for code check-ins, automatic-test check-ins, and detailed specifications check-ins. When referring to code it means code plus its unit test.

Burn-down is the third metrics. It presents the project remaining work versus the remaining human resources. This metrics is supported by the main planning table that is updated for each task according to kinds of activities (code, tests, or detailed specifications), dates of opening and closing, estimate and real time of development and, the component that it belongs to. In addition, this metrics is supported with the human resources table that is updated when new information regarding teammates’ absence arrives. This table also contains the product’s component assigned to each of the teammates and with the percentages of her/his position in the project. By using the data of these tables, this metrics can present the remaining work in days versus the remaining human resources in days. This information can be presented per week or for any group of weeks till a complete release, both for the entire team or for any specific component.

The burn-down graph answers a very basic managerial question: are we going to meet the goals of this release, and if not, what can we do about it? Release goals were set before each release – each goal is a high-level feature. Goals are defined by the user, and are verified by matching a rough estimate of the effort required to complete each goal (given by the development team) to the total available resources.
Once goals are defined and estimated, both remaining work and remaining resources are based on this initial estimation, which is refined as the release progresses.

Faults is the fourth metrics, which counts faults per iteration. During the release on which  all faults that were discovered in a specific iteration were fixed at the beginning of the next iteration. The faults metrics is required to continuously metrics the product’s quality. Note that the product size metrics doesn’t do it, since although it metrics test points, it does not correlate between the number of failed or un-run test steps to the number of actual bugs.The feedback on the Size Metrics was that it motivates writing tests and that it can be referred also as complexity metrics.

The use of the presented metrics mechanism increases confidence of the team members as well of the unit’s management with respect to using agile methods.Further, these metrics enable an accurate and professional decision making process for both short- and long-term purposes.

Technical Debt in Petrobras , Brazil

As an oil and gas company, Petrobras (http://www.petrobras.com.br) develops software in areas which demands increasingly innovative solutions in short time intervals. The company started officially with Scrum in March of 2009, using its lightweight framework to create collaborative self-organizing teams that could effectively deliver products. After the first team had adopted Scrum with a relative success, the manager noticed that the framework could be used in other teams, and thus he invested in training and coaching so that the teams could also have the opportunity to try the methodology. At that time, only the software development department for E&P, whose software helps to Exploit and Produce oil and gas, had management endorsement in adopting scrum and agile practices that would let teams deliver better products faster. About one year and half later, all teams in the department were using Scrum as its software development process. The developers and the stakeholders in general noticed expressive gains with the adoption of Scrum.
The results had varied from the skill of the team leadership in agile methodologies, customer participation, level of collaboration between team members, technical expertise among other factors.
The architecture team of the software development department for E&P was composed of four employees, whose responsibility was to help teams and offer support for resolution of problems related to agile methods and architecture. At that time, it had to work with 25 teams which had autonomy regarding its technical decisions. In fact, autonomy was one of the main managerial concerns when adopting agile methods.
After Scrum adoption, there was active debate, training and architectural meetings about whether Agile engineering practices should also be adopted in parallel with managerial practices; in hindsight, it would have accelerated the benefits had they been adopted. But the constraints of time and budget, decisions made by non- technical staff, and the bureaucracy in areas such as infrastructure and database, led to the postponement of those efforts initially. Moreover, the infrastructure area had only build and continuous integration (CI) tools available. And, unfortunately, these tools were not taken seriously by the teams. The automated deployment was relatively new and was postponed because of fear of implementing it in immature phase. Other tools and monitoring mechanisms were not used by the teams even so the architecture team was aware of its possible benefits.
Despite all initiatives in training and supporting in agile practices such as configuration management, automated tests and code analysis, teams, represented by 25 focal points in architectural meetings, did not show much interest in adopting many agile practices – particularly technical practices. Delivering the product on the date agreed with the customer and maintaining the legacy code were the most urgent issues. Analyzing retrospectively it seems that the main cause for this situation was that debt was getting accrued unconsciously. Serving the client was a much more visible and imperative goal. This can be one explanation for the ineffectiveness of prior attempts in introducing technical practices. Be it by the means of specialized training or by the support of the architecture team.
With these not so effective attempts to promote continuous improvement with teams, the architecture team sought a way to motivate them to experiment agile practices without a top-down “forced adoption”. The technical debt metaphor,  was the basis for the approach.

Given the context aforementioned, especially regarding the role of the architecture team serving various teams in parallel and the fact that the teams have autonomy in its technical decisions, there was a need for the technical debt estimation and visualization in a “macro-level”, i.e., not only associated with source code aspects but
with technical practices involving the product in general. This would give the opportunity to see the actual state of the department and indicate the “roadmap” for future interaction with the development teams.
The actions involved in introducing this kind of visualization and the management activities based on that visualization. The architecture team modeled a board, where the lines corresponds to teams and the columns are the categories and subcategories of technical debt, based on the work of Chris Sterling .In each cell, formed by the pair team x technical debt category, the maturity of the team was evaluated according to predefined criteria. They used the colors red, yellow and green to show the compliance level of each criterion.

After the design of the technical debt board, each team was invited to a rapid meeting in front of the board, where all team members talked about the status of each criterion, translating it to the respective color. During these meeting the teams could also conclude that some categories were not relevant or applicable for their systems.
This meeting should happen every month, so that the progress of each category could
be updated. At the end of the meeting, the team members agreed which of the categories would be the aim for the next meeting or, to put it another way, where they would invest their efforts in reducing the technical debt.

To measure the technical debt at source code level, the architecture team has made use of the tool Sonar (http://www.sonarsource.org/) . Sonar has a plugin that allows estimating how much effort would be required to fix each debt of the project. Sonar considers as debts: cohesion and complexity metrics, duplications, lack of comments, coding rules violation, potential bugs and no unit tests or useless ones.  The important aspect is that an estimative is calculated, and Sonar shows the results financially and the effort in man days necessary to take the debt to zero (the daily rate of the developer in the context of the project must be informed).
It is important to mention that Sonar, in fact, use many other tools internally to analyze the source code – each one for different aspects of the analysis. It works as an aggregator to display results of other tools such as PMD, Findbugs, Cobertura and Checkstyle among others.

The teams could make the debt rise during a whole month without even knowing about it.
To address this situation, the architecture team created a virtual tiled board, where each tile had information about the build state of each team in the department. The major information was the actual state of the build and the project name. If everything was ok (compilation and automated tests), the tile is green , if the compilation was broken, the tile turns red  and if there were failed tests, the tile turns yellow . Besides the build information, there is other information: total number of tests, number of failed tests, test coverage, number of lines and technical debt (calculated in Sonar).

The virtual tiled board was placed in a big screen in a place where everybody in the room could see it from their workplaces. The main objective was that when the team members saw their failed build and that instant feedback would lead them to make corrective actions so the build could go green again.
As the mechanisms of feedback were implemented, the teams had instant information
about what should be done to lower the levels of technical debt. With this information, they could prioritize which categories they would try to improve in the next month. If the team had some difficulties addressing any of the categories, they could call upon the architecture team support.