PEARL VIII : Agile Test Automation : Application of agile development principles to Test Automation

PEARL VIII :  Agile Test Automation : Application of agile development principles to Test Automation

Introduction

Today, more and more software development companies are shifting to an agile development approach from traditional waterfall development to keep pace with the current trends.To meet the demands of agile environment, several companies have adopted test automation which resolves some of the issues faced in manual testing and delivers faster results.

Even though test automation provides assistance in many ways, there are several challenges in test automation which can turn out to be a nightmare if it is adopted without appropriate brainstorming and analysis. Here’s a complete list of approaches to make test automation work successfully in agile environment.

The Development methodology agile team follows when creating automated tests is very similar to the process they follow for creating the software being tested. It involves a fair bit of design, coding and testing of its own to get it working correctly. So just like the application itself, automated tests are best developed incrementally – adding new tests and features to the automation framework over several different sprints. It’s important not to aim for the perfect “it-can-do-everything” test framework right at the start, as it would never materialize. Balance the cost vs. ROI well and come up with a bare minimum working solution at the start.

As more companies move from a traditional waterfall software development approach to an agile Methodology, suppliers are offering more test automation tools and services. Automated tests can provide faster feedback than a manual test, reducing rework and long feedback cycles.

Manual testing, particularly manual exploratory testing, is still important. However,  Agile teams typically find that the fast feedback afforded by automated regression is  a key to detecting problems quickly, thus reducing risk and rework.

Research firm Forrester expects testing-as-a-service (TaaS) to emerge as a managed test service from an increase in test automation, as large suppliers such as HP seek to support customers with test automation tools and outsourcing options, including security testing and services for SAP. Better test automation is intrinsic to agile development

Working tests, just like working software, are useful; they build confidence and get everyone excited about the progress being made. Successes, even small ones, make it easier to bring everyone on-board – especially when the test automation solution team have created actually runs and proves to be of real value to the team.

Agile Test team can’t set up automated testing tools without team members who can write code.  Automating tests is software development. It must be done with the same care and thought that goes into writing production code.This poses a significant hurdle for testers who lack coding skills. That challenge is best met by mastering those skills, not by shying away from projects that demand them

Better software is the result of running the right tests and continually re-evaluating which tests are the right ones,

Automated unit tests check the behavior of individual functions/methods and object interactions. They’re to run often, and provide feedback in minutes. Automated acceptance tests usually check the behavior of the system end-to-end. (Although,  sometimes they bypass the GUI, checking the underlying business logic.) They’re  typically run on checked in code on an ongoing basis, providing feedback in an hour  or so. Agile projects favor automated tests because of the rapid feedback they provide.

Agile Testing Quadrant

TheAgileTestingQuadrants.png

In the agile testing quadrant depicted above, The order in which quadrants numbered  has no relationship to when the different types of testing are done. Most projects would start with Q2 tests, because those are where you get the examples that turn into specifications and tests that drive coding, along with prototypes and the like. The quadrants are a taxonomy to help teams plan their testing and make sure they have all the resources they need to accomplish it.The timing of the various types of tests depends on the risks of each project, the customers’ goals for the product, whether the team is working with legacy code or on a greenfield project, and when resources are available to do the testing.

The lower left quadrant represents test-driven development, which is a core agile development practice. The quadrants on the left include tests that support the team as it develops the product.The testing done in Quadrants 1 and 2 are more requirements specification and design aids than what we typically think of as testing.

Unit tests verify functionality of a small subset of the system, such as an object or method. Component tests verify the behavior of a larger part of the system, such as a group of classes that provide some service [Meszaros, 2007]. Both types of tests are usually automated with a member of the xUnit family of test automation tools. We refer to these tests as programmer tests, developerfacing tests, or technology-facing tests. They enable the programmers to measure what Kent Beck has called the internal quality of their code [Beck, 1999].
A major purpose of Quadrant 1 tests is test-driven development (TDD) or test-driven design. The process of writing tests first helps programmers design their code well. These tests let the programmers confidently write code to deliver a story’s features without worrying about making unintended changes to the system. They can verify that their design and architecture decisions are appropriate. Unit and component tests are automated and written in the same programming language as the application. A business expert probably couldn’t understand them by reading them directly, but these tests aren’t intended for customer use. In fact, internal quality isn’t negotiated with the customer; it’s defined by the programmers. Programmer tests are normally part of an automated process that runs with every code check-in, giving the team instant, continual feedback about their internal quality

The tests in Quadrant 2 also support the work of the development team, but at a higher level. These business-facing tests, also called customer-facing tests and customer tests, define external quality and the features that the customers want.

They describe the details of each story. Business-facing tests run at a functional level, each one verifying a business satisfaction condition. They’re written in a way business experts can easily understand using the business domain language. In fact, the business experts use these tests to define the external quality of the product and usually help to write them.

One of the most important purposes of tests in these two quadrants (Q1 and Q2) is to provide information quickly and enable fast troubleshooting. They must be run frequently in order to give the team early feedback in case any behavior changes unexpectedly. All of these tests should be run as part of an automated continuous integration, build, and test process.

Technology-facing tests in Quadrant 4 are intended to critique product characteristics such as performance, robustness, and security. The types of tests that fall into the fourth quadrant are just as critical to agile development as to any type of software development. These tests are technology-facing. Creating and running these tests might require the use of specialized tools and additional expertise.

Quadrant 4 Automation tools

Native database tools

  •  SQL, data import tools
  • Shell scripting 
  • Monitoring tools examples
    • jConsole
      • Application bottlenecks, memory leaks
    • jProfiler
      • Database and bean usage
  • Commercial load test tools
    • Loadrunner
    • Silk Performer
  • Open source test tools
    • jMeter
    • The Grinder
    • jUnitPerf
  • Performance test providers
  • Multiple sites

The short iterations of agile development give your team a chance to learn and experiment with the different testing quadrants

Test Automation Backlog

Maintain a test automation backlog for the project that contains all needed automation tasks and identified improvements. Then target a few items from the backlog every sprint, in no time team will start to see the new regression test suite taking shape. Occasionally, stories from the test automation backlog may require dedicated developer time to implement and consequently some buy-in from the product owner in order to proceed. However, it should not be difficult to convince the product owner about the value of such stories if everyone on team is committed to quality. A test automation backlog could contain a prioritized list of items such as:

  • Parameterize the test environment for test execution.
  • Integrate with Continuous Integration.
  • Enhance reporting mechanism.
  • Provide an option to attach error logs in notification emails.
  • Collect performance metrics for workflow scenarios.
  • Add tests to check for concurrent execution of critical test cases.

Agile test automation pyramid

TA-pyramid 2 Figure 1

In the traditional Test Automation Pyramid view, most if not all of the effort was in developing UI-centric functional tests that explored the application via the GUI. There might be some lower-level tests and a few unit tests, but teams mostly staid at the upper tier.

The agile test automation pyramid is a strategy that attempts to alter the Test automation pyramid and provide the Agile Test automation pyramid depicted in the figure above.

Agile testing relies more on automation. It requires a much greater contribution from developers. And it has a different basic philosophy – to prevent bugs.

The first change is taking a whole-team view. Instead of the testers being responsible for testing AND writing all of the test automation, it becomes a whole-team responsibility. The developers take most of the ownership for unit-level automation, but testers can operate here as well.

At the base of the test automation pyramid is unit testing. Unit testing should be the foundation of a solid test automation strategy and as such represents the largest part of the pyramid.(50 % to 60 % – Unit Tests)

The upper tier focuses on limited GUI-based automation tests. Usually, these are longer running, core customer usage workflows that are best implemented at this level.

Automated user interface testing (0% to 10%) is placed at the top of the test automation pyramid because we want to do as little of it as possible.

The testers typically operate on these, but there are very few tests within this tier. And remember that the developers can operate here as well. The two layers are met by middle-tier automation. This is often the domain of the open source ATDD/BDD tools, such as: FitNesse, Cucumber, JBehave, Robot Framework, and others. The middle tier is the Acceptance Test tier / API Layer tests are performed.

One key thing to note is that traditional automation was often a one-tool operation, with Mercury/HP commercial tooling (Quick Test Professional or QTP) leading that space.  The agile approach is tool agnostic, but also aggregates tools that are appropriate for each layer. Therefor no “one size fits all” thinking is allowed. For example, these are common used tools at each tier:

  1. UI tier:  Selenium, Watir, or traditional QTP
  2. Middle tier:  FitNesse, Robot Framework, and Cucumber
  3. Unit tier:  xUnit family variants for example JUnit or NUnit for Java and .Net respectively

The other consideration is that there are extensions to many of these. For example, both Robot Framework and Cucumber have Selenium plug-ins so that they can ‘drive’ the UI as well as the middle tier. This implies that the middle tier tooling, and automated tests for that matter, can extend or blend into the lower and upper tiers.

Rather than investing in extensive, heavyweight step-by-step manual test scripts in  Word or a test management tool, we capture expectations in a format supported by  automated test frameworks like FIT/Fitnesse. The test could be executed manually,  but more importantly that same test artifact becomes an automated test when the  programmers write a fixture to connect the test to the software under test.

Test-driven development (TDD)

One key benefit of agile software development is the ability to make changes to the code quickly and easily. As the team embraces changing requirements as a part of a living, breathing system, they need to know that their changes will not cause a domino effect of broken code. Ideally, the design of the system will be so simple and well isolated that any possible effects of a change will be readily apparent. In the real world, this is not always the case, especially when working with legacy systems. In this case, something must be implemented to ensure that if a change affects other areas of the system, there will be an immediate red-flag, rather than having to chase down the issues later.

After experimenting with different ways to enable this flexibility, agile teams hit upon the practice of Test Driven Development, which involves producing automated unit tests for production code before you write that production code. Instead of writing tests afterward (or, more typically, never writing those tests), you always begin with a unit test. For every small chunk of functionality in production code, you first build and run a small, focused test that specifies and validates what the code will do. This test might not even compile, at first, because all of the classes and methods it requires may not yet exist. Nevertheless, it functions as an executable specification. You then get it to compile with minimal production code, so that you can run it and watch it fail. You then produce exactly as much code as will enable that test to pass. Sometimes you expect it to fail, and it passes, which is also useful information.

This technique feels odd, at first, to quite a few programmers who try it. It’s a bit like rock climbers inching up a rock wall, placing anchors in the wall as they go. Why go to all this trouble? Surely it slows you down considerably. The answer is that it only makes sense if you end up relying heavily and repeatedly on those anchors (unit tests) later. Those who practice Test Driven Development regularly claim that the benefits of those unit tests more than pay back the effort required to write them.

For Test-First work, you will typically use one of the xUnit family of automated unit test frameworks (JUnit for Java, NUnit for C#, etc). These frameworks make it quite straightforward to create, run, organize, and manage large suites of unit tests. Test Driven Development has grown to be such a useful tool that it is well integrated into most of the major IDEs, including Visual Studio and Eclipse.

Test-driven development (TDD) is an advanced technique of using automated unit tests to drive the design of software and force decoupling of dependencies. The result of using this practice is a comprehensive suite of unit tests that can be run at any time to provide feedback that the software is still working. This technique is heavily emphasized by those using Agile development methodologies

The motto of test-driven development is “Red, Green, Refactor.”

  • Red: Create a test and make it fail.
  • Green: Make the test pass by any means necessary.
  • Refactor: Change the code to remove duplication in your project and to improve the design while ensuring that all tests still pass.

The Red/Green/Refactor cycle is repeated very quickly for each new unit of code.

Follow these steps (slight variations exist among TDD practitioners):

  1. Understand the requirements of the story, work item, or feature that you are working on.
  2. Red: Create a test and make it fail.
    1. Imagine how the new code should be called and write the test as if the code already existed. You will not get IntelliSense because the new method does not yet exist.
    2. Create the new production code stub. Write just enough code so that it compiles.
    3. Run the test. It should fail. This is a calibration measure to ensure that your test is calling the correct code and that the code is not working by accident. This is a meaningful failure, and you expect it to fail.
  3. Green: Make the test pass by any means necessary.
    1. Write the production code to make the test pass. Keep it simple.
    2. Some advocate the hard-coding of the expected return value first to verify that the test correctly detects success. This varies from practitioner to practitioner.
    3. If you’ve written the code so that the test passes as intended, you are finished. You do not have to write more code speculatively. The test is the objective definition of “done.” The phrase “You Ain’t Gonna Need It” (YAGNI) is often used to veto unnecessary work. If new functionality is still needed, then another test is needed. Make this one test pass and continue.
    4. When the test passes, you might want to run all tests up to this point to build confidence that everything else is still working.
  4. Refactor: Change the code to remove duplication in your project and to improve the design while ensuring that all tests still pass.
    1. Remove duplication caused by the addition of the new functionality.
    2. Make design changes to improve the overall solution.
    3. After each refactoring, rerun all the tests to ensure that they all still pass.
  5. Repeat the cycle. Each cycle should be very short, and a typical hour should contain many Red/Green/Refactor cycles.

Characteristics of a Good Unit Test

A good unit test has the following characteristics.

  • Runs fast, runs fast, runs fast. If the tests are slow, they will not be run often.
  • Separates or simulates environmental dependencies such as databases, file systems, networks, queues, and so on. Tests that exercise these will not run fast, and a failure does not give meaningful feedback about what the problem actually is.
  • Is very limited in scope. If the test fails, it’s obvious where to look for the problem. Use few Assert calls so that the offending code is obvious. It’s important to only test one thing in a single test.
  • Runs and passes in isolation. If the tests require special environmental setup or fail unexpectedly, then they are not good unit tests. Change them for simplicity and reliability. Tests should run and pass on any machine. The “works on my box” excuse doesn’t work.
  • Often uses stubs and mock objects. If the code being tested typically calls out to a database or file system, these dependencies must be simulated, or mocked. These dependencies will ordinarily be abstracted away by using interfaces.
  • Clearly reveals its intention. Another developer can look at the test and understand what is expected of the production code.

Rich feature sets provided by the automation framework would essentially reduce the effort put in by the development teams in the TDD workflow. Here are some of the aspects of automation frameworks that can pay rich dividends: Ease of tracking: Unit tests would be stored in a central repository (part of the automation framework) with all the development team members submitting their unit tests to it. Tests would be stored in hierarchical folder structure based on the product features and its components. With this, viewing and tracking of unit tests within and across teams would be smoother. Traceability of unit tests: Automation frameworks can ensure that each product requirement has a unit test associated with it. This ensures that all requirements are developed as part of the TDD process, thus avoiding development slippages.

Improving the development and review process: Automation infrastructure can facilitate tracking of all requirements by associating them with a developer and reviewer(s). This would ensure that development and review processes are organized. Unit test execution: A good automation framework ensures quick running of automated unit tests. The tests could be executed selectively for a component, set of features or the product itself. Reporting of test execution results: Results of the automated unit test for a component/ feature would be sent to the respective developer; this ensures quick reporting and cuts short the response time in refactoring unit tests from the developer. Automation infrastructure components: Automation frameworks could facilitate:

  • Cross-platform testing
  • Compatibility testing

xUnit frameworks

Developers may use computer-assisted testing frameworks, such as xUnit, to create and automatically run the test cases. Xunit frameworks provide assertion-style test validation capabilities and result reporting. These capabilities are critical for automation as they move the burden of execution validation from an independent post-processing activity to one that is included in the test execution. The execution framework provided by these test frameworks allows for the automatic execution of all system test cases or various subsets along with other features.

Fakes, mocks and integration tests

Unit tests are so named because they each test one unit of code. A complex module may have a thousand unit tests and a simple module may have only ten. The tests used for TDD should never cross process boundaries in a program, let alone network connections. Doing so introduces delays that make tests run slowly and discourage developers from running the whole suite. Introducing dependencies on external modules or data also turns unit tests into integration tests. If one module misbehaves in a chain of interrelated modules, it is not so immediately clear where to look for the cause of the failure.

When code under development relies on a database, a web service, or any other external process or service, enforcing a unit-testable separation is also an opportunity and a driving force to design more modular, more testable and more reusable code. Two steps are necessary:

  1. Whenever external access is needed in the final design, an interface should be defined that describes the access available.
  2. The interface should be implemented in two ways, one of which really accesses the external process, and the other of which is a fake or mock. Fake objects need do little more than add a message such as “Person object saved” to a trace log, against which a test assertion can be run to verify correct behaviour. Mock objects differ in that they themselves contain test assertions that can make the test fail, for example, if the person’s name and other data are not as expected.

Fake and mock object methods that return data, ostensibly from a data store or user, can help the test process by always returning the same, realistic data that tests can rely upon. They can also be set into predefined fault modes so that error-handling routines can be developed and reliably tested. In a fault mode, a method may return an invalid, incomplete or null response, or may throw an exception. Fake services other than data stores may also be useful in TDD: A fake encryption service may not, in fact, encrypt the data passed; a fake random number service may always return 1. Fake or mock implementations are examples of dependency injection.

A Test Double is a test-specific capability that substitutes for a system capability, typically a class or function, that the UUT depends on. There are two times at which test doubles can be introduced into a system: link and execution. Link time substitution is when the test double is compiled into the load module, which is executed to validate testing. This approach is typically used when running in an environment other than the target environment that requires doubles for the hardware level code for compilation. The alternative to linker substitution is run-time substitution in which the real functionality is replaced during the execution of a test cases. This substitution is typically done through the reassignment of known function pointers or object replacement.

Test doubles are of a number of different types and varying complexities:

  • Dummy – A dummy is the simplest form of a test double. It facilitates linker time substitution by providing a default return value where required.
  • Stub – A stub adds simplistic logic to a dummy, providing different outputs.
  • Spy – A spy captures and makes available parameter and state information, publishing accessors to test code for private information allowing for more advanced state validation.
  • Mock – A mock is specified by an individual test case to validate test-specific behavior, checking parameter values and call sequencing.
  • Simulator – A simulator is a comprehensive component providing a higher-fidelity approximation of the target capability (the thing being doubled). A simulator typically requires significant additional development effort.

A corollary of such dependency injection is that the actual database or other external-access code is never tested by the TDD process itself. To avoid errors that may arise from this, other tests are needed that instantiate the test-driven code with the “real” implementations of the interfaces discussed above. These are integration tests and are quite separate from the TDD unit tests. There are fewer of them, and they must be run less often than the unit tests. They can nonetheless be implemented using the same testing framework, such as xUnit.

Integration tests that alter any persistent store or database should always be designed carefully with consideration of the initial and final state of the files or database, even if any test fails. This is often achieved using some combination of the following techniques:

  • The TearDown method, which is integral to many test frameworks.
  • try…catch…finally exception handling structures where available.
  • Database transactions where a transaction atomically includes perhaps a write, a read and a matching delete operation.
  • Taking a “snapshot” of the database before running any tests and rolling back to the snapshot after each test run. This may be automated using a framework such as Ant or NAnt or a continuous integration system such as CruiseControl.
  • Initialising the database to a clean state before tests, rather than cleaning up after them. This may be relevant where cleaning up may make it difficult to diagnose test failures by deleting the final state of the database before detailed diagnosis can be performed.

Continuous Integration

CI is a software development practice, where Team members integrate their work frequently, usually with each check-in to the version control system . An automated server builds the system, runs all the unit tests and reports “green” or “red” to indicate if the build is stable/broken.  Building and Testing are done on a dedicated machine to eliminate “it works on my machine” dependencies.

Over the past three years, Ancestry.com, the world’s largest online family history resource, underwent a significant transformation in technology and infrastructure. Starting with the adoption of Agile development practices, the company evolved to a Continuous Delivery model that enables code release whenever the business requires it. Transitioning from large, weekly or bi-weekly software rollouts to smaller, incremental updates has allowed Ancestry.com to increase responsiveness and deliver new features to customers more quickly.

Continuous Integration features

  • Very High ROI. One of the first engineering practices an Agile teams should adopt.
  • Many open source and commercial tools available
  • Keep builds and tests fast (move slow running tests to a nightly build)
  • Adopt a “Stop the Line” mentality (all work stops until a broken build is fixed)
  • Test in a clone of production (Every environmental difference results in risk)

Continuous integration (CI) is the practice, in software engineering, of merging all developer working copies with a shared mainline several times a day. It was first named and proposed as part of extreme programming (XP). Its main aim is to prevent integration problems, referred to as “integration hell” in early descriptions of XP. CI can be seen as an intensification of practices of periodic integration advocated by earlier published methods of incremental and iterative software development, such as the Booch method. CI isn’t universally accepted as an improvement over frequent integration, so it is important to distinguish between the two as there is disagreement about the virtues of each.

CI was originally intended to be used in combination with automated unit tests written through the practices of test-driven development. Initially this was conceived of as running all unit tests and verifying they all passed before committing to the mainline. This helps avoid one developer’s work in progress breaking another developer’s copy. If necessary, partially complete features can be disabled before committing using feature toggles.

Later elaborations of the concept introduced build servers, which automatically run the unit tests periodically or even after every commit and report the results to the developers. The use of build servers (not necessarily running unit tests) had already been practised by some teams outside the XP community. Nowadays, many organisations have adopted CI without adopting all of XP.

In addition to automated unit tests, organisations using CI typically use a build server to implement continuous processes of applying quality control in general — small pieces of effort, applied frequently. In addition to running the unit and integration tests, such processes run additional static and dynamic tests, measure and profile performance, extract and format documentation from the source code and facilitate manual QA processes. This continuous application of quality control aims to improve the quality of software, and to reduce the time taken to deliver it, by replacing the traditional practice of applying quality control after completing all development. This is very similar to the original idea of integrating more frequently to make integration easier, only applied to QA processes.

In the same vein the practice of continuous delivery further extends CI by making sure the software checked in on the mainline is always in a state that can be deployed to users and makes the actual deployment process very rapid.

Twist at AutoTrader

Testing was growing unwieldy for AutoTrader.co.uk, the UK’s #1 automotive website, with multiple tools, lack of  automation, and increasing test maintenance. Enter Twist—the test automation platform that reduced testing effort  by 95%, achieved 89% test coverage, and minimized testing time by 75%. Furthermore, Twist removed the biggest  pain-point, by drastically reducing the overhead of test maintenance. With Twist, the team could add real value and  test new functionality at a rapid pace

Sustaining quality for a dynamic and complex application can end up in a vicious cycle of modifying tests, testing
new functionality, and optimizing existing tests. AutoTrader.co.uk is the UK’s most popular motoring website, and
the fourth busiest search engine, getting over 10 million unique visitors every month. With such high traffic
volumes, optimal test coverage is not just nice-to-have, but a critical necessity. However, testing such a complex,
and ever changing application was a maintenance-heavy, lengthy, and expensive exercise that was further
complicated by difficult testing tools. Auto Trader liked Twist’s support for writing test specifications in plain English,
automating robust regression tests, and integrating with continuous integration (CI) and source control products,
and decided to give it a try. Since 2008, Twist has dramatically reduced AutoTrader.co.uk’s testing effort and
increased test coverage. It has also improved collaboration between business analysts (BAs) and quality analysts
(QAs) on tests, by simplifying technically complex tests to readable English constructs. Twist’s highly refactored,
modular test suite now exhaustively tests the site in 12 minutes, and can easily keep pace with application changes.
Additionally, Twist’s usability makes it highly popular across the teams.

Advertisements

PEARL X : Behavior Driven Development

PEARL X : Behavior Driven Development provides stakeholder value through collaboration throughout the entire project

Behavior-driven development was developed by Dan North as a response to the issues encountered teaching test-driven development:

  • Where to start in the process
  • What to test and what not to test
  • How much to test in one go
  • What to call the tests
  • How to understand why a test fails
At the heart of BDD is a rethinking the approach to the unit testing and acceptance testing that North came up with while dealing with these issues. For example, he proposes that unit test names be whole sentences starting with the word “should” and should be written in order of business value

At its core, behavior-driven development is a specialized version of test-driven development which focuses on behavioral specification of software units.

Test-driven development is a software development methodology which essentially states that for each unit of software, a software developer must:

  • define a test set for the unit first;
  • then implement the unit;
  • finally verify that the implementation of the unit makes the tests succeed.

This definition is rather non-specific in that it allows tests in terms of high-level software requirements, low-level technical details or anything in between. The original developer of BDD (Dan North) came up with the notion of BDD because he was dissatisfied with the lack of any specification within TDD of what should be tested and how. One way of looking at BDD therefore, is that it is a continued development of TDD which makes more specific choices than TDD.

Behavior Driven Development

Behavior Driven Development

Behavior-driven development specifies that tests of any unit of software should be specified in terms of the desired behavior of the unit. Borrowing from agile software development the “desired behavior” in this case consists of the requirements set by the business — that is, the desired behavior that has business value for whatever entity commissioned the software unit under construction.  Within BDD practice, this is referred to as BDD being an “outside-in” activity.

BDD practices

The practices of BDD include:

  • Establishing the goals of different stakeholders required for a vision to be implemented
  • Drawing out features which will achieve those goals using feature injection
  • Involving stakeholders in the implementation process through outside–in software development
  • Using examples to describe the behavior of the application, or of units of code
  • Automating those examples to provide quick feedback and regression testing
  • Using ‘should’ when describing the behavior of software to help clarify responsibility and allow the software’s functionality to be questioned
  • Using ‘ensure’ when describing responsibilities of software to differentiate outcomes in the scope of the code in question from side-effects of other elements of code.
  • Using mocks to stand-in for collaborating modules of code which have not yet been written

Domain-Driven Design (DDD) is a collection of principles and patterns that help developers craft elegant object systems. Properly applied it can lead to software abstractions called domain models. These models encapsulate complex business logic, closing the gap between business reality and code.

Outside–in

BDD is driven by business value; that is, the benefit to the business which accrues once the application is in production. The only way in which this benefit can be realized is through the user interface(s) to the application, usually (but not always) a GUI.

In the same way, each piece of code, starting with the UI, can be considered a stakeholder of the other modules of code which it uses. Each element of code provides some aspect of behavior which, in collaboration with the other elements, provides the application behavior.

The first piece of production code that BDD developers implement is the UI. Developers can then benefit from quick feedback as to whether the UI looks and behaves appropriately. Through code, and using principles of good design and refactoring, developers discover collaborators of the UI, and of every unit of code thereafter. This helps them adhere to the principle of YAGNI, since each piece of production code is required either by the business, or by another piece of code already written.

YAGNI : You Are Not Goingto Need It

Behavior-Driven Development (BDD) is an agile process designed to keep the focus on stakeholder value throughout the whole project. The premise of BDD is that the requirement has to be written in a way that everyone understands it – business representative, analyst, developer, tester, manager, etc. The key is to have a unique set of artifacts that are understood and used by everyone.

User stories are the central axis around which a software project rotates. Developers use user stories to capture requirements and to express customer expectations. User stories provide the unit of effort that project management uses to plan and to track progress. Estimations are made against user stories, and user stories are where software design begins. User stories help to shape a system’s usability and user experience.

User stories express requirements in terms of The Role, The Goal, and The Motivation.

A BDD story is written by the whole team and used as both requirements and executable test cases. It is a way to perform test-driven development (TDD) with a clarity that cannot be accomplished with unit testing. It is a way to describe and test functionality in (almost) natural language.

mind-mapping-bdd

BDD Story Format
Even though there are different variations of the BDD story template, they all have two common elements: narrative and scenario. Each narrative is followed by one or more scenarios.

The BDD story format looks like this:

Narrative:
In order to [benefit]
As a [role]
I want to [feature]
Scenario: [description]
Given [context or precondition]
When [event or action]
Then [outcome validation]

“User stories are a promise for a conversation” (Ron Jeffries)
A BDD story consists of a narrative and one or more scenarios. A narrative is a short, simple description of a feature told from the perspective of a person or role that requires the new functionality. The intention of the narrative is NOT to provide a complete description of what is to be developed but to provide a basis for communication between all interested parties (business, analysts, developers, testers, etc.) The narrative shifts the focus from writing features to discussing them.
Even though it is usually very short, it tries to answer three basic questions that are often overlooked in traditional requirements.
What is the benefit or value that should be produced (In order to)?
Who needs it (As a)? And what is a feature or goal (I want to)?
With those questions answered, the team can start defining the best solution in collaboration with the stakeholders.

The narrative is further defined through scenarios that provide a definition of done, and acceptance criteria that confirm that narrative that was developed fulfill expectations.

It is important to remember that the written part of a BDD story is incomplete until discussions about that narrative occur and scenarios are written. Only the whole story (narrative and one or more scenarios) represents a full description of the functionality and definition of done.

If more information is needed, narratives can point to a diagram, workflow, spreadsheet, or any other external document.

Since narratives have some characteristics of traditional requirements, it is important to describe distinctions. Two most important differences are precision and planning

Precision
Narratives favor verbal communication. Written language is very imprecise, and team members and stakeholders might interpret the requirement in a different way.

Verbal communication wins over written.
As another example,  is the following requirement statement relating to a registration screen: “The system shall allow the user to register using 16 character username and 8 character password”.

It was unclear whether the username MUST be 16 characters or whether it could be any length up to 16 characters, or whether it could be any length with a minimum of 16 characters. In this particular case, the business analyst removed any doubt as soon as clarification was asked.

However, there are many other cases
when developers take requirements as a final product and simply implement them in a way they understand them. In those cases they might not understand the reasons behind those requirements but just “follow specifications”. They might have a better solution in mind that never gets discussed.

Planning
IEEE 830 style requirements (“The system shall…”) often consist of hundreds or even thousands of statements. Planning such a large number of statements is extremely difficult. There are too many of them to be prioritized and estimated, and it is hard to understand which functionalities should be developed. That is especially evident when those statements are separated into different sections that represent different parts of the system or products. Without adequate prioritization, estimation, and description of the functionality itself, it is very hard to accomplish an iterative and incremental development process. Even if there is some kind of iteration plan, it can take a long time for a completed functionality to be delivered, since the development of isolated parts of the
system is done in a different order and at a different speed.

Narratives are not requirement statements
The Computer Society of the Institute of Electrical and Electronics Engineers (IEEE) has published a set of guidelines on how to write software requirements specifications. This document is known as IEEE Standard 830 and it was last updated in 1998. One of the
characteristics of an IEEE 830 statement is the use of the phrase
“The system shall…”. Examples would be:

The system shall allow the user to login using a username and password.

The system shall have a login confirmation screen.

The system shall allow 3 unsuccessful login attempts.

Writing requirements in this way has many disadvantages: it is error prone and time-consuming, to name but two. Two other important disadvantages are that it is boring and too long to read.
This might seem irrelevant until you realize the implications. If reviewers and, if there is such a process, those who need to sign off requirements do NOT read it thoroughly and skip sections out of boredom, or because it does NOT affect them, many things will be missed. Moreover, having a big document written at that level often prevents readers from understanding the big picture and the real goal of the project.

A Waterfall model combined with IEEE 830 requirements tends to plan everything in advance, define all details, and hope that the project execution will be flawless. In reality, there are almost no successful software projects that manage to accomplish these goals. Requirements change over time resulting in “change requests”. Changes are unavoidable and only through constant communication and short iterations can the team reduce the impact of these changes. IEEE 830 statements are a big document in the form of a checklist. Written, done, forgotten, the overall understanding is lost. The need for constant reevaluation is nonexistent.
Consider the following requirements:

  • The product shall have 4 wheels.
  • The product shall have a steering wheel.
  • The product shall be powered by electricity.
  • The product shall be produced in different colors.

Each of those statements can be developed and tested independently and assembled at the end of the process. The first image in someone’s head might be an electrically-powered car.
That image is incorrect. It is a car, it has four wheels, it is powered by electricity (rechargeable batteries) and it can be purchased in different colors. it is A toy car

That is probably not what individual would think from reading those statements. A better description would be:

Narrative:
In order to provide entertainment for children
As a parent
I want a small-sized car
By looking at this narrative, it is clear what the purpose is (enter- tainment for children), who needs it (parents), and what it is (a small-sized car). It does not provide all the details since the main purpose is to establish the communication that will result in more information and understanding of someone’s needs.

That process might end with one narrative being split into many. Further on, scenarios produced from that narrative act as acceptance criteria, tests, and definition of done.

Who can write narratives?
Anyone can write narratives. Teams that are switching to Agile tend to have business analysts as writers and owners of narratives or even whole BDD stories (a narrative with one or more scenarios).
In more mature agile teams, the product owner has a responsibility to make sure that there is a product backlog with BDD stories. That does not mean that he writes them. Each member of the team can write BDD stories or parts of them (narrative or scenario).
Whether all the narratives are written by one person (customer, business analyst, or product owner) or anyone can write them (developers, testers, etc.) usually depends on the type of organization and customers. Organizations that are used to “traditional” requirements and procedures that require them to have those requirements “signed” before the project starts often struggle during their transition to Agile and iterative development. In cases like this, having one person (usually a business analyst) as the owner and writer of narratives might make a smoother transition towards team ownership and lower the impact on the organization

A good BDD narrative uses the “INVEST” model:

  •  Independent. Reduced dependencies = easier to plan.
  •  Negotiable. Details added via collaboration.
  •  Valuable. Provides value to the customer.
  •  Estimable. Too big or too vague = not estimable.
  •  Small. Can be done in less than a week by the team.
  •  Testable. Good acceptance criteria defined as scenarios.

While IEEE 830 requirements are focused on system operations, BDD narratives focus on customer value. They encourage looseness of information in order to foster a higher level of collaboration between stakeholders and the team. The actual work being done is accomplished through collaboration revolving around the narrative that becomes more detailed through scenarios as the development progresses. Narratives are at higher level than IEEE 830 requirements. Narratives are followed by collaboratively developed scenarios which define when the BDD story meets the expectations.

Scenarios
Even though narratives can be written by anyone, it is often the result of conversations between the product owner or business analyst and the business stakeholder.
Scenarios describe interactions between user roles and the system. They are written in plain language with minimal technical details so that all stakeholders (customer, developers, testers, designers, marketing managers, etc.) can have a common base for use in discussions, development, and testing.
Scenarios are the acceptance criteria of the narrative. They represent the definition of done. Once all scenarios have been implemented, the story is considered finished. Scenarios can be written by anyone, with testers leading the effort.

The whole process should be iterative within the sprint; as the development of the BDD story progresses, new scenarios can be written to cover cases not thought of before. The initial set of scenarios should cover the “happy path”. Alternative paths should be added progressively during the duration of the sprint.

Format
Scenarios consist of a description and given, when, and then steps.
The scenario description is a short explanation of what the scenario does. It should be possible to understand the scenario from its description. It should not contain details and should not be longer than ten words.
Steps are a sequence of preconditions, events, and outcomes of a scenario. Each step must start with words given, when or then.
The Given step describes the context or precondition that needs to be fulfilled.

Given visitor is on the home screen

The When step describes an action or some event.

When user logs in

The Then step describes an outcome.

Then welcome message is displayed

Any number of given, when and then steps can be combined, but at least one of each must be present. BDD steps increase the quality of conversations by forcing participants to think in terms of pre-conditions that allow users to perform actions that result in some outcomes. By using those three types of steps, the quality of the interactions between team members and stakeholders increases.

Process
The following process should be followed.
1. Write and discuss narrative.
2. Write and discuss short descriptions of scenarios.
3. Write steps for each scenario.
4. Repeat steps 2 and 3 during the development of the
narrative.

By starting only with scenario descriptions, we are creating a basis that will be further developed through steps. It allows us to discuss different aspects of the narrative without going into the details of all the steps required for each of the scenarios. Do not spend too much time writing descriptions of all possible scenarios. New ones will be written later.
Once each scenario has been fully written (description and steps) new possibilities and combinations will be discovered, resulting in more scenarios.

Each action or set of actions (when steps) is followed by one or more outcomes (then steps). Even though this scenario provides a solid base, several steps are still missing. This situation is fairly common because many steps are not obvious from the start.
Additional preconditions, actions, and outcomes become apparent only after first version of the scenario has been written.

This scenario covers one of many different combinations. It describes the “happy path” where all actions have been performed successfully. To specify alternative paths, we can
copy this scenario and modify it a bit.

This scenario was not written and fully perfected at the first attempt  but through several iterations. With each version of the scenario,  new questions were asked and new possibilities were explored.
The process of writing one scenario can take several days or even  weeks. It can be done in parallel with code development. As soon as  the first version of the scenario has been completed, development  can start. As development progresses, unexpected situations will  arise and will need to be reflected in scenarios.

Behavior-driven development borrows the concept of the ubiquitous language from domain driven design. A ubiquitous language is a (semi-)formal language that is shared by all members of a software development team — both software developers and non-technical personnel. The language in question is both used and developed by all team members as a common means of discussing the domain of the software in question.  In this way BDD becomes a vehicle for communication between all the different roles in a software project.

BDD uses the specification of desired behavior as a ubiquitous language for the project team members. This is the reason that BDD insists on a semi-formal language for behavioral specification: some formality is a requirement for being a ubiquitous language. In addition, having such a ubiquitous language creates a domain model of specifications, so that specifications may be reasoned about formally. This model is also the basis for the different BDD-supporting software tools that are available.

Much like test-driven design practice, behavior-driven development assumes the use of specialized support tooling in a project. Inasmuch as BDD is, in many respects, a more specific version of TDD, the tooling for BDD is similar to that for TDD, but makes more demands on the developer than basic TDD tooling.

Tooling principles

In principle a BDD support tool is a testing framework for software, much like the tools that support TDD. However, where TDD tools tend to be quite free-format in what is allowed for specifying tests, BDD tools are linked to the definition of the ubiquitous language discussed earlier.

As discussed, the ubiquitous language allows business analysts to write down behavioral requirements in a way that will also be understood by developers. The principle of BDD support tooling is to make these same requirements documents directly executable as a collection of tests. The exact implementation of this varies per tool, but agile practice has come up with the following general process:

  • The tooling reads a specification document.
  • The tooling directly understands completely formal parts of the ubiquitous language . Based on this, the tool breaks each scenario up into meaningful clauses.
  • Each individual clause in a scenario is transformed into some sort of parameter for a test for the user story. This part requires project-specific work by the software developers.
  • The framework then executes the test for each scenario, with the parameters from that scenario.

Dan North has developed a number of frameworks that support BDD (including JBehave and RBehave), whose operation is based on the template that he suggested for recording user stories  These tools use a textual description for use cases and several other tools (such as CBehave) have followed suit. However, this format is not required and so there are other tools that use other formats as well. For example Fitnesse (which is built around decision tables), has also been used to roll out BDD.

Tooling examples

There are several different examples of BDD software tools in use in projects today, for different platforms and programming languages.

Possibly the most well-known is JBehave, which was developed by Dan North. The following is an example taken from that project:

Consider an implementation of the Game of Life. A domain expert (or business analyst) might want to specify what should happen when someone is setting up a starting configuration of the game grid. To do this, he might want to give an example of a number of steps taken by a person who is toggling cells. Skipping over the narrative part, he might do this by writing up the following scenario into a plain text document (which is the type of input document that JBehave reads):

Given a 5 by 5 game
When I toggle the cell at (3, 2)
Then the grid should look like
.....
.....
.....
..X..
.....
When I toggle the cell at (3, 1)
Then the grid should look like
.....
.....
.....
..X..
..X..
When I toggle the cell at (3, 2)
Then the grid should look like
.....
.....
.....
.....
..X..

The bold print is not actually part of the input; it is included here to show which words are recognized as formal language. JBehave recognizes the terms Given (as a precondition which defines the start of a scenario), When (as an event trigger) and Then (as a postcondition which must be verified as the outcome of the action that follows the trigger). Based on this, JBehave is capable of reading the text file containing the scenario and parsing it into clauses (a set-up clause and then three event triggers with verifiable conditions). JBehave then takes these clauses and passes them on to code that is capable of setting a test, responding to the event triggers and verifying the outcome. This code must be written by the developers in the project team (in Java, because that is the platform JBehave is based on). In this case, the code might look like this:

private Game game;
private StringRenderer renderer;
 
@Given("a $width by $height game")
public void theGameIsRunning(int width, int height) {
    game = new Game(width, height);
    renderer = new StringRenderer();
    game.setObserver(renderer);
}
 
@When("I toggle the cell at ($column, $row)")
public void iToggleTheCellAt(int column, int row) {
    game.toggleCellAt(column, row);
}
 
@Then("the grid should look like $grid")
public void theGridShouldLookLike(String grid) {
    assertThat(renderer.asString(), equalTo(grid));
}

The code has a method for every type of clause in a scenario. JBehave will identify which method goes with which clause through the use of annotations and will call each method in order while running through the scenario. The text in each clause in the scenario is expected to match the template text given in the code for that clause (for example, a Given in a scenario is expected to be followed by a clause of the form “a X by Y game”). JBehave supports the matching of actual clauses to templates and has built-in support for picking terms out of the template and passing them to methods in the test code as parameters. The test code provides an implementation for each clause type in a scenario which interacts with the code that is being tested and performs an actual test based on the scenario. In this case:

  • The theGameIsRunning method reacts to a Given clause by setting up the initial game grid.
  • The iToggleTheCellAt method reacts to a When clause by firing off the toggle event described in the clause.
  • The theGridShouldLookLike method reacts to a Then clause by comparing the actual state of the game grid to the expected state from the scenario.

The primary function of this code is to be a bridge between a text file with a story and the actual code being tested. Note that the test code has access to the code being tested (in this case an instance of Game) and is very simple in nature (has to be, otherwise a developer would end up having to write tests for his tests).

Finally, in order to run the tests, JBehave requires some plumbing code that identifies the text files which contain scenarios and which inject dependencies (like instances of Game) into the test code. This plumbing code is not illustrated here, since it is a technical requirement of JBehave and does not relate directly to the principle of BDD-style testing.

Story versus specification

A separate subcategory of behavior-driven development is formed by tools that use specifications as an input language rather than user stories. An example of this style is the RSpec tool that was also developed by Dan North. Specification tools don’t use user stories as an input format for test scenarios but rather use functional specifications for units that are being tested. These specifications often have a more technical nature than user stories and are usually less convenient for communication with business personnel than are user stories. An example of a specification for a stack might look like this:

Specification: Stack

When a new stack is created
Then it is empty

When an element is added to the stack
Then that element is at the top of the stack

When a stack has N elements 
And element E is on top of the stack
Then a pop operation returns E
And the new size of the stack is N-1

Such a specification may exactly specify the behavior of the component being tested, but is less meaningful to a business user. As a result, specification-based testing is seen in BDD practice as a complement to story-based testing and operates at a lower level. Specification testing is often seen as a replacement for free-format unit testing.

Specification testing tools like RSpec and JDave are somewhat different in nature from tools like JBehave. Since they are seen as alternatives to basic unit testing tools like JUnit, these tools tend to favor forgoing the separation of story and testing code and prefer embedding the specification directly in the test code instead. For example, an RSpec test for a hashtable might look like this:

describe Hash do
  before(:each) do
    @hash = Hash.new(:hello => 'world')
  end
 
  it "should return a blank instance" do
    Hash.new.should eql({})
  end
 
  it "should hash the correct information in a key" do
    @hash[:hello].should eql('world')
  end
end

This example shows a specification in readable language embedded in executable code. In this case a choice of the tool is to formalize the specification language into the language of the test code by adding methods named it and should. Also there is the concept of a specification precondition – the before section establishes the preconditions that the specification is based on.

Cucumber lets software development teams describe how software should behave in plain text. The text is written in a business-readable domain-specific language and serves as documentation, automated tests and development-aid – all rolled into one format.

Cucumber works with Ruby, Java, .NET, Flex or web applications written in any language. It has been translated to over 40 spoken languages.

Cucumber also supports more succinct tests in tables – similar to what FIT does. Users can view the examples and documentation to learn more about Cucumber tables.

Gherkin gives us a lightweight structure for documenting examples of the behavior our stakeholders want, in a way that it can be easily understood both by the stakeholders and by Cucumber. Although we can call Gherkin a programming language, its primary design goal is human readability, meaning you can write automated tests that read like documentation.

Using mocks

BDD proponents claim that the use of “should” and “ensureThat” in BDD examples encourages developers to question whether the responsibilities they’re assigning to their classes are appropriate, or whether they can be delegated or moved to another class entirely. Practitioners use an object which is simpler than the collaborating code, and provides the same interface but more predictable behavior. This is injected into the code which needs it, and examples of that code’s behavior are written using this object instead of the production version.

These objects can either be created by hand, or created using a mocking framework such as mock.

Questioning responsibilities in this way, and using mocks to fulfill the required roles of collaborating classes, encourages the use of Role-based Interfaces. It also helps to keep the classes small and loosely coupled.