Fight!

Comparing Executable Specification tools

For when you want to go ATDD-, BDD- or SpecByExample-style

--

Cet article est également disponible en français.

I just love executable specifications! So here is a match between the main tools that can be used to write them.

Let’s pit against each other:

  • FitNesse
  • Cucumber (and its kin)
  • Robot Framework
  • Concordion
  • Gauge

A word of caution: business-people involvement

Beware: I will evaluate these tools with the mindset that they will be used by the business people. Business people will write the tests, not the developers.

This is the reason why I won’t say anything about the many other tools that keep close to the programming language, thus providing a much more lightweight solution. I’m counting in this space tools like Jasmine(JavaScript), Spock (Java) and Codeception (PHP). Being closer to the developer’s environment, these tools are consequently less understandable by the non-developers.

If you’re unable to involve the business people into writing tests/executable specifications, then you might as well go with these more lightweight tools.

Don’t bother using a big, cumbersome tool if the only users of the tool are developers…

OK, that has to be said. Now let’s go!

TL;DR

Cucumber is king

All these tools are able to handle all kinds of scenarios and will allow you to express your specific business concepts and tests. However, the result will be more or less enjoyable depending on your needs, the resulting tests will be more or less readable, and you’ll need more or less code to make them work. Still, you’ll never face the impossible.

On the other hand, Cucumber will be the easiest to learn for the non-technical ones, and is available whatever your stack. These two decisive arguments are making Cucumber an excellent default option.

Keep reading if you wish to make a more informed choice, tailored to your context!

FitNesse

Exercise your product!

FitNesse is the old-timer of the group. It is loosely based on Fit, which pioneered specification by example and driving development by acceptance tests.

The most noticeable trait of Fitnesse is that it takes the form of a Wiki, a website whose content can be directly edited and formatted from the interface of the website.

What’s great about the tool:

  • A well-known and widely used tool: go check the FitNesse mailling-list and you’ll find answers! The GitHub accounts hosting the project are well maintained and are still moving in spite of the venerable age of the tool. FitNesse is used in many companies. You’ll find books about FitNesse!
  • Community-driven only: there is no company behind FitNesse; it is not a product created by some company, sharing them as an open-source project. What happens next to the tool is not tied to any business model of some specific company.
  • Easy-as-pie to install: just type java -jar fitnesse.jar and you’re good to go…
  • Integrated Wiki, the resulting documentation is directly accessible: you don’t need any tool to edit tests, no link to do between files and the rendered doc…
  • The test Runner is available for many platforms: they are plugged into FitNesse by using specific protocols (FIT or SLIM).
  • The prettiest decision tables: truly, the result is nice! It pretty much looks like what the business people would have written into an Excel sheet.
  • All the space you need to write some beautiful text: the business poeple can expand on the context and explain it in great detail — as much as they want! They can insert images, format text…

What’s less good about the tool:

  • An outdated tool: the tool is venerable, and it feels. The protocol to interface the Wiki with the code server (FIT or SLIM) is clearly questionable. It is rather complicated to use the same test base (i.e. the same Wiki instance) to test several products using different technology stacks. And when something is going wrong, it’s not always easy to understand what exactly went wrong and to debug it.
  • Irregular language: several types of tables are available for different types of tests, and each table actually has his own syntax and his own subtleties. These differences are rarely justified, they just happen to be that way and you have to live with it.
  • The workflow scenarios are barely readable: among the different types of possible tests, the workflow tests (chaining various steps, actions and checks) are notoriously hard to read and maintain.
  • Counter-intuitive concepts: you write tests in a Wiki editor, and the Wiki will make HTML out of it. Yet the test is not the text you’ve written but the generated HTML. Let’s say that you write an URL, then by default the Wiki will make an hyperlink out of the URL; and now when the test is run the character strings passed to the test code won’t be the URL you typed but an <a> HTML tag to your URL.
  • Using the Wiki is mandatory, test reports and test writing are not text-friendly: using FitNesse outside of the Wiki is neither natural or practical. This is a downside for the developers, unless they use some FitNesse extension for their IDE — at the condition that such a FitNesse extension is available for their language and their IDE. Likewise versioning of the Wiki content does not come naturally since it is the files that are versioned; the versioning options directly integrated into the Wiki are sub-optimal.
  • No advanced editor for the non-developers: the developers may have access to some FitNesse integration into their IDE if they are lucky. The non-developers, on the other hand, have no other option than using the Wiki which is still a rather rudimentary editor. What I really mean is that there is no completion available.

Conclusion:

One really great thing about FitNesse is how easy it is to install and to get started.

Don’t focus too much on this, though: what really counts is the total work needed to get a working, full-blown continuous integration.

Nevertheless, faced to people requesting some POC before they can be convinced, FitNesse will allow you to finish the POC in 1 or 2 hours tops, including writing and running the first test.

Cucumber

Cucumber or gherkin? Size matters.

Everybody knows Cucumber! Maintained by the company Cucumber Ltd., it truly is the tool which made Behavior-Driven-Development (BDD) popular, as well as using the Given-When-Then train of thought to explain and illustrate it.

Cucumber is known under many names depending of the underlying platform. For instance on top of PHP it is called Behat and on C#/.NET it’s SpecFlow.

But even more important is the language used to express the tests, named Gherkin. It is widely used independently of Cucumber. Cucumber being kind of the reference implementation to run Gherkin tests.

What’s great about the tool:

  • Cross-platform at the text level with Gherkin: if you happen to have several platforms to maintain with the same business rules, you may specify the behavior only once with a single test which will be run against each of the software stacks. You won’t be allowed to mix two software stacks into the same test but in practice this is hardly ever needed.
  • The test runner is available on virtually all the environments, and is easy to integrate: for instance integrating Behat to a PHP project will be easily done with a few composer commands. Extensions are available to add features. You’re likely to find on GitHub an integration for any context.
  • Easy to use, no tool is needed to write the tests: Cucumber, or Gherkin to be accurate, is nothing more than a handful of keywords. Writing the first test can be done with any text editor: the journey do not begin by learning yet another tool necessary to write the tests.
  • Test expression is heavily constrained, and that makes Cucumber a wonderful teaching tool: the language is very simple with just a handful of keywords but on the other hand it puts a lot of constraints on how these keywords can be used. There is not much freedom in the way tests are written. While this can be seen as a downside for an experienced user, it is in fact the nicest thing about this tool for a novice user. Indeed it will force the newbie to ask good questions and to follow the path to writing good tests. In the end, such a rigidity makes it easier for business people to learn the tool.
  • Gherkin can be localized: that may not sound like a big deal to you, but in many companies the business people are writing theirs specifications in their native language. This practice is not always dubious: if your business domain or your market is tightly related to a local culture, it makes sense to describe your product in the corresponding language. In the end, localizing Gherkin is another mean to make the tool easier to learn.
  • The reference — “Given-When-Then” is used by everybody, everywhere: there are so many books on Cucumber and its ecosystem, not to mention those using the Gherkin formalism without naming it. The community around Cucumber is very big and active. Whatever your question, you’ll find the answer.

What’s less good about the tool:

  • No IDE for the business people: developers are provided with nice IDE integrations and that definitely helps them. On the other hand the space for business people IDE looks empty. Some commercial products are trying to fill that need, like HipTest which integrates Gherkin into JIRA and provides some test step completion, or Cucumber Pro which looks very promising but is still in closed beta. Hiptest told me that their solution is free to use for startups.
  • Syntax highlighting only works in English: maybe you haven’t found any suitable IDE with completion, but the chances are big that your favourite editor will have an extension for Gherkin syntax highlighting: SublimeText, Notepad++, Atom… But this syntax highlighting will only work if you don’t localize the Gherkin language; you’ll get the highlights only with English Gherkin and its Given-When-Then.
  • More link code is needed: compared to the other tools, Cucumber might be the one needing the most amount of link code to make executable the specifications. The first goal being to ease the discussions between technical and business people, that should not be a major decision criteria.

Conclusion:

You can’t go wrong with Cucumber. It may not be the best choice, but it won’t be a bad choice.

One of the biggest hurdles in this executable specifications stuff, is to have them written by the business people themselves. So it turns out that being easy to learn is a major asset of Cucumber. Being able to localize the language makes it even more easier.

New users will face many problems and won’t be able to write directly what they had in mind. Actually, these complications will lead to asking the good questions.

Why is it better to write small tests? Why test only a single concept? Why is it better than writing big tests that test everything at once?

Last but not least, Cucumber is available on pretty much any tech stack! You can even realize the dream of having a single test case running against all your products in spite of different technology stacks.

Robot Framework

Don’t mind the clueless face of this robot, he’s very resourceful!

At first sight Robot Framework does not seem very engaging. With such simple looks, you’d be asking yourself whether this tool has any user.

Well, yes people are using Robot Framework! It’s a complete tool, fully working, well thought-out, and the community is strong and active. Let’s thank Pekka Klärck for having created this tool. He’s still very active in all the gatherings and events around Robot Framework.

The tool comes mainly from the Nordic countries (in particular Finland), and unsurprisingly the companies using and supporting actively Robot Framework are mostly in the Nordic countries too. Maybe the most known company being Nokia!

What makes Robot Framework different? Its versatility and its ability to do many things, out-of-the-box.

What’s great about the tool:

  • The language is fully-featured, very powerful, and regular: the language used to describe Robot Framework test cases is very powerful, and most importantly it is regular. The basic element in Robot Framework is the keyword, and keywords can be themselves defined by using other keywords. At each level, you can add some timeout, setup code, clean-up code, documentation, tags… By default, the tool knows how to handle lists, dictionaries, conditional branching. Such options are to be used with moderation, but still it is very comfortable to have them available.
  • Reusing keywords, with no limit: I just mentioned it, keywords can be themselves defined by using other keywords. Keywords can be grouped in files that the tests will import. And these files can use keywords coming from other files, libraries…
  • Many libraries are available out-of-the-box, new ones can be added in Python or by using a server of dynamic keywords implemented in any language: Robot Framework provides out-of-the-box many libraries, for instance to connect to a SSH server. Many third-party libraries are also available. It is easy to write a new library in Python. It is even possible to write some Python code in-line in the tests to handle very specific cases. Or finally, you can plug-in any product by using a dynamic keywords server whose implementation rely on XML-RPC, a standard and reliable protocol.
  • Cross-platform, you can even mix several tech stacks into the same test: Robot Framework is, to my knowledge, the only tool whose architecture makes it possible to mix several tech stacks into the same test. In practice, you can import twice the same library under two different names. That way you can simultaneously plug-in several dynamic keywords servers, each server speaking to a different product, each product using his own tech stack.
  • Many companion tools, mature and fully-featured: Robot Framework is not only a language. There is the test runner of course but also other tools to generate test reports, or another test runner running the tests in parallel. All these tools offer many options and are very well documented. It is clear that these tools have been well thought-out. You are bound to succeed when you’ll setup the continuous integration.
  • Detailed execution logs: the test reports and execution logs are as simple looking as the Robot Framework homepage, but most importantly they provide very detailed information. You’ll see the run for each test case, with the detail for each keyword that has been called, itself decomposed as each of the underlying keywords. If logs are displayed when running a specific keyword, the logs will be associated to the corresponding keyword. Speaking of understanding and debugging what happened, this is the behavior we should expect from all these tools.
  • Several IDE’s whose main audience is not developers: this is rare enough to be pointed out, several stand-alone IDE’s are available to express Robot Framework tests — using a developer’s IDE not being mandatory. What a pleasure to be able to use completion to find again keywords already used in other tests! If RIDE is now an old tool, RED is the new kid in town and is very promising. For instance RED allows to debug test cases by performing step-by-step evaluation at the level of the Robot Framework tests. Can you imagine that?
  • Complete freedom in the test writing: workflow, decision table, Given-When-Then… Whatever the kind of test, you can do it in Robot Framework with a rather readable result.

What’s less good about the tool:

  • Complex language, hard to learn for non-technical users: while the heart of the test cases will be rather readable and understandable, we cannot ignore what’s around this test case. We are talking here about variables, imports, libraries, keywords, files. That’s a lot of stuff which may be alien to non-technical people. Likewise, the IDE’s can be quite complex to learn since they can do so many things.
  • Bugs in the IDE, some Robot Framework features are not supported from the GUI editor: RIDE, Robot Framework’s historic IDE, is scattered with small bugs. Nothing too dramatic, but if you use the IDE extensively you’ll eventually stumble on one of them and that can be annoying. Some features of the Robot Framework language are not available from RIDE: you then have to switch to text editing to perform the modification! On the other hand, RED is a much more recent and modern IDE, however it was quite incomplete not so long ago.
  • Dynamic keywords servers are not available for any platform: dynamic keywords servers rely on XML-RPC to talk with the Robot Framework test runner, which is a standard and proven protocol. However, such dynamic keywords servers only exist for a handful of platforms. If your platform is not supported, you will probably just skip Robot Framework rather than implement the server, even if that is not very hard: I know what I’m talking about since I have completely reworked the one for PHP.
  • Execution logs are not meant to be used as a reference documentation: test reports of an executable specification shall hold the absolute truth — this is the behavior of the product, completely and unambiguously. Robot Framework test reports, while being complete, tend to bury the original information, that is the behavior description, under details of the run. From that point of view, Robot Framework is missing the fundamental stake of executable specifications by focusing too much on the mechanics of the execution.

Conclusion:

Most upsides and downsides of Robot Framework are directly related:

  • Great execution logs with a lot of details making it easy to investigate and debug, but at the cost of the readability of the specifications themselves.
  • Very well-thought, powerful and complete language, but that makes it harder to learn
  • Actual IDE’s are available and are a boon to use, but unsurprisingly learning an IDE is much harder than using a simple text editor

All this lead us to wonder what is the real audience of Robot Framework. Not appropriate for non-technical business people, on the other hand it will be a joy to use for testers, who are used to navigate into complex tools — when they are not coding! While these very testers may feel frustrated by other tools like Cucumber because of the imposed constraints and limitations, they will probably love Robot Framework.

For the specification lovers

Like FitNesse, Concordion is based on Fit. But the path chosen by Concordion is definitely different from FitNesse’s. Concordion’s focus is truly about writing a specification, that is a document that mimics traditional papers — with the addition of being executable, of course!

Unfortunately I never had the chance to try Concordion on a real, actual project. Unsurprisingly I can only comment so much in these conditions. Do not hesitate to correct or to complete my sayings by commenting this article!

What’s great about the tool:

  • Literally a specification that can be executed: when you express tests in Concordion, the result is much closer to an old fasioned spec than what the other tools propose. That may sound like a bad thing (yeah sure, remember that Waterfall is bad…) but it is in fact what makes this tool worthy. You must keep in mind that in many domains, legal constraints and regulations are very real and you cannot get a product out the door without some official document showing that the product is following the rules. On one hand the challenge is to have a document matching the behavior of the product — and that’s the whole point of executable specifications. On the other hand the document must be readable by people completely outside of the project, who will put their names on the document saying the product follows the rules. As a result, absolutely no compromise is allowed about the readability of the document.
  • Beautiful integration of images and other elements describing a spec: to go even further in this specification, images and other elements can be integrated. Said integrations are very well done, always with a focus on the readability of the document.

What’s less good about the tool:

  • All the test stuff is on the dev: most focus being on writing a regular specification, it is only logicial that the code to make it executable must go farther than with some other tools. The test concepts being mostly non-existent in the business people tool, it’s up to the technical people to implement them.
  • Supports only the major platforms (Java, C#, Python, Ruby): to my knowledge, Concordion can only be used with Java, C#, Python and Ruby.

Conclusion:

Like I said before, some domains have unavoidable legal constraints and they must show that they respect them. It can be that the CEO of the company himself will sign-off the document swearing that the product does what it is asked to do. In such a situation, Concordion is the tool to use as its main focus is on writing an executable specification whose result mimics a well-known spec format.

Gauge

Thanks ThoughtWorks!

Gauge is the youngest tool of the list and is maintaned by ThoughtWorks. The tool tries to borrow good ideas from all the existing ones while avoiding their mistakes and by taking a more modern approach, accounting that today’s constraints and way of working are not yesterday’s.

The tool is still young and is evolving at a steady pace. The community around the tool is not that big but is definitely there, supported by ThoughtWorks.

Unfortunately I never had the chance to try Gauge on a real, actual project. Unsurprisingly I can only comment so much in these conditions. Do not hesitate to correct or to complete my sayings by commenting this article!

What’s great about the tool:

  • Writing in MarkDown, relying on the MarkDown ecosystem for editing and formatting: there are so much MarkDown editors and document generators out there!
  • By ThoughtWorks! That is by itself a guarantee of the quality of the product and of the pragmatism of the design choices.
  • Many good ideas, tries to borrow the best from its elders: that’s the whole point of creating a new solution, trying to keep the best things while improving the questionable stuff. For instance, running tests in parallel is directly part of the core of Gauge while it is quite clear that in the other tools it has been added as an after-thought.

What’s less good about the tool:

  • Java / C# / Ruby / JavaScript / Python / Golang: more and more platforms are getting supported.
  • Still young: the tool is still young and while it is usable, we can find there or there missing features or bugs. We can only hope that such elements will be fixed as the tool gains in maturity

Conclusion:

The tool is very promising and it clearly deserves to be given a try! At the condition that you are using one of the supported platforms.

Who’s the winner?

Ding!

It’s Cucumber, of course!

And the winner is… CU-CUM-BEEEEEEEEER!!!
  • Easy to learn by the business people: it is the easiest to learn from scratch, because of the simplicity of the langage but also because no specific editor is needed. Furthermore the language forces constraints that will help guide the neophyte into his learning.
  • Available whatever your tech stack
  • Cross-platform at the Gherkin level: it is common to have several products with the same business rules. While these products rely on different technical stacks, it is possible to run the same test case on all the products. Worst case scenario, you can bypass Cucumber but keep the same test case in Gherkin; for instance by using Calabash.
  • Huge community: whatever your problem, somebody will have been there before you!
  • Many third party tools: the ecosystem around Cucumber is very active, full of good stuff making your life easier.
  • The reference! You won’t be lost when you’ll read books and blogs…

Did you like this article? Please click on the heart to support me! And don’t hesitate to Follow me!

--

--

Software engineer with special interest in quality and business value, now a technical/Agile coach helping teams to deliver high-value, high-quality products.