Automated Testing of HTML5 Canvas Applications with Selenium WebDriver

Tags: , , ,

This is the first in a series of articles about using automated testing tools on a Canvas-based Web application. Each article covers a different testing tool or technique.

The specific application I’m testing is sized for a tablet, and uses a combination of a Canvas taking up roughly 2/3 of the space and a set of JQuery Mobile widgets taking up most of the remaining 1/3 of the space (there’s also a small space on top showing a read-only “status bar” with key statistics).  The basic concept is an editable diagram in the Canvas, with specific edit controls in the JQuery Mobile area.  You can hover, drag, and click parts of the diagram in the Canvas, which can change what’s displayed in the edit area, and changes you make in the edit area can add, remove, or change elements of the diagram in the canvas as well as updating the values shown in the status bar.

My goals for the automated testing are:

  • Make it easy to write tests
  • Run the tests in a real browser, to ensure I’m testing what a user would actually see
  • Make it easy to run a suite of tests and drill into specific test failures
  • Make the tests run fast enough that it’s not terribly onerous to run before every commit
  • Be able to verify the visible state of the HTML showing on the screen (e.g. in the edit area, in the status area)
  • Be able to verify the underlying diagram model that dictates what is drawn to the Canvas.  (I haven’t attempted to go as far as testing the actual Canvas state — as in the pixel color at some specific coordinate.  The bugs I get aren’t that the Canvas is drawing something that does not accurately represent its model state.)

Summary of Selenium WebDriver

  • Good: It has a model that I like — you write tests, and when you run them, it launches a browser and runs through the tests, clicking on various things on the screen, waiting for screens to appear, etc.  It supports multiple browsers (though with varying quality).  You can evaluate JavaScript within the context of the page (e.g. “return mymodel.somevalue;”) to get at the underlying state in addition to testing the visible component state.
  • Neutral: You write the tests in a programming language such as Java, C#, Ruby, or Python.  (I used Java.)
  • Bad: All the examples show a test that runs from as a standalone application.  You have some decisions if you want to run from within a unit testing framework — do you restart the browser between tests (~10 seconds), reload the page (but it’s not like a user does that for every action), or try to recover from an unknown page state if a test fails?  The detailed event support is poor — you can cause a mouse move to a specific location, for instance, but not a mouse down, mouse up, or click at specific coordinates.  It’s fine if all you do is click buttons, but terrible for a Canvas (there is a workaround).  As far as I know there’s no way to record WebDriver tests from the Selenium IDE or a tool like that.
  • Bottom Line: If you’re willing to write a lot of setup code, you can record and execute tests easily.  They run in a browser, 99% like a user would experience.  I just hope to find some still higher-level tools that wrap WebDriver to record tests, run multiple tests and show results, etc.  I also hope future releases resolve some of the differences in the browser-specific hookups that make tests work differently in different browsers.

Detailed Review

The basic model is that you write a test in code, compile (if needed), and execute the test.  The test launches a real browser (Firefox, Safari, etc.) and interacts with it, simulating mouse movements, clicks, etc.  This causes the page to function just as if a user was interacting with it.  A test can also execute JavaScript within the context of the page, to retrieve information on the JavaScript state, to execute JavaScript functions defined for the page, or whatever.

Installing WebDriver:

Since I’m just writing Java code, all I had to do was put the dependency in my Maven POM:


For Chrome support, I also had to download the Chrome driver.

Learning Curve:

My first problem was with the event support.  A perfectly suitable API is there for the tests to use:

  new Actions(driver)
  .moveToElement(canvas, xWithinCanvas, yWithinCanvas)

However, this API is not reliable.  In Firefox, every mouse down, mouse up, or mouse click happens at the center of the element.  So the code above produces a mouse move event to the provided x,y, then a mouse move event to the center of the Canvas, then a mouse down, mouse up, and click all at the center of the Canvas.  That may be fine for a button, but is unworkable for a Canvas, where you want to be able to hover, click, etc. at a specific location.  The situation is even worse in Safari, where it just produces an exception indicating that mouse move events aren’t supported.  Chrome, meanwhile, works fine.

The workarounds are ugly — manually dispatching synthesized mouse events using JavaScript:

driver.executeScript("var evt = $.Event('click', { pageX: " + x +
   ", pageY: " + (y + 55) + " } );" +

Here I’m manually creating a mouse event with pageX and pageY using offsets I happen to know from the provided mouse coordinates.  Yuck!

For Firefox for my purposes, things were a little smoother.  My Canvas event processing is to record the subcomponent of the diagram that the mouse is over on a mouse move, and do something with the selected subcomponent on a click, so I don’t actually use the mouse coordinates on a click event.  Since the move event works OK in Firefox, something like this works for me:

new Actions(driver).moveToElement(
   canvas, xWithinCanvas, yWithinCanvas).perform();

I also ran into some other differences between browsers.  In Firefox I could click a widget that closed a JQuery Mobile dialog and immediate click on the diagram.  In Chrome, the same occasionally failed unless I introduced a slight delay to allow the dialog to get out of the way.

Writing Tests:

Once I got past that, I started writing tests.  Writing one little test was fine, and it was exciting to see it run in the browser.  But I quickly determined that it was going to be incredibly obnoxious to write any substantial quantity of tests by hand.  Too much code, and languages like Java and Ruby don’t describe a user’s interaction with HTML very well.

When I ended up doing was:

  1. Writing a helper class with methods like waitForFooPage(), clickToolbarButtonFoo(), replaceTextIn(id, text), clickElement(id), clickCheckbox(id), confirmModelState(x, y, z), and so on.  All the WebDriver API calls are hidden in the helper class.
  2. Writing an additional JavaScript “QA” file that added a bunch of event listeners to emit (via console.log) calls to the helper class — for instance a click on an a emits a clickElement(id) and a change on an input generates a clickChecbox(id) or replaceTextIn(id, text) and a JQuery Mobile pageShow generates a waitForFooPage() and so on.  Then it adds a listener to the very bottom left-hand corner of the canvas (otherwise unused) to toggle the recording state.

So when that file is included in the page, I can basically record a script, though I still have to copy and paste it into a test file to use.  Still much better than writing tests by hand.  My recorder has some basic page state validation in there (for instance for screens where the links have different content depending on the current state when you load it — like a category browse type page).  But I still have to write any detailed validation by hand.

My tests end up looking like:

helper.clickOnDiagram(393, 412);
helper.replaceTextIn("SomeTextFieldID", "MyNewText");

Running Tests:

I wasn’t too interested in writing a dozen tests as command-line executables and running them one after the other.  I wanted to run one command and have it run all the tests, counting successes and failures, logging stack traces for any failures, and so on.  I really wanted a little GUI to show red and green bars and so on, but it wasn’t worth the time to recreate that.  I ended up writing a little wrapper to run and capture the results of all my individual tests, but it didn’t seem like something I should have to be doing.

The bottom line is that I can issue a command, watch a browser flash through interactions with my app, and get a list of failures and stack traces at the end.  When I want to write a new test I include my extra JavaScript file, click the obscure corner of the Canvas to turn on recording, and then interact with my app as usual.  At the end, I copy and paste the script from the Web Console into a new source file and add any additional validation I want.  Then I can comment out the QA JavaScript and run more tests.

It’s what I wanted, but less flashy, and I had to write a lot of setup code to get there.

Areas for Further Investigation:

The next thing I’d like to try is Geb as a wrapper for WebDriver. It looks like a standard pattern for wrapping all the interesting application actions and state is PageObjects.  My gut reaction is that it would have been way too much work to do that for all the “pages” in my application, but it’s a much more organized approach and better than just scattering various IDs and coordinates through a bunch of individual tests.  I feel like the approach has merit, I just can’t put my finger on the line where the provided value overcomes the necessary work.  Probably if you’re writing all the tests by hand, but I’m not willing to go there anyway.