Randoop project ideas

Contents:

Do you love writing unit tests to find bugs in your programs? Or, would you prefer that the tests were written for you automatically?

The Randoop tool writes tests for you.

Writing tests is a difficult and time-consuming activity, and yet it is a crucial part of good software engineering. If you are interested in improving the state of the art in test generation and bug finding, read on.

Our system, Randoop, automatically generates unit tests for Java classes. For example, here is a test case generated by Randoop that reveals an error in the Sun JDK 1.5:

public static void test1() {
  LinkedList l1 = new LinkedList();
  Object o1 = new Object();
  l1.addFirst(o1);
  TreeSet t1 = new TreeSet(l1);
  Set s1 = Collections.unmodifiableSet(t1);
  // This assertion fails
  Assert.assertTrue(s1.equals(s1));
}

The ultimate goal is a testing system that at the push of a button creates a solid JUnit test suite providing good coverage of arbitrary classes, with no manual effort on your part. Randoop is a step in the direction of this vision and it already does a very good job, especially with library classes (such as java.util). Randoop is already used at companies like ABB and Microsoft, and on open-source projects.

Our project ideas are of several kinds:

Required skills: Most of these projects require facility with Java. The other requires knowledge of C. You should be very comfortable with the idea of unit testing. You should know how unit testing helps you and where it can hinder you, and you should be passionate about improving the tradeoff. You should be willing to dive into and understand a moderate-sized codebase.

If you want to discuss these projects, feel free to contact the Randoop developers at randoop-developers@googlegroups.com. We look forward to hearing from you!

Test generation projects

Better choice of methods

Randoop selects a method to test at random. Uniform random selection is the simplest, but not always the best, option. This project would augment Randoop with different, and potentially novel strategies for selecting the next method to test. Here are a few example strategies; the project would involve fleshing out the details.

Mimic real call sequences

Randoop's generated test sequences are random and may not be characteristic of real use. One problem is that there may be restrictions on usage of a class: close may only be called after open, or arguments must have particular relationships, such as the non-array argument to binary search must appear in the array argument. Another problem is that random invocations may not create interesting data structures, such as very large ones or ones with particular qualities such as duplicate elements.

Randoop does a very good job of generating tests for library classes (such as java.util.*) that have few restrictions on their calling sequences and parameters. Classes that make up programs often have a number of restrictions on both the sequences of calls and their parameters.

We want to extend the system to provide better coverage for such classes. One way to do this is to do analysis (either dynamic or static) that determines valid and invalid sequences of calls. For example, Randoop could observe what call sequences real code makes, perhaps using a tool like Palulu. Then, bias could be introduced into the generator to make call sequences that are more realistic, and therefore better test the most commonly-used and important parts of the code.

A possible downside is reduced diversity in tests. A research question is whether mimicking real executions is worthwhile overall.

Better choice of arguments

Suppose that Randoop wants to create a call to a method with a parameter of type T1. Currently, Randoop selects an object of type T1 at random. Randoop can do better. Here is one idea; other approaches are possible.

Randoop could determine what fields a method will read, and then chose an object for which those fields have been recently side-effected. As an example, when obtaining a point to pass to the getX() method, a point produced by the setX() or move() method is more interesting than one produced by the setY() method.

As another example, Randoop could choose objects that are most likely to achieve additional coverage in the called method, perhaps because the values will cause particular branches to be taken.

Creating independent tests

The goal of this project is to make Randoop output independent unit tests, rather than tests that must be run in a specific order.

Randoop makes test sequences out of methods that may have side effects. This can cause running one test to affect the outcome of a subsequent, putatively independent, test.

For example, suppose that method m1 modifies a global variable, and method m2 reads it. Further suppose that test t1 includes a call to m1, and test t2 includes a call to m2. The result of m2 (and, thus, whether test t2 succeeds) depends on whether test t1 has already been run in the same JVM. All of the tests are entangled, which makes it difficult to run a subset of them, to understand them, or to debug failures.

Ideally, all of the tests in a test suite should be independent, in that they can be run in any order.

Approach 1: Undo state changes

One approach is to undo state changes: for each global field that a Randoop test sets (directly or indirectly), the test should reset it at the end of the test.

This requires determining which fields the test changes and restoring them to their original values.

A research question is whether this degrades the fault-finding ability of the test suite. Setting global fields may change the behavior of subsequent tests — is that a positive or a negative? It would also be interesting to exclude the fields from being modified by Randoop, via the --omitmethods and --omit-filed-list command-line options. How much does that degrade Randoop's fault-finding ability?

Approach 2: convert a set of interdependent tests into independent ones

The goal of this project is to convert the Randoop output, which is currently a set of interdependent tests, into a set of independent tests.

Suppose Randoop outputs 100 small unit tests. An idea is to split each tests into the part that depends on other tests, and the independent part. Now move the dependent parts into a new test. The result is 101 tests: 100 of them are even smaller than before, but are completely independent of one another, and the last one is relatively large but captures all the dependencies.

Either a static or a dynamic analysis could perform this task.

We would like to convert this sequence of interdependent tests into a set of tests that are independent of one another and can be run in any order, or any subset of them can be run.

Background: A test consists of alternating actions and observations:

a1
o11
o12
...
o1m_1
a2
o21
o22
...
o2m_2
...

Each action is a method invocation, that Randoop chose because Randoop believed that the action method would affect the state of some object. Each observation is a method invocation, that Randoop chose because Randoop believed that the observer method would NOT affect the state of any object but would reveal information about it. The test oracle checks the result of each method call (actions and observers) against an expected value that was obtained by running the method at test generation time.

Input: n tests that might depend on one another Output: n independent tests that may be run in any order (each of these is a subset of one of the original tests), plus one big test that concatenates all n of the original tests. The big test must be run in a fresh JVM or sandbox, or at least as the last test of the suite. The big test may seem problematic -- too hard to understand and debug -- but it is no worse than the original suite, and at least it is honest in putting all the possible effects together. I suspect that slicing, delta debugging, or other test case minimization approaches will be extremely effective on the big test for those situations where it fails but none of the little tests does. One could imagine putting only a subset of the original tests into the big test suite, but including everything is more likely to catch serendipitous interactions among rarely-called methods. We could empirically evaluate whether that benefit is worthwhile.

An analysis can compute 2 distinct properties of a test case:

Only the latter property matters: An independent test that can be run in any order, must have no side effects on global (static) state. We could compute the property via a static analysis, or by executing the tests to see what happens.

Static analysis seems cleaner to me. Compute what action methods can write to static fields. For each test t, construct a new unit test that is just like t, but it omits all actions (& their associated observers) that side-effect static fields. Now, those unit tests can be run in any order. Also concatenate all omitted actions (or, more likely, all actions whatsoever) into one big test that must be run in a new JVM.

Dynamic analysis: In a new JVM, run all tests and compute oracles. For each test t, run (in a new JVM) t followed by all tests. If any test does not satisfy its oracle, then say that the test t has side effects on global state. (This isn't quite right: The effect of test t may be masked by the effect of the first test, for example.) Retain all side-effect-free tests as unit tests. (Verify that they don't affect one another by running them all in a new JVM, in random order; do this (say) 5 times.) Concatenate all other tests (or maybe all tests whatsoever) as one big test that must be run in a new JVM, or at least as the last test of the suite.

Questions: What % of tests are separable? This will affect whether the technique is able to create independent tests, or those end up empty and the algorithm's output is just the big hairball.test. That would be no worse than the current situation, and at least the user would know that interdependent tests are a problem.

Flakiness

Randoop can call APIs whose value changes from run to run of tests, and this can cause flaky tests (tests that spuriously fail). Examples of such APIs are ones that retrieve the current date, the username, etc. Two ways to mitigate such flakiness are:

EvoSuite follows approach #1 so far, and you can see its source code for the APIs it mocks. Approach #2 would avoid the need for a dedicated runtime library.

Robustness

Randoop executes methods under test with random inputs, and this can cause usability problems. Code under test may exhaust memory or other resources to the point that impedes Randoop from making progress, or even causes Randoop itself to crash. Making Randoop more robust in the face of these kinds of failures would greatly improve the tool's usability and attract more users.

One approach is to create a wrapper around Randoop: a process that that monitors test generation and can terminate/restart the generator if it fails to make progress or aborts unexpectedly. For more details on the wrapper idea, see Finding Errors in .NET with Feedback-Directed Random Testing (Section 3).

Parallelizing Randoop

This project would modify Randoop so that it can run on multiple machines. This would greatly increase the efficiency of the system, by distributing the generation effort across multiple processors. The test generation techniques that Randoop uses are amenable to parallelization: Randoop could spawn multiple test generation engine instances, each with a different random seed.

There are a number of challenges to overcome in parallelizing Randoop. Two examples:

A downside of this project is that generation speed is not the most important problem for Randoop; effort would probably be better invested elsewhere.

Additional required skills: Knowledge of Java APIs for spawning processes, creating streams, reading/writing files, would also be helpful.

Side effect analysis

Enhance Randoop to utilize a side effect analysis that indicates which methods are pure (perform no side effects). Here are two distinct benefits:

  1. Benefit 1: avoiding useless calls in tests.

    Ordinarily, Randoop assumes that a method call may change any object that is passed to it. For example, Randoop considers x to be different after
      x = foo();
    
    than after
      x = foo();
      y = bar(x);
    
    and Randoop will try to test the x values produced by both sequences. By ignoring the x value produced by the second sequence, Randoop could spend half as long to produce equally good tests. Also, Randoop's tests would be easier to understand because they would be shorter and would not contain mysterious, useless calls to pure methods.
  2. Benefit 2: more test assertions.

    When Randoop creates regression tests, the tests contain assertions about the objects that the test creates. For instance, Randoop might output
      assertTrue(someObject.itsField == 22);
    In addition to reading fields of an object, Randoop could invoke observer methods. For instance, it could output
      assertTrue(someObject.observerMethod() == 22);
    This is only possible for pure methods &mdash methods that have no side effect and return the same value every time they are called on the same arguments. If Randoop is told which methods are pure, it can use them as observers and can generate much richer and more sensitive assertions in its regression tests.

The goal of this project is to enable Randoop to utilize pure methods for both of the purposes outlined above.

There are some other enhancements that would make this project even more compelling.

User interface and user experience projects

Test cases with debugging clues

Debugging is difficult. When a test fails, it can be very time-consuming to understand the source of the failure in order to fix the code (or the test). For example, consider the following test generated by Randoop:

public static void test1() {

  ArrayList l = new ArrayList(7);
  Object o = new Object();
  l.add(o);
  TreeSet t = new TreeSet(l);
  Set s = Collections.synchronizedSet(t);
  // This assertion fails
  Assert.assertTrue(s.equals(s));
}

This test reveals an error in Sun's JDK (version 1.5). It shows a short sequence of calls leading up to the creation of an object that is not equal to itself! Unfortunately, it is difficult to tell simply by looking at the test what is the source of the failure.

This project would augment Randoop with the ability to provide debugging clues: comments within a failing test that provide potentially-useful facts about the failure and can help the user with debugging. For example, an improved version of the above test may look something like this:

public static void test1() {

  // Test fails for any capacity >= 0.
  // Test throws IllegalArgumentException for any capacity <0.
  ArrayList l = new ArrayList(7);

  Object o = new Object();

  // Test passes when o is a Comparable.
  l.add(o);

  TreeSet t = new TreeSet(l);
  Set s = Collections.synchronizedSet(t);

  // This assertion fails
  Assert.assertTrue(s.equals(s));
}

The new test provides several clues that can help the user avoid time debugging. The key observation is that the test passes if we use a Comparable object in the call l.add(o). Further investigation would reveal that the TreeSet constructor should not accept a list with a non-comparable object in it but does, and this leads to the error.

To output tests with debugging comments, Randoop would have to be modified in two ways:

In theory this sounds simple, but there are several issues that make the problem interesting: creating a set of variations of a test is non-trivial, and synthesizing test executions into useful comments, are both non-trivial tasks.

Integration with Eclipse

Today, Randoop is a command-line program. There are many ways in which the tool could be integrated into an IDE like Eclipse.

The most basic way would be to add menu items, or short-cut buttons, to Eclipse that invoke Randoop on a set of classes, perhaps walking the user through a wizard-like series of steps to specify the classes under test and any non-standard generation parameters. The resulting JUnit classes could automatically be added to the Java project.

There are more interesting, but also more open-ended and challenging ways of integrating Randoop into a development environment. For example, you could create an Eclipse plugin that continuously runs Randoop on the class that the user is currently editing, with the results of Randoop's exploration displayed in a non-intrusive way into the source code editing window itself. Imagine, for example, if as you type code into Eclipse, a thin bar on the side shows which lines of code have been tested by Randoop and which ones have not by displaying the line numbers in green or red (much in the same way some coverage tools do), or a small symbol appears next to a method definition to flag the fact that Randoop may have revealed an error in the method. There are many exciting possibilities in integrating a test generation tool like Randoop with an IDE.

A downside of this project is that UI is less important than features: programmers who are sophisticated enough to appreciate Randoop are comfortable enough with the command line.

Additional required skills: You should have experience with (or be able to quickly learn) Eclipse's plugin infrastructure.

Test generation for other languages, such as C

Many test generation techniques have been proposed and evaluated in the context of Java. Randoop works for Java and C#. C programs have rather different characteristics, starting with the fact that C is a procedural language whereas Java is an object-oriented one. These different abstraction mechanisms raise interesting research questions, such as whether previously-proposed techniques work well and what new techniques can work even better, by taking advantage of the unique characteristics of C. In some ways this makes the task easier and in some ways harder.

Another reason to focus on C is that there is a wealth of important software that would be greatly improved by better tests. This codebase has been largely untapped by previous work, so there should be some low-hanging fruit as well as deeper and harder problems.

The project is to adapt the Randoop testing approach to C code, and evaluate it. Other languages than C would also be interesting.

Additional required skills: In addition to familiarity with C, it is helpful to know, or be able to quickly learn, the C system interface. For example, the fork call is likely to come in very handy.

Other projects

These projects (some of which are small, and others of which are quite ambitious and would make research contributions) would be a good way to get started with Randoop development and make a contribution.