Many times we find that there are multiple ways to write a piece of code and sometimes the choice may be determined by which implementation executes fastest. We might want to have a shootout between the different implementations to find out which one is fastest. The Java Microbenchmark Harness (JMH) tool can help us get an experimental answer to this type of question. The sbt-jmh plugin makes it very easy to execute JMH tests on Scala or Java code in an sbt project.
toHexString
Suppose we need to implement a method to convert an Array[Byte]
to it's hexadecimal representation. There are multiple ways we could do this. A concise approach is to use Scala's formatted string interpolation.
scala> def toHexString(bytes: Array[Byte]) = bytes.map(b => f"$b%02x").mkString toHexString: (bytes: Array[Byte])String scala> toHexString("Scala".getBytes) res1: String = 5363616c61
String interpolation looks convenient, but maybe we should try a version not using string interpolation to see if there's a cost to using it.
scala> def toHexString(bytes: Array[Byte]) = bytes.map(b => "%02x".format(b)).mkString toHexString: (bytes: Array[Byte])String scala> toHexString("Scala".getBytes) res3: String = 5363616c61
As a third option, we might consult StackOverflow to see how others solve this problem and maybe get better performance.
scala> def toHexString(bytes: Array[Byte]) = { | val hexArray: Array[Byte] = Array( | '0', '1', '2', '3', '4', | '5', '6', '7', '8', '9', | 'A', 'B', 'C', 'D', 'E', | 'F') | val hexChars = Array.fill(bytes.size * 2)(0.toByte) | for { | j <- 0 to bytes.length - 1 | v = bytes(j) & 0xFF | } { | hexChars(j * 2) = hexArray(v >>> 4) | hexChars(j * 2 + 1) = hexArray(v & 0x0F) | } | new String(hexChars) | } toHexString: (bytes: Array[Byte])String scala> toHexString("Scala".getBytes) res1: String = 5363616C61
I know which one I'd choose for readability, but let's see which one performs the best.
JMH
To use JMH via the sbt-jmh plugin, we need to create an sbt project with a project/plugins.sbt
file with the following line:
addSbtPlugin("pl.project13.scala" % "sbt-jmh" % "0.2.4")
and enable it in your project in build.sbt
:
enablePlugins(JmhPlugin)
Now we need to create a class that contains methods to benchmark. We tell JMH to benchmark a method by using the Benchmark
annotation. We can then configure how the method will be tested using the BenchmarkMode
and OutputTimeUnit
annotations.
// Must not be in default package package com.chariotsolutions.jmh.sample import org.openjdk.jmh.annotations.Benchmark import org.openjdk.jmh.annotations.BenchmarkMode import org.openjdk.jmh.annotations.Mode import org.openjdk.jmh.annotations.OutputTimeUnit import java.util.concurrent.TimeUnit /* Default settings for benchmarks in this class */ @OutputTimeUnit(TimeUnit.MILLISECONDS) @BenchmarkMode(Array(Mode.Throughput)) class TestHexString { @Benchmark def interpolation: Unit = toHexStringInterp(randomArray) @Benchmark def format: Unit = toHexStringFormat(randomArray) @Benchmark def stringManip: Unit = toHexString(randomArray) def toHexStringInterp(bytes: Array[Byte]) = bytes.map(b => f"$b%02x").mkString def toHexString(bytes: Array[Byte]) = { val hexArray: Array[Byte] = Array( '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F') val hexChars = Array.fill(bytes.size * 2)(0.toByte) for { j <- 0 to bytes.length - 1 v = bytes(j) & 0xFF } { hexChars(j * 2) = hexArray(v >>> 4) hexChars(j * 2 + 1) = hexArray(v & 0x0F) } new String(hexChars) } def toHexStringFormat(bytes: Array[Byte]) = bytes.map(b => "%02x".format(b)).mkString def randomArray: Array[Byte] = { val a = Array.fill(20)(0.toByte) scala.util.Random.nextBytes(a) a } }
Note that this class must not be in the default package, otherwise JMH will not run right and the sbt session will die. Also note that I put OutputTimeUnit
and BenchmarkMode
annotations at the class level to set defaults for all of my benchmark methods. I have three methods that are marked with the Benchmark
annotation. These simply call the appropriate toHexString
method with a random array of bytes.
Running JMH
We can run our benchmark using jmh:run -i 20 -wi 10 -f1 -t1
in sbt. In this command, -i 20
says that we want to run each benchmark with 20 iterations, -wi 10
says to run 10 warmup iterations, -f 1
says to fork once on each benchmark, and -t1
says to run on one thread. Increasing the number of threads would let us see if the throughput of our benchmark method will scale up. Increasing the number of forks lets us verify performance across multiple JVM instances. If no values are provided, JMH will default to 20 warmup iterations, 20 measurement iterations, 1 thread, and 10 forks. All of these values could be set via annotations in the test code as well. I'm using one fork here minimize execution time, but more than one fork should usually be used for accurate results.
Running this should produce logging output as the test executes. Once it's completed we should see a summary of the results like this:
[info] # Run complete. Total time: 00:01:31 [info] [info] Benchmark Mode Cnt Score Error Units [info] TestHexString.format thrpt 20 63.825 ± 0.863 ops/ms [info] TestHexString.interpolation thrpt 20 62.952 ± 1.090 ops/ms [info] TestHexString.stringManip thrpt 20 1355.426 ± 14.119 ops/ms [success] Total time: 92 s, completed Sep 30, 2015 6:06:51 AM
We can see that there is no difference between string interpolation and using format()
directly. The more complicated string manipulation approach is noticeably faster, however. If speed is a high concern when creating the hex string, we clearly should be using that approach instead of the other two.
We've seen an example of using JMH to quickly create and execute microbenchmarks to check out performance characteristics with different implementations. For more in-depth information, check out the sbt-jmh samples. There is also a Jenkins plugin that would allow you to run JMH benchmarks as part of your CI workflow.