Many times we find that there are multiple ways to write a piece of code and sometimes the choice may be determined by which implementation executes fastest. We might want to have a shootout between the different implementations to find out which one is fastest. The Java Microbenchmark Harness (JMH) tool can help us get an experimental answer to this type of question. The sbt-jmh plugin makes it very easy to execute JMH tests on Scala or Java code in an sbt project.
toHexString
Suppose we need to implement a method to convert an Array[Byte] to it's hexadecimal representation. There are multiple ways we could do this. A concise approach is to use Scala's formatted string interpolation.
scala> def toHexString(bytes: Array[Byte]) =
bytes.map(b => f"$b%02x").mkString
toHexString: (bytes: Array[Byte])String
scala> toHexString("Scala".getBytes)
res1: String = 5363616c61
String interpolation looks convenient, but maybe we should try a version not using string interpolation to see if there's a cost to using it.
scala> def toHexString(bytes: Array[Byte]) =
bytes.map(b => "%02x".format(b)).mkString
toHexString: (bytes: Array[Byte])String
scala> toHexString("Scala".getBytes)
res3: String = 5363616c61
As a third option, we might consult StackOverflow to see how others solve this problem and maybe get better performance.
scala> def toHexString(bytes: Array[Byte]) = {
| val hexArray: Array[Byte] = Array(
| '0', '1', '2', '3', '4',
| '5', '6', '7', '8', '9',
| 'A', 'B', 'C', 'D', 'E',
| 'F')
| val hexChars = Array.fill(bytes.size * 2)(0.toByte)
| for {
| j <- 0 to bytes.length - 1
| v = bytes(j) & 0xFF
| } {
| hexChars(j * 2) = hexArray(v >>> 4)
| hexChars(j * 2 + 1) = hexArray(v & 0x0F)
| }
| new String(hexChars)
| }
toHexString: (bytes: Array[Byte])String
scala> toHexString("Scala".getBytes)
res1: String = 5363616C61
I know which one I'd choose for readability, but let's see which one performs the best.
JMH
To use JMH via the sbt-jmh plugin, we need to create an sbt project with a project/plugins.sbt file with the following line:
addSbtPlugin("pl.project13.scala" % "sbt-jmh" % "0.2.4")
and enable it in your project in build.sbt:
enablePlugins(JmhPlugin)
Now we need to create a class that contains methods to benchmark. We tell JMH to benchmark a method by using the Benchmark annotation. We can then configure how the method will be tested using the BenchmarkMode and OutputTimeUnit annotations.
// Must not be in default package
package com.chariotsolutions.jmh.sample
import org.openjdk.jmh.annotations.Benchmark
import org.openjdk.jmh.annotations.BenchmarkMode
import org.openjdk.jmh.annotations.Mode
import org.openjdk.jmh.annotations.OutputTimeUnit
import java.util.concurrent.TimeUnit
/* Default settings for benchmarks in this class */
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@BenchmarkMode(Array(Mode.Throughput))
class TestHexString {
@Benchmark
def interpolation: Unit = toHexStringInterp(randomArray)
@Benchmark
def format: Unit = toHexStringFormat(randomArray)
@Benchmark
def stringManip: Unit = toHexString(randomArray)
def toHexStringInterp(bytes: Array[Byte]) =
bytes.map(b => f"$b%02x").mkString
def toHexString(bytes: Array[Byte]) = {
val hexArray: Array[Byte] = Array(
'0', '1', '2', '3', '4',
'5', '6', '7', '8', '9',
'A', 'B', 'C', 'D', 'E',
'F')
val hexChars = Array.fill(bytes.size * 2)(0.toByte)
for {
j <- 0 to bytes.length - 1
v = bytes(j) & 0xFF
} {
hexChars(j * 2) = hexArray(v >>> 4)
hexChars(j * 2 + 1) = hexArray(v & 0x0F)
}
new String(hexChars)
}
def toHexStringFormat(bytes: Array[Byte]) =
bytes.map(b => "%02x".format(b)).mkString
def randomArray: Array[Byte] = {
val a = Array.fill(20)(0.toByte)
scala.util.Random.nextBytes(a)
a
}
}
Note that this class must not be in the default package, otherwise JMH will not run right and the sbt session will die. Also note that I put OutputTimeUnit and BenchmarkMode annotations at the class level to set defaults for all of my benchmark methods. I have three methods that are marked with the Benchmark annotation. These simply call the appropriate toHexString method with a random array of bytes.
Running JMH
We can run our benchmark using jmh:run -i 20 -wi 10 -f1 -t1 in sbt. In this command, -i 20 says that we want to run each benchmark with 20 iterations, -wi 10 says to run 10 warmup iterations, -f 1 says to fork once on each benchmark, and -t1 says to run on one thread. Increasing the number of threads would let us see if the throughput of our benchmark method will scale up. Increasing the number of forks lets us verify performance across multiple JVM instances. If no values are provided, JMH will default to 20 warmup iterations, 20 measurement iterations, 1 thread, and 10 forks. All of these values could be set via annotations in the test code as well. I'm using one fork here minimize execution time, but more than one fork should usually be used for accurate results.
Running this should produce logging output as the test executes. Once it's completed we should see a summary of the results like this:
[info] # Run complete. Total time: 00:01:31 [info] [info] Benchmark Mode Cnt Score Error Units [info] TestHexString.format thrpt 20 63.825 ± 0.863 ops/ms [info] TestHexString.interpolation thrpt 20 62.952 ± 1.090 ops/ms [info] TestHexString.stringManip thrpt 20 1355.426 ± 14.119 ops/ms [success] Total time: 92 s, completed Sep 30, 2015 6:06:51 AM
We can see that there is no difference between string interpolation and using format() directly. The more complicated string manipulation approach is noticeably faster, however. If speed is a high concern when creating the hex string, we clearly should be using that approach instead of the other two.
We've seen an example of using JMH to quickly create and execute microbenchmarks to check out performance characteristics with different implementations. For more in-depth information, check out the sbt-jmh samples. There is also a Jenkins plugin that would allow you to run JMH benchmarks as part of your CI workflow.