Learnings from Actor Development

I spent a fair amount of time developing actor-based systems recently, specifically with the Scala Actor library. Regardless of whether you are implementing actors with the Scala library, Akka, Lift or Scalaz, some basic gotchas can present themselves until you get a feel for what you’re doing. Here are some of them that I’ve learned the hard way.

Never Refer Directly to Other Actors

Actors are fragile and can die easily. While you typically create a supervisor with a strategy for how to recreate that actor, any other class with a direct reference to that actor that died now has an invalid reference. If you absolutely must have actors with references to others that do not have a supervisory relationship, use a proxy reference instead – if the actor behind the proxy dies, you only have to replace it in the proxy, not in every actor with that reference. Akka solves this problem nicely with ActorRef, where the reference behind it can be recreated without updating anyone holding the reference.

If You Do Have an Actor’s Reference, Avoid Synchronous Method Calls Between Actors

Regardless of whether your actors are event- (shared thread pool) or thread-based (each actor has its own dedicated thread), avoid having actors make direct method calls on another actor. It introduces concurrency into classes that are designed to avoid that very situation – the receiving actor can be operating on a thread handling a mailbox message at the same time it is dealing with your call. Use blocking or future-based message sends instead, which allows the receiving actor to handle the request through its mailbox on its own thread. Not to harp on the virtues of Akka too much, but the ActorRef type also prevents this kind of behavior.

Write Business Logic in External Idempotent Functions

Testing actors is difficult, particularly those with side effects. If you are in a supervisor hierarchy, the receipt of a message may lead to the creation of child actors that have their own side effects which may be difficult to account for in a test environment. The goal of unit tests is not to test whether actor interaction works, but that the business logic that the actor performs is sound. Externalize your business logic into functions and partial functions that can be tested outside of actors, and use integration tests to prove only that the actors executing that logic behave as expected as part of an end-to-end functional test.

Beware the Thundering Herd

When you start creating structures of actors such as supervisor hierarchies, it can seem simplest to send generic messages that are passed through the tree. However, as actors react and send their own messages, this can lead to event “storms”. This can be addressed using two strategies – 1) use granular messages that target specific events for specific actor instances, and 2) ignore messages of the same type with the same parameter data for a given time period. You can even implement a common trait for all of your actors that gives them the ability to not handle the same message for an externally-configurable period of time. Be judicious in how you use this, though – tune it for the loads of your system.

Garbage Collection

In the case of a supervisor hierarchy that is responsible for configuring servers in a cluster, you may want to implement garbage collecting actors that ensure that each server is pruned of configuration that it currently has but is no longer relevant. The actors in the supervisor hierarchy will take care of that if one was created to represent that particular configuration item, but if no actor already existed to represent that state, only a garbage collector whose role is specifically to clean up a dirty environment can take care of clearing bad data from the target server.

Always Pass Copies in Immutable Messages

Copy any object instance that will be passed in a message, so as to avoid accidentally sharing any state. In almost all cases, you should ensure that your messages themselves are immutable. Dean Wampler and Alex Payne make this point specifically in their book, Programming Scala. This, combined with very granular messages, can seem expensive in terms of resources. But it is worth the cost in memory and performance to ensure that your actor behavior is what you expect at design time.

Semantic Logging

Debugging actors isn’t easy. Typically, you have multiple instances of the same class with asynchronous behavior, so it is difficult to discern flow. Create trace level log output for each actor type that displays specific information about it in a clearly-visible manner. Use line breaks and tabbed indentation to make it readable, but note that doing so can make your log files even larger than they already are. This has an unfortunate side effect of forcing you to be very granular in your log configuration as to what logging level is used – package-level logging may be too much information. It may help to put a timestamp into a message, so you can grep the log for specific messages as they flow through actors. Also, log the timestamp of when the actor received and handled it.

Deteriorating Retry

If your actors have side effects where a required resource (network connection, database access) may not be available or may fail, use deteriorating retry logic to allow the actor to send itself a message to try again in an increasingly longer interval. For a good example of this, go to Gmail, disconnect from all networks and watch as it tries to reconnect in longer and longer timeframes.

Instrument via JMX for Runtime Clarity

Register every actor instance with the JVM’s MBeanServer, and have their supervisors clean up the instrumentation when they die. Yes, this comes at a performance cost, but you can make the registration asynchronous through a future while you perform other tasks in initialization and startup. While you’ll still need to profile the threads involved to find threading issues, having the ability to view actor existence and state in JConsole or VisualVM is a wonderful help in knowing what is happening in your system in production.

Prepare for Race Conditions

As with any asynchronous programming, the timing of actor interactions can be unpredictable. Make your actor interactions recheck state they depend on so that they can reflect an appropriate state of their own. If Actor A needs Actor B to have a specific value for its own state to be appropriate, it should not send only one message to Actor B and assume that the value returned is correct that one time. Keep checking the value (again, possibly with deteriorating retry) until you can be certain you have a correct representation of Actor B’s state.