Posted on Jul 01, 2011 by

Learnings from Actor Development

I spent a fair amount of time developing actor-based systems recently, specifically with the Scala Actor library. Regardless of whether you are implementing actors with the Scala library, Akka, Lift or Scalaz, some basic gotchas can present themselves until you get a feel for what you're doing. Here are some of them that I've learned the hard way.

Never Refer Directly to Other Actors

Actors are fragile and can die easily. While you typically create a supervisor with a strategy for how to recreate that actor, any other class with a direct reference to that actor that died now has an invalid reference. If you absolutely must have actors with references to others that do not have a supervisory relationship, use a proxy reference instead - if the actor behind the proxy dies, you only have to replace it in the proxy, not in every actor with that reference. Akka solves this problem nicely with ActorRef, where the reference behind it can be recreated without updating anyone holding the reference.

If You Do Have an Actor's Reference, Avoid Synchronous Method Calls Between Actors

Regardless of whether your actors are event- (shared thread pool) or thread-based (each actor has its own dedicated thread), avoid having actors make direct method calls on another actor. It introduces concurrency into classes that are designed to avoid that very situation - the receiving actor can be operating on a thread handling a mailbox message at the same time it is dealing with your call. Use blocking or future-based message sends instead, which allows the receiving actor to handle the request through its mailbox on its own thread. Not to harp on the virtues of Akka too much, but the ActorRef type also prevents this kind of behavior.

Write Business Logic in External Idempotent Functions

Testing actors is difficult, particularly those with side effects. If you are in a supervisor hierarchy, the receipt of a message may lead to the creation of child actors that have their own side effects which may be difficult to account for in a test environment. The goal of unit tests is not to test whether actor interaction works, but that the business logic that the actor performs is sound. Externalize your business logic into functions and partial functions that can be tested outside of actors, and use integration tests to prove only that the actors executing that logic behave as expected as part of an end-to-end functional test.

Beware the Thundering Herd

When you start creating structures of actors such as supervisor hierarchies, it can seem simplest to send generic messages that are passed through the tree. However, as actors react and send their own messages, this can lead to event "storms". This can be addressed using two strategies - 1) use granular messages that target specific events for specific actor instances, and 2) ignore messages of the same type with the same parameter data for a given time period. You can even implement a common trait for all of your actors that gives them the ability to not handle the same message for an externally-configurable period of time. Be judicious in how you use this, though - tune it for the loads of your system.

Garbage Collection

In the case of a supervisor hierarchy that is responsible for configuring servers in a cluster, you may want to implement garbage collecting actors that ensure that each server is pruned of configuration that it currently has but is no longer relevant. The actors in the supervisor hierarchy will take care of that if one was created to represent that particular configuration item, but if no actor already existed to represent that state, only a garbage collector whose role is specifically to clean up a dirty environment can take care of clearing bad data from the target server.

Always Pass Copies in Immutable Messages

Copy any object instance that will be passed in a message, so as to avoid accidentally sharing any state. In almost all cases, you should ensure that your messages themselves are immutable. Dean Wampler and Alex Payne make this point specifically in their book, Programming Scala. This, combined with very granular messages, can seem expensive in terms of resources. But it is worth the cost in memory and performance to ensure that your actor behavior is what you expect at design time.

Semantic Logging

Debugging actors isn't easy. Typically, you have multiple instances of the same class with asynchronous behavior, so it is difficult to discern flow. Create trace level log output for each actor type that displays specific information about it in a clearly-visible manner. Use line breaks and tabbed indentation to make it readable, but note that doing so can make your log files even larger than they already are. This has an unfortunate side effect of forcing you to be very granular in your log configuration as to what logging level is used - package-level logging may be too much information. It may help to put a timestamp into a message, so you can grep the log for specific messages as they flow through actors. Also, log the timestamp of when the actor received and handled it.

Deteriorating Retry

If your actors have side effects where a required resource (network connection, database access) may not be available or may fail, use deteriorating retry logic to allow the actor to send itself a message to try again in an increasingly longer interval. For a good example of this, go to Gmail, disconnect from all networks and watch as it tries to reconnect in longer and longer timeframes.

Instrument via JMX for Runtime Clarity

Register every actor instance with the JVM's MBeanServer, and have their supervisors clean up the instrumentation when they die. Yes, this comes at a performance cost, but you can make the registration asynchronous through a future while you perform other tasks in initialization and startup. While you'll still need to profile the threads involved to find threading issues, having the ability to view actor existence and state in JConsole or VisualVM is a wonderful help in knowing what is happening in your system in production.

Prepare for Race Conditions

As with any asynchronous programming, the timing of actor interactions can be unpredictable. Make your actor interactions recheck state they depend on so that they can reflect an appropriate state of their own. If Actor A needs Actor B to have a specific value for its own state to be appropriate, it should not send only one message to Actor B and assume that the value returned is correct that one time. Keep checking the value (again, possibly with deteriorating retry) until you can be certain you have a correct representation of Actor B's state.

  • Marc
    What makes actors fragile? Is there something specifically about Actor that makes them so or is it that they often represent things that naturally come and go?
  • Kai Wähner
    Hey Jamie,

    thank you for sharing your experiences with actors.

    After reading your learnings, I wonder what your conclusion is about using actors? Will you use them again or go back to the Thread programming model (Java improved the API with JDK 5/6/7, thus it is easier - but still complex - to use)?

    I think that actors have a better abstraction level. You do not have to concern about shared memory, locks, and so on. Debugging is difficult as it is with Threads, so that's no obstacle.
    I would like to use actors (either Scala API or even the Java or Scala API of Akka) in my next project, if management permits it :-)

    Are there any reasons why you would not use actors instead of the Thread API?

    Best regards,
    Kai Wähner (Twitter: @KaiWaehner)
  • Jamie Allen
    Actors "die" when an uncaught exception occurs on their execution thread. This is by design, and is explained well in the "let it crash" paradigm espoused by Akka creator Jonas Bonér. The idea is that you cannot predict all possible scenarios where something can go wrong, so make your system fault tolerant by being able to recover appropriately no matter what happens. This is managed through how you "link" actors at runtime so that they notify others when they are dying. Akka also provides you with supervisory strategies for what to do when a linked actor dies: a) recreate only the one that died, or b) kill all other actors it is supervising and recreate all.
  • Jamie Allen
    I would definitely use Actors again, but only where the situation warrants it. Actors are composable abstractions, allowing the developer to dictate logic flow via asynchronous interaction and guaranteed single-threaded access to shared state. When you have many consumers all attempting to access state at the same time, encapsulating that state in actors is safer than using multiple locks in several classes that could block one another.

    I would not use an actor just to prevent using a lock or synchronized block in my code. Only once my design shows the interleaving of locks would I consider using an actor implementation. When I can reason about my locks/semaphores/mutexes/etc, then I am perfectly happy to stay with them.

    Note that actors can also reduce latency in a system where work stealing is used to delegate tasks among a pool of actors. A dispatcher can have many actors working for it to handle load, and when one is finished it will send a message with the next task rather than using a less efficient strategy such as round robin, which doesn't take into account latency in a specific actor's execution of a task. Akka has a work-stealing dispatcher you can use in v1.1, and a ForkJoinDispatcher appears to be on the way for v2 soon.
  • Eurekin
    Would You mind writing a book about it? :D
  • Jamie Allen
    Would that I had the time. :) But if there's anything specific you need to know, don't hesitate to ask. I can also be reached via twitter at @jamie_allen.
  • Jamie Allen
    Phillip Haller has book about to reach print on this topic.