Contrasting ideas helps me see what’s special about each idea and what they have in common but maybe in different forms. Different tools, different languages show off different ideas, and while some work best with specific language features there’s always something to help in other languages in my day-job.
So today I’ve been thinking about some models of concurrent programming. The Coursera course on Reactive Programming explored the Actor model and I’m contrasting that with the Active Object pattern from POSA2 which I saw in detail in another Coursera course last year and with Tasks which have quickly become one of the favoured models for concurrent programming in .Net.
All these are ways to move away from the old threads, locks and semaphores model that was previously common and made concurrent programming so difficult, and also hard to scale for modern multi-core hardware.
Active Object Pattern
An Active Object is a function-call interface which packages incoming calls into messages on a queue, which are processed by an internal thread. The Active Object can have state that is modified by each message.
- Function-call interface looks like any other object to caller and to other programmers
- Simple decoupling of caller thread from the actual work. The caller doesn’t have to block while work is done.
- No data is shared between the caller thread and the active object’s thread except the message to simply avoid race conditions.
- The active object’s state is securely hidden from other parts of the system again avoiding any data shared between threads.
- Messages can be processed sequentially, both giving simple mechanism to ensure the order of actions and avoiding sharing issues from doing multiple actions at once.
- Reasonably simple to implement yourself.
- Simple versions of the pattern use a thread for each Active Object (but Active Objects could shared pooled threads, or one Active Object could have several threads to cover large volumes of messages).
- No built-in way to reply or send results back to caller (so separate return queue or future mechanism is required).
Useful as a learning example but probably not useful now we have mature libraries with greater capabilities, but if you are working in an environment or language without a concurrent library still can give a simple way to encapsulate multiple worker threads safely. Having no standard mechanism to return results is perhaps the biggest drawback in using an Active Object.
Task Parallel Library
A Task is an item of work, which will be run for you on another thread, has mechanisms to run other Tasks on completion or for letting the caller know the result either by polling for results, blocking until complete or setting a callback. The Task Parallel Library in .Net (and similar libraries available in other environments) manages a pool of threads on which to run Tasks and a set of queues to order and schedule Tasks.
So the Task is like the message for an Active Object, but this pushes towards having no mutating state, only the results of work being passed to the next step.
- Simple way to create any item of work and schedule it to run.
- Simple way to run Tasks in sequence or start multiple Tasks from one Task.
- Simple way to wait for one or many Tasks to finish.
- Built-in mechanisms to pass results either back to original thread or from one Task to the next.
- Scales well as number of CPU cores increases.
- Common use is to use a lambdas to describe the work which can lead to unintentional sharing of data between the work running on a background thread and the original thread (needs discipline to avoid capturing data in the lambda which is then shared).
- Makes Tasks visible to other programmers, less simple than function-call interface (but can return Task<T> from a library function so other programmers can treat it as just a "future" result without knowing the details).
Tasks have quickly become a very powerful way to break up work. There can be some confusion from a Task in .Net being both the work (wrapping any function/lambda) and the result of the work (the "future" result), but having this single class makes it easier for programmers new to concurrent programming then separate future/promise/task concepts. The power of the library and the ability to chain/compose/wait on tasks makes this approach initially simple to explain and use while scaling well to more complex problems.
The Actor Model is an alternative attempt to describe a complete approach to concurrent programming, getting away from the old model of threads, locks and semaphores. Libraries have been written for a number of languages including Akka for Scala which is the one I’ve seen recently.
Superficially it’s similar to Active Object, each Actor is an object with its own state, receiving a sequential queue of messages. The Actors share a pool of threads and can respond by sending a message back to the caller.
But the Actor Model has thought through how error-handling works with plug-in strategies and escalation of errors. You do not need to handle errors and marshal them back yourself. This step up is similar to the step up in abstraction when we replaced error-codes with exceptions. Instead of handing errors for every single call, you just had to handle them when they happened and handle them at an appropriate point. The code in between could ignore them (apart from simple changes to avoid messing up state if an exception is thrown by a lower-level function). Similarly the Actor model allows errors to be escalated up a hierarchy of actors and handled in the best place. Other actors in between need small changes to avoid breaking state but can otherwise concentrate on clearly performing their own task.
- Simple encapsulation of mutable state avoiding issues with shared data.
- Simple way to avoid blocking calls.
- Messages processed sequentially avoiding issues with sharing data.
- Message model can extend to not just multiple CPU cores but multiple networked machines.
- Fault tolerance, restarting of Actors and hot-code reloading give very high up-time levels for established Actor platforms.
- Few robust Actor Model libraries in common use (there are libraries for Scala, Java, C++, .Net, Python but these ideas are not used in the mainstream communities for these languages).
- Hard to bridge Actors to non-Actor code (need to jump from Actor messages to some other call and callback interface, so Actor and non-Actor code would be separated in separate components and not freely mixed).
- Different guarantees of message order in different Actor libraries may reduce portability.
- Very different mental model needed for most developers to get the best results.
So mechanically the Actor Model is close to Tasks with items of work queued up and run on a managed pool of threads but the mental model is very different with stateful Actors. It looks like this should scale well to multi-core, multi-machine architectures and the fault tolerance could be a big win but it will take a big shift in mind-set from current mainstream programming models.
Moving from procedural code to object-oriented was a hard jump for many at that time. The current wave of ideas from functional programming used to build the Task Parallel Library (and similar libraries in Java and C++11) is slowly taking hold in the mainstream programming community and allowing us to start making use of the hardware on our desks safely. With active Actor Model communities working in several languages it will be interesting to see how well the ideas spread and whether the Actor Model comes to be seen as a strong alternative to Tasks for concurrent programming in the near future.
- Active Object pattern: http://www.dre.vanderbilt.edu/~schmidt/PDF/Active-Objects.pdf
- Pattern-Oriented Software Architecture Volume 2 (POSA2): http://www.cs.wustl.edu/~schmidt/POSA/POSA2/
- .Net Task Parallel Library: http://msdn.microsoft.com/en-us/library/dd460717(v=vs.110).aspx
- Coursera: Principles of Reactive Programming: https://www.coursera.org/course/reactive
- Actor Model: http://en.wikipedia.org/wiki/Actor_model