Last night I finished reading Chapter 6 of Joe Armstrong's thesis on building fault-taulerant software. The chapter was primarlily about putting together alot of what we have learned into some concrete examples. The chapter showed code relating to the various behaviours and supervision models that we had learned about previously. I have to say that I am a big fan of examples. I think they help to solidify my understanding of the material.
Chapter 6 is the last chapter that we will be reading in CS 527. Having about 12 years of industry experience under my belt, I can't help but think about why I had never heard of Erlang before. Is it because I have worked primarily on projects dealing with standard line of business type apps in companies whose core businesses were not technology related? Hmm.
I also wonder about the adaptability of these concepts to other langauges. Could the concepts such as supervision trees that are first class citizens be adapted to a language like C# and maintain the same sort of effect?
Thursday, November 5, 2009
Wednesday, November 4, 2009
Pipeline, Geometric Decomposition, Data Parallelism
Three of the readings for tomorrow's class are on parallel algorithm strategy patterns. We discussed a few of these in yesterday's class as well. The three for discussion tomorrow are the pipeline, geometric decomposition and data parallelism patterns.
The pipeline pattern is probably the pattern that I had heard/read the most about previously. This pattern is as you would expect; stages of the pipeline are executed concurrently to exploit the parallelism present.
The geometric decomposition pattern breaks the input problem space/data structure into chunks and processes the chunks in parallel. Some of the concerns with this pattern are in regards to chunk granularity and data dependencies between chunks.
The data parallelism pattern writeup could us a bit more detail :). Data parallelism exploits situations where the same computation is performed on different input data. These computations can then run concurrently.
Having read numerous parallel patterns over the past few weeks, I think I have done myself a bit of a disservice by not having programmed on any large scale parallel systems. I think having some real world experience in this area would go a long way with understanding these patterns.
The pipeline pattern is probably the pattern that I had heard/read the most about previously. This pattern is as you would expect; stages of the pipeline are executed concurrently to exploit the parallelism present.
The geometric decomposition pattern breaks the input problem space/data structure into chunks and processes the chunks in parallel. Some of the concerns with this pattern are in regards to chunk granularity and data dependencies between chunks.
The data parallelism pattern writeup could us a bit more detail :). Data parallelism exploits situations where the same computation is performed on different input data. These computations can then run concurrently.
Having read numerous parallel patterns over the past few weeks, I think I have done myself a bit of a disservice by not having programmed on any large scale parallel systems. I think having some real world experience in this area would go a long way with understanding these patterns.
Tuesday, November 3, 2009
Armstrong Thesis (Chapter 5)
In Chapter 5 of Joe Armstrong's thesis, we are introduced to programming fault tolerant systems. Joe begins the chapter discussing the concept of falling back to simpler tasks if what ideally should be done cannot. The idea is that simpler tasks mean simpler code and in theory should be able to succeed even when the more complex tasks cannot. I think Erlang works well with this sort of methodology. Attempting to do this type of fallback in other languages such as C/C++/Java/C# is not always as straightforward as it could be.
Joe also discusses the concept of supervisors in more detail. We were introduced to them in earlier chapters, but in this chapter we began to dig in a little deeper and look at AND vs. OR supervisors. Child tasks monitored by AND supervisors are always restarted together. The premise being that the tasks work in conjunction so if one fails they should all fail. The tasks for OR supervisors are handled independently.
The chapter concludes with discussion on what exactly constitues an error. In my view, an error is essentially any type of problem that was unaccounted for in the code. I think Joe's definition is much more poetic, but in the end I think our philosophy is essentially the same. Again, another good read. On to chapter 6...
Joe also discusses the concept of supervisors in more detail. We were introduced to them in earlier chapters, but in this chapter we began to dig in a little deeper and look at AND vs. OR supervisors. Child tasks monitored by AND supervisors are always restarted together. The premise being that the tasks work in conjunction so if one fails they should all fail. The tasks for OR supervisors are handled independently.
The chapter concludes with discussion on what exactly constitues an error. In my view, an error is essentially any type of problem that was unaccounted for in the code. I think Joe's definition is much more poetic, but in the end I think our philosophy is essentially the same. Again, another good read. On to chapter 6...
Task Parallelism, Recursive Splitting, Discrete Event
Part of today's class discussions will be about the three patterns identified in the title of this post. These patterns fall into the parallel algorithm strategy class of patterns. Essentially, the patterns address issues on how we exploit parallelism to better solve the problem at hand.
Being somewhat new to the world of parallel patterns, I was able to get a basic understanding of the discussion points of each pattern and the rationale behind them. I will say though that as the number of patterns that we read increases, it is sometimes more unclear as to how certain patterns are differentiated from others. I am hoping the discussion in class will be lively and I will pick up a deeper understanding of each of these three patterns.
Being somewhat new to the world of parallel patterns, I was able to get a basic understanding of the discussion points of each pattern and the rationale behind them. I will say though that as the number of patterns that we read increases, it is sometimes more unclear as to how certain patterns are differentiated from others. I am hoping the discussion in class will be lively and I will pick up a deeper understanding of each of these three patterns.
Thursday, October 22, 2009
Armstrong Thesis (Chapter 4)
For the last class, we read chapter 2 of Joe Armstrong's thesis. Today we will be discussing chapter 4. (Note: Chapter 3 which was skipped was all about the Erlang language syntax ... I skimmed it over, but after reading Chapter 4 it is obvious that I really need to go back and read Chapter 3 in more detail)
In Chapter 4, Joe introduces us to a couple of concepts. Specifically, the chapter discusses some high level concepts for programming with Erlang such as keeping concurrent and sequential code as separate processes, the overall ideology behind error handling and the concept of letting a process die on error while another process monitors for errors (workers/supervisors).
The chapter wraps up with a brief discussion on intentional programming. I thought this discussion, albeit brief, was interesting. The general concept is that your code should be easily understood and one of the examples given was about a lookup method which could be used in three different contexts, depending on the developer's intent. Armstrong points out that a better approach would be to make this three separate methods. That way it is clear what was meant by the calling function. I think there is definitely some merit to this approach and I really don't think it is specific to Erlang.
As time permits, I would like to write a program using Erlang. I like the ideas we've seen so far and find it interesting. Any ideas for a good project to get started?
In Chapter 4, Joe introduces us to a couple of concepts. Specifically, the chapter discusses some high level concepts for programming with Erlang such as keeping concurrent and sequential code as separate processes, the overall ideology behind error handling and the concept of letting a process die on error while another process monitors for errors (workers/supervisors).
The chapter wraps up with a brief discussion on intentional programming. I thought this discussion, albeit brief, was interesting. The general concept is that your code should be easily understood and one of the examples given was about a lookup method which could be used in three different contexts, depending on the developer's intent. Armstrong points out that a better approach would be to make this three separate methods. That way it is clear what was meant by the calling function. I think there is definitely some merit to this approach and I really don't think it is specific to Erlang.
As time permits, I would like to write a program using Erlang. I like the ideas we've seen so far and find it interesting. Any ideas for a good project to get started?
Dense Linear Algebra, Graph Algorithms, Monte Carlo
No, contrary to how the title sounds, I am not moving to Vegas with some surefire "get-rich-quick" gambling scheme. For one of this week's CS527 lectures we have moved our discussions from structural patterns to computational patterns. The items in the title are all computational patterns for parallel programming.
Having done very little programming (with emphasis on the very little part) for large parallel systems these patterns were new to me. The patterns made sense and after reading them I think they were clearly documented. I particularly liked reading about the trade-offs and forces in play such as data movement costs vs. computation time for the dense linear algebra pattern, data structure choices for graph algorithms and threading/SIMD allocation in monte carlo simulations.
Hopefully in tonight's discussion there will be students who have work with these patterns. I look forward to hearing about their experiences.
Having done very little programming (with emphasis on the very little part) for large parallel systems these patterns were new to me. The patterns made sense and after reading them I think they were clearly documented. I particularly liked reading about the trade-offs and forces in play such as data movement costs vs. computation time for the dense linear algebra pattern, data structure choices for graph algorithms and threading/SIMD allocation in monte carlo simulations.
Hopefully in tonight's discussion there will be students who have work with these patterns. I look forward to hearing about their experiences.
Monday, October 19, 2009
Armstrong Thesis (Chapter 2)
As part of the reading list for CS527 this semester, our class has recently began reading the thesis of Joe Armstrong. The thesis is about building reliable software assuming that there will be hardware and/or software errors present.
For this week's reading we focused on Chapter 2 which introduces the architectural model being proposed. The chapter begins by introducing the original problem domain (telecom switching systems requiring essentially 24/7 operation with the ability to be updated while running). Oh wait ... did I mention that we are also going to assume that there will be hardware & software errors and that the system has to be able to work around such issues? Not an easy task!
Armstrong approaches the problem through the use of what he calls "concurrency oriented programming languages" (COPLs). COPLs typcially consist of and must support the concept of processes to serve as isolation boundaries. Communication between processes is achieved through message passing. Message delivery is guaranteed in-order between two endpoints, but is not guaranteed reliable. The other key feature is that processes must be able to detect failures of other processes and react accordingly. Armstrong introduces the Erlang language as one such implementation.
The remainder of the chapter discusses library support of Erlang for things like stable storage (storage that survives a crash) and various behaviours that assist with building these systems.
In subsequent weeks we will be reading additional chapters from Armstrong's thesis. If chapter 2 is any indication, this seems like it will be a good read.
For this week's reading we focused on Chapter 2 which introduces the architectural model being proposed. The chapter begins by introducing the original problem domain (telecom switching systems requiring essentially 24/7 operation with the ability to be updated while running). Oh wait ... did I mention that we are also going to assume that there will be hardware & software errors and that the system has to be able to work around such issues? Not an easy task!
Armstrong approaches the problem through the use of what he calls "concurrency oriented programming languages" (COPLs). COPLs typcially consist of and must support the concept of processes to serve as isolation boundaries. Communication between processes is achieved through message passing. Message delivery is guaranteed in-order between two endpoints, but is not guaranteed reliable. The other key feature is that processes must be able to detect failures of other processes and react accordingly. Armstrong introduces the Erlang language as one such implementation.
The remainder of the chapter discusses library support of Erlang for things like stable storage (storage that survives a crash) and various behaviours that assist with building these systems.
In subsequent weeks we will be reading additional chapters from Armstrong's thesis. If chapter 2 is any indication, this seems like it will be a good read.
Subscribe to:
Posts (Atom)