rubymine

Posted by anton
on Monday, July 27, 2009

scarred, but not defeated by vim, i decided to try a recently released beta of rubymine 1.5, and it’s been great so far.

massive disclaimer: i have not tried anything else for ruby/rails coding except for vim.

unlike a similar offering from netbeans, rubymine is a standalone install that seems to reuse a lot of the existing intellij idea codebase.

it was great experience out of the box – i pointed it at the local svn working copy and it verified all the installed gems (even though i am fortunate enough to run on cygwin, it recognized them all).

it has great rake support, and all rake tasks run without any modifications. however, for script/server in cygwin evironment i had to replace -e"STDOUT.sync=true;STDERR.sync=true;load($0=ARGV.shift);" with -e"STDOUT.sync=true;STDERR.sync=true;RAILS_ROOT='/cygdrive/c/project/root/dir';load($0=ARGV.shift);" in ruby arguments field. once the server runs, it also displays its log with proper color-coding.

i still run script/console from cygwin command line, since rubymine does not do readline support (i use ctrl+L, ctrl+R, tab completion, ctrl+e/ctrl+a and other goodness quite a lot in my irb).

things to love
  • color-coding, auto-indenting (including color-coding of matching brace/bracket/do-end block)
  • ctrl+shift+n/ctrl+n for finding files and getting around; with alt+f1 to show current file in different contexts/views
  • ctrl+f12 for current file structure
  • visible spaces (otherwise the ruby coding standards make the code look too squeezed to me)
  • ctrl+click when mousing over (jump to all kinds of things, including template names, css style names – all of it very nicely integrated)
  • ctrl+/ for toggling comments
  • rails project structure
  • parsing my stacktraces and linking them to the source code
  • autocompletion, although i do not find myself using it too much
  • all the usual things that idea has – svn and git support out of the box (and shelving works, just like in idea, in case you have to work with svn)
  • pretty sweet diff (ctrl+d) that rivals tortoisesvn visual diff
  • ctrl+shift+up/down to move the current line
  • ctrl+d to clone selection
  • shift+delete to delete the whole line
  • ctrl+shift+f12 to go full-screen
  • alt+number to toggle between tool windows
  • file structure tool window (alt+7)
  • alt+f7 to find usages
  • alt+shift+f10 to bring up run menu
  • shift+f10 to run current run
  • simple refactorings (introduce method, rename variable/method, etc).
  • it is pretty damn stable, and occasional errors do not kill the IDE.
things to improve
  • unlike netbeans, it is not a full-blown ide with ruby support, so some things that exist in idea are missing (notably, database support, some of the team communication and code sharing features, and other bells and whistles)
  • ctrl-q for docs is a bit wonky (frankly, i’d rather jump to matching place in the online api docs – it gives me context)
  • code folding fscks stuff up sometimes
  • still do not know how to jump to matching brace/do-end block
  • svn switch could not be found
  • ctrl+shift+f10 to run current test (and any other ad-hoc run tasks) does not work on cygwin, unless you do RAILS_ROOT trick above

i have not tried all the other stuff, like haml support, cucumber support, rspec, and rspec w/ drb.

overall feel is nice and polished – most things just work out of the box (unlike the frankenstein monster that eclipse can be sometimes – truly a Windows of IDEs).

i do believe in using “idiomatic” shortcuts with an IDE, thus i did not try any of the “compatibility” keyboard modes.

for now, i do not see myself coming back to vi for rails development – for a hundred bucks, rubymine is a great development tool.

ode to vi

Posted by anton
on Friday, July 24, 2009

i recently had to do a small rails project. so i did what i usually do in these cases – fired up the easiest IDE that runs anywhere – vim.

now let me reminisce a bit – i’ve been using vi on and off ever since i got my hands on linux in 1997, and i have not learned much beyond the basics over the years. it works the same way on half a dozen unixes i’ve used it on, as well as on cygwin and macs; even dreaded beasts like mks toolkit provide it. it is an indispensable cross-platform tool.

it does not require much horsepower, and it fits well with the back-to-command-line ideology of rails.

if all you are using it for is editing occasional file or two, it does everything you need out of the box. throw in basic syntax highlighting, auto-indenting, split windows, buffers – and you have enough to survive.

oh, did i mention the macho factor? it takes some effort to tame the menu-less monster of an interface with a barrage of keystrokes that appear as magic incantation to others.

vi has a peculiar physical effect – i often surprise myself when i remember certain editing commands, but at the same time i am utterly unable to remember them when standing behind someone, advising them what to type – my fingers twitch, but my higher brain functions are not firing.

curiously, this reminds me too much of some of the mainframe folks i’ve seen, or even an occasional SAP jockey. consider it a compliment – there is a lot of power in short mnemonic commands compared to drill-down menus. yes, every powerful system must have a command line, but it must degrade gracefully. with vi the discoverability of interface is pretty much non-existent, and the learning curve is steep.

my current theory is that the muscle memory vi creates leads to a particular form of addiction which explains its appeal (and perhaps the religious zeal).

when i work in a context of a project, where i constantly need to bounce around different files, vim UI starts to break down: built-in buffers support is inadequate. the editor needs to have a concept of a project i am currently working on, and, ideally, the framework i use.

i know that the usual answer is customize, customize, customize – and in this respect it follows very much a linux tradition – if you are 15yrs old, and have tons of time, and only one machine, you can spend days crafting that perfect setup that is just right. having done that a number of times, i have learned that it is just not worth my time – I switch computers often, work on client sites, bounce between different teams, so i want stuff that is reasonably workable out of the box.

i can take it easy and install some basic plugins – fuzzy finder to give me files i want fast, nerd tree for filesystem navigation, rails for rails integration, tComment for toggling comments on blocks of text.

but now i need to manage them across several machines, and perhaps i do not want to spoil my vi muscle memory that can cripple my vi-fu on that hp-ux 11.11 when i come across it (yeah right).

so perhaps i will draw the line and use some other ide for project work, leaving vi for simpler stuff.

to add some substance to this post, a few significant lines from my .vimrc:


syntax enable
filetype on
filetype plugin on

set tabstop=4
set shiftwidth=4
set expandtab

autocmd FileType ruby set shiftwidth=2|set tabstop=2|set expandtab

set number
set ai
set si

and some of the commands i use often (besides the usual navigation/editing/searching ones that are in my muscles, but refuse to be articulated):

  • :e! to reload the file i am currently editing
  • :retab i really hate those tabs
  • :ls to look at open buffers (the listing is a pain to read, trying to remember what those little symbols mean and matching numbers to file names)
  • :e filename to open a file in a new buffer
  • :bd to close the current buffer
  • :e# to bounce between two last buffers (how can i cycle between all the buffers ala alt+tab?)
  • ctrl+w followed by s or p to start splitting windows, then bounce around them with double ctrl+w or ctrl+w and arrows
  • ctrl+w followed by <number>+ or <number>- to shrink or grow the split windows
  • ctrl+w followed by q to close current window
  • zz to center screen on current line, accompanied by zt op and zb ottom
  • % to jump to matching brace
  • o and O to insert the line and switch to editing mode
  • I to insert at the beginning of the line (that i always forget, unlike its companion A)
  • m + letter to place a named mark, ' + letter to jump to the beginning of the line, ` + letter to exact position of the mark
  • >> and << are much easier for indenting, as opposed to my muscle-memorized number + > + enter
  • shift+v or v to do visual selection
  • once you visually selected stuff, you fold it with zf

finally, i find the whole :tabnew business utterly useless and insulting.

what i do miss in addition to project structure navigation, is easily looking up/jumping between methods, code block folding that follows language semantics, and decent tabs.

jamis buck blog entries on the subject were really informative and inspiring (especially the comments).

GoRuCo 2009

Posted by anton
on Saturday, June 06, 2009

gotham ruby confrence is the nyc’s own ruby conference organized by nyc ruby users group. it was my second time attending it. this post is an attempt to organize my own notes, as well as an attempt at feedback that i think i owe to the speakers and the organizers.

i really like the smaller focused gatherings like this with around a hundred-plus attendees – there is definitely a community spirit there, since a lot of the folks are local and already know each other.

there is also more focused discussion, since the background of the people in the room is similar. a lot of the culture is shared, a lot of the values are implied, and the conversation zips along nicely. i have stopped attending local java groups precisely for the lack of this common background – the topics are too broad, the backgrounds are too different, and it takes a lot of effort to communicate ideas.

i am a bit uneasy about my own relationship with the ruby community – i am not really contributing, nor am i doing paid ruby work. i also am a bit weary of the monoculture (look at them macboys and macgirls!) that tends to re-invent the wheel way too often. but i am there for the excitement, for the bright-eyed kids that tinker and create things – this energy is infectious and i feed off it. it is inspirational and energizing.

so why am i going to the conferences like these? in addition to the energy boost i mentioned above, there is also the trivia of learning about tools, projects, approaches; getting the feel for the zeitgeist, where things are heading, what folks are thinking. deep down inside i am always looking for the “blow your mind” experience, something that can turn a familiar topic on its head, something that can make me discover things i have never suspected existed.

GoRuCo had a nice balance – things technical and detailed, and also approaches/techniques/principles. the reject conf at the end – a series of quick lightning talks – was an icing on the cake, stuffing you full of references, pointers, tips that you could take home and work through at your own pace.

Gregory Brown: Where is Ruby really heading?

more of a book report, talking about different versions of ruby out there. for anyone following the community, none of it was a surprise. no hard data either, just his personal anecdotal experience. this part of the talk was more suited to a short user group presentation.

there were a few nice tips and personal war stories related to moving between 1.8.6, 1.8.7 and 1.9.1, unicode, side-by-side installs and very basic crude techniques to code for different versions.

these are typical growing pains – everything from the compatibility issues between versions to the curse of the system-wide install that makes one go an extra mile in order to run different apps under different ruby versions on the same box (i always preferred the semi-structured self-contained java jdk installs and jars controlled by the classpath).

there were some props to jruby (it is a real distro, not a hack to reach out for when all else fails!), mentions of ffi in jruby that allowed gregory’s project to run on windows.

Eleanor McHugh: The Ruby Guide to *nix Plumbing

this could have been a great talk, but it seems like eleanor was really hungover, so instead it was a very disconnected series of ramblings on the general subject of unix and coding. very poor delivery, and at times plain embarrassing.

only towards the end she managed to find the message for the talk, which was “you can code against kernel internals using ruby, since it makes it much easier. do not automatically assume that you need to write in C for performance – try ruby first.”

she also highlighted ruby community’s respect for bare-metal – the ability to tinker, the taste for small simple tools that do the job well, and the affinity for unix. this is exactly what attracted me to the language in the first place.

there were a lot of references that at times seemed like name-dropping: ngnix, beej’s guide to network programming, beej’s guide to unix interprocess communication, ruby/dl, duby, event machine, c10k problem, advanced unix programming book

Dan Yoder: Resource-Oriented Architecture With Waves

dan briefly talked about his waves framework that tries to present a simple DSL around HTTP and resource representation in a REST fashion.

the meat of the talk that was interesting to me dealt with REST in general, its differences from MVC, resource-oriented architecture, self-describing data, returning links to other data inside of data to aid discoverability. he mentioned how a resource identifier (e.g. URL) should not specify representation (i.e. do not add .xml to specify that you need an XML document back), but rely on client’s Accept* headers to negotiate representation (caveat being that CDNs like Akamai do not currently care about these headers, so you will always get the same content).

there was also a mention of the fact that sometimes one can view HTTP protocol as something dealing with a distributed hash table (DHT) using get/put/delete operations (with post reserved for everything else).

RDF and freebase were also mentioned.

i have a strange relationship with REST: i get it on the technical level, i sort of get it on the architecture level, but it did not fully “click” yet, perhaps due to the fact that i do not have enough practical implementations under my belt. i should go through the restful webservice book again and play around.

good talk overall, but lacking concrete examples that could have brought things more into focus.

Jake Howerton: Into the Heart of Darkness: Rails Anti-Patterns

a bit of a disappointment, since the title promised so much. i usually really liked the anti-pattern talks, since you learn nothing when things work as expected – the real learning comes when things break, and you are forced to dig in and figure out why. plus these talks also give you an idea of applicability of certain techniques, which is really a required counterpart for all patterns to begin with (beware of dartboard-driven design).

instead we were treated to a very few amusing short code snippets, but no larger patterns in the sense of fowler’s refactoring book or even rails-specific patterns.

he did mention cucumber, reek and metric_fu and some general well-known testing techniques. i liked his term “flight check” for the smoke tests that run before deployment to prod, and a notion of sandbox test environment where mocks are replaced with real classes that do destructive things without impacting the real world (like sending emails).

i also like his term “irb-driven design” for something that was copy-pasted from the exploratory irb session into the production code.

there was also some treatise of legacy code, but i think dhh’s talk on the subject was much better.

overall it was a fun, light talk that was well-received. i blame the deceiving title and the lack of focus for initial feeling of disappointment; it felt like jake was simply talking about things he does and prefers to do during development, without specific overarching theme in mind.

Sandi Metz: SOLID Object Oriented Design

sandi stole the show; she set the level that none of the speakers matched and showed what it really means to have a solid, gripping presentation that is lucid, focused, well-prepared, and superbly delivered.

she talked about SOLID design principles: Single Responsibility, Open Closed, Liskov Substitution, Interface Segregation, Dependency Inversion which all boil down to managing dependencies.

she went through the fowler’s value of design argument, and then iteratively went through a refactoring example, invoking the SOLID principles along the way.

i think the most impressive for me was how well the whole talk came together, how the arguments were presented to support the refactorings, and how well it got into my head (almost uncanny, similar to the effect the best books in head first series have).

some of the principles she kept bringing up – using the rate of change as the indicator for splitting the functionality; refactor in small steps to let the design emerge – not because you know, but because you want to find the design; red-green-refactor; mock at the seam; only mock classes i own; those that change often should depend on those that change less often.

this once again brings up the importance of developing a language to discuss design (e.g. patterns) – i think this is a crucial step for every practitioner, when something intuitive and personal (e.g. a vague code smell) becomes something that you can articulate and communicate to others.

she also mentioned uncle bob, micronaut, steve freeman and nat pryce mock objects site

Benjamin Stein: Building Cross Platform Mobile Apps with Ruby & PhoneGap

the main promise of PhoneGap is quite compelling – build cross-platform apps on the phones using javascript while taking advantage of the native features (vibrate, storage, accelerometer, sound, gps, etc).

it was a fine presentation, with a story to tell and with a perspective that put all the low-level details in context. makes me itchy to get my hands on some phone development; i was also quite impressed with their adoption of latest standards (e.g. HTML5).

Yehuda Katz: From Rails to Rack: Making Rails 3 a Better Ruby Citizen

rails3 perspective from the horse’s mouth – quite detailed look at how rails is trying to be less opinionated in its choice of frameworks to work with, and how it exposes its internals for others to integrate with (orm, rack, js frameworks). some very interesting examples of design decisions and the overall future path of the framework.

this was a bit tedious, but quite informative talk.

Lightning Talks

i really like these, since most people do not have enough material for a long talk; in fact some of the main talks earlier should probably have been half their size.

the highlights include sunlight foundation and data.gov plug, a great pair of fast talks by aman gupta on google-perftools that he tweaked to work with ruby and used on some real-world code and joe damato on tweaking the thread performance in ruby 1.8 (in retrospect, these two talks were something i wish eleanor would have done).

concurrency: part 2 - actors

Posted by anton
on Friday, September 19, 2008

message-passing

if shared memory makes concurrent programming difficult, what else is there that an app developer can use?

one way of representing coarse-grained parallelism is through message-passing concurrency.

the idea is pretty simple – the only way to share state between isolated components is through message passing. what happens inside the components, is their own business.

there is no global state of the whole system, unlike in shared memory program that behaves like a giant state machine. sending and receiving of messages happens concurrently together with the computations by the components themselves. this approach is much easier for the developer to reason about and it maps easily to multiple CPUs and multiple machines.

a lot of the enterprise workflow-like processing falls into this model. it is basically a pipeline of worker pools, configured independently and possibly running on separate machines. a unit of work travels through the pipeline, as it is being handed off from one set of workers to another.

actors

one of the common implementations of message-passing concurrency is actor model. i’ll take a liberty to interpret this model to fit my own developer needs, even though i am butchering decades of academic research in the process.

actors model represents well multiple computers talking to each other using network packets, components exchanging messages using JMS, independent processes or threads talking to each other using messages – basically anything where isolation is possible and interactions are loosely coupled.

usually each actor has a mailbox associated with it (often represented with a queue), where messages are stored until an actor processes them. messages map well to physical artifacts in the real world – they are immutable, and only one actor can handle a given message at a time.

actors are connected with channels; individual actors are isolated from each other – a failure of an actor does not affect another actor. no other actor can change the inner state of a given actor – the only way to communicate is through message-passing.

messaging is usually asynchronous, but synchronous messaging could also be useful.

depending on implementation, beware of deadlocks if you are using synchronous messaging. another issue to keep in mind is order of messages – depending on implementation it might not be preserved.

while some advocate “everything is an actor” approach, and I get dizzy imagining the possibilities, the pragmatic app developer in me is living in the real world among existing apps. in this case actors work best as a library for the existing language.

erlang

although i shied away from “actors everywhere” approach above, erlang is the most successful implementation that actually does just that. it is not just the language, but a whole platform that transparently runs actors within a single process as well as across multiple machines.

as this topic is heating up, one should at least read the book and play with the language. after all, a language that doesn’t affect the way you think about programming is not worth knowing, and erlang is enough of a paradigm shift to kickstart your concurrency thinking.

Tibco BusinessWorks

as i’ve described before, BusinessWorks (BW) is an example of an integration DSL that happens to use actors.

given an integration process (e.g. receive a message on JMS queue A, enrich it from a database, transform it, and send it to a JMS topic B), you describe it using BW language constructs. then it becomes an actor definition that you can deploy on an engine (really a managed JVM instance). there could be multiple engines running on multiple machines, and each engine can have many process instances (aka actors in our terminology) running inside of it. a process instance gets created from a process definition whenever a new message arrives on a queue (mailbox in actors’ terminology).

a scheduler inside the individual engine takes care of creating process instances (there could be thousands) and scheduling them on the worker threads.

all of this mapping happens at deploy time, as a developer you do not worry about it.

actors talk to each other using message-passing, thus your actor implementation does not even have to worry about threads or concurrency – you just express your integration logic. you could use shared memory, but it would not scale well, since you are limited to one JVM; nor would it be natural, since you have to use explicit language constructs; this language support for immutability is very convenient, as i have mentioned earlier

if you use a JMS server to pass messages around, it becomes a sort of a mailbox, holding messages for you in the queue. each incoming message would eventually spawn an instance of the actor, feeding it the message as an argument. multiple instances of the same actor can read from the same queue, thus achieving load-balancing.

once you recall that jms supports -filters- selectors you have the actors implementation that curiously matches something like erlang

note that this is not fine-grained parallelism; your units of work are more coarse-grained and very loosely coupled, but fundamentally, the model is the same, and it scales like crazy achieving massive throughput.

even if you do not end up using BW, you can implement this model by hand relatively easy.

so what if i wanted more fine-grained and more efficient support for actors in my language of choice (provided i am not using erlang)?

ruby

revactor networking library includes actors implementation (also see this great intro to actors by Tony Arciery), but i have not seen a more generic approach yet.

note that ruby is really hampered by lack of proper threading support; this is why jruby guys are in a much better shape if they were to roll their own actors implementation.

scala

this is probably the most mature implementation i’ve seen (see this paper). they take advantage of scala language features to simplify the syntax and unify synchronous and asynchronous message-passing. individual actors are represented as threads or more light-weight primitives that get scheduled to run on threads in the thread pool. it is type-safe, but it relies on convention to make sure you do not mutate your messages.

although i could see where representing actors as threads could be too heavyweight for some tasks, in the case of java and scala, your mileage may vary (see this presentation from Paul Tyme).

groovy

given language features like closures and general simpler syntax, together with the fact that it sits on top of JDK that includes java.util.concurrent, one would imagine that groovy would be a perfect candidate for actors implementation. however, the only thing i found so far was groovy actors, and it seems to have been dormant for a while.

python

i do not know enough about python’s memory model and its implementation, but i suspect is suffers from the same “feature” as ruby – i.e. global interpreter lock, which means that it won’t be able to scale to multiple CPUs (and, similar to ruby, jython that builds on JVM comes to the rescue).

the only thing i’ve looked at so far is stackless python, which is a modified version of python that makes concurrency easier (see this tutorial by Grant Olson that also includes actors). it introduces tasklets aka fibers, channels, and a scheduler among other things.

java

this is where i am a bit surprised – i do not see a good drop-in-a-jar-and-go actors library blessed and used by all. there seems to be some research projects out there, but i want something that works for me now and supports in-memory zero-copy message passing, sync/async messaging, and type safety. i am OK with abiding by conventions instead of compiler checking things for me.

i suspect that the reason for this is the fact that some rudimentary form of actors can be implemented relatively easy using existing concurrency libraries, and this approach is intuitive without putting labels on it.

nevertheless, this is what i found:

  • jetlang is a port of a .NET library and looks at Scala actors for inspiration. it is still quite beta, but it looks promising
  • kilim (from one of the principle engineers of weblogic server) still seems to be a bit too much of a research project for my taste, but the theory behind it is sound

and there is a number of research projects out there:

bottom line

actors is a great abstraction, and “good enough” version of it is easy to implement – think about it, consider it, use it!

it helps if your language/platform supports concurrency primitives to build upon. this includes true threading support that scales to many CPUs, although we could also benefit from a standard fibers implementation, since they are more lightweight than typical threads and would allow creation of a large number of actors that later could be mapped onto threads for execution.

each language could benefit from a well thought-out actors library, since it would push developers in the right direction.

it is not right for everything though – it might not be fine-grained enough, it might not map well to problems that rely on ordering of messages or presence of any other state across multiple actors or multiple messages.

to be continued

what is on the horizon that is worth noting? what are some of the interesting research topics? what have we forgotten over the years? what other heuristics/patterns and libraries could be immediately useful?

concurrency: part 1

Posted by anton
on Friday, September 12, 2008

true to the purpose of this blog, below is an attempt to organize my (admittedly very superficial) experience with concurrency.

my 10GHz CPU

you probably noticed that moore’s law does not really apply anymore when it comes to CPU speed. if it were holding up, we would have had 10GHz CPUs by now, but for half a decade we haven’t really moved past 3GHz.

that is to be expected for the current generation of hardware – the gains have to happen elsewhere. for a little while we’ll get performance boost due to increase in size and speed of the caches that would improve locality, but in the long run it seems that multiple CPUs is where the improvements are to be mined from (this also includes specialized CPUs like Cell and GPUs in general).

this means that more and more people will have to think about their applications in terms of parallel processing. this also means that optimizations will become more and more important for those workloads that cannot be parallelized and therefore will be stuck on a single CPU (for a good introduction see The Free Lunch Is Over at Dr. Dobb’s Journal).

the bottom line is that as an app developer you cannot ignore the problem any longer; to make matter worse, there is no automagical solution in the nearest future that would make your application take advantage of multiple processors.

my concurrency story

in past decade most of the stuff i’ve worked with had some sort of coarse-grained parallelism; the rest was taken care of by the underlying framework.

i started with a unix philosophy of small programs connected via pipes, each performing a simple task. a little later came in fork and signals. things were simple, and OS took care of everything.

then came the web – it was mostly stateless with the database doing all the heavy lifting when it came to shared state. we just added boxes if we needed to grow. in ETL multi-box, multi-cpu setup was also natural, and the tools were designed to conceal concurrency; same goes for integration, where concurrency was at the level of data flows, which made things rather simple.

it is only in the past year or so when i had to really dive in deeper into relatively low-level concurrent development with java.

my dog-eared copy of Java Concurrency in Practice has proved to be quite an indispensable reference. the book is a bit uneven, and the editor should have spent more time on it, but you get used to it. it is a great practical resource, especially in the presence of so much confusing and incomplete information online.

jsr-166 introduced in java5 (and the primary subject of the book) is such a productivity boost; being a part of JDK, it is a big step forward towards letting mere mortals like me really embrace concurrent programming.

i find myself using Executors convenience methods all the time: it is so easy to create a pool, and then just feed it Callable instances, getting a Future instance as a handle in return. if more flexibility is needed, i use ThreadPoolExecutor. Queues are great as a communication channel for any sort of producer/consumer scenario, anything that requires message-passing or any sort of other work hand-off. Atomics are also great – i do not have to think twice when implementing counters or any other simple data structures.

most of the time i do not even have to work with threads or low-level synchronization primitives directly – they are buried deep within the libraries. i have less nightmares, since i do not have to touch volatile as often.

at some point i’ve read both editions of doug lea’s book, but i was always hesitant to recommend it; i’d rather rely on libraries that abstracted all of this away. now that java.util.concurrent has been out for 4 years, and Java Concurrency in Practice has become a bestseller, there are no more excuses.

one thing i’ve learned though – when you think you got this stuff, you discover a whole new class of problems that make you realize how complicated all of this really is, and how truly difficult it is to write larger concurrent programs.

you really, really have to think hard about how you share your objects, how you compose them and operate on them. you need to really understand how the language and the runtime work (i find myself checking JLS quite often). this is where good OO practices like encapsulation become even more important, since you are not just risking maintenance overhead, but you are risking the correctness of your program.

now, i always told myself that programming is not an exercise in manliness. i am just an app developer; i want to ship a working code that solves customer’s problems, not spend countless hours trying to reason through non-blocking algorithms just because i decided to do something non-trivial with ConcurrentHashMap. at the same time i do not want to waste my precious CPUs, so what am i to do? shouldn’t this stuff be easier? is there something I am missing?

threads considered harmful

actually, there is no problem with threads per se; the problem is with shared state.

in a normal sequential program you only worry about the logic as it is unfolding before you – one statement after another, in order. in a concurrent program that uses threads and shared state in addition to all your usual problems you also have problem of the non-deterministic state: since at any point in time any thread can come in and mess with your data, even between the operations you considered atomic before (like counter++), the number of states that your program can be in suffers a combinatorial explosion. this makes it really hard to reason about its correctness.

your code becomes brittle, sacrificing failure isolation – one misbehaving thread can potentially harm the whole runtime (a good analogy is BSOD caused by a device driver).

in addition, things don’t compose – a transfer operation performed by a thread-safe customer between two thread-safe accounts is not going to be automatically thread-safe.

to make matter worse, some of the errors remain hidden when run on commodity 1-2 CPU IA32 hardware, but as the number of CPUs grow, or their architecture becomes less restrictive to help with concurrency, things start to break down.

for more thorough discussion see The Problem With Threads by Edward A. Lee and Cliff Click’s We Don’t Know How To Program…

now what?

a natural reaction is to forget about fine-grained parallelism and offload the hard stuff onto someone else. after all, i am an app programmer, i care about business problems, what’s all of this yak shaving about?!

in some cases we can get away with firing up individual processes to take advantage of multiple CPUs. most of the time though it means that the problem has been pushed further down the stack, which often turns out to be the database. this is the route that rails folks went, and it certainly was pragmatic approach at the time (now that they are forced to deal with efficiency, threading is back in the picture. for discussion of issues see Q/A: What Thread-safe Rails Means).

if you can get away with using individual processes, go for it (see google chrome) – you get failure isolation, you get immutability in respect to other processes (it won’t be as easy for another process to mess with your data), and as an additional benefit, you get to use all the standard tools that the OS has when it comes to managing and troubleshooting processes (as opposed to using often incomplete and idiosyncratic tools for thread management that your runtime platform of choice offers – if any).

still, as we need more and more fine-grained concurrency and as the level of concurrency increases (it is not just a handful of CPUs now, but dozens, and even hundreds), one process per task becomes too expensive (context switching, high costs of creating a new process, memory overhead, etc). so we are back to some sort of lightweight thread-like primitives running within the same process, sharing some common resources.

most of the popular languages/platforms these days provide some sort of threading and shared memory support. but as outlined above, they suffer from some fundamental problems. there are some practical things at various levels of abstractions that that can help: low-level constructs within the language/platform itself, tooling, and higher-level libraries/mini-languages

language

  • make immutability easier – take note of functional languages, but also make it practical. in java case, for instance, it could mean extending immutability to some core data structures (see scala collections) or making it easier to tag an instance as immutable (see ruby’s freeze; this reeks of boilerplate though) – this way errors will be caught at compile time
  • consider sharing data only through explicit, ideally checked at compile-time, means. thus by default nothing is shared, and in order to make something shared you have to explicitly tag it as such. ideally, this would also come with some sort of namespace support, thus limiting mutations to a sandbox (see clojure for reference)
  • make language safer to use when it comes to exposing shareable state (this is when something like static becomes a problem – see Shared Data Considered Harmful for an example that applies to concurrency)

tooling

  • static analysis tools might help, but we need to give them a bit more than just an infinite number of states. findbugs for instance, supports concurrency annotations and something like chord could also be promising. this stuff is complex though and there are limits to static analysis (and i do not even want to bring up formal proofs using process calculi)
  • i want more support from the platform to help me troubleshoot lock contention, deadlocks, cpu-intensive threads, and other concurrency-related infrastructure. sun’s hotspot has some rudimentary stuff in place, but i want more things out of the box (azul claims that they have always-on built-in tools in their product, but i have not played with them)
  • speaking of azul, i need to study them more. although perceived as a boutique solution, they are addressing issues that everyone will be facing in just a few years. seems like they ported sun’s hotspot to their hardware which allowed them to achieve scaling by automatically replacing synchronization with optimistic concurrency which scales much better. incidentally, this truism about optimistic concurrency has been obvious to database folks for decades

libraries/mini-languages

one of the approaches is to focus on your problem domain and come up with a library/language that solves your particular problem and abstracts away concurrency. web frameworks (J2EE, rails), or ETL tools, or even databases are all examples of such approaches.

this is where my interest lies as an app developer – how can i make concurrent programming easier for me, the layman.

the bottom line is that if we insist on using low-level synchronization primitives, it would be really hard to paper over the underlying complexities. right now there is no generic universal approach that will simplify concurrent programming. so at this point a pragmatic programmer is left with patterns, supporting libraries, and heuristics.

to be continued

there are some patterns (for the lack of a better word) that i found to be helpful in dealing with concurrency; there is also some stuff on the horizon that promises all sorts of benefits – is there really a silver bullet? but also there is plenty of stuff that has been with us for decades, and i would be the first one to bow my head in shame, acknowledging my ignorance.

some sort of pun on ruby, java, and gluing goes here. i got nothing.

Posted by anton
on Thursday, June 21, 2007

speaking of gluing things, below is a jruby script i cobbled together to get a backup of an archaic snipsnap instance.

as you might have guessed, it was just an excuse to play with ActiveRecord-JDBC, since all it really takes is just connecting to the database and pulling one table out.

still, it was fun and just a few lines of code, although you had to install ActiveRecord gem as well as ActiveRecord-JDBC gem (not to mention adding mysql jdbc driver in the classpath). as an excuse, i did not want to deal with low-level jdbc machinery, nor did i want to install another gem to get ruby's mysql connectivity.

although it takes an ungodly amount of time to startup, it works just fine. here's the best of both worlds - java's jdbc type4 driver prowess and ruby's terse and readable way of expressing yourself (plus the quick feedback of edit-run-swear-edit cycle):

snipsnap does boast xml-rpc support, but it only provides a meager pingback ability.

it's for gluing things

Posted by anton
on Thursday, June 21, 2007

i have installed xml-rpc plugin for trac and played a bit with it. it is amazing how simple it is to use - just install the plugin, add a user to the basic auth passwd file (in my case Apache checks there first, then goes to Active Directory), give this user XML_RPC privilege in trac admin, and there you go:


#!python
import xmlrpclib
server = xmlrpclib.ServerProxy("http://username:password@host/trac/login/xmlrpc")
print server.wiki.getPage("WikiStart")

just imagine the possibilities that make trac an application platform - easily create pages/attachments or edit entries in response to events (we have scripts that do certain things for us, and then we also have to go into the wiki and document things manually), create pages in response to incident tickets as they are being worked on, or functional specification workflow process, etc, etc.

xmlrpc libraries are built into python and ruby (php and even javascript, not to mention java) - so there is nothing really that stops one from running this thing on a stock installation of a given language (non-privileged account on a unix box, for instance).

here's a simple script i put together to backup a snapshot of trac wiki to a local hard drive; it is using ruby, since my python skills are nil (i do like the python xmlrpc api much more though - it seems to be a lot more convenient to use and succinct, compared to xmlrpc4r):

#!ruby
require 'xmlrpc/client'
require 'fileutils'

class Wiki
  def initialize
    @client = XMLRPC::Client.new2('https://user:password@server/trac/login/xmlrpc')
  end

  def method_missing(m, *args)
    @client.call('wiki.' << m.to_s, *args)
  end
end

wiki = Wiki.new
pages = wiki.getAllPages

index = '<html><body><ul>'

pages.sort.each do |p|
  puts 'getting ' << p
  FileUtils.mkpath p

  txt = wiki.getPage p
  html = wiki.getPageHTML p

  open(File.join(p, 'index.txt'), 'w') { |f| f.puts txt }
  open(File.join(p, 'index.html'), 'w') { |f| f.puts html }

  attachments = wiki.listAttachments p
  attachments.each do |a|
    puts "\t" << 'getting attachment ' << a
    content = wiki.getAttachment a
    open(a, 'wb') { |f| f << content }
  end

  index << '<li>' << p << '<ul>'
  index << '<li><a href="' << p << '/index.html">html</a></li>'
  index << '<li><a href="' << p << '/index.txt">txt</a></li>'

  if !attachments.empty?
    index << '<li>attachments</li>'
    index << '<ul>'
  end

  attachments.each do |a|
    file = File.basename a
    index << '<li><a href="' << p << '/' << file << '">' << file << '</a></li>'
  end

  index << '</ul>' if !attachments.empty?

  index << '</ul></li>'
end

index << '</ul></body></html>'

open('index.html', 'w') { |f| f.puts index }

i am not using multicall, since it only takes a few minutes to run against our trac instance.

more information on the wiki xml-rpc interface is here. seems like trac does not implement listLinks as well as listBackLinks and some macros do not render properly when retrieving pages via getPageHTML.

also, since tags (which we use heavily) are an extension to trac, xml-rpc api does not support them. perhaps a weekend project to add that in?