When you have to solve a problem, in order to make an efficient and effective solution you must choose the right tools to help you in the process of building it. As programmers, the languages we choose determine a big part of the solution and they will influence the way we reason about it, so it’s really helpful to understand what are the kind of problems that a language aims to solve before starting to use it. Following that line, this post tries to summarize the key features of Clojure that make it preferable over other languages under some conditions, based on what Clojure designers and its community predicate about the language.
Immutable data structures
The Clojure libraries provide several data structures along with a handful of functions to manipulate them. The core ones are
set and they are all treated using the same abstraction:
collection. Out of the box Clojure provides functions that operate on any collection, like
= for testing equality,
count to know the number of items in the collection,
conj to add an element,
empty to create a new empty collection and
seq to obtain a
sequence out of a collection, which is a sequential view of the latter. To work with sequences the core library offers the obvious
rest but also the more powerful
drop, that is just to mention the fundamentals but you can find a lot more in the clojuredocs. It’s important to note that all of these data structures are immutable. For instance,
conj adds an element to a collection without mutating it, the trick is just to return a new collection, one that contains exactly the same elements as the original plus the new one. There’s an unavoidable penalty cost of allocating new data for these operations, but it’s very optimized since the structures are persistent and Clojure is hosted in the JVM, which is known for a trustworthy garbage collector. Also, we’re trading that performance cost for correctness guarantees and other performance optimizations using parallel computing, as we’ll see next.
Clojure is mainly a functional programming language, meaning that it drives you to think in the mathematical sense of functions: a relation between a set of inputs and outputs. Given an input, a function always yields the same value, no matter how many times you call it, and most importantly, it doesn’t modify anything externally visible (AKA side-effects); let’s look at a trivial example.
This is a common way to transform a list in an imperative manner:
for elem in list transform(elem);
And this is how you would do the same in a functional (lisp) fashion:
(map transform list)
You could argue about the succinctness of the syntax and you could even say that the first is clearer to read, but it is a fact that, if you want certain correctness guarantees, at least two implications arise from the imperative choice.
First: you are bound to sequentially and synchronously iterate the list. Why? Because in imperative programming, a procedure (
transform in our example) may cause alterations visible to other procedures (yeah, side-effects!). How do you know that the transformation of one element isn’t affected by the transformation of the previous one? How do you know that the overall result is always the same? To convince yourself, change
Second: you are forbidden from sharing the list between threads. As you are modifying elements in place, you could be doing so while another thread is trying to access the same element; for those coming from the Java world, this exception may sound familiar.
On the other hand, the limitations of programming with functions turn out to help in making safer constructions. Implementing
transform with pure functional programming means that it won’t make any visible modification and will also return a new value instead of modifying the previous one. Now, that means you can asynchronously calculate the transformation of each element and safely share the list between threads, what is more, these optimizations can be automatically done by the compiler.
Identity and state
Here is where actual modifications take place: in order to model real world problems, we sometimes need entities that mutate over time. Clojure’s approach to achieve this, while maintaining immutable data structures and functional constructions, is to have a clearly defined concept of identity separated from the state. Here’s the definition for each concept:
- Identity: a stable logical entity associated with a series of different values over time.
- State: the value of an identity at a point in time.
The relation between these two concepts is that an identity has exactly one state at any point in time, quoting clojure.org: an identity can be in different states at different times, but the state itself doesn’t change. Notice how the immutable structures perfectly fit as states. To better visualize these definitions you can think of many examples: an identity can be a counter of page visits, while the states of the counter may be
1000; another identity could be the list of current users in a chat group with states being (immutable) lists of users. Just as at any point in time the counter may change from having
42 to having
43 and neither
43 will change as a value, also the list of users may change from the state
[‘user_1’, ‘user_2’] to the new state
[‘user_2’, ‘user_3’] without any list being modified but creating a new one and updating the content of the identity.
Clojure implements identities with three reference types: Atoms, Agents and Refs. The difference between them is basically their concurrency semantics, i.e. whether their value changes are synchronous or asynchronous and coordinated or uncoordinated.
Atoms are a simple reference whose changes are synchronous and atomic but uncoordinated and independent from other references.
Agents are also simple references but changes are made asynchronously, so the semantic is to send a change to the Agent, and the change will be applied in an unknown time by other thread.
Finally, Refs are the only reference type that implement coordinated modifications. They are backed by Clojure’s implementation of software transactional memory which gives you the power to modify more than one reference within a transaction.
Clojure aims to be a language used in highly concurrent environments where a lot of work can be parallelized, scaling up by adding more processing units instead of increasing the speed of a single unit. The way to achieve this is by making you think in functional transformations and immutable values, with state changes being very controlled and limited.
Although learning a lisp-based language and migrating from other tools can be really hard or expensive, it’s a good exercise to learn this way of reasoning about problems and apply these concepts with more or less difficulty in almost any language.
References and further reading