In Clojure We Trust

A piece of intertube about the Clojure programming language, algorithms and Emacs.

When-let Maybe?

In this post we will see how to incrementally develop a macro similar to when-let but more flexible.


When-let is useful to both bind one variable and do a test on the bind value in one operation. One common usage is:

(when-let [something (get-something arg1 arg2)]
   (do-stuff something))


This is equivalent to this code:

(let [something (get-something arg1 arg2)]
   (when something
      (do-stuff something)))

but with a more concise form.

When-let only executes the code after the binding if the value assigned in the let form is logically true, that is only if the value is not nil or false. In our example, this means that if get-something returns false then do-stuff will not be executed. Sometimes this is not enough and we want to execute the code even for false values. For instance if we are getting our data from a database and false values are acceptable but nil values are not, or if our functions return nil on error.

We could define a when-nlet macro which does what when-let does but executes the body whenever the binded value is not nil. By spying the code from clojure.core we obtain:

(defmacro when-nlet [bindings & body]
  (when (not= (count bindings) 2)
    (throw (IllegalArgumentException.
            "when-nlet requires an even number of forms in binding vector")))
  (let [form (bindings 0)
        tst (bindings 1)]
    `(let [temp# ~tst]
        (when (not (nil? temp#))
          (let [~form temp#]
            ~@body)))))

This is fine. But what if we need multiple values to be bound and multiple checks on them?

We could write:

(when-nlet [val1 (get-something arg1 arg2)]
  (when-nlet [val2 (get-something2 val1)]
     (when-nlet [val3 (get-something3 val2]
        (do-stuff val1 val2 val3))))

This is not satisfying. What about writing a when-nlet* macro that does multiple binds?

We could call it like that:

(when-nlet* [val1 (get-something arg1 arg2)
val2 (get-something2 val1)
val3 (get-something3 val2)]
(do-stuff val1 val2 val3))

and it would produce multiple calls to when-nlet.

Here it is:

(defmacro when-nlet* [bindings & body]
  (when (not (even? (count bindings)))
    (throw (IllegalArgumentException.
            "when-nlet* requires an even number of forms in binding vector")))
  (let [whenlets (reduce (fn [sexpr bind]
                           (let [form (first bind)
                                 tst (second bind)]
                             (conj sexpr `(when-nlet [~form ~tst]))))
                         ()
                         (partition 2 bindings))
        body (cons 'do body)]
    `(->> ~body ~@whenlets)))

After the reduce the whenlets variable is assigned this list (we can see it by playing with macroexpand, macroexpand-1 and prn at the REPL): 

((when-nlet [val3 (get-something3 val2)]) (when-nlet [val2 (get-something2 val1)]) (when-nlet [val1 (get-something arg1 arg2)]))

We then thread the body inside the when-nlets forms with the powerful ->> macro. We obtain:

(when-nlet [val1 (get-something arg1 arg2)]
(when-nlet [val2 (get-something2 val1)]
(when-nlet [val3 (get-something3 val2)]
(do (do-stuff val1 val2 val3)))))

So basically what we have done is creating a macro that does multiple binds, stops after the first bind returning a nil value and executes its body if no nil value has been encountered. It is nice and it shows us how much powerful Clojure is but… if we read this tutorial on how to use monads in Clojure we quickly see that there is a simpler way to do the same thing with the maybe monad!

(domonad maybe-m [val1 (get-something "a" "b")
val2 (get-something2 val1)
val3 (get-somnil val2)]
(do-stuff val1 val2 val3))

This code is equivalent to our usage of when-nlet*. Thus, if we still feel the need for our when-nlet* macro, we could write it simply like that (after :using clojure.contrib.monads):

(defmacro when-nlet* [binding & body]
(let [body (cons 'do body)]
`(domonad maybe-m ~binding ~body)))

Is that not much better?

Conclusion: we shall not write macros before learning more about monads ;-) !

Comments

Anonymous
You're using the "fn*" convention precisely opposite to how most people do in Clojure.

Usually, the asterisk suffix is used on a helper _function_ that is paired with a macro (with no suffix) that is only sugar around the function.
Nicolas
Very nice. This is the Maybe monad in Haskell, the Maybe type being defined as such:

data Maybe a = Just a | Nothing
Just and Maybe are the constructors of the Maybe type. This is close to a C++ template on <a>, really.

The do macro can be used to chain operations:

maybeM = do
val1 <- getSomething "a" "b"
val2 <- getSomething2 val1
val3 <- getSomnil val2
doStuff val1 val2 val3

Each one of these function must return a Maybe type, constructed with "Nothing", or "Just 42" for instance. To simplify things, let's just say that the arrow is an extraction of the internal Maybe value. val1 will be an Int, or a String, for example. Not a Maybe Int or Maybe String.

There is another way to write this do program:

maybeM = getSomething "a" "b"
>>= \val1 -> getSomething2 val1
>>= \val2 -> getSomnil val2
>>= \val3 -> doStuff val1 val2 val3

>>= looks like a funnel, and behaves sort of like the unix pipe: take the result of the function getSomething "a" "b", and if you can extract it from a Maybe type, pipe it into the lambda function that takes 1 parameter (val1). If the return is Nothing, then we can't extract anything and stop the evaluation there.

This is why monads are needed to do I/O in Haskell: I/O functions return an action; executing it produces the new state of the world and has an I/O side-effect.

People start using Javascript on the server now using Node.JS with a lot of asynchronous functions; they write code like:

getSomething("a", "b", function(val1) {
getSomething2(va11, function(va2){
getSomnil(val2, function(val3) {
doStuff(val1, val2, val3); // wtf.
});
});
});

This must be horrible to anyone who has been introduced to monads.

How to Build a GUI With NetBeans and Clojure

In this post we show how to create a simple GUI with the powerful Swing GUI builder of NetBeans, use it from Clojure and deliver it as a self-executable JAR. The application is a text generator similar to the Henley program of the chapter 8 of Ansi Common Lisp by Paul Graham: it reads some input file to generate statistics and use them to generate a random text. Throughout the design of the application we emphasize on a clean separation between the presentation logic and the logic of the data model with a MVC pattern.

Here are the tools required to create and build the application:

* Java 1.6
* Apache Maven
* NetBeans
* Leiningen


Creating the GUI


We use Leiningen as our Clojure build tool and it plays nicely with Maven, so we create a Maven project in NetBeans for the GUI. The project consists of Swing components, in our example just one JFrame. We do not write any Java code and we just use the NetBeans GUI builder to create components without any logic. The logic will be in the Clojure code. This Maven project will be a dependency of our Clojure project.

First we make a new NetBeans project by clicking on File/New Project…/Maven/Maven Project and selects Maven Quickstart Archetype which is nearly an empty Maven project. We name our project henleyuicomponents. Clojars forces you to have lower-case JAR names, so we choose a lower-case name for our project, useful if we want to publish it later. We set the Group Id to henley, the version to 1.0.0-SNAPSHOT and the package to henley.uicomponents.


We create a new JFrame called MainFrame by right-clicking on the package name in the Projects tree, and selecting ”JFrame Form…”. We then design the application with the GUI builder and make it looks like this:

For each Swing components (JButton, JText etc.) that we need to access to from our Clojure code, we right-click on it in the GUI builder, select ”Customize code”, rename it with a meaningful name and set its Access to public. When we are finished designing the GUI we can build it by right-clicking on the project name and choosing ”build”. This will invoke Maven with the command ”mvn install”. Alternatively you can go in the project directory and invokes this command with a shell. Once build, the Maven artifact is installed in your local Maven repository (in ~/.m2 on Linux for instance).

We now need to connect the Swing components to our Clojure code.

Wiring the components


We create an “henley” project with Leiningen with ”lein new henley” with the following content for the project.clj:

(defproject henley "1.0.0-SNAPSHOT"
:description "A GUI for a text generator similar to the one
in the book Ansi Common Lisp, Paul Graham"
:dependencies [[org.clojure/clojure "1.2.0"]
[org.clojure/clojure-contrib "1.2.0"]
[henley/henleyuicomponents "1.0.0-SNAPSHOT"]]
:dev-dependencies [[swank-clojure "1.2.0"]
[lein-run "1.0.0-SNAPSHOT"]]
:main henley.core
)

The Maven component is listed as a dependency. The :main option specify where the main function is located. This is necessary to create a self-executable JAR.

We follow a MVC pattern where the data of the model are pushed from the controller to the view. The view consists of the Swing components in the Maven component; for our simple example it is only the JFrame. Let us see how our project is organized, here is the project tree:


  • The model directory contains the logic of the application, in our case one namespace and its functions to generate some text.
  • The view directory contains all the code necessary to use the Swing components. We define two protocols, View and SwingUI: they are used by the controller and allow to abstract the details of the particular graphical implementation. The View protocols is for all functions that are totally independent of Swing from the point of view of the controller. The SwingUI is for all functions that are relative to Swing, like attaching a listener. They are defined in the files view.clj and swingui.clj. The Swing components inside the Maven artifact are referenced in the uicomponents.clj file. Application.clj provides an implementation for the two protocols.
  • The controller directory contains the register.clj function which uses the SwingUI interface to register Swing listeners defined in swing_listeners.clj. These listeners can use the SwingUI interface to get some information from the Swing components, for instance the number of words defined by the user in the JText field. This is indeed their purpose: extract the information and calls the function defined in handlers.clj. The functions in handlers.clj are callbacks and do not have access to the SwingUI interface, all the relevant information has been extracted by the swing_listeners and they only access the interface through the View protocol. The function in the handlers uses the model and informs the View of any changes.

Developing the application logic


In our case the logic of the application is just the text generation, a functional translation in Clojure of the Henley program available in the book ANSI Common LISP.

Building and using the application


We build and install the Swing components with the command ”mvn install” launched from the henleyuicomponents directory. We then go in the henley directory and call ”lein deps” to resolve all dependencies. If we want to build a self-executable we call ”lein uberjar” and go outside for a walk; when we are back we should have a standalone JAR. If not, we may have more success by installing cake and do ”cake uberjar”. The JAR can be executed with ”java -jar jarname”.  
If you have a problem building the JAR you can comment the :main option in the project.clj file and type the command ”lein run henley.core -main” from the henley directory to launch the application.
How to use the application? We can for instance generate a french “poem” by using the baudelaire.txt file in the test directory as an input file:



Abstraction levels


We have a lot of files and two protocols just for a simple project. What kind of abstraction do they defined?

  • uicomponents.clj allows us to access the Swing components independently of the way they are defined. They could be defined with code manually, with an Eclipse project etc. We choose the NetBeans GUI builder because it is the best free Swing builder available (to the extend of our knowledge).

  • the SwingUI protocol allows the controller to access the Swing components in an implementation-independent way. If for instance the Number of words JText field becomes later a JSpinner this will not affect the controller: the details of the implementation are already abstracted by the protocol.

  • The callbacks defined in handlers.clj, which are the heart of the controller, are independent of the GUI. The GUI could be written in SWT: this will not affect them. At this level, the View protocol abstracts the GUI implementation.

  • By pushing data from the controller to the View, the GUI is independent of the model data. It does not matter in a simple example as ours, but it will on a bigger project.

Conclusion


We have a clean and very flexible design but with one constraint: a lot of functions are defined just to do a few operations. What do you think of this design? Do you see a way to simplify it without losing flexibility?

Links


The self-executable JAR can be download here
The code is available on Git: https://github.com/kototama/henley

Tail Recursion and Function Composition

The chapter four of the ANSI Common Lisp book has an interesting code to insert a node in a binary search tree (BST). The code, ported to Clojure, is:

(defstruct node :elt :l :r)

(defn bst-insert [bst x comp]
(if (nil? bst)
(struct node x)
(let [elt (:elt bst)]
(if (= x elt)
bst
(if (comp x elt)
(struct node
elt
(bst-insert (:l bst) x comp)
(:r bst))
(struct node
elt
(:l bst)
(bst-insert (:r bst) x comp)))))))


The function is non-destructive. Creating a BST can be done like this:

(def nums (reduce (fn [acc x] (bst-insert acc x <)) nil [5 8 4 2 1 9 6 7 3]))
Obviously the function does not use a tail recursion. The value of the recursive call to bst-insert is used by the struct function. How could we transform it to have a tail recursion? We need to store on a stack the information gained while doing the recursion: the branches we have visited and the branches taken. We store the information while we go from the root to the leaf. We can then unstack and reconstruct the tree from the leave to the root. This is done in the bst-insert-tc function:
(defn bst-insert-tc [bst x comp]
(loop [bst bst
acc '()]
(if (nil? bst)
(loop [tree (struct node x)
stack acc]
(let [[lr elt branch] (first stack)]
(if (empty? stack)
tree
(if (= lr :l)
(recur (struct node elt branch tree) (rest stack))
(recur (struct node elt tree branch) (rest stack))))))
(let [elt (:elt bst)]
(if (= x elt)
bst
(if (comp x elt)
(recur (:l bst) (cons [:r elt (:r bst)] acc))
(recur (:r bst) (cons [:l elt (:l bst)] acc))))))))

This approach works but is a bit naive and the second loop can be simplified by using reduce:
(defn bst-insert-tc2 [bst x comp]
(loop [bst bst
acc '()]
(if (nil? bst)
(reduce (fn [tree [lr elt branch]]
(if (= lr :l)
(struct node elt branch tree)
(struct node elt tree branch)))
(struct node x)
acc)
(let [elt (:elt bst)]
(if (= x elt)
bst
(if (comp x elt)
(recur (:l bst) (cons [:r elt (:r bst)] acc))
(recur (:r bst) (cons [:l elt (:l bst)] acc))))))))
Once I had the first tail recursive version, I asked my friend Yogi how he would transform the bst-insert code into a tail recursive version. He did find a really elegant solution, in Haskell. Here is the idea: instead of stacking information about the tree and then performing at the end calls to reconstruct the tree in reverse order, we can build a function constructing a new tree as we go from the root to the leaf. When we are on a node, what we do is something like that (if the element to be inserted is smaller than the node’s element):
(struct node elt l (:r bst))
The problem is, when we are doing this operation, we do not know the value of l. l is a tree to be construct, and we can know its value only by doing a new recursive call. Since we do not know the value of it at this time, we will created an anonymous function that will be able to take this value as an argument later, when the value will be known. We thus construct anonymous functions at each level of the recursion, composing them into a unique function with comp. Each anonymous function will return the appropriate tree for the left or right branches. At the end of the recursion, we know all the arguments. The last argument is the value of the element to be inserted. We can directly call the composed function which will create the whole tree. We do not even need to unstack anything:
(defn bst-insert-tc3 [bst x cmp]
(loop [bst bst
f identity]
(if (nil? bst)
(f (struct node x))
(let [elt (:elt bst)]
(if (= x elt)
bst
(if (cmp x elt)
(recur (:l bst) (comp f (fn [l] (struct node elt l (:r bst)))))
(recur (:r bst) (comp f (fn [r] (struct node elt (:l bst) r))))))))))
The code shows how functional programming can be both elegant and efficient.
The full code for the BST is available here

Dijkstra’s Algorithm in Clojure

The Dijkstra’s algorithm is an efficient algorithm to compute the shortest path between two nodes of a directed graph with positives costs (distances). In this post we rely loosely on the pseudo-code of Wikipedia and on the nice illustrated example of the french version to develop this algorithm in Clojure. The functions in the clojure.zip namespace will not be used.

We represent the graph with a map. Keys are the nodes of the graph, values are another map mapping each child to its distance to its parent.

Thus, the representation for this graph:
graph
is:
{:A {:B 85, :C 217 }, :B { :C 25 }, :C {} }

Symbols (:A, :B etc.) are used to represent the nodes, but any type would be fine.

We don’t want our algorithms to rely on a particular graph structure so access to the graph is made via helper functions. The distance function return the distance between two nodes and the children function return a sequence containing the children for a given node.

(defn distance [net nodesrc nodedst]
((net nodesrc) nodedst))

(defn children [net node]
(keys (net node)))

In the Dijkstra’s algorithm, nodes’ distances to the initial node are progressively updated and the predecessor of a node, for its known shortest distance, is updated. We take the same approach but we don’t store distances with a value of infinity, i.e. the distances to the initial node for nodes not yet explored are not kept. We call “root” the initial node in our code, and “rdistance” the distance to the root for a given node.

The rdistances are stored in a sorted map, rdists. Each rdistance is mapped to another map mapping each node to their predecessor for this rdistance. This allow to efficiently implement the
u := vertex in Q with smallest dist[]
operation describe in the pseudo-code: we just need to take the first element of the map. Once the smallest rdistance is found, we removed it from the map to save space. How do we keep the rdistance of a node if rdistances are removed from the rdists map? Predecessors and rdistances are kept together in the preds map. Rdists is only used for newly discovered rdistances. Note that with this scheme we don’t need a Q set like in the pseudo-code, rdists acting like a one. When rdists is empty, that mean we have explored all reachable nodes.

When the node with the smallest rdistance has been found, we need to update the rdistances of its children. This is done in the update-rdists function.

(defn- update-rdists [rdists preds net node dist children distance]
"return [rdists preds] updated"
(reduce (fn [acc x]
(let [curdist (+ dist (distance net node x))
prevdist (second (preds x))
nrdists (first acc)
npreds (second acc)]
(if (nil? prevdist)
[(add-rdist nrdists x node curdist) (assoc npreds x [node curdist])]
(if (< curdist prevdist)
[(add-rdist nrdists x node curdist prevdist)
(assoc npreds x [node curdist])]
[nrdists npreds]))))
[rdists preds]
(children net node)))
When true, the condition (nil? prevdist) means that the rdistance for the currently examined node is not known, i.e. infinite. The reduce function allows us to “iterate” on each child of the node with the smallest rdistance, and to update the values of rdists and preds. They are stored together in a vector, and pass via the accumulator between each iteration.

Uncommenting the printf function in the code allows us to follow the steps of the algorithm.

minnode = :A preds = {:E [:A 173], :C [:A 217], :B [:A 85], :A [:A 0]} rdists ={85 {:B :A}, 173 {:E :A}, 217 {:C :A}}
For instance, the previous line means that the node with the smallest rdistance was A, then its children’ rdistances were updated. The rdistance from :B is 85 and its predecessor is A, the one from :E is 173 etc. In this case rdists and preds convey nearly the same information, this is because its one of the first iteration of the algorithm. Things change after when the exploration progress.

Please feel free to criticize and improve the algorithm. One idea could be to use the memoize function to cach the result of the dijkstra function.

If you know a good test suite for graph algorithms, please post a comment.

Full code is available here

Comments

Kototama
Thanks for noticing that. The code has been updated.
Anonymous
Great write-up, thanks!

One side note: the prevnode binding is not used in update-rdists.