Bitumen Framework: October 2010

Thursday, October 28, 2010

Typed Abstractions in Clojure

(We will use the term 'abstraction' instead of 'data abstraction' in this post for brevity.)

Prelude

In software parlance an abstraction is a concept or idea not associated with any specific instance. Fundamentally an abstraction can be manifested as Representation + Identification, which is to say that a representation must be identifiable as a certain abstraction. For example, the representation of a 'Person' abstraction may be as follows:


{:name    "Martin Collins"
 :gender  :male
 :country "de"}

This is however, only a persistent map and the detail that this map is identified as a 'Person' is implicit. Isn't this 'PersonDetails' rather than 'Person'? And isn't each of these elements an abstraction by itself? Well yes, but most of it is implicit for convenience sake. We tend to make such trade offs (implicit versus identifiable) in software systems depending upon how many of such abstractions we need identified in the given context.

Data Abstraction = Representation + Identification

Example representations of other abstractions:

(a) FundsTransfer (Bank name/branch is implicit):


{:from-account 123456789
 :to-account   987654321
 :which-date   Oct-23-2010 ; this is a var
 :txn-number   "F083-BN8892064"}

(b) ItemPrice (currency is implicit):


67.88


["Christie Paul" "Ram Goyal" "Shaqeel Ahmed"]

Why/when do we need abstraction identification?

You would notice that the types above in the previous section have different representations (map, number, vector), which means the identification of those abstractions remain implicit. When we have too many such implicit abstractions or too much of nesting of abstractions (esp with same representation types) debugging and re-factoring on a non-trivial code base may become a challenge. Identifying the abstractions becomes increasingly important during such scenarios.

Identification = Type + Notion

If we drill down on the term 'identification', we observe that it is made up of 'type' (of) and 'notion' (about) the abstraction. For example, the type of an employee abstraction may be 'Employee' or :employee, and the notion that it can draw salary is part of the domain or business logic of the application. Expressing the notion of an abstraction is a complex activity and is described in terms of context and behaviour. In this post we will focus on expressing the type, not notion.

Representation types vs Identification

Now let us discuss the ways and means we can use to identify (or rather assign 'type' to) abstractions in Clojure. Please note that we are not going to discuss which data type suits what use case here -- that is a different topic altogether.

1. Data types (Clojure 1.2)

Data types come with a pre-built mechanism for identification.


(defrecord Person [name gender country]) ; type Person

(def p (Person. "Sherlyn Casta" :female :ar))

(type p) ; tells 'Person'

(instance? Person p) ; returns true

Even though data types may behave as maps they are actually quite different -- you can add behaviour on those types (protocols). If you want to handle cases where one type may belong to multiple super-types or sub-types, consider using the technique for maps (#5).

2. Protocols (Clojure 1.2)

Protocols can be considered as abstractions that are accessible using the behaviour they expose. Protocols are also named things like data types.

3. Multi-methods are for behaviour, not for data

Multi-methods are means to exploit the identification attributes already present in a representation.

4. Structred map

(defstruct Person :name :gender :country)

They are equivalent to maps -- see the map entry (#5 below) for details.

5. Collection (Map, Vector, Seq, List, Set)

Meta data is a nice way in Clojure to add arbitrary additional information to a Clojure object. The collection data types are pre-organized for type annotation, i.e. they implement the clojure.lang.IObj protocol.


(defn obj?
  [obj]
  (instance? clojure.lang.IObj obj))

(defn typed
  [obj type-keyword]
  (let [old-meta (into {} (meta obj))]
    (with-meta obj
      (assoc old-meta :obj-type type-keyword))))

(defn typed?
  [obj type-keyword]
  (= (:obj-type (meta obj)) type-keyword))

Now putting it to use:


(defn names-list
  [names]
  (typed names :names))

(defn names-list?
  [names]
  (typed? names :names))

(defn print-labels
  [names]
  (assert (names-list? names))
  ..)

;; usage
(print-labels
  (names-list ["Tom" "Dick" "Harry"]))

The good thing about meta data is that you can access the representation in the same way after annotating them. Moreover, you can assert the type (attached meta data) of a representation as and when required.

6. Catch-all: Value types (number, string), atom, ref etc

Basic data types (such as number, string) and atom, ref etc do not implement the clojure.lang.IObj protocol. Hence, adding type information to such things requires us to wrap them into a form that enables meta data, and provide for a way to unwrap them as well. One of the easiest and the most powerful constructs for this is a function:


(constantly 3788) ; or "Peter", or (java.util.Date.)

To unwrap the wrapped representation, you can simply call the function (see 'fetch-orders' function below):


(defn item-code
  [code]
  (typed (constantly code) ; wrap the item code
    :item-code))

(defn item-code?
  [code]
  (typed? code :item-code))

(defn fetch-orders
  [wrapped-code]
  (assert (item-code? wrapped-code))
  (let [code (wrapped-code)] ; unwrap the item code
    ...))

;; putting it to use
(fetch-orders (item-code 46))

The wrap function wraps a given object into a function. Upon wrapping, you can attach meta data to them using typed and assert their types using typed? functions respectively. You can pass around wrapped objects and assert them wherever required to check for sanity. However, you must remember the price that comes with this whole thing: Wrap/assert/unwrap is slower than plain access!

To deal with the overhead of asserting the types, I would suggest to
1. use wrap/unwrap only at contract points, i.e. module boundaries (public functions)
2. assert the types conditionally in a block based on a global *whether-to-assert* (or a suitably named) boolean flag -- this can be turned off in production, and set to true during development/testing

Type hierarchies and Transitivity

More sophisticated forms of identification are Type Hierarchies and the rules of Transitivity.

Consider this type hierarchy (the ones placed higher are super-types):


                  Worker      Regular (entitled to perks)
                  /    \       /   |
                 /      \     /    |
             Payable   Volunteer   |
              /   \                |
             /     \               |
        Salaried    \              |
           /  _______\_____________|
          /  /        \
      Employee    Contractor

The type :employee implies types [:salaried :payable :worker :regular] and similarly, the type :volunteer implies [:worker :regular]. See the connection? There can also be more sophisticated forms such as contextual and conditional (logic-based) type hierarchies and relations but that is beyond the scope here. Now let's see how types can be identified in a hierarchy:


(def *super-types* {:employee   #{:salaried :payable :worker :regular}
                    :salaried   #{:payable  :worker}
                    :payable    #{:worker}
                    :contractor #{:payable  :worker}
                    :volunteer  #{:worker   :regular}})

;; new version of typed?
(defn typed?
  [obj type-keyword]
  (let [all-types (into #{(:obj-type (meta obj))}
                    (type-keyword *super-types*))]
    (contains? all-types type-keyword)))

That's all for a simple introduction to typed abstractions in Clojure. I am interested to know what you think about this. You may like to follow me on Twitter.

Sunday, October 24, 2010

Stack traces for Clojure app development

Edit (2011-Mar-06): This feature is available in Clj-MiscUtil as the bang operator.

The easiest way to print a stack trace in Clojure may be this:

user=> (Thread/dumpStack)
java.lang.Exception: Stack trace
at java.lang.Thread.dumpStack(Thread.java:1249)
at user$eval391.invoke(NO_SOURCE_FILE:193)
at clojure.lang.Compiler.eval(Compiler.java:5424)
at clojure.lang.Compiler.eval(Compiler.java:5391)
at clojure.core$eval.invoke(core.clj:2382)
at clojure.main$repl$read_eval_print__5624.invoke(main.clj:183)
at clojure.main$repl$fn__5629.invoke(main.clj:204)
at clojure.main$repl.doInvoke(main.clj:204)
at clojure.lang.RestFn.invoke(RestFn.java:422)
at clojure.main$repl_opt.invoke(main.clj:262)
at clojure.main$main.doInvoke(main.clj:355)
at clojure.lang.RestFn.invoke(RestFn.java:398)
at clojure.lang.Var.invoke(Var.java:361)
at clojure.lang.AFn.applyToHelper(AFn.java:159)
at clojure.lang.Var.applyTo(Var.java:482)
at clojure.main.main(main.java:37)
nil

However, many people realize that reading this kind of stack traces in Clojure is hard because they are intermingled with Java and Clojure implementation classes. It may help to filter the stack trace so that only relevant details appear. In this post we try to come up with an ad-hock filtering stack trace printer:

(defn get-stack-trace
([stack-trace]
(map #(let [class-name  (or (.getClassName  %) "")
method-name (or (.getMethodName %) "")
file-name   (or (.getFileName   %) "")
line-number (.getLineNumber %)]
[file-name line-number class-name method-name])
(into [] stack-trace)))
([]
(get-stack-trace (.getStackTrace (Thread/currentThread)))))


(defn get-clj-stack-trace
([classname-begin-tokens classname-not-begin-tokens]
(let [clj-stacktrace? (fn [[file-name line-number class-name method-name]]
(and (.contains file-name ".clj")
(or (empty? classname-begin-tokens)
(some #(.startsWith class-name %)
classname-begin-tokens))
(every? #(not (.startsWith class-name %))
classname-not-begin-tokens)))]
(filter clj-stacktrace? (get-stack-trace))))
([]
(get-clj-stack-trace [] ["clojure."])))


(defn print-table
[width-vector title-vector many-value-vectors]
(assert (= (type width-vector) (type title-vector) (type many-value-vectors)
(type [])))
(let [col-count (count width-vector)]
(assert (every? #(= (count %) col-count) many-value-vectors)))
(assert (= (count width-vector) (count title-vector)))
(let [fix-width (fn [text width]
(apply str
(take width (apply str text (take width (repeat " "))))))
sep-vector (into [] (map #(apply str (repeat % "-")) width-vector))]
(doseq [each (into [title-vector sep-vector] many-value-vectors)]
(doseq [i (take (count width-vector) (iterate inc 0))]
(print (fix-width (each i) (width-vector i)))
(print " | "))
(println))))


(defn print-stack-trace
([stack-trace-vector]
(print-table [20 5 45 10] ["File" "Line#" "Class" "Method"]
(into [] stack-trace-vector)))
([]
(print-stack-trace (get-clj-stack-trace))))

Having copy-pasted this code at the REPL, let us try to print the stack trace now:

user=> (print-stack-trace)
File                 | Line# | Class                                         | Method     |
-------------------- | ----- | --------------------------------------------- | ---------- |
nil

Well, that does not print anything because we have filtered out all non-Clojure stack trace; we have also filtered out all qualified class names beginning with "clojure." so that we can see the stack trace pertaining to application development only.

So let us tweak the command to print stack trace for all Clojure code at least:

user=> (print-stack-trace (get-clj-stack-trace [] []))
File                 | Line# | Class                                         | Method     |
-------------------- | ----- | --------------------------------------------- | ---------- |
core.clj             | 2382  | clojure.core$eval                             | invoke     |
main.clj             | 183   | clojure.main$repl$read_eval_print__5624       | invoke     |
main.clj             | 204   | clojure.main$repl$fn__5629                    | invoke     |
main.clj             | 204   | clojure.main$repl                             | doInvoke   |
main.clj             | 262   | clojure.main$repl_opt                         | invoke     |
main.clj             | 355   | clojure.main$main                             | doInvoke   |
nil

Now that stack trace is much easier to read! For a variation let us print the stack trace captured in an Exception:

user=> (print-stack-trace (get-stack-trace (.getStackTrace (Exception.))))
File                 | Line# | Class                                         | Method     |
-------------------- | ----- | --------------------------------------------- | ---------- |
NO_SOURCE_FILE       | 52    | user$eval52                                   | invoke     |
Compiler.java        | 5424  | clojure.lang.Compiler                         | eval       |
Compiler.java        | 5391  | clojure.lang.Compiler                         | eval       |
core.clj             | 2382  | clojure.core$eval                             | invoke     |
main.clj             | 183   | clojure.main$repl$read_eval_print__5624       | invoke     |
main.clj             | 204   | clojure.main$repl$fn__5629                    | invoke     |
main.clj             | 204   | clojure.main$repl                             | doInvoke   |
RestFn.java          | 422   | clojure.lang.RestFn                           | invoke     |
main.clj             | 262   | clojure.main$repl_opt                         | invoke     |
main.clj             | 355   | clojure.main$main                             | doInvoke   |
RestFn.java          | 398   | clojure.lang.RestFn                           | invoke     |
Var.java             | 361   | clojure.lang.Var                              | invoke     |
AFn.java             | 159   | clojure.lang.AFn                              | applyToHel |
Var.java             | 482   | clojure.lang.Var                              | applyTo    |
main.java            | 37    | clojure.main                                  | main       |
nil

You can try embedding the functions listed here in an application project and then print the stack trace using (print-stack-trace) - it will display only those lines available/relevant in your project.

Feedback/comments are welcome. You may like to follow me on Twitter.

Thursday, October 21, 2010

Easy getter/setter interop with Clojure

Edit:
1. There is also a bean function that turns a POJO into a map (with lazy map entries). There are subtle differences between setter-fn/getter-fn and bean - you can read in the comments to this post.
2. The setter-fn is used in a (map ..) to demonstrate the return values. Ideally you would call setter-fn in a doseq when working on a bunch of setters:


(doseq [each (seq {:name         "Jerry Stone"
                   :address      "39 Square, Bloomville"
                   :email        "no@spam.com"
                   :birth-date   (java.util.Date.) ; bad date for convenience
                   :married      true
                   :country-code 346})
        stfn [(setter-fn p)]]
  (stfn each))

Java interoperability is one of the strong features of Clojure. This post shows how to use the Clj-ArgUtil library to further ease the calling of getter/setter methods on Java objects.

Let us say there is a Person class (Plain Old Java Object - POJO):


// filename: test/Person.java
package test;

import java.util.Date;

public class Person {
    private String  name        = null;
    private String  address     = null;
    private String  email       = null;
    private Date    birthDate   = null;
    private boolean married     = false;
    private int     countryCode = 0;
    
    // getters
    public String   getName()        { return name;        }
    public String   getAddress()     { return address;     }
    public String   getEmail()       { return email;       }
    public Date     getBirthDate()   { return birthDate;   }
    public boolean  isMarried()      { return married;     }
    public int      getCountryCode() { return countryCode; }
    
    // setters
    public void setName(String name)             { this.name        = name;        }
    public void setAddress(String address)       { this.address     = address;     }
    public void setEmail(String email)           { this.email       = email;       }
    public void setBirthDate(Date birthDate)     { this.birthDate   = birthDate;   }
    public void setMarried(boolean married)      { this.married     = married;     }
    public void setCountryCode(int countryCode)  { this.countryCode = countryCode; }   
}

We can construct and set/get on a Person object as follows:


;; assuming we execute this code snippet in the REPL

(import 'test.Person)
(use 'org.bituf.clj-argutil)

;; instantiate a Person object
(def p (Person.))

;; call setters - returns (nil nil nil nil nil nil)
(map (setter-fn p) (seq {:name         "Jerry Stone"
                         :address      "39 Square, Bloomville"
                         :email        "no@spam.com"
                         :birth-date   (java.util.Date.) ; bad date for convenience
                         :married      true
                         :country-code 346}))

;; call getters - returns
;; ("Jerry Stone" "39 Square, Bloomville" "no@spam.com" #<Date Fri Oct 22 01:03:42IST 2010> true 346)
(map (getter-fn p)
  [:name :address :email :birth-date :is-married :country-code])

So what just happened? We used setter-fn and getter-fn functions from Clj-ArgUtil to call setters and getters on a Person object.

setter-fn and getter-fn wrap a POJO into respective functions so that setter and getter calls can be made on them easily.

When we call the setters, as you will notice


(map (setter-fn p) (seq {:name         "Jerry Stone"
                         :address      "39 Square, Bloomville"
                         :email        "no@spam.com"
                         :birth-date   (java.util.Date.) ; bad date for convenience
                         :married      true
                         :country-code 346}))

is equivalent to the following:


(map (setter-fn p) [[:name         "Jerry Stone"]  ; becomes .setName("Jerry Stone")
                    [:address      "39 Square, Bloomville"] ; and so on
                    [:email        "no@spam.com"]
                    [:birth-date   (java.util.Date.)] ; bad date for convenience
                    [:married      true]
                    [:country-code 346]])

Somewhat similar things happen when calling getters. The following code


(map (getter-fn p)
  [:name :address :email :birth-date :is-married :country-code])

gets internally converted into something like this:


(map (getter-fn p)
  [[:name]       ; .getName()
   [:address]    ; .getAddress()
   [:email]      ; .getEmail()
   [:birth-date] ; .getBirthDate()
   [:is-married] ; .isMarried()
   [:country-code]])

This conversion is due to the as-vector function that is applied to every argument. as-vector wraps a non-collection argument into a vector, or else (if the argument is a collection) pulls the items into a vector.

Hope you have fun with Clj-ArgUtil. You can find more variants of functions for calling setters and getters in the tutorial/documentation: http://bitbucket.org/kumarshantanu/clj-argutil/src

Kindly share your comments/feedback about this post and the library.

Sunday, October 10, 2010

CRUD in Clojure

Clj-DBCP and SQLRat are recently created Clojure libraries to deal with relational databases. This post describes how to use them to carry out database CRUD (Create, Retrieve, Update, Delete) operations in Clojure without installing a database.

For this example we will use the in-memory instance of the H2 embedded database. Let us create a project using Leiningen.


lein new crud

Edit the project.clj file as follows:


(defproject crud "1.0.0-SNAPSHOT"
  :description "CRUD example"
  :dependencies [[org.clojure/clojure "1.2.0"]
                 [org.clojure/clojure-contrib "1.2.0"]
                 [org.bituf/clj-dbcp "0.1"]
                 [org.bituf/sqlrat   "0.2"]
                 [com.h2database/h2 "1.2.141"]]
  :dev-dependencies [[swank-clojure "1.2.1"]]
  :main crud.core)

and get the dependencies:


lein deps

Now we will edit the core.clj file as follows:


(ns crud.core
  (:use org.bituf.clj-dbcp)
  (:use org.bituf.sqlrat.entity)
  (:gen-class))


(def db (db-spec (h2-memory-datasource)))


(defrecord Employee [])


(def emp-type
  (entity-meta :emp :empid (from-row Employee.)
    :cols [[:empid   :int "NOT NULL PRIMARY KEY"]
           [:empname "VARCHAR(50)" "NOT NULL"]
           [:dob     "DATE" "NOT NULL"]]))


(extend-entity Employee emp-type)


(defn new-emp [id name dob]
  (Employee. {} {:empid id :empname name :dob dob}))


;; create the table
(defn create-all-tables []
  (println "Creating employee table")
  (in-txn db
    (create-table emp-type)))


(def ymd-fmt (java.text.SimpleDateFormat. "yyyy-MM-dd"))


(defn ymd
  "Accept y, m and d. Return a new date based on input."
  ([y m d]
    (ymd (format "%d-%d-%d" y m d)))
  ([ymd]
    (.parse ymd-fmt ymd)))


(defn print-all-emp
  ([msg]
    (println msg)
    (print-all-emp))
  ([]
    (println "All employee records")
    (in-db db
      (let [all (find-by-criteria emp-type)]
        (println "All employees")
        (print-entities all)))))


(defn crud []
  (let [e1 (new-emp 1 "Joe Walker" (ymd "1977-10-10"))
        e2 (new-emp 2 "Mary Rayle" (ymd "1983-06-15"))]
    ;; insert
    (println "Inserting employee records")
    (in-txn db
      (save e1)
      (save e2))
    (print-all-emp "After insert")
    ;; retrieve by ID
    (in-txn db
      (let [r (find-by-id emp-type 1)]
        ;; update
        (save (assoc r :empname "Derek Smith"))))
    ;; print after update
    (print-all-emp "After update")
    ;; delete
    (in-txn db
      (delete emp-type 1))
    ;; print after delete
    (print-all-emp "After delete")))


(defn -main [& args]
  (create-all-tables)
  (crud))

Breakdown of this file:
1. An in-memory data source instance is created using the H2 database and bound to the var db.
2. We define an entity Employee (meta data emp-type). For creating instances of Employee data type we use the factory function new-emp and function ymd helps create date instances.
3. We carry out the CRUD operations in the function crud.
4. Helper functions create-all-tables and print-all-emp contain commonly used functionality.
5. The -main function is the entry point when executed from an executable JAR.

Now we try to build the file:


lein uberjar

and run it


java -jar crud-1.0.0-SNAPSHOT-standalone.jar

The output should like the following:


Creating employee table
Inserting employee records
After insert
All employee records
Executing SQL...
["SELECT * FROM emp"]
All employees
empid | empname    | dob
----- | ---------- | ----------
1     | Joe Walker | 1977-10-10
2     | Mary Rayle | 1983-06-15
Executing SQL...
["SELECT * FROM emp WHERE empid=?" 1]
After update
All employee records
Executing SQL...
["SELECT * FROM emp"]
All employees
empid | empname     | dob
----- | ----------- | ----------
1     | Derek Smith | 1977-10-10
2     | Mary Rayle  | 1983-06-15
After delete
All employee records
Executing SQL...
["SELECT * FROM emp"]
All employees
empid | empname    | dob
----- | ---------- | ----------
2     | Mary Rayle | 1983-06-15

Bitumen Framework