Thursday, October 28, 2010

Typed Abstractions in Clojure

(We will use the term 'abstraction' instead of 'data abstraction' in this post for brevity.)

Prelude


In software parlance an abstraction is a concept or idea not associated with any specific instance. Fundamentally an abstraction can be manifested as Representation + Identification, which is to say that a representation must be identifiable as a certain abstraction. For example, the representation of a 'Person' abstraction may be as follows:


{:name "Martin Collins"
:gender :male
:country "de"}


This is however, only a persistent map and the detail that this map is identified as a 'Person' is implicit. Isn't this 'PersonDetails' rather than 'Person'? And isn't each of these elements an abstraction by itself? Well yes, but most of it is implicit for convenience sake. We tend to make such trade offs (implicit versus identifiable) in software systems depending upon how many of such abstractions we need identified in the given context.

Data Abstraction = Representation + Identification


Example representations of other abstractions:

(a) FundsTransfer (Bank name/branch is implicit):

{:from-account 123456789
:to-account 987654321
:which-date Oct-23-2010 ; this is a var
:txn-number "F083-BN8892064"}

(b) ItemPrice (currency is implicit):

67.88

(c) NamesList (the fact that the names belong to humans, is implicit):

["Christie Paul" "Ram Goyal" "Shaqeel Ahmed"]


Why/when do we need abstraction identification?


You would notice that the types above in the previous section have different representations (map, number, vector), which means the identification of those abstractions remain implicit. When we have too many such implicit abstractions or too much of nesting of abstractions (esp with same representation types) debugging and re-factoring on a non-trivial code base may become a challenge. Identifying the abstractions becomes increasingly important during such scenarios.

Identification = Type + Notion

If we drill down on the term 'identification', we observe that it is made up of 'type' (of) and 'notion' (about) the abstraction. For example, the type of an employee abstraction may be 'Employee' or :employee, and the notion that it can draw salary is part of the domain or business logic of the application. Expressing the notion of an abstraction is a complex activity and is described in terms of context and behaviour. In this post we will focus on expressing the type, not notion.

Representation types vs Identification


Now let us discuss the ways and means we can use to identify (or rather assign 'type' to) abstractions in Clojure. Please note that we are not going to discuss which data type suits what use case here -- that is a different topic altogether.

1. Data types (Clojure 1.2)

Data types come with a pre-built mechanism for identification.


(defrecord Person [name gender country]) ; type Person

(def p (Person. "Sherlyn Casta" :female :ar))

(type p) ; tells 'Person'

(instance? Person p) ; returns true



Even though data types may behave as maps they are actually quite different -- you can add behaviour on those types (protocols). If you want to handle cases where one type may belong to multiple super-types or sub-types, consider using the technique for maps (#5).

2. Protocols (Clojure 1.2)

Protocols can be considered as abstractions that are accessible using the behaviour they expose. Protocols are also named things like data types.


3. Multi-methods are for behaviour, not for data

Multi-methods are means to exploit the identification attributes already present in a representation.


4. Structred map

(defstruct Person :name :gender :country)

They are equivalent to maps -- see the map entry (#5 below) for details.


5. Collection (Map, Vector, Seq, List, Set)

Meta data is a nice way in Clojure to add arbitrary additional information to a Clojure object. The collection data types are pre-organized for type annotation, i.e. they implement the clojure.lang.IObj protocol.


(defn obj?
[obj]
(instance? clojure.lang.IObj obj))

(defn typed
[obj type-keyword]
(let [old-meta (into {} (meta obj))]
(with-meta obj
(assoc old-meta :obj-type type-keyword))))

(defn typed?
[obj type-keyword]
(= (:obj-type (meta obj)) type-keyword))


Now putting it to use:

(defn names-list
[names]
(typed names :names))

(defn names-list?
[names]
(typed? names :names))

(defn print-labels
[names]
(assert (names-list? names))
..)

;; usage
(print-labels
(names-list ["Tom" "Dick" "Harry"]))


The good thing about meta data is that you can access the representation in the same way after annotating them. Moreover, you can assert the type (attached meta data) of a representation as and when required.

6. Catch-all: Value types (number, string), atom, ref etc

Basic data types (such as number, string) and atom, ref etc do not implement the clojure.lang.IObj protocol. Hence, adding type information to such things requires us to wrap them into a form that enables meta data, and provide for a way to unwrap them as well. One of the easiest and the most powerful constructs for this is a function:


(constantly 3788) ; or "Peter", or (java.util.Date.)


To unwrap the wrapped representation, you can simply call the function (see 'fetch-orders' function below):


(defn item-code
[code]
(typed (constantly code) ; wrap the item code
:item-code))

(defn item-code?
[code]
(typed? code :item-code))

(defn fetch-orders
[wrapped-code]
(assert (item-code? wrapped-code))
(let [code (wrapped-code)] ; unwrap the item code
...))

;; putting it to use
(fetch-orders (item-code 46))


The wrap function wraps a given object into a function. Upon wrapping, you can attach meta data to them using typed and assert their types using typed? functions respectively. You can pass around wrapped objects and assert them wherever required to check for sanity. However, you must remember the price that comes with this whole thing: Wrap/assert/unwrap is slower than plain access!

To deal with the overhead of asserting the types, I would suggest to
1. use wrap/unwrap only at contract points, i.e. module boundaries (public functions)
2. assert the types conditionally in a block based on a global *whether-to-assert* (or a suitably named) boolean flag -- this can be turned off in production, and set to true during development/testing

Type hierarchies and Transitivity


More sophisticated forms of identification are Type Hierarchies and the rules of Transitivity.

Consider this type hierarchy (the ones placed higher are super-types):

Worker Regular (entitled to perks)
/ \ / |
/ \ / |
Payable Volunteer |
/ \ |
/ \ |
Salaried \ |
/ _______\_____________|
/ / \
Employee Contractor


The type :employee implies types [:salaried :payable :worker :regular] and similarly, the type :volunteer implies [:worker :regular]. See the connection? There can also be more sophisticated forms such as contextual and conditional (logic-based) type hierarchies and relations but that is beyond the scope here. Now let's see how types can be identified in a hierarchy:


(def *super-types* {:employee #{:salaried :payable :worker :regular}
:salaried #{:payable :worker}
:payable #{:worker}
:contractor #{:payable :worker}
:volunteer #{:worker :regular}})

;; new version of typed?
(defn typed?
[obj type-keyword]
(let [all-types (into #{(:obj-type (meta obj))}
(type-keyword *super-types*))]
(contains? all-types type-keyword)))

That's all for a simple introduction to typed abstractions in Clojure. I am interested to know what you think about this. You may like to follow me on Twitter.

No comments:

Post a Comment