Do It Yourself DSLs in Scala

Andreas Kaltenbach

The charm of Scala can hardly escape a Java developer – the language concepts are too beautiful and elegant, and mesh perfectly with one another. In various Java Magazin [1] issues in 2009 we have already been introduced to some Scala DSL proponents from various fields. Now it’s time to roll up our sleeves like good old Tim the “Tool Man" Taylor and create our own DSL with Scala.

If you are active in the Scala universe, it does not take too long until you stumble across a domain-specific language (DSL). Even the Scala distribution [2] already includes its own DSLs, such as the Scala Actor Framework [3]. In addition, there are a lot of freely available frameworks, which also supply their own DSLs. Scala Modules [4] and the Apache Camel Scala DSL [5] are mentioned as examples here.

They all are classified as internal DSLs, that is, languages which are embedded in the host language Scala and only use its language concepts.
In this article we want to extend our know-how and build our own DSL based on Scala. Step by step we will build the DSL and at the same time enrich our knowledge of Scala concepts.

The Domain

 

Rule 1: No DSL without an associated domain! In our case the description of object relational mappings should be made easier by using a DSL, which can be seen as an alternative to an XML configuration file. Meanwhile we all know that thanks to annotations we can really do without verbose XML. While designing the DSL we will focus on Hibernate XML mappings to see where a Scala-based DSL has its strengths. Don’t worry if you are a Hibernate novice as the article is essentially about the design of the DSL and the judicious application of language concepts. The sample domain is kept deliberately small to keep the example as simple as possible.

Fig. 1: DSL Meta model


The domain or the metamodel in Fig. 1 is fairly straightforward: a Hibernate mapping consists of a lot of classes for which mapping information is provided. The identity of a class is configured using its ID. Classes have properties that need to be persisted, and their respective mapping configurations should also be included with the DSL. References to other mapped classes are described by sets or ManyToOne relationships.

First Attempt in Java

 

Initially we’ll stay in a familiar Java environment and see what a Java-based DSL could look like. Suppose we wanted to describe the persistence model illustrated in Figure 2.

Fig. 2: Our Caveat Emptor Model


The really very simplistic Caveat Emptor model [6] is limited to the two entities User and Item. A User can sell several Items conversely each item is also known to the seller. A User can be assigned name and birth date and you can store, whether or not the user is an administrator. In Listing 1 we can see what the mapping looks like in Java for the Caveat Emptor Model we just described.

Listing 1
public static HibernateMapping mapping
  = new HibernateMapping() {{
  classes = new Classes(
    new Class() {{
      type = User.class;
      tableName = "USERS";
      properties = new Properties(
        new Property() {{
          name = "birthDate";
          type = Date.class;
        }},
        ...
      );
      ...
    }},
    ...
  );
}};
 

We quickly realize that the Java language features that are available to us do not invite us to develop our own DSL. Only thanks to the interaction of anonymous inner classes and Initializers (which together give us the ominous {{...}}) is it possible to express the mapping without detours via method calls. In addition, the DSL still contains a lot of noise - unnecessary elements such as brackets and semicolons, the new keyword, or any workarounds, such as the full third line. These elements do not belong to the payload of the model, but rather pollute it.

In Java, we could possibly exhaust the existing language features to arrive at a Fluent API - an internal DSL that provides chainable methods. Well-designed Fluent APIs, for example, liquidform [7], are characterized by their high readability. However, we quickly come to the limit of what is possible with the Java language, so we decide to port the DSL we created in Java to Scala in the next step.

Arrival in Scala

When porting, we can incorporate here and there some of the convenience of Scala. The metamodel classes (for example, Class or Property) are pure data containers. In Java the fields need getters and setters. Scala spares us this, as getters and setters are generated additionally by the compiler. The metamodel classes are confined to the bare essentials.

Scala's native XML support is a particular treat. XML elements can be directly used as literals in the source code. The compiler detects this and creates scala.xml.Elem instances. Listing 2 shows, for example, how the metamodel class Property can be serialized in XML via the toXml method.

Listing 2
class Property {
  var name:String = null
  var Type:java.lang.Class[_] = null
  var columnName:String = null
  ...
  def toXml = {
    <property name={name} type={typeAsString}
      column={columnName}
      ...
    </property>
  }
}
 
 

As already noted, the third line of the Java DSL does not contain any useful information for our model it just adds unnecessary noise.

classes = new Classes(...)

The Classes wrapper exists only out of necessity, because Java unfortunately provides no way to instantiate and at the same time fill a java.util.Collection with entries. For this purpose Scala provides factory methods. Listing 3 contains a conventional factory method named apply.

Listing 3
class Class {
  def apply(classes:metamodel.Class*):List[metamodel.Class] = {
    var classList:List[metamodel.Class] = Nil
    for (clz <- classes)
      classList += clz
    classList
  }
}
 

The Factory methods of the class Class can be called using

 Class(...)

The constructor call via new is no longer applicable and the wrapper class isn’t necessary anymore.

Let's take a first look at our Scala DSL in Listing 4.

Listing 4
var hibernateMapping
  = new HibernateMapping() {
  classes=Class(
    new Class() {
      Type=classOf[User]
      tableName="USERS"
      properties = Property(
        new Property() {
          name="birthDate"
          Type=classOf[Date]
        }
        ...
      )
      ...
    },
    ...
  )
}
 
 

Unlike the Java equivalent, this comes out of the box with less noise as semicolons are optional in Scala, and we have already eliminated the wrapper. In addition, the example shows that Java and Scala are interoperable. For example, in our Scala DSL we reference the persistence class User, which we implemented in Java as well as native Java types such as java.util.Date. But unfortunately direct metamodel-instantiating via new still muddies the overall picture (for example, lines 4 and 8).

Singleton Objects

The factory methods have proven useful and let's see if they can help us out of the next jam. We note that products of the factory methods and the transfer parameters are identical:

apply(properties:Property*):List[Property]

The task of the factory method is limited just to instantiating the given metamodel elements at the moment, which is admittedly unimpressive. Basically, we hope that a factory method receives configuration and is hooked into our model. For example, a property can exist only in the context of a class. So such a factory method should expect the Property configuration, bind it to a class configuration and return the modified class configuration. To fulfill the requirement we change the signature of the factory methods, so that the parent metamodel element is always returned:

apply(properties:Property*):Class

The factory method returns the parent metamodel types (in our example the Class), but never gets a parent type as a parameter. So how does the factory method get the required information?

Fortunately, for such problems Scala provides singleton objects. Since Scala takes very much to heart the guiding principle "Everything is an Object!" it is possible to use stand-alone objects without these instances being of any class. Similar to the keyword class by using the object keyword a single singleton object can be created. As a comparison such objects are like static fields in Java.

So we'll put the factory method in such a Singleton object. In addition, we give the singleton a field with the corresponding metamodel type. Listing 5 shows the code for such a singleton object.

Listing 5
object Property {
  private[config] var property = new Property
  def apply(p:Property*):Class = {
    Class.clazz.properties += property
    property = new metamodel.Property
    Class.clazz
  }
}
 
 

The factory method works both on its own (line 5), as well as on the singleton field of the parent (lines 4 and 6). Through the modifier private[config] the singleton fields are only visible inside the package config. By the way - Scala, unlike Java, allows a much more granular definition of what is visible. What all the Factory methods have in common is that they each return the parent singleton field. Thus, the factory methods property (...) and Id (...) always returns a Class instance. Using the same return types, the factory methods can be invoked in any order. A user of our DSL is therefore free to decide whether first the ID and then the properties are defined, or vice versa.

The shared singleton fields raise questions regarding concurrency. The current version offers no synchronized access and is therefore not thread safe. We refrain at this time from using sophisticated synchronization policies in order not to inflate the sample still further. Under the assumption that a Hibernate configuration is loaded once and only one exists within an application, we can ignore synchronization issues for the time being.

If the name of a Scala class and a singleton object are identical, then Scala uses the terminology companion class or companion object. Since the same name space is shared the private fields are visible to each other. Our singleton objects are not bona fide companion objects, since the package name of the DSL configuration objects deviate from those of the metamodel classes. This distinction is desired, because a user of our DSL only wants to work with configuration objects and should therefore never come directly in contact with the internal metamodel. This application of the Anti-Corruption Layer [8] decouples the external configuration from the internal metamodel. Thanks to this decoupling of the two representations they can be developed independently of one another.

In this step we have separated our DSL from the internal metamodel. DSL users no longer have to create their own instance of our metamodel. In addition, the order of the configuration is unimportant, which provides additional comfort. But if we look closer, we seem to have sanded away a little too much. In Listing 5 we notice that the parameter value is never evaluated. Configuration information, like the table name of a class, unfortunately never finds its way into the internal metamodel at the moment. Our metamodel instances are currently little more than empty shells.

Operators and Methods

Since DSL users work exclusively on the configuration model, metamodel instances can not be initialized directly. As a result our DSL needs to be expanded to include additional language elements. We want, for example, that users can define the table name of a persistent class.

For this purpose, we create special singleton objects. Instead of factory methods the singletons have simple methods that expect a configuration value and return a modified metamodel instance. The singleton object used to set a table name, for example, is shown in Listing 6.

Listing 6
object TableName {
  def -> (tableName:String) : metamodel.Class = {
    Class.clazz.tableName = tableName
    Class.clazz
  }
}
 
 

The metamodel element as the return type guarantees that the expressions can still be specified in any order by the user. In addition, the return type ensures, for example, that the table name can only be set in the context of a class. The use of the TableName object in the context of a property would result in a compiler error because the return types are incompatible.

What is striking is the name of the singleton method. In Scala symbols like “+“, “-“ or “>“ can be used without a second thought in method names. In other languages, including Java, such symbols would be interpreted as operators. Once again, Scala behaves in an object-oriented manner, as operators in Scala are entirely unknown. Expressions disguised as operators, for example, "1 + 1" turn out to be simple method calls. The expression "1 + 1“ can alternatively be written as "1.+(1)“, however, the first variation is more readable. Each method in Scala can be expressed either in operator notation or dot notation. Thanks to the operator notation we can express the table name of a persistent class with the new singleton object as follows:

TableName -> "USERS"

On our singleton object TableName we call the method "->". This call returns a Class instance with the table name set. So for the creation of table names, we are off the hook. But unfortunately, the same approach does not help us with the singleton object Name. The metamodel types Property, Id, ManyToOne and Set all have a name field that you need to set. For the method “->“in the name singletons, we necessarily run into conflicts, as we must commit ourselves to using one of the four types as the return type.

Currying

A naive approach to get to grips with this ambiguity would be to append additional parameters and overload the singleton methods. Instead of a single one, we define a "->" method for each metamodel type in the Name Singleton:

def -> (name:String, p:Property) : Property

def -> (name:String, id:Id) : Id
...

Although this solution appeases our compiler, our DSL will be very difficult to use due to the additional parameters. To set a name here we'll always have to loop through the corresponding singleton field:

Name->("name", Property.property)

The additional and unnecessary parameters are also unfortunately again instances of our metamodel and pollute our DSL. To make matters worse, we must set the parameter list in parentheses, as parentheses are optional only for methods that have zero or one parameters.

To rescue our good work, we have to dig deep into our tool box again. Currying and implicit parameters come to the fore to help us escape from this mess. First, we focus on the use of currying, a strange-sounding technique from functional programming, named after Haskell Brooks Curry. Thanks to currying, in Scala it is possible to break up parameter lists into separate parameter lists.

def -> (name:String, p:Property) : Property

def -> (name:String) (p:Property) : Property

The two method signatures can be used alternatively in Scala. The lower of the two methods has two separate parameter lists. Via currying this method is internally processed in a chain of two successive function calls. The first sub-function returns another function as a result, which is passed as an argument to the second sub-function. Within the method body we have the usual access to the parameters of all the parameter lists. Thanks to currying we have at least parameter lists, each with one parameter, and thus regained the operator notation without parentheses. The second additional parameter list continues to exist, however.

 

Implicit Parameters

The aim of the final step is to trim the singleton methods, so that we can do without the unnecessary second parameter list. Since there are no optional parameters in Scala, the parameters of all the parameter lists are mandatory for each method call. By using implicit parameters, we can relax this restriction a bit. With the keyword implicit, the last parameter list can be marked as implicit. An implicit parameter is characterized by the fact that it must not necessarily be given by the method caller. For a caller, implicit parameters act like optional parameters. We can therefore extend our excess second parameter list with the implicit keyword:

def -> (name:String) (implicit p:Property) : Property

As an alternative to calling a method with two parameter lists a method caller can now dispense entirely with the second parameter list. Thus, our DSL finally has the desired syntax to be able to do convenient persistence mapping, for example, setting a property name: 

Name -> "name"

In the above expression, the caller gives the first parameter explicitly; the second is suppressed by the caller. As Scala has no optional parameters, the method necessarily expects both parameters. But where does this additional implicit parameter come from? Variables can also be marked as implicit like individual parameters. All variables that are marked as implicit are used in cases where a parameter set of the caller does not agree with the method signature. In the case of our suppressed parameter of type Property somewhere in the source code there must be a variable of type Property that is marked as implicit. And what variable is more suited than the existing singleton object’s Property field? With the keyword implicit we can mark this field as implicit as well:

implicit var property = new metamodel.Property

The Java and Scala compiler in each case ensure that methods are always invoked with the expected parameters. If the parameters given by the caller do not match the method’s signature, the Java compiler aborts with a compiler error. The Scala compiler goes a step further and checks the implicit parameters. If a parameter marked as implicit is not given by the caller it will be filled with a variable marked as implicit if necessary. If no implicit variable can be found for an implicit parameter then the Scala compiler too admits defeat.

Summary

After all the changes our DSL can be used - as shown in Listing 7.

Listing 7
HibernateMapping(
  Class(
    Type->classOf[User],
    TableName->"USERS",
    Property(
      Name->"birthDate",
      Type->classOf[Date],
      ...
    ),
    ...
  ),
  ...
)
 

As opposed to an XML or a Java DSL approach we achieve our aims with much less configuration code and less noise. Compared to XML, language based internal DSLs do not require additional marshalling or unmarshalling steps. In addition IDEs support us through code completion and direct compiling. Additional model validation, such as checking of mandatory fields, can be directly implemented in Scala. Validation rules can be implemented as Scala compiler plugins [9] so that they can be evaluated during compile time.

Scala offers other great concepts, such as for example traits or functions (see [10]) that we could exploit in our DSL in order to gradually abstract from the underlying metamodel and therefore offer the user an even simpler API. This article has hopefully demonstrated that Scala-based DSLs can wonderfully enrich complex software systems. There are, for example, efforts to use scala-based DSLs for safety-critical configurations within the eHealth Framework [11]. Anyone who would like to learn more about the DSL created in the article can do so can take a look at the source files here [12].

Links & Literature

[1] This article originally appeared in the German language publication Java Magazin published by Software & Support Verlag, Frankfurt am Main, Germany. Republished here with their permission.
[2] www.scala-lang.org/downloads
[3] Arno Haase: „Im Dienste der Allgemeinheit“, Java Magazin 07/2009
[4] Roman Roelofson, Heiko Seeberger: „OSGI on Scala – Scala Modules“, Java Magazin 08/2009
[5] camel.apache.org/scala-dsl.html
[6] Christian Bauer, Gavin King: „Java Persistence with Hibernate“, Chapter 3.1.2
[7] code.google.com/p/liquidform/
[8] Eric Evans: „Domain-Driven Design“, Chapter 14
[9] www.scala-lang.org/node/140
[10] Martin Odersky, Lex Spoon, Bill Venners: „Programming in Scala“
[11] idn.icw-global.com/solutions/ehealth-framework.html
[12] idn.icw-global.com/blogs/ehf-team-blog/blog-post/2009/11/03/do-it-yourself-dsls-in-scala.html

Download the Eclipse projects for the article.