Lately I've been working on the client-side developing a mult-international Grails application. As this application is being used from several different parts of the world, an important requirement has been to support MySQL's collations for the most important SQL queries.
MySQL Collation Support
In MySQL there is a distinction between the character set and its collations. According to the documentation,
a character set is a set of symbols and encodings. A collation is a set of rules for comparing characters in a character set
Therefore collations can be used to implement language specific sorting, which has been exactly the requirement in my case. The MySQL documentation gives good examples for the impacts of different collations .
For example, the following data set
SELECT name FROM <data_set> ORDER BY name COLLATE <collation_name>;
gives the following results for the german1 and german2 collations for the latin1 character set:
Collations are not only useful for German, but also East-European languages where sorting rules are sometimes totally different than those found in germanic languages.
Collation Support for GORM
Before adding collation support to HQL we have to be aware of a huge drawback: collations are a MySQL specific feature. By introducing HQL support for collations, we introduce a large binding to MySQL, including collation-aware integration tests. AFAIK there is no built-in query-based support for collations in H2, so we have to stick to MySQL-backed integration tests.
Let's start our journey to adding collations to HQL. The first step is to find the right place to add this kind of functionality. My first thought, (I'll spoilt it, not proven to be a good one) was to implement a custom org.hibernate.Interceptor that modifies the SQL query right before its execution in the onPrepareStatement method. I very quickly reached the point where I was missing too much meta-data about the SQL query that needed collation support. In a nutshell, it turned out to be too complicated to add automatic collation support in a custom Hibernate interceptor, so I decided to find a way to somehow introduce a "collation function" to HQL.
Whenever adding custom functions to HQL the org.hibernate.dialect.Dialect class comes to rescue. There are various dialect implementations MySQL5InnoDBDialect, H2Dialect, DB2Dialect are only some among them. When having a look at the MySQLDialect implementation we can see the Dialect class is being used for registering MySQL specific built-in database functions and types. In addition, language specific features such as paging support, registering var char types for various lengths etc. are made by the dialect implementation.
In order to support the manual specification of a collation in HQL I added a custom MySQL5InnoDBDialect descendant class: CollationAwareMySQL5InnoDBDialect. The collation aware dialect does only a handful of things, but the most important one is: it registers a collate function object of type org.hibernate.dialect.function.SQLFunction:
The most important method is the render method. It is called whenever the function is being detected and a MySQL raw SQL query replacement String is needed. As you can see above, the collate function needs to be called with a single argument: the column/property that needs the COLLATE statement in the resulting SQL query.
Extending the dialect with a custom function turned out to be very flexible. It not only supports a manual way to specify when and where to use a collation (it can even be used in every place where collate is valid in terms of the MySQL query syntax; SELECT, ORDER BY WHERE) buts fits naturally into the HQL syntax and supports several "enscapsulated" function calls:
from Person where collate(lower(name)) like :name
There is still a missing piece: how to register the mapping between a language and the chosen collation via the Grails configuration?
The place to go seemed to be the GrailsAnnotationConfiguration. This is the code where setting up the Hibernate session factory (with annotation support, hence the name) happens on startup. A GrailsAnnotatioNConfiguration descendant class can be configured via the data-source configuration:
In the custom configuration class the collate function is registered:
The most important part in the code above is the buildSessionFactory method. After Grails has configured the Hibernate session factory, the registerCollationFunctions method from the custom Dialect is called.
There is a little detail that wasn't talked about explicitly until now. How can the mapping from the configuration between a language and it's collation be specified for a particular HQL at runtime? In fact the point is how to tell the SQLFunction the collation string that should be used for a specific query?
This is where the registerCollationFunctions comes into play. As its name implies, it register more than one collate function. In fact, it register a collation function for every collation that has been specified in the languageIdCollationMapping. The trick is, to provide the most important finder methods via a custom meta-class enhancer, the CollationSupportEnhancer. This class adds various dynamic methods to our domain classes, all with a language id parameter used to determine the collation to be used. Another little trick is applied in the prepareQuery method. For the given language id, the dynamic method gets the collation to be used and replaces the collate HQL function call with a function name for the specific collation:
The dynamic methods added by this class are the ones that can effectively use the collate function in the given HQL queries by specifying the language id besides the HQL query and parameters:
As can be seen in the CollationSupportEnhancer code above, the enhancer modifies the HQL query a bit: it replaces the collate function name with the collation name. The collate function replacement does also solve a not directly visible issue: Hibernate's SQL caching with the HQL query plan.
The query plan acts as internal cache for HQL queries. If we would have built the solution on using the collate function and selecting the right collation string in the SQLFunction we would have been running into troubles with the query plan as the same HQL statement results in potentially different SQL statements. As a matter of fact, this would be leading to having the first SQL query that was generated from our HQL query to be "bound" to that specific HQL query, which isn't what we want, as there could be multiple SQL statements which are bound to our HQL query.
Now it might be clear why the dialect method has been called registerCollationFunctions. It registers a function for every collation that is configured in the data-source configuration:
The code above shows how multiple SQL functions are registered. This way the correct SQLFunction object is being used to render the desired collation string.
There are still some missing pieces like adding a faked collation support to the H2 dialect, to let collate HQL queries run in H2-based integration tests. But the article should have shown the most important parts for adding collation support to HQL queries.
There are various places in Hibernate and Grails that can be leveraged to add a collate HQL function and dynamic GORM finder methods that do support it.
Adding MySQL collation support to HQL is not a trivial thing and breaks compatibility to other databases. In my case, the project has been committed to MySQL and there are more parts of the application that directly use MySQL-only features, so this wasn't a big deal.