Authors: William D Back, Nicholas Goodman, Julian Hyde
Aimed at: Analysts and developers
Reviewed by: Kay Ewbank
Mondrian is an open-source data analysis tool that can be used for analyzing data. You can use it to create reports, or to set up interactive systems that give business users the means to carry out their own analyses.
Mondrian is actually an open source OLAP engine that works with databases that support Java database connections. It also needs a server to host it, and the book largely assumes you’ll be using Pentaho, an open source business analytics suite. If you’re using this combination, you’ll get the most from the book, though quite a lot of the material is still relevant if you’re not using Pentaho.
One point to note is that the book was written with Mondrian 4.0 (pre-release) and Pentaho 4.8. As of now (April 2014), Mondrian 4.0 is still pre-release, so you’ll need to check out the Lagunitas branch from github and build it (https://github.com/pentaho/mondrian?source=cc ). If you haven’t got Mondrian 4, some of the features described such as Measure Groups aren’t available.
The authors start the book with an introduction to the overall topic of business analytics, and why a tool like Mondrian is useful. They then go on to give a high-level overview of what Mondrian is and how it works. The third chapter is another general introductory chapter, this time to star schemas and data marts. It also covers how to organize data so Mondrian can make best use of it. All three chapters are well written and seem to hit the right level of detail and explanation.
Having set the scene, the next four chapters start looking in more depth at Mondrian and how to use it. There’s a chapter on the Mondrian schema that introduces the database used in the rest of the book – a custom version of the well-known AdventureWorks familiar to anyone who’s encountered Microsoft’s BI products. A follow-on chapter looks at advanced schema features such as parent-child hierarchies and hanger dimensions, and how these can be used to model complex data. These two chapters go to the heart of multidimensional modelling with explanations of XML, cubes and schemas in a very understandable way. Roles and security get a chapter of their own, and getting the best performance out of Mondrian is also introduced at this point of the book, with coverage of aggregate tables and in-memory caching.
By Chapter 9 the authors have moved on to using Mondrian with Pentaho, and how to use Mondrian as the data source for analytics and dashboards. The authors also show how to use Mondrian with the Community Dashboard Framework.
There’s just one chapter on developing with Mondrian, but it’s a good one with some detailed explanations and sample code. The OLAP4J library and the XML4JS libraries are both covered, and you’re shown how to configure Mondrian as an XMLA web service, how to call XMLA services with Ajax, and how to call Mondrian from a Java application.
The final chapter is titled Advanced analytics, and covers using MDX, what-if analysis, R, and ‘big data’ – Hadoop, Hive, and NoSQL in a quick gallop. I got the impression this section was added in because someone who doesn’t know much about Mondrian said the book ought to cover big data!
The book closes with a couple of useful appendices on installing and running Mondrian, online resources, and schema shortcuts.
I enjoyed this book and thought it gives a good introduction to Mondrian, why it’s useful and how to use it. I liked the way the authors tied everything together as a story about a company that keeps asking for changes and more reports and analysis; it kept the interest going and is (in my experience) quite realistic.