|How To Create Pragmatic, Lightweight Languages|
Page 3 of 3
6: Mapping: from the parse-tree to the Abstract Syntax Tree
Transforming the parse tree is a bit more complicated and requires working with the classes generated by ANTLR to represent the nodes of the parse tree. So the rest of the chapter does that, mapping each parse tree node to a Kotlin class.In other words, we treat the AST as a collection of Kotlin classes and interfaces.
In addition, we also have to resolve symbols. By resolving symbols we create new links between references and their declaration. In a sense we transform our tree into a graph, because links are not strictly hierarchical. This is further expanded in 7: Symbol resolution".
The following chapter, 8: Typesystem, is a very interesting one in that it attempts to be more theoretical in tackling the type system pertaining to a language, so it briefly catalogues the most common types found in a language such as Java:
We also define our type system's rules in order to check for semantic errors such as:
In case you're not in sync with the background theory, just jump into the code examples which leverage these concepts.Everything will become clear after going through the code which is annotated and explained by the author after it gets dissected.This principle runs throughout the book.Kotlin is surely playing its part with its clean syntax in rendering things more accessible.
Of course the chapter wouldn't end without defining a type system for our toy languages.
9: Validation, which has sections for both MiniCalc and StaMac, brings Prt 1 to an end.
(click on book cover for details on Lean Pub site)
Part II: Compiling
At this point you should be wondering, "OK, I've done both syntactic and semantic validation, what's next?" Doing something with the information obtained of course! This could be interpreting the code to execute something in response; compiling it to native code or bytecode; or even generating something else like a graph or some code for another language.
But first, in 10: Build an interpreter we look at the components required which, surprisingly, do not include data structures or garbage collectors; we can just reuse the ones from the host language:
Do you need a map in your language and you are writing the interpreter in Java? Just use a Java map in your interpreter. Same applies to lists, hash tables and so on. This significantly reduces our work.
That is the modern way of building compilers.Reusing existing, well crafted components to lessen the workload.
The interpreter code written in Kotlin follows, but don't imagine thousands of lines of code. Kotlin makes it a breeze since it reduces boilerplate.As far as the interpreter vs compiler debate goes, the book has the following to say, which also puts a cap on the chapter:
In many cases starting by writing an interpreter is just easier compared to writing a compiler.I would suggest going thorugh this path at least while you are designing the language and it is not yet stable. Have fun writing interpreters and when you are ready let’s move to the next chapter and explore an alternative: generating bytecode.
So bytecode it is, in 11: Generate JVM bytecode. But first what are the advantages of compiling to JVM bytecode?
we can leverage all the JVM libraries, we can run the same code on all platforms supported by the JVM, but the user must have the JVM installed.
In order to do that we have to build a Compiler and even before that we have to familiarize ourselves with the inner workings of the Java Virtual Machine, therefore we first examine how Java code ends up in bytecode. As such we see how Classes, Mathematical operations, Type conversions, Operations on objects, method invocations, etc are represented in bytecode.
As for building the compiler, another component is called into play, ASM, a library which can produce class files and bytecode. Again, advanced constructs and functions such as generics and arrays, inner classes, invokedynamic, control-flow statements, etc have been left out. Bear in mind that as happens in almost every chapter, the book doesn't cover advanced language elements. not because it can't but because it wants to keep it simple, rather acting as the gateway to further research. As such, this chapter presents the most common, useful concepts you can leverage to write compilers for the JVM.
Then there comes its how-to-compile-to-native-code counterpart, in 12: Generate an executable using LLVM. In considering the advantage of compiling to native code:
we do not need any software installed to run on the machine of the user, we can reuse all native libraries but we need to generate different executables for different platforms.
Discussing all the features of LLVM is beyond the scope of the chapter, but for our purposes we need to know that LLVM is able to work with an intermediate representation (IR) of low-level code. You can think of the IR as a sort of portable assembler and in fact the aim is to leverage it through KLLVM, a library to generate LLVM IR in Kotlin.
So after looking into bytecode, we now must examine IR and a good way to do that is to write simple examples in C and use the clang compiler to generate the corresponding IR code which we compare against the C listings to figure out what's happening.
This chapter thus gets very technical, and if, like me. you have no previous experience of how LLVM and IR works, you'll find it difficult to follow, even at a basic level. There's a lot of voodoo going on and probably justified as compiling to native code was always considered non trivial, as other languages like Perl have discovered through unsuccessful experiments such as the perlcc tool - although RPerl looks like making the C jump, but only for a subset of the Perl language. Makes you feel grateful for the existance of the Virtual Machines!
The final chapters 13:Syntax highlighting and 14: Autocomplete
In the end, what this book has managed to do is to untangle a skein which had the notions of lexing, parsing, transformation, interpretation and compilation tied tightly together, and broken them apart into clean, well-defined parts, which under the unix philosophy they do one thing and they do it well, working together in order to reach to something bigger and more complex.
Further, serving those parts in bite size chunks renders them easier to absorb as the basics of post-modern compiler design are laid out. Lightweight but precise won't turn you into a language engineer as such, but at the very least will pave the way to finding that path yourself by removing the barrier and mist surrounding the art of writing compilers.
For those of you who have decided to build your own language or tool, and thus have the necessary impetus, I can recommend this as THE entry level guide to lexing, parsing and compiling with no prior experience required.
|Last Updated ( Tuesday, 08 August 2017 )|