Peachpie Open Source PHP to .NET Compiler

Written by Nikos Vaggalis

Monday, 24 October 2016

Article Index
Peachpie Open Source PHP to .NET Compiler
Advantages of Roslyn

Page 1 of 2

Peachpie is a new open source PHP language to .NET compiler, which aims at full PHP 7 compatibility. Looking at it gave us the opportunity to revisit the state of dynamic language interoperability on the .NET platform and consider the practical advantages that arise out of this atypical symbiosis of dynamic and static languages under the same roof.

We've always been keen on the advancements in the field of programming language interoperability, especially from .NET/CLR's perspective; as such we were there when the big bang happened, the period that the very first attempts in bridging and bringing the dynamic languages to the statically oriented CLR commenced through the introduction of the DLR.

An early and notable attempt to demystify the inner workings of the DLR was provided in Pro DLR in .NET 4.0 which we reviewed back in 2010.

The following excerpt from that review revealed the essence of the DLR:

a runtime that sits atop the CLR and hosts dynamic languages.
It makes implementing a new language, be it a dynamic, application or domain specific one, much easier to build since you can use ready made parts and leverage existing functionality; for example instead of implementing a GC you plug into the CLR's GC.

Furthermore,

Chapter 2 examines DLR expressions in depth. These are the common-denominator that ties all languages in the DLR platform. To sum it up in a few lines: each language has its own parser which parses the source code and results into a AST (not optree;abstract-syntax-tree is more flexible). For example IronPython produces an AST and IronRuby produces its own AST as well. Both ASTs are then transformed into a DLR AST, which essentially IS the Expressions, so that everybody speaks the same language. The tree itself is composed by nodes that are objects which represent code. The idea is that representing code as a graph instead of bytecode/IL, allows walking it, manipulating and extending it (using the Visitor pattern), semantic transformation of the tree (as the example with the custom Linq query provider in chapter 8 demonstrates) and applying optimizations. These Expression trees can be directly interpreted by the DLR runtime or be compiled into bytecode/IL for execution by the CLR.

Then you can mix and match objects from different languages. For example calling an IronPython library from C# and vice versa (or even call the .NET graphical library from within IronRuby). I think that is the groundbreaking issue here, not only connecting dynamic languages with dynamic languages but also dynamic languages with their static counterparts.

For example, in Chapter 4, we were introduced to the ways of using dynamic Ruby objects from within Python code, as well as to the ways that enable a static language like C# to seamlessly interoperate with a dynamic language like Ruby

Soon enough the first fully fledged DLR incarnations of the popular dynamic languages Python, with IronPython, and Ruby with IronRuby, began to appear on the.NET platform. These were not the only ones however; another porting attempt was that of Javascript's in its IronJS alter ego. The interview with its creator Fredrik Holmström was particularly insightful, going through the language's core concepts and the DLR's infrastructure and giving answers to questions that pertain to this day, like why on earth would someone get into the trouble of implementing a dynamic language on the .NET platform:

FH: I think the main advantage is just the DLR, I was investigating an implementation on top of the JVM at first, this was back in early 2010 if I remember correctly. The DLR gives you a lot of really great stuff for free.

NV: Like language interoperability and what else ?

FH: That is one thing for sure, but also just how solid the DLR code is and how big of a piece of the technology it solves for you (emitting the IL)

NV: DLR has a reputation of making it easy to create a new language implementation. Is that meant in the context of the language's implementer borrowing core facilities out of the box such as the CLR's GC or JIT?

FH: The DLR lets you turn your AST into what they call an Expression Tree, which is a high level, object oriented version, of IL. In reality, the expression trees are syntax trees, but they are compiled to IL by the DLR and while emitting IL isn't a complicated task, it's time-consuming to write code that does it. So, getting that for free, is a big thing.

NV: After emitting IL, is it compiled or interpreted by the DLR at runtime?

FH: Well, that's pretty much one process, you hand the DLR your expression tree, it gives you back a delegate which you can invoke. The delegate itself can either be interpreted by the DLR or compiled by the JIT; it's something you specify when you compile it.

Another non-DLR but closely related to the porting of a dynamic language to a non compatible Virtual Machine was the three part interview with Jonathan Worthington on Perl 6 for the JVM. It emphatically highlighted the new possibilities arising from such a move:

NV: What about C#’s injected into-dynamic language features? Do they really make a difference?

JW: Are you thinking of things like the ‘dynamic’ keyword in C#? If so, I can only say that my experience “in the wild” is that most developers don’t use that. I’m not sure we should be too quick to say, “Oh, it’s because C# programmers refuse to consider dynamic typing”, however. Many people I teach are barely using Linq or lambda expressions, which showed up in C# a release before “dynamic”.

The “Ooh, that’s actually useful” moment people tend to have with dynamic is when I show them how, with the appropriate library, you can use dynamic typing in C# to dig into JSON documents, just like you would do in JavaScript. “Oh, we don’t have to build a load of types to deserialize this stuff into!” So, mostly it’s showing people how things are actually useful.

NV:And what can a dynamic language like Perl offer a C# programmer and vice versa?

JW: “The limits of my language mean the limits of my world.” – Wittgenstein.
Now, I’m pretty sure he was talking about natural languages, but I think it goes for programming languages too. And, just like a natural language has a community and a value system around it, so do programming languages.
Perl and C# may both be multi-paradigm languages, but they do feel rather different to write. I found myself approaching problems in different ways in each of them. But I’m quite sure working in both has influenced the way I write each of them.

I think in the Perl to C# direction, you bring an understanding that not everything needs to be expressed as methods on a class, fitting into some type hierarchy. It’s OK for things to be “just a subroutine”, or “just a function”, or “just a quick thing you do with a regex” – because that’s exactly what some problems need.

Going the other way, I’d say you perhaps come with a little more appreciation of where adding type annotations into programs can help. I’m excited about what we’re doing in Perl 6 in this regard with gradual typing. And, if you get into C#’s Linq stuff really well, you start to see more operations as list processing, which Perl is rather good at. Perl 6 especially so.

And that’s a win, because chains of list operations are easy to read – the next thing’s input is the last thing’s output – as well as nice and functionalish.

but also on the limitations and hindrances too :

NV: The JVM has been primarily designed with statically typed languages in mind. The same goes for the CLR, and that is why the DLR (build on top of CLR) came into existence. Have you at some point considered the DLR (maybe combined with Mono instead of the CLR) or JVM’s Dynalink, both of which admittedly have a good Meta Object protocol infrastructure, as a potential backend to Rakudo?

JW: One interesting thing to note about Perl 6 is that it’s a gradually typed language, which means the considerations are a little different from if it was dynamically typed. In that sense, VMs which explicitly seek to do both static and dynamic typing well are especially interesting for Perl 6.

I can’t speak too well to the DLR, but I do know that Niecza, the Perl 6 on CLR implementation, went the way of not using it for a range of reasons. By contrast, the JVM’s invokedynamic instruction has been rather interesting from a Perl 6 implementation point of view.

Dynalink is certainly of interest too, in so far as it seems to provide an interesting path to enabling calling Perl 6 code from Java. I’d rather reuse an existing solution than reinvent that wheel. I need to dig into it more deeply to be really sure.

To discover the reasons that the DLR was not a good fit for Niesza you will have to read our interview with the Niesza man himself, Stefan O'Rear in, Niecza - Perl 6 Implemented in .NET

Things have changed and rapidly evolved since then, the catalyst in the process being the introduction of Roslyn, the tool behind Peachpie's birth. As to the question of what Roslyn is, what is better than getting the authoritative answer than by a member of the Roslyn team , the renowned C# Guru himself, Eric Lippert? The opportunity came about in the form of an interview that he gave us back in 2014:

NV: Roslyn's official definition states that it is a "project to fully rewrite the Visual Basic and C# compilers and language services in their own respective managed code language; Visual Basic is being rewritten in Visual Basic and C# is being rewritten in C#. "
How is C# being rewritten in C# ?

EL: When I was at Microsoft I saw so many people write their own little C# parsers or IDEs or little mini compilers or whatever, for their own purposes. That's very difficult, it’s time-consuming, it's expensive, and it's almost impossible to do right. Roslyn changes all that, by giving everyone a library of analysis tools for C# and VB which is correct, very fast, and designed specifically to make tool builder's lives better. I am very excited that it is almost done! I worked on it for many years and can't wait to get my hands on the release version.

C# and VB are both good languages to write a compiler.
It seems somewhat magical; how can you write a compiler for a language in that language? Don't you have to start somewhere?
And of course the answer is yes: you do have to start somewhere.

C# 1.0 through 5.0 compilers were written in C++. For quite a long time -- over a year -- we wrote the Roslyn C# compiler in C# and compiled it with C# 4.0. (C# 5.0 was being developed in parallel by a sister team.) The day that we could compile the Roslyn compiler, and then turn right around and compile it again using the compiler we'd just built, that was a happy day.

Microsoft strongly believes in a technique called "eat your own dog food". That is, the Outlook team uses yesterday's version of Outlook to read today's mail, and so on. The Roslyn compiler and IDE teams have been dog-fooding for a long time now. Today's build of Roslyn is developed using yesterday's version of the IDE and compiled with yesterday's compiler. You find bugs really fast that way!

So let me start by taking a step back and reiterating what Roslyn is, and is not. Roslyn is a class library usable from C#, VB or other managed languages.Its purpose is to enable analysis of C# and VB code. The plan is for future versions of the C# and VB compilers and IDEs in Visual Studio to themselves use Roslyn.

So typical tasks you could perform with Roslyn would be things like:
"Find all usages of a particular method in this source code"

"Take this source code and give me the lexical and grammatical analysis"
"Tell me all the places this variable is written to inside this block"

Now, let me quickly say what it is not. It is not a mechanism for customers to themselves extend the C# or VB languages; it is a mechanism for analyzing the existing languages. Roslyn will make it easier for Microsoft to extend the C# and VB languages, because its architecture has been designed with that in mind. But it was not designed as an extensibility service for the language itself.

You mentioned a REPL. That is a Read-Eval-Print Loop, which is the classic way you interface with languages like Scheme. Since the Roslyn team was going to be re-architecting the compiler anyway they put in some features that would make it easier to develop REPL-like functionality in Visual Studio. Having left the team, I don't know what the status is of that particular feature, so I probably ought not to comment on it further.

One of the principle scenarios that Roslyn was designed for is to make it much easier for third parties to develop refactorings. You've probably seen in Visual Studio that there is a refactoring menu and you can do things like "extract this code to a method" and so on. Any of those refactorings, and a lot more, could be built using Roslyn.

As for if there will be an eval-like facility for spitting fresh code at runtime, like there is in JavaScript, the answer is sort of.
I worked on JavaScript in the late 1990s, including the JScript.NET langauge that never really went anywhere, so I have no small experience in building implementations of JS "eval". It is very hard. JavaScript is a very dynamic language; you can do things like introduce new local variables in "eval" code.

There is to my knowledge no plan for that sort of very dynamic feature in C#. However, there are things you can do to solve the simpler problem of generating fresh code at runtime. The CLR of course already has Reflection Emit. At a higher level, C# 3.0 added expression trees. Expression trees allow you to build a tree representing a C# or VB expression at runtime, and then compile that expression into a little method. The IL is generated for you automatically.

If you are analysing source code with Roslyn then there is I believe a facility for asking Roslyn "suppose I inserted this source code at this point in this program -- how would you analyze the new code?" And if at runtime you started up Roslyn and said "here's a bunch of source code, can you give me a compiled assembly?" then of course Roslyn could do that. If someone wanted to build a little expression evaluator that used Roslyn as a lightweight code generator, I think that would be possible, but I've never tried it. It seems like a good experiment. Maybe I'll try to do that.

Prev - Next >>

Last Updated ( Monday, 24 October 2016 )

Recent Articles

Recent Book Reviews

Popular Articles