Page 1 of 4
Eric Lippert's name is synonymous with C#. Having been Principal Developer at Microsoft on the C# compiler team and a member of the C# language design team he now works on C# analysis at Coverity.
If you know C# then the name Eric Lippert will be synonymous with clear explanations of difficult ideas and insights into the way languages work and are put together. However this didn't stop our interviewer Nikos Vaggalis (NV) from ranging over topics as diverse as the future of C#, asynchronous v parallel, Visual Basic and more.
Read on because you are sure to find something to interest you about C#, languages in general or just where things are headed.
Eric Lippert (Source; Mark Aiken)
NV : So Eric, after so many years at Microsoft you began a new career at Coverity. Was the 'context switch' easy?
EL : Yes and no. Some parts of it were very easy and some took some getting used to.
For example, re-learning how to use Unix-based development tools, which I had not touched since the early 1990s, took me a while. Git is very different than Team Foundation Studio. And so on. But some things were quite straightforward.
Coverity's attitude towards static analysis is very similar to the attitude that the C# compiler team has about compiler warnings, for instance. Though of course the conditions that Coverity is checking for are by their nature much more complicated than the heuristics that the C# compiler uses for warnings.
Switching from taking a bus to downtown every day instead of taking a bus to Redmond every day was probably the easiest part!
NV: I guess that from now on you'll be working on the field of static analysis. What exactly does static analysis do?
EL: Static analysis is a very broad field in both industry and academia. So let me first start very wide, and then narrow that down to what we do at Coverity.
Static analysis is analysis of programs based solely on their source code or, if the source code is not available, their compiled binary form. That is in contrast with dynamic analysis, which analyses program behavior by watching the program run. So a profiler would be an example of dynamic analysis; it looks at the program as it is running and discovers facts about its performance, say.
Any analysis you perform just by looking at the source code is static analysis. So for example, compiler errors are static analysis; the error was determined by looking at the source code.
So now let's get a bit more focused. There are lots of reasons to perform static analysis, but the one we are focused on is the discovery of program defects. That is still very broad. Consider a defect such as "this public method violates the Microsoft naming guidelines". That's certainly a defect. You might not consider that a particularly interesting or important defect, but it’s a defect.
Coverity is interested in discovering a very particular class of defect.
That is, defects that would result in a bug that could realistically affect a user of the software. We're looking for genuine “I'm-glad-we-found-that-before-we-shipped-and-a-customer-lost-all-their-work” sort of bugs. Something like a badly named method that the customer is never going to notice.
NV: How far can it go ? Can it diagnose issues like memory leaks or predict potential concurrency bottlenecks?
EL: Absolutely, those are two really good realistic examples of use of static analysis. Memory leaks are of course a serious, customer-impacting defect.Coverity refers to its defect detection algorithms as "checkers"; each checker looks for a particular sort of defect. We have a checker called RESOURCE_LEAK that looks specifically for leaks -- in C/C++ it looks for memory leaks, in C# it looks for things like files that were opened but never closed.
I don't believe we have a checker specifically for the problem of finding performance bottlenecks in concurrent code; that is more the domain of dynamic analysis. But we do have many checkers that look for problems in concurrent code. For example, we have a checker that notices if nine times you access a field inside a particular lock, and the tenth time you do not, then probably the tenth time someone forgot to use a lock. That's a relatively simple concurrency checker; they get much more complicated than that, as you might imagine.
NV: Do Code contracts play a role, and will the introduction of Roslyn affect the field of static analysis?
EL: Let me split that up into two questions. First, code contracts.
So as you surely know, code contracts are annotations that you can optionally put into your C# source code that allow you to express the pre-condition and post-condition invariants about your code. So then the question is, how do these contracts affect the static analysis that Coverity does? We have some support for understanding code contracts, but we could do better and one of my goals is to do some research on this for future versions.
One of the hard things about static analysis is the number of possible program states and the number of possible code paths through the program is extremely large, which can make analysis very time consuming. So one of the things we do is try to eliminate false paths -- that is, code paths that we believe are logically impossible, and therefore do not have to be checked for defects. We can use code contracts to help us prune false paths.
A simple example would be if a contract says that a precondition of the method is that the first argument is a non-null string, and that argument is passed to another method, and the second method checks the argument to see if it is null. We can know that on that path - that is, via the first method - the path where the null check says "yes it is null" is a false path. We can then prune that false path and not consider it further. This has two main effects. The first is, as I said before, we get a significant performance gain by pruning away as many false paths as possible. Second, a false positive is when the tool reports a defect but does so incorrectly. Eliminating false paths greatly decreases the number of false positives. So we do some fairly basic consumption of information from code contracts, but we could likely do even more.
Now to address your second question, about Roslyn. Let me first answer the question very broadly. Throughout the industry, will Roslyn affect static analysis of C#? Absolutely yes, that is its reason for existing.
When I was at Microsoft I saw so many people write their own little C# parsers or IDEs or little mini compilers or whatever, for their own purposes. That's very difficult, it’s time-consuming, it's expensive, and it's almost impossible to do right. Roslyn changes all that, by giving everyone a library of analysis tools for C# and VB which is correct, very fast, and designed specifically to make tool builder's lives better.
I am very excited that it is almost done! I worked on it for many years and can't wait to get my hands on the release version.
More specifically, will Roslyn affect static analysis at Coverity? We very much hope so. We work closely with my former colleagues on the Roslyn team. The exact implementation details of the Coverity C# static analyzer are of course not super-interesting to customers, so long as it works. And the exact date Roslyn will be available is not announced.
[Roslyn was announce a day or two after this interview see C# and VB are open sourced and Eric was pleased that it was open sourced - if a little worried about the world seeing his code. Ed]
So any speculation as to when there will be a Coverity static analyzer that uses Roslyn as its front end is just that -- speculative. Suffice to say that we're actively looking into the possibility.
NV: Roslyn's official definition states it is a "project to fully rewrite the Visual Basic and C# compilers and language services in their own respective managed code language; Visual Basic is being rewritten in Visual Basic and C# is being rewritten in C#. "
How is C# is being rewritten in C# ?
EL: C# and VB are both good languages to write a compiler.
It seems somewhat magical; how can you write a compiler for a language in that language? Don't you have to start somewhere?
And of course the answer is yes: you do have to start somewhere.
C# 1.0 through 5.0 compilers were written in C++. For quite a long time -- over a year -- we wrote the Roslyn C# compiler in C# and compiled it with C# 4.0. (C# 5.0 was being developed in parallel by a sister team.) The day that we could compile the Roslyn compiler, and then turn right around and compile it again using the compiler we'd just built, that was a happy day.
Microsoft strongly believes in a technique called "eat your own dog food". That is, the Outlook team uses yesterday's version of Outlook to read today's mail, and so on. The Roslyn compiler and IDE teams have been dog-fooding for a long time now. Today's build of Roslyn is developed using yesterday's version of the IDE and compiled with yesterday's compiler. You find bugs really fast that way!
NV: In other words, that's the process of bootstrapping?