====== LOLCODE on the DLR: how it came to life ====== > The following article was generously contributed by Martin Maly, Microsoft and IronPython language wizard, describing his experiences with implementing LOLCODE in Microsoft's Dynamic Language Runtime. The article makes reference to the implementation currently being hosted at [[http://www.iunknown.com/2007/11/lolcode-on-dlr.html|John Lam's blog]]. > I've done some minor editing and clean-up, so any errors you see are quite probably mine. --Adam Dynamic Language Runtime (DLR) is a brain child of Jim Hugunin ([[http://blogs.msdn.com/hugunin/]]). Several years ago Jim decided to explore dynamic language implementations on Microsoft’s .NET platform, an endeavor which ultimately brought IronPython to life. Jim would later join Microsoft’s Common Language Runtime team and follow his vision to make .NET the best platform for running dynamic languages. In September 2006, IronPython 1.0 was released and seven months later, in April 2007 Microsoft released IronPython 1.1, currently the most recent production release of IronPython. The development of Dynamic Language Runtime, a software platform for implementing dynamic languages, started in the spring of 2007 as a natural next step in following Jim’s original vision. With Dynamic Language Runtime, language implementers can leverage the benefits offered by .NET platform – garbage collector, just-in-time compiler, tools, and obviously the wealth of software libraries available on .NET platform. The language implemented on top of DLR can, if it so chooses, gain access to the .NET libraries while still maintaining the purity of the original language. IronPython is definitely a testament to that; we took lot of care to make .NET functionality and libraries visible only in modules in which the programmer explicitly asks for it. I was going to Barcelona to give a talk about building dynamic languages on top of DLR, and it would be nice to have a simple language which to use for the demonstration purposes. The key is that it needs to be a language, and it needs to be simple. And we had such language on our team … ToyScript. Our team wrote it to have a little trivial language handy to be able to quickly play with ideas relevant to DLR development. So for the TechEd conference, I figured, it would be just perfect. It is a simple language which people can explore to see how it takes advantage of the DLR, how it is integrated within the hosting APIs etc. On the other hand, it is still a true DLR based compiler. It has own scanner and parser, both quite simple, builds own Abstract Syntax Tree, and then instead of generating IL [//Intemediate Language// code] directly, it simply generates a DLR Tree to have DLR do the heavy lifting of the code generation. And it takes advantage of the DLR support for dynamic operations too. Before the last day in October I had not heard of LOL Code. It is a shame, and if I can give an excuse, it would be that the DLR team, myself included, is working hard to make the Dynamic Language Runtime a great platform. However, our team had heard of LOL Code. It was on October 31 that I talked with John Lam (http://www.iunknown.com) and that changed everything… Today, 12 days later, I don’t remember what exactly John said then, but it was something along the lines that … “ToyScript is great for the demo, but the problem is that people haven’t heard of it. Wouldn’t it be cool to find an existing language and implement that? Oh, have you heard of LOL Code?” ---- I spent the afternoon researching LOL Code, laughing out loud at the variety of cat photos that naturally come with it and it didn’t take long to find that Nick Hodge presented the .NET implementation of LOL Code at TechEd Australia. That really pushed me over the edge of inspiration … however, there was only about a week left before the Barcelona talk and I still needed to finish the presentation, pack, spend hours en route and get some sleep in between. Since my flight was on Friday evening (so come to worst, I could pack on Friday) I gave myself all day Thursday to write the compiler. If I failed, I'd fall back on ToyScript presentation and things would be ok. If I made it, all the better… Initially, I needed to understand how tricky the language actually was to analyze and understand enough of the semantics to determine whether LOL Code and DLR are actually a good match. After that, the development process would be that already proven in IronPython, IronRuby, and yes, also ToyScript: I’ll parse LOL Code and then transform its LOL AST into the DLR Trees for code generation, implement small libraries (for example for GIMMEH) and figure out the .NET interoperability at the end once I understand enough of the language syntax and semantics. LOL Code has several specifications, there is a vibrant discussion going on about new language features and additions to syntax, so it took me a while to get familiar with the syntax, which differs with each released language specification. [//LOL, you're being very diplomatic there!// -ed] I hoped to target the 1.2 specification, but there were two main things that I liked about earlier versions of the language. Infix binary operators (the 1.2 spec switches to prefix binary operators) and implicit use of IT for the conditional statement. I did, however, liked that functions and function calls were added, even though later they would turn out differently in the .NET implementation. The grammar from the [[implementations:lolcode-ctools|ctools implementation]] was a great starting point and even though I’d end up rebuilding the grammar from scratch, it was very helpful resource. If you have an existing language with a scanner and parser already working, you can certainly use your existing parser to target DLR, in fact, it is one of DLR’s great properties that you can start using it with your existing parser. For parser and lexer generator, I used the Gardens Point Parser Generator (GPPG) and Gardens Point Scanner Generator (GPLEX) created by Queensland University of Technology in Brisbane, Australia. They are both very good tools, but since I was quite new to using them, it took couple hours to get the parser and tokenizer wired together and into the DLR hosting framework. On the DLR side, the source code arrives in the ''LanguageContext.ParseSourceCode'' (look for it in ''LolLanguageContext.cs''). The source code (either file passed on the command line, or perhaps a code snippet entered in the interactive console) is enclosed in the ''CompilerContext''’s ''SourceUnit''. The GPLEX likes to read from a stream (''System.IO.Stream'', or a string), but the SourceUnit works with TextReader abstract class so matching the two resulted in a bit of an unnecessary reading of the whole input ahead of time into the string and having GPLEX work from that. Since GPPG generates partial class, it was easy to add the ''_tree'' member to it and have the final grammar rule store the finished AST there upon the last parser reduction. The last step of the LOL Code frontend was to generate the DLR Tree. This is also triggered from ''LolLanguageContext'' right after parsing, by calling ''LolGen.Generate''. There we create a DLR ''CodeBlock'' (DLR’s way to express functions) and then calling the LOL AST to transform itself into the body of the block. The ''LolGen'' is quite simple, only keeps track of variables alredy declared, implementing thus a simple name binding for the tree generation. Each AST node generally has one static method “Maek” which is called from the parser upon reductions to create the AST nodes, and a virtual method “Gen” which performs the transformation into the DLR Tree. The transformation then happens as single-pass recursive walk of the AST. For example, the “''Gimmeh''” node will convert itself into a call to the library method “''LolOps.Gimmeh''” and assign the result to the variable expressed in the GIMMEH statement. ''ImInYr'' becomes a DLR loop, and ''Iz'' will turn into DLR ''If'' Statement with one condition (the value of the condition is passed to the ''LolOps.Bool'' method to determine the true-value of the condition, for example for string to determine whether its length is 0). All of the above for the ‘core’ of the LOL Code language, took about 8 hours I think. At the end of Thursday I had a language with the interactive console and I could implement the Fibonacci series algorithm. Missing was the support for debugging in VS and some way to instantiate .NET objects and call methods on them. That’s what I would have to implement on the way to Barcelona. Scandinavian Airlines doesn’t give you power in the seat in economy class and my notebook’s battery life is about 20 minutes so there was little I could do on the plane, however, there was plenty of waiting at the airports, especially when my flight to Copenhagen was 2 hours late and I continued to Barcelona not directly, but via Frankfurt and with 4 hour layover … //I may actually finish the compiler before I get to my hotel room.// :) ---- I don’t remember anymore in which order I implemented the .NET interoperability and debugging support, but they both came to being en route, partially in Seattle’s South terminal and then at the gate in Frankfurt where I was plugged into the only power outlet I could find. To implement debugging support really meant to set the correct spans on DLR Tree nodes. I went back to the scanner to generate the right spans for the keywords, rewrote the parser to propagate them up during reductions and the LOL AST node ''Maek'' methods to remember them correctly. Then at DLR Tree generation time, the DLR Tree node receives correct span so that DLR can set positions in the IL stream correctly. For example, the ''IHasA.Maek'' calculates the span of the statement (if there’s an initializer “ITZ …”, it will include its span in the statements’ span). Then in Gen method, it will either generate empty statement with the right Span, or generate the DLR assignment expression. In both cases, it means that as you step in the VS debugger, the ''I HAS A VAR'' will behave correctly. .NET interoperability took less time to implement because DLR has a powerful infrastructure in place already that LOL Code leverages. The problem was with syntax. LOL Code doesn’t have notion of member access (''a.b'') which is essential to navigate .NET namespaces and to also invoke methods on objects. Also, with the exception of 1.2 language spec I couldn’t find much about function call or object instantiation (new). This is where I went back to the grammar to add few things. I make no claim that they are the right or definitive answer, but here they are anyway… For member access (''a.b'') I came up with syntax “''b ON a''”. So for example ''System.Environment.CurrentDirectory'' would read “''CurrentDirectory ON Environment ON System''”. For method calls it was a keyword ''COL'' I added, and ''NJU'' for object instantiation. Python uses a call syntax to create instances of objects by calling their type: ''Hashtable()'', but I stayed closer to ToyScript here and provided an explicit operator ''NJU''. The argument list is the same for both: ''WIT [ AN … ]'' For example, calling String.Concat(“a”, “b”) would read: COL Concat ON String WIT “a” AN “b” Again, this syntax may not be the best, and the LOL Code experts may be not happy with my choices. I tried to look for hints of what the right thing would be online, but found an initial discussion about support for objects in LOL Code, but not much beyond that. The last piece of the puzzle is … how can I actually get my hands on the .NET types themselves so I can NJU the instances up? Ideally, I’d need to get access to the System namespace which I can then traverse (using ''ON'' operator) and then ''NJU'' objects and ''COL'' their methods. I overloaded the ''CAN HAS'' statement. Syntactically, it has 2 versions. One accepts string literal: ''CAN HAS ''?, I guess a good use of which is to refer to disk files, the other accepts an identifier: ''CAN HAS STDIO''?. The DLR Tree generated for the latter version, written in a bit of a C# inspired pseudo-code, would look something like this: STDIO = LolOps.CanHas(“STDIO”); The ''LopOps.CanHas'' helper is quite trivial, it only asks DLR to look up a root namespace with the given name. So if you do ''CAN HAS System''?, the variable ''System'' in your program will be initialized with the root of the .NET libraries and you are set to write programs such as: HAI CAN HAS System? VISIBLE CurrentDirectory ON Environment ON System NJU Hashtable ON Collections ON System I HAS A HT ITZ IT I HAS A DT LOL DT R DateTime ON System VISIBLE Now ON DT COL Add ON HT WIT "LolCode" AN "Rulezz!!" VISIBLE COL get_Item ON HT WIT "LolCode" COL Concat ON String ON System WIT "LolCode " AN "Rulezz!!" VISIBLE IT KTHXBYE In the end, it took about 6 hours to implement the debugging and .NET interop (2 hours waiting in Seattle, 4 more in Frankfurt) and an odd minute here and there to change the original WITH keyword (for argument list) with ''WIT'' that John Lam proposed as a cooler version (and I agreed :) ). ---- The main point is that if you have an existing dynamic language, getting it basically up and running with DLR is a matter of very short time. For me at least, this instant gratification is of a huge value because I can see things working very quickly. And if I was to pick one thing that made the LOL Code (or ToyScript) relatively easy to implement, it would be the code generation. Implementing code generation is, in my opinion, one of the harder parts of the compiler, and having DLR do that for you really takes a big burden off. And we haven’t really talked about the support for dynamic operations… You can check out Bill’s MSDN article on those: [[http://msdn.microsoft.com/msdnmag/issues/07/10/CLRInsideOut/default.aspx|http://msdn.microsoft.com/msdnmag…]]