For a very long time, Lua 5.1 was the language of choice for Roblox. As we grew, so too did the demand for better tooling support as well as a more performant VM. To answer this, we started the initiative to rebuild our Lua stack named “Luau” (pronounced /lu-wow/), with the goal of encompassing the features programmers expect a modern language to offer – which includes a type checker, a new linter framework, and a faster interpreter, just to name a few.
To make all of that possible, we had to rewrite most of our stack from scratch. The problem is that the Lua 5.1 parser is tightly coupled with bytecode generation, and that’s insufficient for our needs. We want to be able to traverse the AST for deeper analysis, so we need a parser to produce that syntax tree. From there, we’re free to perform any operations we wish to do on that AST.
As luck would have it, there was an existing Lua 5.1 parser lying about in Studio only used for basic linting pass. That made it very easy for us to adopt that parser and extend it to recognize Luau-specific syntax, which thereby minimized the possible risk of changing the resulting parse in some subtle way. A critical detail because one of our sacred values at Roblox is backward compatibility. We have millions of lines of Lua code already written and we are committed to ensuring that they continue to work forever.
So with these factors in mind, the requirements are clear. We need to:
- stay clear of grammar quirks that require backtracking
- have an efficient parser
- maintain forward-compatible syntax
- remain backward compatible with Lua 5.1
Sounds simple, right?
How the type inference engine influenced syntax choices
To start, we need to understand some context about how we arrived in this situation. We chose these syntaxes because they are already immediately familiar to the majority of programmers, and are in fact industry standard. You don’t have to learn anything new.
There are several places where Luau permits you to write such type annotations:
- local foo: string
- function add(x: number, y: number): number … end
- type Foo = (number, number) -> number
- local foo = bar as string
Adding syntax to annotate your bindings is very important for the type inference engine to better understand the intended typings. Lua is a very powerful language that allows you to overload virtually every operator in the language. Without some way to annotate what things are, we cannot even confidently say that the expression x + y is going to produce a number!
Type cast expression
Something we really like from TypeScript is what they call a type assertion. It’s basically a way to add extra type information to a program for the checker to verify. In TypeScript, the syntax is:
bar as string
Unfortunately, when we tried this out, we were in for a bad surprise: this breaks existing code! One of our users’ games had a function named as. Their scripts therefore included snippets like:
local x = y
as(w, z) — Expected ‘->’ when parsing function type, got
We likely could have made this work, were it not for one additional complication: we wanted our parser to work with only a single token of lookahead. Performance is important to us, and part of writing a very highly performant parser is minimizing the amount of backtracking it has to do. It would not be efficient for our parser to have to scan forward and backward arbitrarily far to determine what an expression really means.
It also turns out that TypeScript can thank JavaScript’s automatic semicolon insertion rule for making this work for free. When you write this snippet in TypeScript/JavaScript, it will insert semicolons on each line, causing it to be parsed as two separate statements. Whereas if it were on a single line, it is a syntax error at the as token in JavaScript, but a valid type assertion expression in TypeScript. Because Lua doesn’t do this, nor does it enforce semicolons, it has to try to parse for each longest possible statement even if they span across multiple lines.
let x = y
as(w, z)
Luau’s original type cast expression was not backward compatible even though it had the performance we wanted. Regrettably, this broke our promise of Luau being a superset of Lua 5.1, so we can’t do it without some extra constraints such as requiring parentheses in certain contexts!
Type arguments in function calls
Another unfortunate detail in Lua’s grammar prevents us from adding type arguments to function calls without introducing another ambiguity:
return someFunction(c)
It could mean two different things:
- evaluate someFunction < A and B > c, and return the results
- call and return someFunction with two type arguments A and B, and an argument of c
This ambiguity only occurs in the context of an expression list. It’s not really a big problem in TypeScript and C# because they both have the advantage of compiling ahead of time. Therefore, they can both afford to spend some cycles trying to attempt to disambiguate this expression down to one of the two options.
While it appears that we could do the same thing, such as applying heuristics during parsing or type checking, we actually can’t. Lua 5.1 has the ability to dynamically inject globals into any environment, and that can break this heuristic. We also flat out do not have that benefit because we have to be able to generate bytecode as quickly as possible for all clients to start interpreting.
Type alias statement
Parsing this type alias statement is not a breaking change because it’s already invalid Lua syntax:
type Foo = number
What we do is simple. We parse a primary expression which only ends up parsing as far as just type, and then we decide what to do based on the parse result of that expression:
- If it’s a function call, stop trying to parse for more of this expression-as-statement.
- Otherwise, if the next token is a comma or equal, parse an assignment statement.
What’s missing above is very obvious. It has no branch for which an identifier can be led by another one. All we have to do then is pattern match on the expression:
- Is it an identifier?
- Is the name of that identifier equal to “type”?
- Is the next token any arbitrary identifier?
Voilà, you get backward-compatible syntax with a context-sensitive keyword.
type Foo = number — type alias
type(x) — function call
type = {x = 1} — assignment
type.x = 2 — assignment
As a bonus snippet, this still parses in the exact same way as Lua 5.1 because we were not parsing from the context of a statement:
local foo = type
bar = 1
Lessons learned
The takeaways here, it seems, is that we’re going to have to design the syntax for Luau to be forward compatible and with least context-sensitive parse paths. It eliminates the necessity of second-guessing that requires the parser to backtrack and try something else from that point of failure. Not only does that give us the benefit of having a fast parser to just chug along to the end of the source code, but also it can give back the AST without needing other kinds of stages to disambiguate.
It also means that we will need to be careful when adding new syntax in general, which is not necessarily a bad place to be. A well-thought-out language demands its designers to take the long view.
Neither Roblox Corporation nor this blog endorses or supports any company or service. Also, no guarantees or promises are made regarding the accuracy, reliability or completeness of the information contained in this blog.
This blog post was originally published on the Roblox Tech Blog.