author David Anderson <dvander@alliedmods.net>
Sat Nov 17 18:41:01 2012 -0800 (2012-11-17)
changeset 195 c1ba166f3fc4
parent 60 581a9c793062
child 263 ba85a47ee414
permissions -rw-r--r--
Massive SemA/BC refactoring to support better type systems. Read on for more.

The previous compiler had a simple pipeline, best outlined as:
(1) Parsing and name binding -> AST
(2) Storage allocation -> AST
(3) Semantic Analysis (SemA) -> AST
(4) Bytecode Compilation (BC) -> Code

This pipeline, unfortunately, had two problems. First, name binding cannot be performed in one pass if we wish to have multi-file support in the form of modules. Name binding must be two-pass.

Second, because SemA was only capable of annotating AST nodes with a single type, most coercion work had to be duplicated in the BC phase, as coercion was not explicit in the SemA output. This made introducing new type rules extremely difficult, and all but precluded concepts like operator overloading (or overloading at all).

This patch rewrites most of Keima's backend. Of interest are the following changes:
(1) CompileContext has been refactored around future multi-file support.
(2) Parsing no longer performs any name binding.
(3) The grammar for type-and-name has been changed from:
(Label | Identifier)? Identifier
Type? Identifier
Type ::= OldType | NewType
OldType ::= Label
NewType ::= Identifier (. NewType)*

Although we do not implement the full production for Type yet, this
distinction is important. A type identifier is now affixed to the AST
as an Expression (right now, always a NameProxy), so it can fully
participate in name binding.

(4) Immediately after parsing, the AST goes through a NamePopulation phase.
This phase creates Scope objects for every scope which declares a name,
and if those names are declaring types or functions, a Symbol is created
and entered into the scope. Symbols entirely replace the old BoundName

(5) After NamePopulation comes NameBinding. This phase performs three steps:
(a) Links Scope objects together, to form a tree.
(b) Binds any free name to an existing name in the scope chain.
(c) Creates and registers any Symbols for names which have not yet been
added to the scope (for example, local variables).

In accordance, all "allocation" and "binding" concepts have been removed
from Scopes, which are now lightweight container classes.

(6) SemA now generates bytecode for each statement in the AST. Expression
compilation is performed via a separate mechanism called HIR (High-level
Intermediate Representation). To analyze an expression, SemA will walk
the AST, and produce a HIR object for each node. HIR is typed, and SemA
may insert coercion nodes, or even expand an AST node into multiple HIR
nodes. The result of evaluating an expression in SemA is therefore the
the root of a HIR tree, which SemA then sends to the HIRTranslator, which
performs bytecode generation.

As before, SemA still performs full semantic analysis. However, not it
also produces each function's bytecode.

This split allows us to decompose potentially complex semantics into
a more fine-grained, AST-like structure, which can have very simple
code-generation logic. For example, (1.0 + 5) might look like:

HAdd(HFloat(1.0), HConvert(HInteger(5), <Float>))

As part of this decomposition, many opcodes have been removed. Rather than
using typed jumps, we now expand jumps into longer tests. For example:
jge.f <label>
jt <label>

This simplifies the pipeline, and JITs should be able to melt the added
work away.

HIR is not used at a statement level, and HIR does not have any concept
of control-flow.

(7) The old BytecodeCompiler has been removed, as its work is now split
between SemA and HIRTranslator.

In addition, some minor refactorings have taken place:

(1) Opcodes.tbl no longer hardcodes numbers (aaah).
(2) Type has been split into smaller, typed structs.
(3) BytecodeEmitter no longer relies on Pools or RootScopes.
(4) SemA is now responsible for variable/storage allocation.
(5) Native table are now global, rather than per-module. As such, native
declarations now result in a Native object (which still requires a
CALLNATIVE opcode), which is an index into the global table. Natives
must be bound globally. This is in preparation for module support.
(6) Publics are no longer tracked, but rather registered via a global
callback upon module load. This callback can be set by embedders. This
is in preparation for multi-file support.
(7) The BytecodeEmitter now performs Symbol allocations itself, and it
also generates Code objects itself, which removes a good deal of

Finally, a test harness has been added. This harness is a python script which finds *.test files, recursively, in the tests folder. For each file it runs the corresponding .sp. The contents of the .test file must match stdout+stderr.
[email protected]
[email protected]
Microsoft Visual Studio Solution File, Format Version 11.00
[email protected]
# Visual C++ Express 2010
[email protected]
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "jitcraft", "src\jitcraft.vcxproj", "{E12034DC-E8CB-4B49-AC0A-0D27CDED9A50}"
[email protected]
[email protected]
[email protected]
	GlobalSection(SolutionConfigurationPlatforms) = preSolution
[email protected]
		Debug - Library|Win32 = Debug - Library|Win32
[email protected]
		Debug|Win32 = Debug|Win32
[email protected]
		Release|Win32 = Release|Win32
[email protected]
[email protected]
	GlobalSection(ProjectConfigurationPlatforms) = postSolution
[email protected]
		{E12034DC-E8CB-4B49-AC0A-0D27CDED9A50}.Debug - Library|Win32.ActiveCfg = Debug - Library|Win32
[email protected]
		{E12034DC-E8CB-4B49-AC0A-0D27CDED9A50}.Debug - Library|Win32.Build.0 = Debug - Library|Win32
[email protected]
		{E12034DC-E8CB-4B49-AC0A-0D27CDED9A50}.Debug|Win32.ActiveCfg = Debug|Win32
[email protected]
		{E12034DC-E8CB-4B49-AC0A-0D27CDED9A50}.Debug|Win32.Build.0 = Debug|Win32
[email protected]
		{E12034DC-E8CB-4B49-AC0A-0D27CDED9A50}.Release|Win32.ActiveCfg = Release|Win32
[email protected]
		{E12034DC-E8CB-4B49-AC0A-0D27CDED9A50}.Release|Win32.Build.0 = Release|Win32
[email protected]
[email protected]
	GlobalSection(SolutionProperties) = preSolution
[email protected]
		HideSolutionNode = FALSE
[email protected]
[email protected]