author David Anderson <>
Sat Nov 17 18:41:01 2012 -0800 (2012-11-17)
changeset 195 c1ba166f3fc4
parent 106 8534dbfb8c1a
permissions -rw-r--r--
Massive SemA/BC refactoring to support better type systems. Read on for more.

The previous compiler had a simple pipeline, best outlined as:
(1) Parsing and name binding -> AST
(2) Storage allocation -> AST
(3) Semantic Analysis (SemA) -> AST
(4) Bytecode Compilation (BC) -> Code

This pipeline, unfortunately, had two problems. First, name binding cannot be performed in one pass if we wish to have multi-file support in the form of modules. Name binding must be two-pass.

Second, because SemA was only capable of annotating AST nodes with a single type, most coercion work had to be duplicated in the BC phase, as coercion was not explicit in the SemA output. This made introducing new type rules extremely difficult, and all but precluded concepts like operator overloading (or overloading at all).

This patch rewrites most of Keima's backend. Of interest are the following changes:
(1) CompileContext has been refactored around future multi-file support.
(2) Parsing no longer performs any name binding.
(3) The grammar for type-and-name has been changed from:
(Label | Identifier)? Identifier
Type? Identifier
Type ::= OldType | NewType
OldType ::= Label
NewType ::= Identifier (. NewType)*

Although we do not implement the full production for Type yet, this
distinction is important. A type identifier is now affixed to the AST
as an Expression (right now, always a NameProxy), so it can fully
participate in name binding.

(4) Immediately after parsing, the AST goes through a NamePopulation phase.
This phase creates Scope objects for every scope which declares a name,
and if those names are declaring types or functions, a Symbol is created
and entered into the scope. Symbols entirely replace the old BoundName

(5) After NamePopulation comes NameBinding. This phase performs three steps:
(a) Links Scope objects together, to form a tree.
(b) Binds any free name to an existing name in the scope chain.
(c) Creates and registers any Symbols for names which have not yet been
added to the scope (for example, local variables).

In accordance, all "allocation" and "binding" concepts have been removed
from Scopes, which are now lightweight container classes.

(6) SemA now generates bytecode for each statement in the AST. Expression
compilation is performed via a separate mechanism called HIR (High-level
Intermediate Representation). To analyze an expression, SemA will walk
the AST, and produce a HIR object for each node. HIR is typed, and SemA
may insert coercion nodes, or even expand an AST node into multiple HIR
nodes. The result of evaluating an expression in SemA is therefore the
the root of a HIR tree, which SemA then sends to the HIRTranslator, which
performs bytecode generation.

As before, SemA still performs full semantic analysis. However, not it
also produces each function's bytecode.

This split allows us to decompose potentially complex semantics into
a more fine-grained, AST-like structure, which can have very simple
code-generation logic. For example, (1.0 + 5) might look like:

HAdd(HFloat(1.0), HConvert(HInteger(5), <Float>))

As part of this decomposition, many opcodes have been removed. Rather than
using typed jumps, we now expand jumps into longer tests. For example:
jge.f <label>
jt <label>

This simplifies the pipeline, and JITs should be able to melt the added
work away.

HIR is not used at a statement level, and HIR does not have any concept
of control-flow.

(7) The old BytecodeCompiler has been removed, as its work is now split
between SemA and HIRTranslator.

In addition, some minor refactorings have taken place:

(1) Opcodes.tbl no longer hardcodes numbers (aaah).
(2) Type has been split into smaller, typed structs.
(3) BytecodeEmitter no longer relies on Pools or RootScopes.
(4) SemA is now responsible for variable/storage allocation.
(5) Native table are now global, rather than per-module. As such, native
declarations now result in a Native object (which still requires a
CALLNATIVE opcode), which is an index into the global table. Natives
must be bound globally. This is in preparation for module support.
(6) Publics are no longer tracked, but rather registered via a global
callback upon module load. This callback can be set by embedders. This
is in preparation for multi-file support.
(7) The BytecodeEmitter now performs Symbol allocations itself, and it
also generates Code objects itself, which removes a good deal of

Finally, a test harness has been added. This harness is a python script which finds *.test files, recursively, in the tests folder. For each file it runs the corresponding .sp. The contents of the .test file must match stdout+stderr.
     1 # vim: set ts=2 sw=2 tw=99 noet ft=python: 
     2 # 
     3 # Copyright (C) 2004-2012 David Anderson
     4 # 
     5 # This file is part of SourcePawn.
     6 # 
     7 # SourcePawn is free software: you can redistribute it and/or modify it under
     8 # the terms of the GNU General Public License as published by the Free
     9 # Software Foundation, either version 3 of the License, or (at your option)
    10 # any later version.
    11 # 
    12 # SourcePawn is distributed in the hope that it will be useful, but WITHOUT ANY
    13 # WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
    14 # FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
    15 # 
    16 # You should have received a copy of the GNU General Public License along with
    17 # SourcePawn. If not, see
    18 #
    19 import os
    20 import os.path
    21 import shutil
    22 import ambuild.osutil as osutil
    23 from ambuild.command import Command
    25 job = AMBuild.AddJob('dist')
    27 # Single file copy
    28 class CopyFile(Command):
    29 	def __init__(self, filename, fromPath, toPath):
    30 		Command.__init__(self)
    31 		self.filename = filename
    32 		self.fromPath = os.path.join(fromPath, filename)
    33 		self.toPath = toPath
    35 	def shouldCopy(self):
    36 		tofile = os.path.join(self.toPath, self.filename)
    37 		if not osutil.FileExists(tofile):
    38 			return True
    39 		return osutil.IsFileNewer(self.fromPath, tofile)
    41 	def run(self, runner, job):
    42 		if not self.shouldCopy():
    43 			return
    44 		runner.PrintOut('copy {0} to {1}'.format(self.fromPath, self.toPath))
    45 		shutil.copy(self.fromPath, self.toPath)
    47 # Create a folder
    48 class CreateFolder(Command):
    49 	def __init__(self, folder):
    50 		Command.__init__(self)
    51 		self.folder = folder
    53 	def run(self, runner, job):
    54 		runner.PrintOut('mkdir {0}'.format(self.folder))
    55 		os.makedirs(self.folder)
    57 def AddCopy(job, filename, fromPath, toPath):
    58 	job.AddCommand(CopyFile(filename, fromPath, toPath))
    60 def AddFolder(job, folder):
    61 	if not os.path.exists(os.path.join(AMBuild.outputFolder, 'dist', folder)):
    62 		job.AddCommand(CreateFolder(folder))
    64 libkeima = osutil.StaticLibPrefix() + 'keima' + osutil.StaticLibSuffix()
    65 spshell = 'spshell' + osutil.ExecutableSuffix()
    67 AddFolder(job, 'include')
    68 AddFolder(job, 'lib')
    69 AddFolder(job, 'bin')
    71 AddCopy(job,
    72         'keima.h',
    73         os.path.join(AMBuild.sourceFolder, 'include'),
    74         'include')
    75 AddCopy(job,
    76         'keima_util.h',
    77         os.path.join(AMBuild.sourceFolder, 'include'),
    78         'include')
    79 AddCopy(job,
    80         libkeima,
    81         os.path.join(AMBuild.outputFolder, 'keima'),
    82         'lib')
    83 AddCopy(job,
    84         spshell,
    85         os.path.join(AMBuild.outputFolder, 'spshell'),
    86         'bin')