src/compiler/Symbols.h
author David Anderson <dvander@alliedmods.net>
Sat Nov 17 18:41:01 2012 -0800 (2012-11-17)
changeset 195 c1ba166f3fc4
child 196 9904633b1dfc
permissions -rw-r--r--
Massive SemA/BC refactoring to support better type systems. Read on for more.

The previous compiler had a simple pipeline, best outlined as:
(1) Parsing and name binding -> AST
(2) Storage allocation -> AST
(3) Semantic Analysis (SemA) -> AST
(4) Bytecode Compilation (BC) -> Code

This pipeline, unfortunately, had two problems. First, name binding cannot be performed in one pass if we wish to have multi-file support in the form of modules. Name binding must be two-pass.

Second, because SemA was only capable of annotating AST nodes with a single type, most coercion work had to be duplicated in the BC phase, as coercion was not explicit in the SemA output. This made introducing new type rules extremely difficult, and all but precluded concepts like operator overloading (or overloading at all).

This patch rewrites most of Keima's backend. Of interest are the following changes:
(1) CompileContext has been refactored around future multi-file support.
(2) Parsing no longer performs any name binding.
(3) The grammar for type-and-name has been changed from:
(Label | Identifier)? Identifier
to:
Type? Identifier
where:
Type ::= OldType | NewType
OldType ::= Label
NewType ::= Identifier (. NewType)*


Although we do not implement the full production for Type yet, this
distinction is important. A type identifier is now affixed to the AST
as an Expression (right now, always a NameProxy), so it can fully
participate in name binding.

(4) Immediately after parsing, the AST goes through a NamePopulation phase.
This phase creates Scope objects for every scope which declares a name,
and if those names are declaring types or functions, a Symbol is created
and entered into the scope. Symbols entirely replace the old BoundName
classes.

(5) After NamePopulation comes NameBinding. This phase performs three steps:
(a) Links Scope objects together, to form a tree.
(b) Binds any free name to an existing name in the scope chain.
(c) Creates and registers any Symbols for names which have not yet been
added to the scope (for example, local variables).

In accordance, all "allocation" and "binding" concepts have been removed
from Scopes, which are now lightweight container classes.

(6) SemA now generates bytecode for each statement in the AST. Expression
compilation is performed via a separate mechanism called HIR (High-level
Intermediate Representation). To analyze an expression, SemA will walk
the AST, and produce a HIR object for each node. HIR is typed, and SemA
may insert coercion nodes, or even expand an AST node into multiple HIR
nodes. The result of evaluating an expression in SemA is therefore the
the root of a HIR tree, which SemA then sends to the HIRTranslator, which
performs bytecode generation.

As before, SemA still performs full semantic analysis. However, not it
also produces each function's bytecode.

This split allows us to decompose potentially complex semantics into
a more fine-grained, AST-like structure, which can have very simple
code-generation logic. For example, (1.0 + 5) might look like:

HAdd(HFloat(1.0), HConvert(HInteger(5), <Float>))

As part of this decomposition, many opcodes have been removed. Rather than
using typed jumps, we now expand jumps into longer tests. For example:
jge.f <label>
becomes:
ge.f
cvt.f2b
jt <label>

This simplifies the pipeline, and JITs should be able to melt the added
work away.

HIR is not used at a statement level, and HIR does not have any concept
of control-flow.

(7) The old BytecodeCompiler has been removed, as its work is now split
between SemA and HIRTranslator.


In addition, some minor refactorings have taken place:

(1) Opcodes.tbl no longer hardcodes numbers (aaah).
(2) Type has been split into smaller, typed structs.
(3) BytecodeEmitter no longer relies on Pools or RootScopes.
(4) SemA is now responsible for variable/storage allocation.
(5) Native table are now global, rather than per-module. As such, native
declarations now result in a Native object (which still requires a
CALLNATIVE opcode), which is an index into the global table. Natives
must be bound globally. This is in preparation for module support.
(6) Publics are no longer tracked, but rather registered via a global
callback upon module load. This callback can be set by embedders. This
is in preparation for multi-file support.
(7) The BytecodeEmitter now performs Symbol allocations itself, and it
also generates Code objects itself, which removes a good deal of
complexity.

Finally, a test harness has been added. This harness is a python script which finds *.test files, recursively, in the tests folder. For each file it runs the corresponding .sp. The contents of the .test file must match stdout+stderr.
[email protected]
     1
/* vim: set ts=4 sw=4 tw=99 et:
[email protected]
     2
 *
[email protected]
     3
 * Copyright (C) 2012 David Anderson
[email protected]
     4
 *
[email protected]
     5
 * This file is part of SourcePawn.
[email protected]
     6
 *
[email protected]
     7
 * SourcePawn is free software: you can redistribute it and/or modify it under
[email protected]
     8
 * the terms of the GNU General Public License as published by the Free
[email protected]
     9
 * Software Foundation, either version 3 of the License, or (at your option)
[email protected]
    10
 * any later version.
[email protected]
    11
 * 
[email protected]
    12
 * SourcePawn is distributed in the hope that it will be useful, but WITHOUT ANY
[email protected]
    13
 * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
[email protected]
    14
 * FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
[email protected]
    15
 *
[email protected]
    16
 * You should have received a copy of the GNU General Public License along with
[email protected]
    17
 * SourcePawn. If not, see http://www.gnu.org/licenses/.
[email protected]
    18
 */
[email protected]
    19
#ifndef _include_sp2_symbol_h_
[email protected]
    20
#define _include_sp2_symbol_h_
[email protected]
    21
[email protected]
    22
#include "../PoolAllocator.h"
[email protected]
    23
#include "../Handles.h"
[email protected]
    24
#include "../Opcodes.h"
[email protected]
    25
#include "../Types.h"
[email protected]
    26
[email protected]
    27
namespace ke {
[email protected]
    28
[email protected]
    29
class String;
[email protected]
    30
[email protected]
    31
#define SYMBOL_KINDS(_) \
[email protected]
    32
    /* Any kind of variable or argument produces a VariableSymbol. */ \
[email protected]
    33
    _(Variable)         \
[email protected]
    34
    /* A function declaration produces a FunctionSymbol. */ \
[email protected]
    35
    _(Function)         \
[email protected]
    36
    /* A named constant produces a ConstantSymbol. */ \
[email protected]
    37
    _(Constant)         \
[email protected]
    38
    /* A named type (class struct, typedef, etc) produces a TypeSymbol. */ \
[email protected]
    39
    _(Type)             \
[email protected]
    40
    /* // A module import produces a ModuleSymbol. */   \
[email protected]
    41
    _(Module)
[email protected]
    42
[email protected]
    43
#define _(name)     class name##Symbol;
[email protected]
    44
SYMBOL_KINDS(_)
[email protected]
    45
#undef _
[email protected]
    46
[email protected]
    47
class Scope;
[email protected]
    48
[email protected]
    49
// A symbol represents the declaration of a named entity.
[email protected]
    50
class Symbol : public PoolObject
[email protected]
    51
{
[email protected]
    52
  public:
[email protected]
    53
    enum Kind {
[email protected]
    54
#       define _(name) k##name,
[email protected]
    55
        SYMBOL_KINDS(_)
[email protected]
    56
#       undef _
[email protected]
    57
        kTotalSymbolKinds
[email protected]
    58
    };
[email protected]
    59
[email protected]
    60
  public:
[email protected]
    61
    Symbol(Scope *scope, Handle<String> name, const SourcePosition &pos)
[email protected]
    62
      : scope_(scope),
[email protected]
    63
        name_(name),
[email protected]
    64
        pos_(pos)
[email protected]
    65
    {
[email protected]
    66
    }
[email protected]
    67
[email protected]
    68
    virtual Kind kind() const = 0;
[email protected]
    69
[email protected]
    70
    Handle<String> name() const {
[email protected]
    71
        return name_;
[email protected]
    72
    }
[email protected]
    73
    const SourcePosition &pos() const {
[email protected]
    74
        return pos_;
[email protected]
    75
    }
[email protected]
    76
    Handle<Type> type() const {
[email protected]
    77
        return type_;
[email protected]
    78
    }
[email protected]
    79
    Scope *scope() const {
[email protected]
    80
        return scope_;
[email protected]
    81
    }
[email protected]
    82
[email protected]
    83
  public:
[email protected]
    84
#define _(name)                                             \
[email protected]
    85
    bool is##name() const {                                 \
[email protected]
    86
        return (kind() == k##name);                         \
[email protected]
    87
    }                                                       \
[email protected]
    88
    name##Symbol *as##name() {                              \
[email protected]
    89
        if (is##name())                                     \
[email protected]
    90
            return to##name();                              \
[email protected]
    91
        return NULL;                                        \
[email protected]
    92
    }                                                       \
[email protected]
    93
    name##Symbol *to##name() {                              \
[email protected]
    94
        assert(is##name());                                 \
[email protected]
    95
        return reinterpret_cast<name##Symbol *>(this);      \
[email protected]
    96
    }
[email protected]
    97
    SYMBOL_KINDS(_)
[email protected]
    98
#undef _
[email protected]
    99
[email protected]
   100
  private:
[email protected]
   101
    Scope *scope_;
[email protected]
   102
    ScopedRoot<String> name_;
[email protected]
   103
    SourcePosition pos_;
[email protected]
   104
[email protected]
   105
  protected:
[email protected]
   106
    ScopedRoot<Type> type_;
[email protected]
   107
};
[email protected]
   108
[email protected]
   109
class VariableSymbol : public Symbol
[email protected]
   110
{
[email protected]
   111
  public:
[email protected]
   112
    enum Storage {
[email protected]
   113
        Unknown,
[email protected]
   114
        Local,
[email protected]
   115
        Heap
[email protected]
   116
    };
[email protected]
   117
[email protected]
   118
  public:
[email protected]
   119
    VariableSymbol(Scope *scope, Handle<String> name, const SourcePosition &pos)
[email protected]
   120
      : Symbol(scope, name, pos),
[email protected]
   121
        storage_(Unknown)
[email protected]
   122
    {
[email protected]
   123
    }
[email protected]
   124
[email protected]
   125
    VariableSymbol(Scope *scope, Handle<String> name, const SourcePosition &pos, Handle<Type> type)
[email protected]
   126
      : Symbol(scope, name, pos),
[email protected]
   127
        storage_(Unknown)
[email protected]
   128
    {
[email protected]
   129
        type_ = type;
[email protected]
   130
    }
[email protected]
   131
[email protected]
   132
    Kind kind() const {
[email protected]
   133
        return kVariable;
[email protected]
   134
    }
[email protected]
   135
    void setType(Type *type) {
[email protected]
   136
        type_ = type;
[email protected]
   137
    }
[email protected]
   138
    void allocate(Storage storage, unsigned slot) {
[email protected]
   139
        storage_ = storage;
[email protected]
   140
        slot_ = slot;
[email protected]
   141
    }
[email protected]
   142
    Storage storage() const {
[email protected]
   143
        return storage_;
[email protected]
   144
    }
[email protected]
   145
    unsigned slot() const {
[email protected]
   146
        assert(storage() != Unknown);
[email protected]
   147
        return slot_;
[email protected]
   148
    }
[email protected]
   149
[email protected]
   150
  private:
[email protected]
   151
    Storage storage_;
[email protected]
   152
    unsigned slot_;
[email protected]
   153
};
[email protected]
   154
[email protected]
   155
class TypeSymbol : public Symbol
[email protected]
   156
{
[email protected]
   157
  public:
[email protected]
   158
    TypeSymbol(Scope *scope, Handle<String> name, Handle<Type> type)
[email protected]
   159
      : Symbol(scope, name, SourcePosition())
[email protected]
   160
    {
[email protected]
   161
        type_ = type;
[email protected]
   162
    }
[email protected]
   163
[email protected]
   164
    Kind kind() const {
[email protected]
   165
        return kType;
[email protected]
   166
    }
[email protected]
   167
};
[email protected]
   168
[email protected]
   169
class FunctionSymbol : public Symbol
[email protected]
   170
{
[email protected]
   171
  public:
[email protected]
   172
    FunctionSymbol(Scope *scope, Handle<String> name, const SourcePosition &pos, Handle<FunctionType> type)
[email protected]
   173
      : Symbol(scope, name, pos)
[email protected]
   174
    {
[email protected]
   175
        type_ = type;
[email protected]
   176
    }
[email protected]
   177
[email protected]
   178
    Kind kind() const {
[email protected]
   179
        return kFunction;
[email protected]
   180
    }
[email protected]
   181
    FunctionType *type() const {
[email protected]
   182
        return FunctionType::cast(type_);
[email protected]
   183
    }
[email protected]
   184
};
[email protected]
   185
[email protected]
   186
}
[email protected]
   187
[email protected]
   188
#endif // _include_sp2_symbol_h_