Representation of data types
 

Almost all parts of the compiler deal with data types in one way or another. Symbols worry about data types the most, this is what most of the compile-time type checks are based on. AST nodes also have data types, that takes care of expressions (including casting/conversions).

A data type is represented as a combination of:

    • dtype integer
      • 5 bits: raw type:
        • void (unknown type, e.g.: any ptr, type t as t)
        • byte, ubyte
        • char (zstring pointers and their deref expressions)
        • short, ushort
        • wchar (wstring pointers and their deref expressions)
        • integer, uinteger
        • enum (integer)
        • long, ulong
        • longint, ulongint
        • single, double
        • string (variable length)
        • fixstr (fixed length strings, string * N, N is the type's length)
        • struct (UDT, -> subtype is used)
        • namespace (used during name mangling?)
        • function (used for function pointers, -> subtype contains full function declaration)
        • forward reference (will be changed to actual raw type when known, -> subtype is used)
        • pointer (this value is only used temporarily as a result of the typeGet() macro)
        • xmmword (used by SSE emitter)
      • 4 bits: PTR count
How many PTR's there are on the type, maximum 8. If > 0, then the data type is a pointer.
      • 9 bits: CONST mask (8 PTR's + 1 "base")
    Example                     CONST mask

const integer                   000000001           (first CONST bit set)
integer const ptr               000000001           (ditto)
const integer ptr               000000010           (pointer to const)
const integer ptr const ptr     000000101           (const pointer to pointer to const)
    • subtype, which for some types points to symbol:
      • For UDTs types (structs/classes, enums) this points to the corresponding UDT symbol
      • For forward-referencing typedefs this points to a special forward reference symbol which will eventually be replaced by the actual subtype symbol, when it's known.
      • For procedure pointers, this points to an anonymous symbol further defining the calling convention etc. and most importantly the types of result and parameters.
    • length integer
This is used in places that have to calculate sizes (e.g. structure size calculations, pointer arithmetic, stack offsets).