Understanding Go compiler tools (2)

Today I read memory management in go compiler.

There are go.y , y.tab.[ch] , lex.c and other files in src/cmd/gc/ directory. It seems that the go compiler has a lexical analyzer written by hand and a syntax parser generated by yacc ( bison ). lex.c also contains the main function for the compiler.

Here are the head lines of main function:

int
main(int argc, char *argv[])
{
        int i, c;
        NodeList *l;
        char *p;

        signal(SIGBUS, fault);
        signal(SIGSEGV, fault);

        localpkg = mkpkg(strlit(""));
        localpkg->prefix = "\"\"";

        builtinpkg = mkpkg(strlit("go.builtin"));

        gostringpkg = mkpkg(strlit("go.string"));
        gostringpkg->name = "go.string";
        gostringpkg->prefix = "go.string";        // not go%2estring

        runtimepkg = mkpkg(strlit("runtime"));
        runtimepkg->name = "runtime";

....

It is easy to guess what they do. They initializes signal handlers and bootstrap some standard packages.

Also we can see the declaration of NodeList *l . The parser must be going to store the result into the list.

memory management

I guessed that strlit function converts a C string into a String representation for the compiler. It was correct. The definition of strlit is in subr.c and it returns a pointer to Strlit . Strlit in go.h is commented as:

/*
 * note this is the representation
 * of the compilers string literals,
 * it is not the runtime representation
 */

Here I wondered how the go compiler manages heap memory for Strlit and others.

In general, memory management is an important part, although not a core topic. This is because there are many object to allocate during parsing sources and compiling. A parser must try to parse any input. It allocates some memory whenever it reads a term, and little by little constructs a syntax tree.

I heard that gcc uses garbage collection to mange memory on parsing and compiling. Ruby interpreter (CRuby) uses Ruby's garbage collector in order to manage heap memory for parser.

In the case of Go compiler, memory management seems naive.

Strlit*
strlit(char *s)
{
        Strlit *t;

        t = mal(sizeof *t + strlen(s));
        strcpy(t->s, s);
        t->len = strlen(s);
        return t;
}

strlit just calls mal and copies data into the allocated memory. mal must be a wrapper of malloc .

Here is the source of mal :

void*
mal(int32 n)
{
        void *p;

        if(n >= NHUNK) {
                p = malloc(n);
                if(p == nil) {
                        flusherrors();
                        yyerror("out of memory");
                        errorexit();
                }
                memset(p, 0, n);
                return p;
        }

        while((uintptr)hunk & MAXALIGN) {
                hunk++;
                nhunk--;
        }
        if(nhunk < n)
                gethunk();

        p = hunk;
        nhunk -= n;
        hunk += n;
        memset(p, 0, n);
        return p;
}

For large n it just calls malloc . For small n it adjusts alignment and returns a part of memory pointed by hunk .

This is the implementation of gethunk . There is nothing special.

static void
gethunk(void)
{
        char *h;
        int32 nh;

        nh = NHUNK;
        if(thunk >= 10L*NHUNK)
                nh = 10L*NHUNK;
        h = (char*)malloc(nh);
        if(h == nil) {
                flusherrors();
                yyerror("out of memory");
                errorexit();
        }
        hunk = h;
        nhunk = nh;
        thunk += nh;
}

Here is the implementation of remal , you know, a realloc equivalent.

void*
remal(void *p, int32 on, int32 n)
{
        void *q;

        q = (uchar*)p + on;
        if(q != hunk || nhunk < n) {
                if(on+n >= NHUNK) {
                        q = mal(on+n);
                        memmove(q, p, on);
                        return q;
                }
                if(nhunk < on+n)
                        gethunk();
                memmove(hunk, p, on);
                p = hunk;
                hunk += on;
                nhunk -= on;
        }
        hunk += n;
        nhunk -= n;
        return p;
}

It cares the case of extending the last part of hunk , however, that's all. It leaves heap memory leaked. The heap memory is collected by the end of process.

The go compiler seems to manage heap memory really naively and generously.

Here is another example. struct Node , which is the representation of a node in syntax tree, is just a concatenation of members which are necessary for each type of node. It is even not an union.

struct Node
{

....

        // if-body
        NodeList*        nelse;

        // cases
        Node*        ncase;

        // func
        Node*        nname;
        Node*        shortname;
....
}

Conslusion

The go team does not seem to care about this kind part of the implementation. They prerfer keeping the source simple to strictly implementing it.

I think they have a plan to make the go compiler self-hosted. Go have a garbage collector, a parser generator and even a parser for Go. So they can replace the go compiler with one written in Go itself.