2011-01-18

Understanding Go compiler tools (2)

Golang

Today I read memory management in go compiler.

There are go.y , y.tab.[ch] , lex.c and other files in src/cmd/gc/ directory. It seems that the go compiler has a lexical analyzer written by hand and a syntax parser generated by yacc ( bison ). lex.c also contains the main function for the compiler.

Here are the head lines of main function:

int
main(int argc, char *argv[])
{
        int i, c;
        NodeList *l;
        char *p;

        signal(SIGBUS, fault);
        signal(SIGSEGV, fault);

        localpkg = mkpkg(strlit(""));
        localpkg->prefix = "\"\"";

        builtinpkg = mkpkg(strlit("go.builtin"));

        gostringpkg = mkpkg(strlit("go.string"));
        gostringpkg->name = "go.string";
        gostringpkg->prefix = "go.string";        // not go%2estring

        runtimepkg = mkpkg(strlit("runtime"));
        runtimepkg->name = "runtime";

....

It is easy to guess what they do. They initializes signal handlers and bootstrap some standard packages.

Also we can see the declaration of NodeList *l . The parser must be going to store the result into the list.

memory management

I guessed that strlit function converts a C string into a String representation for the compiler. It was correct. The definition of strlit is in subr.c and it returns a pointer to Strlit . Strlit in go.h is commented as:

/*
 * note this is the representation
 * of the compilers string literals,
 * it is not the runtime representation
 */

Here I wondered how the go compiler manages heap memory for Strlit and others.

In general, memory management is an important part, although not a core topic. This is because there are many object to allocate during parsing sources and compiling. A parser must try to parse any input. It allocates some memory whenever it reads a term, and little by little constructs a syntax tree.

I heard that gcc uses garbage collection to mange memory on parsing and compiling. Ruby interpreter (CRuby) uses Ruby's garbage collector in order to manage heap memory for parser.

In the case of Go compiler, memory management seems naive.

Strlit*
strlit(char *s)
{
        Strlit *t;

        t = mal(sizeof *t + strlen(s));
        strcpy(t->s, s);
        t->len = strlen(s);
        return t;
}

strlit just calls mal and copies data into the allocated memory. mal must be a wrapper of malloc .

Here is the source of mal :

void*
mal(int32 n)
{
        void *p;

        if(n >= NHUNK) {
                p = malloc(n);
                if(p == nil) {
                        flusherrors();
                        yyerror("out of memory");
                        errorexit();
                }
                memset(p, 0, n);
                return p;
        }

        while((uintptr)hunk & MAXALIGN) {
                hunk++;
                nhunk--;
        }
        if(nhunk < n)
                gethunk();

        p = hunk;
        nhunk -= n;
        hunk += n;
        memset(p, 0, n);
        return p;
}

For large n it just calls malloc . For small n it adjusts alignment and returns a part of memory pointed by hunk .

This is the implementation of gethunk . There is nothing special.

static void
gethunk(void)
{
        char *h;
        int32 nh;

        nh = NHUNK;
        if(thunk >= 10L*NHUNK)
                nh = 10L*NHUNK;
        h = (char*)malloc(nh);
        if(h == nil) {
                flusherrors();
                yyerror("out of memory");
                errorexit();
        }
        hunk = h;
        nhunk = nh;
        thunk += nh;
}

Here is the implementation of remal , you know, a realloc equivalent.

void*
remal(void *p, int32 on, int32 n)
{
        void *q;

        q = (uchar*)p + on;
        if(q != hunk || nhunk < n) {
                if(on+n >= NHUNK) {
                        q = mal(on+n);
                        memmove(q, p, on);
                        return q;
                }
                if(nhunk < on+n)
                        gethunk();
                memmove(hunk, p, on);
                p = hunk;
                hunk += on;
                nhunk -= on;
        }
        hunk += n;
        nhunk -= n;
        return p;
}

It cares the case of extending the last part of hunk , however, that's all. It leaves heap memory leaked. The heap memory is collected by the end of process.

The go compiler seems to manage heap memory really naively and generously.

Here is another example. struct Node , which is the representation of a node in syntax tree, is just a concatenation of members which are necessary for each type of node. It is even not an union.

struct Node
{

....

        // if-body
        NodeList*        nelse;

        // cases
        Node*        ncase;

        // func
        Node*        nname;
        Node*        shortname;
....
}

Conslusion

The go team does not seem to care about this kind part of the implementation. They prerfer keeping the source simple to strictly implementing it.

I think they have a plan to make the go compiler self-hosted. Go have a garbage collector, a parser generator and even a parser for Go. So they can replace the go compiler with one written in Go itself.

2011-01-17

Understanding Go compiler tools (1)

Golang

Recently I read Go lang . I began to understand its structure.

How to build

You can build the go compiler and tools as documented in the official documentation .

It is quite easy. But it was confusing for me that I must run ./all.bash instead of usual make (1). I wonder why they don't simply use make . Anyway the bash script internally calls make as usual.

After building, I got the following executables in bin directory.

6a: assembler
6c: C compiler
6g: Go compiler
6l: linker
6nm: same as nm(1)?
cgo
ebnflint
godefs
godoc
gofmt
goinstall
gomake
gopack
gopprof
gotest
gotry
govet
goyacc
hgpatch
quietgcc: gcc wrapper which is little slienter than native gcc

The oddish *1 prefix "6" is because I built the tools on amd64 architecture. The prefix varies according to architecture.

There are only three architectures which is implemented in the source:

6: amd64
8: i386
5: ARM

This convention came from Plan 9 . I found the definitions for other architectures in include/mach.h :

0: MIPS2 Little Endian
2: 68020
4: MIPS2 Big Endian
7: Alpha
k: Sparc
q: Power
u: Sparc 64
v: MIPS

2c(1) in Plan 9 describes one more architecture:

1: Motorola 68000

Directory structure

bin

where executables will be generated into.
doc

documentation
include

C header files.
lib

where some libraries will be generated into.
misc

misc
pkg

where Go standard packages (libraries) will be compiled into.
src

source codes
test

test cases

Let's dive into src :

cmd

source codes for the Go tool chain
lib9

Plan 9 C library?
libbio

buffered IO library
libcgo

?
libmach

library which contains architecture dependent codes
pkg

source codes of Go standard packages

commands

Let's dive into cmd directory.

5a/
5c/
5g/
5l/
6a/
6c/
6g/
6l/
8a/
8c/
8g/
8l/
cc/
cgo/
cov/
ebnflint/
gc/
godefs/
godoc/
gofmt/
goinstall/
gomake/
gopack/
gotest/
govet/
goyacc/
hgpatch/
ld/
nm/
prof/

[568][acgl] directories contain architecture dependent part of the sources. Architecture independent parts of the Go compiler are in gc directory.

I will read the detail of gc in the next post.

*1:oddish, at least for me, who grew with GNU/Linux environment.

2011-01-09

Localizing irb messages

ruby

Do you know rubyists in Japan use Irb in Japanese?

% irb --help
Usage:  irb.rb [options] [programfile] [arguments]
  -f            ~/.irbrc を読み込まない.
  -m            bcモード(分数, 行列の計算ができる)
  -d                $DEBUG をtrueにする(ruby -d と同じ)
  -r load-module    ruby -r と同じ.

Irb has had ability to localize the help message and some error messages, and has been shipped with Japanese localization by historical reason. Also I improved this localization mechanism so that messages are converted correctly with String#encode and that Irb can load a localization from a gem. But this fact is not well known. I could not find any other localization with Google search.

Recent events

Recently Abinoam Jr. wrote Portugese localization for IRB . He sent it to me and I suggested that he should distribute it as a gem.

At this time I found that the mechanism that load a localization from a gem is broken on Ruby 1.9.2. Anyway, I fixed the problem at r30448 and the next patchlevel release of Ruby 1.9.2 will be able to load a localization from a gem again.

How to write a localization for Irb

It is quite easy. Suppose that we are writing a localization for zh_JP.UTF-8@ancient locale and let /path/to/somewhere the working directory.

First, copy error.rb and help-message from $(rubylibdir)/irb/lc into /path/to/somewhere/irb/lc/zh_JP.UTF-8@ancient/ . *1

Second, translate the copies of error.rb and help-message into zh_JP.UTF-8 like this:

# -*- coding: UTF-8 -*-
律:  irb [選項] [簒譜] [参數]
-f          不讀~/.irbrc
-m          bc態(得簒分數行列)
....

Place the correct magic comments in both of the files.

Then, move the irb directory which contains the translated files into somewhere under $LOAD_PATH . I think $(sitelibdir)/ is preferred.

Finally you can distribute the irb/lc/<your-locale>/* as a gem instead of install it as a legacy style library. Ruby 1.9.1 and the next release of Ruby 1.9.2 can load a localization from a gem.

*1: Japanese localization has encoding_aliases.rb in addition to the two files. But encoding_aliases.rb is just for backward compatibility with Ruby 1.8. You don't have to implement it in your locale

2010-12-31

コミケ79行

オタク趣味日記

コミックマーケット79 に行ってきた。ずいぶん久しぶりの参加な気がする。昨日はPassengerの設定やらで明け方まで作業してしまったこともあるし、昼頃からゆるゆると参加した。

ActiveWorksが復活して新刊出していたけどSymphony (PHP)ネタだったのでパス。代わりにSolaris本の前に逃したやつが今回は在庫があったので買ってきた。

途中、マイコン島に行くと、イカ娘の帽子を作ってるサークルがあって、新刊を買うついでに帽子も買った。実は、この手のイカ娘のなげやりコスはしたいなと思っていたのだけれども、制作が間に合わずに断念したので丁度良い。この帽子を被って他のサークルを回ったりした。

戦利品

2010-12-30

システムをRuby 1.9に移行

ruby

このブログシステムはこれまでRuby 1.8で動いていた。これをRuby 1.9に移行するのは長い間の夢だったのだけれども、なかなか実現できなかった。

まず1.9.1が出た初めの頃にはライブラリの対応がまるで駄目であった。そこで手始めに postgresqlドライバを移植して本家にパッチを送ってみた。あちこちで1.9対応を訴えて、ようやくライブラリが出そろった頃には今度は自分の開発時間がとれなくなっていた。この年末にようやく時間が取れたので、システムを1.9.2に移行した。

Railsのバグを踏んだり、とある古いライブラリは1.9対応していなかったのでちょっとだけ手を入れたり。以下、はまったこと(あとで詳しく書くかも)

application/www-form-urlencodedの中に入っている文字列データのエンコーディングは転送時に失われるので、決め打ちでforce_encodingするしかない。それか、エンコーディングを持つ為だけのparameterを暗黙にフレームワークが足すべきなのかも
RDtoolのfilter機構は一時ファイルに書き出した時点でデータのエンコーディングが失われる。期待するのはUTF-8なんだけど、仕方がないからdefault_externalがUTF-8になるように環境変数LANGを変えた。
一部のライブラリ(この場合はbackground_fu)はrubyコマンドを"ruby"で決め打ちしてる。一方、1.8との共存のためにサーバー上ではRuby 1.9.2コマンドにはsuffixが付いている。仕方がないからrbconfigを使うように書き換える。
うっかりaptitudeでpassengerを入れたら、これはRuby 1.8とリンクしていたので実行時に落ちた。Ruby 1.9のgemから入れ直した。
US-ASCII文字列とUTF-8文字列の結合でIncompatibleEncoding例外が発生するのはよく分からないな。バグか?　default_externalをUTF-8にして、外部由来のASCII文字列がUTF-8になるように修正して逃げた

2010-12-07

テクノロジーの世界の女性のロールモデルについて考えてみた

性同一性障害ソフトウェア開発

最近、「テクノロジー(あるいはオープンソース)の世界で目立つ女性というのは珍しい」というようなことを言われることが何件か重なった。「 Rubyがそろそろ一回終わってみるべき10の理由」とか、その他何件かね。その重なりは私に、何か色々なことを考えさせた。考えたことについて何とはなしに書き下してみようと思う。

前提と社会

何とは言ってもオープンソースの世界で活躍しようと思ったら、それが好きでなければならない。そりゃあ、今時はオープンソースを積極的に貢献し、それを利用しようとする企業も少なくない。しかし、開発コミュニティには沢山の、開発が好きで好きで仕方が無くてそれに時間を努力を惜しまない人々がいる。その中でなにがしかをなしとげようと思ったら、やっぱり「業務命令だから」じゃなく「好きだから」でなければやっていくのは難しいだろう。

で、ソフトウェア開発が好きで好きで仕方がない女性ってのはどれだけ居るんだろうね。ソフトウェア開発はとても人間的な活動で、いろいろな人と関わって、いろいろな人の考えに思いを巡らせる社会的な活動だ。でも、その入り口はとても無骨な技術の姿をしている。テクノロジー、コンピュータ。所謂"理系"の世界ってものだ。そもそも、"理系"とされる分野に進む女性が少ないしね。

何故、"理系"に女性が少ないのかは様々な要因が考えられて、それは既に多く語られていると同時にそれが完全な回答ってわけでもなさそうだ。ざっと思いつく限りでも、まずひょっとしたら生物学的要因に基づく向き不向きがあるのかも知れない。そういうのは女の子らしくないって考えられていて、進路のあちこちに無意識の抑圧があるのかも知れない。

で、こうして生まれた全体的な傾向はポジティブフィードバックする。つまり、ここにカーネルハッカーになる素質をもった女の子が居たとしよう。カーネルいじって喜んでいてもそれが同性の友達に通じる可能性は少ないし、通じる話題を持たなければつまはじきにされるし、あること無いこと噂を立てられる。幸運にして話の通じる「男の子」の友達が見つかったとしても、それは確かに友達が見つかったという点では良いことだけれども、しかし、女の子のネットワークの間では悪評の要因になったりする。だからカーネルをいじる時間はあったとしても少なくなる。だから、彼女はカーネルハッカーとして成長する上でハンディキャップを持つ。そこまで育たないかも知れない。もっと他のことを面白いと感じるようになって、例えば服飾の道にでも進むかも知れない。

また、野望や目標は状況により作られるものだ。たいていの人にとって。誰も思いつきもしないことを欲望することができるのは、新しい時代を切り開く天才だけだ。普通の人は、精々誰かが似たようなことをやっているのを見て、自分もそうしたいと思う程度のことしかできない。だから、テクノロジーの世界で女性が活躍するのをあまり知らずに育った女の子は、自分がテクノロジーの世界で活躍するという可能性自体が頭の中にない。そして、その可能性を欲望しない。それを面白そうな、素敵な将来だと思ったりしない。

ロールモデル

この状況を断ち切るにはどうしたら良いだろう。テクノロジーの世界で活躍する女性の姿が目立てば目立つほど、将来の世代が自分の1つの可能性を知る機会が増える。ロールモデルというものだ。沢山のロールモデルが提供されれば、いずれは「女の子らしくない」とかいうよく分からない幻想も崩壊する。無意識に共有されるそうした幻想はなかったことになる。

だから、誰かが「テクノロジーの世界で目立っている女性」になるのは良いことだ。

実は、そんなことを考えて先日楽天テクノロジーアワードという賞を頂戴した。誰かが目立たなければならないなら、その1人として私は目立とうと思う。そういうわけで、賞は有り難く頂戴したし、私はできるだけインタビューやら講演やらを断らないようにしてる。まー、本当は、女性に対してという以上にトランスセクシュアルに対してのロールモデルを提供するためというのが一番の目的ではあるんだけど。

立場

さて、ここで考えてしまう。私はオープンソースの世界で活躍する女性のために何か役に立てたらいいなと思うのも動機の一つとして、目立とうともする。しかし、私は、私が役に立ちたいと思っている女性たちが置かれている困難を必ずしも共有はしていないんだな。

私は、シス・ジェンダーの女の子とも男の子とも決定的に違ったから、いつでも「男女」って分類と制約からはある程度の自由を持っていた。あるいはそれは、「男女」って分類を前提としたシステムから疎外されていたともいうけど、システムから疎外されればシステムの提供する膨大なメリットを失うと同時にシステムが課す制約からは逃れられる。名目上こそそのシステム内に居ることになっていたから、完全に自由ではなかったけど、少なくともそのシステムは私の形には合っていなかったから、留め具が部品に合わないみたいに、そこには部品が自由に動ける(動いてしまう)余地があった。つまり、私は女の子がテクノロジーを身につける過程で受けるであろう意識的・無意識的な、他者からあるいは自身からの抑圧を、かなりの部分受けないで済んだ。

また、こんなこともあった。性別違和感や、自分を性同一性の指し示すところとは異なる存在であると思い込もうとすることは生活のすべてを灰色にする。普通ならば喜びであるはずの、ファッションやら家族とのやりとりやら、食べること話すこと歌うこと走ること、日常の様々なことが私には苦痛でしかなかったから、私はテクノロジーや理学分野 *1 に逃避した。だから、私はそれほど迷わずにテクノロジーに没頭できた。

ってな訳で、困難を共有していない私がモデルとなることはできるのであろうか、と考えたりした。答えは出なかったし、「女性のためのロールモデル」は私の一番目の目標ではないし、別に私こそがなるという必要はないし、なれないならなれないで良いんだけれど。

蛇足

時期的に重なったので弾さんの「好き嫌い以前の問題として - 書評 - 女ぎらい」における私への言及も気にはなったけど、色々考えた結果、ここで語ったような内容とはあまり関係しなかった。

弾さんが言っているのは、まず生活のすべてにおいて何らかの性に関わる事項が立ち現れる訳ではないということだ。そりゃそうだ。また、弾さんの論に立ち現れるのは精々、性的自己認知や社会的関係性における性別だが、そのコンテキスト・「射影関数」の限りでは私は女性と断言して差し支えない。一方、適切な射影関数の定義操作をしなければ、弾さん自身についてさえ「性別は○○である」と短く述べることはできない。多くの人が自己の性別を単純に何かであると言い切ることができるのは、日常で用いられる範囲の射影関数の限りにおいて、概ね同値関係が成立するような性別「男・女」が存在し、その人がそれらの関数の限りにおいてはどちらかに移されるからに過ぎない。関数の選択を一般に広げるならば、私に限らず、誰もがその人でしかない。何人も、何か事前に共有された簡潔なもので指し示されることはできない。

*1:だけではないけど。小説書いたり生徒会やったり部活を率いたり、要するに、生活に密着すること以外の何かに。誰もが当たり前にする日常のことではないほど、私はそれが得意だった

2010-10-12

Pulling Strings with Puppet

システム管理 ruby 読書感想

Pulling Strings with Puppet: Configuration Management Made Easy (FirstPress)

作者: James Turnbull
出版社/メーカー: Apress
発売日: 2008/07/31
メディア: ペーパーバック
クリック: 4回
この商品を含むブログ (1件) を見る

『Pulling Strings with Puppet』を読んだ。 Puppet の入門書である。

Puppetとは

Puppetはシステム設定の維持管理を自動化するためのツールで、cfengineやらchefやらと同系統のものだ。クライアント/サーバー方式が基本となっていて、中央のサーバー(puppetmasterd)でシステムの設定情報を集中管理する。例えば「webサーバー機にはapache2が入っていること」とか「apache2にバーチャルホストfoo.exampleが切ってあること」だとか。クライアント(被設定環境, puppetd)はこの情報を受信する。そして、自分の環境とあるべき設定を見比べて違いがある場合はあるべき状態に持って行く。

状態を適合させるための仕組みは"provider"と呼ばれて適切な実装が選択される。たとえばパッケージングシステムがaptであれportsであれ、「apache2を入れる」ということができて、サーバー側の設定情報記述はクライアント(被設定)環境に依存しない。どのマシン(ノード)がどのような種類(webサーバー, アプリケーションサーバー, DBサーバー....)であるか(class)という情報はサーバー側でファイルに書いて管理することもできるし、LDAPで管理することもできる。

この他、望むならばサーバーを立てずにスタンドアロンで手元の設定をローカルマシンに適用したり、サーバーを小さなファイル群を提供するためのファイルサーバーにできたり色々機能がある。

書籍

この本は、システム管理者向けのPuppetの薄くて手軽な入門書である。Puppetの概念、記法などを丁寧に書いてある。私も一応、今までpuppetを扱った雑誌記事やら本家のwiki pageやらを読んできたものの、今ひとつすっきりしない部分があった。この本はそういった理解の怪しい部分を解決してくれた。

Puppetの設定は独自の宣言的な外部DSLで書かれるので、puppetを使うだけならばrubyの知識は必要がない。しかし、puppetを拡張するにはrubyが必要である。この本の巻末にも少しだけpuppetを拡張するための知識が書いてあるものの、本気で独自のtypeを書いたりproviderを書いたりと拡張を必要とするならば巻末の情報だけでは不足だろう。ただし、より詳しい資料へのポインタは書いてあるので十分役に立った。