G-expressions: Makings of a Make-killer?

Table of Contents

G-expressions: Makings of a Make-killer?

Published by Arun Isaac on

In other languages: தமிழ்

Tags: software, guix, lisp, scheme

Can Guix's G-expressions make for a superior Make-like build tool? Here's a proof-of-concept implementation imagining such a future.

The venerable Make is the foundation of many a build system. But its flaws and limitations are the stuff of much hair-pulling, and there have been many attempts to reinvent Make but do it right. Here is one more such attempt, this time with Guix, the powerful and precise functional package manager.

Thanks to Dan for prompting me to think about this in a recent Guix London meetup.

A quick primer to G-expressions

Under the hood, Guix is powered by G-expressions, an expressive and elegant domain-specific language (DSL) embedded in Scheme. G-expressions are a staging language used to express computation that should be performed by the Guix daemon. G-expressions are compiled down to low-level objects known as derivations and passed to the Guix daemon via RPC.

In this blog post, we will use local-file and computed-file to run computation on the Guix daemon. So, let me quickly introduce them to readers who may be unfamiliar with them.

Building store items

First, computed-file. This allows us to run arbitrary scheme code and produce an output store item. For example, here's a computed-file that produces a store item, foo, with the text Hello world in it.

;; foo.scm
(use-modules (guix gexp))

(define foo-gexp
  #~(begin
      (call-with-output-file #$output
        (lambda (port)
          (display "Hello world" port)
          (newline port)))))

(computed-file "foo" foo-gexp)

The #~ and #$ are called gexp and ungexp, and are analogous to the backquote and unquote operators ` and ,. When the G-expression is compiled to a derivation, the #$output in the G-expression is replaced with the absolute store path to the output.

We may build foo.scm like so:

$ guix build -f foo.scm
The following derivation will be built:
  /gnu/store/x1szpqlnl53sggl6zvfqjab426672v0q-foo.drv
building /gnu/store/x1szpqlnl53sggl6zvfqjab426672v0q-foo.drv...
successfully built /gnu/store/x1szpqlnl53sggl6zvfqjab426672v0q-foo.drv
/gnu/store/dnrq880msz18bs0knddvvf1lb0hb5z27-foo

Looking into /gnu/store/dnrq880msz18bs0knddvvf1lb0hb5z27-foo, we see Hello world as expected.

$ cat /gnu/store/dnrq880msz18bs0knddvvf1lb0hb5z27-foo
Hello world

Interning files into the store

Now, suppose we want a computed-file to perform some operation on a file outside the store. Since the Guix daemon can only see and operate on store items, it is not immediately apparent how to do this. So, we first need to get our file into the store. This is called interning, and this is where local-file comes in.

Here's an example that interns a file foo into the store, computes its size, and writes that size to a store item length.

;; length.scm
(use-modules (guix gexp))

(define length-gexp
  #~(begin
      (use-modules (rnrs bytevectors)
                   (rnrs io ports))

      (call-with-output-file #$output
        (lambda (port)
          (display (bytevector-length
                    (call-with-input-file #$(local-file "foo")
                      get-bytevector-all))
                   port)
          (newline port)))))

(computed-file "length" length-gexp)

Just as before, the expression in #$ is replaced with the absolute store path to the interned file. Building and inspecting the output, we see:

$ cat foo
The quick brown fox jumps over the lazy dog.
$ guix build -f length.scm
The following derivation will be built:
  /gnu/store/msm070kdk94vn5lwwnh6h2h0a4f0pfng-length.drv
building /gnu/store/msm070kdk94vn5lwwnh6h2h0a4f0pfng-length.drv...
successfully built /gnu/store/msm070kdk94vn5lwwnh6h2h0a4f0pfng-length.drv
/gnu/store/m5v48lgiq0wkkybm77kxcvayf1c5mhqk-length
$ cat /gnu/store/m5v48lgiq0wkkybm77kxcvayf1c5mhqk-length
45

Our G-expression build system

Hello world

Now that we are done and over with the preliminaries, let's try to use G-expressions to compile a simple C hello world program. Here's hello.c, the program.

#include <stdio.h>

int main ()
{
  printf("Hello world!\n");
  return 0;
}

Let's express the build rules in a make.scm in the form of G-expressions.

(use-modules (gnu packages commencement)
             (guix gexp))

(define hello-gexp
  (with-imported-modules '((guix build utils))
    #~(begin
        (use-modules (guix build utils))

        (set-path-environment-variable "PATH"
                                       '("bin")
                                       (list #$gcc-toolchain))
        (set-path-environment-variable "C_INCLUDE_PATH"
                                       '("include")
                                       (list #$gcc-toolchain))
        (set-path-environment-variable "LIBRARY_PATH"
                                       '("lib")
                                       (list #$gcc-toolchain))
        (invoke "gcc" #$(local-file "hello.c")
                "-o" #$output))))

(computed-file "hello" hello-gexp)

Building this and running, we see:

$ guix build -f make.scm
The following derivation will be built:
  /gnu/store/0rdw34dj0wpk6xfd5pbavf0xbjp84nw0-hello.drv
building /gnu/store/0rdw34dj0wpk6xfd5pbavf0xbjp84nw0-hello.drv...
environment variable `PATH' set to `/gnu/store/bymwhjw3jhh1izylgwsrlv0w0y881chq-gcc-toolchain-11.4.0/bin'
environment variable `C_INCLUDE_PATH' set to `/gnu/store/bymwhjw3jhh1izylgwsrlv0w0y881chq-gcc-toolchain-11.4.0/include'
environment variable `LIBRARY_PATH' set to `/gnu/store/bymwhjw3jhh1izylgwsrlv0w0y881chq-gcc-toolchain-11.4.0/lib'
successfully built /gnu/store/0rdw34dj0wpk6xfd5pbavf0xbjp84nw0-hello.drv
/gnu/store/y72klmvlmf4v7kbhp6krpj8rbv4xracq-hello
$ /gnu/store/y72klmvlmf4v7kbhp6krpj8rbv4xracq-hello
Hello world!

Just as with Make, hello is rebuilt only if hello.c changes. But, unlike with Make, hello is also rebuilt if anything in hello-gexp changes. In effect, this is like having a Make that can detect changes to its Makefile and rebuild accordingly. In fact, we go even beyond that and can detect changes to gcc-toolchain or indeed any of the dependencies that were used to build gcc-toolchain. Thus, everytime you run a guix build -f make.scm, you know you are getting the most up-to-date and correct build of hello. You never have to force a make clean and rebuild everything from scratch. This is the same wonderful experience we enjoy when we run guix build to build a Guix package, but now for your everyday C compiler runs as well.

A more complex C program—building sent

A hello world build system may seem too trivial. How well does this system hold up for more complex projects? Let's try to write a make.scm build system for sent, the suckless plaintext presentation tool. sent has several separate source files and headers. So hopefully, this will be a reasonable test case.

(use-modules (gnu packages commencement)
             (gnu packages fontutils)
             (gnu packages xorg)
             (guix gexp))

(define set-up-gcc-gexp
  (with-imported-modules '((guix build utils))
    #~(begin
        (use-modules (guix build utils))

        (set-path-environment-variable "PATH"
                                       '("bin")
                                       (list #$gcc-toolchain))
        (set-path-environment-variable "C_INCLUDE_PATH"
                                       '("include")
                                       (list #$gcc-toolchain))
        (set-path-environment-variable "LIBRARY_PATH"
                                       '("lib")
                                       (list #$gcc-toolchain)))))

(define* (compile source-filename #:key (flags '()))
  (computed-file (string-append (basename source-filename ".c")
                                ".o")
                 #~(begin
                     #$set-up-gcc-gexp
                     (invoke "gcc" "-c" #$(local-file source-filename)
                             "-o" #$output
                             #$@flags))))

(define* (link output-filename object-files #:key (flags '()))
  (computed-file output-filename
                 #~(begin
                     #$set-up-gcc-gexp
                     (invoke "gcc" "-o" #$output
                             #$@object-files
                             #$@flags))))

(let ((include-dependencies
       (directory-union "include"
                        (list (file-append fontconfig "/include")
                              (file-append freetype "/include/freetype2")
                              (file-append libx11 "/include")
                              (file-append libxft "/include")
                              (file-append libxrender "/include")
                              (file-append xorgproto "/include")))))
  (link "sent"
        (list (let ((headers `(("arg.h" ,(local-file "sent/arg.h"))
                               ("config.h" ,(local-file "sent/config.def.h"))
                               ("drw.h" ,(local-file "sent/drw.h"))
                               ("util.h" ,(local-file "sent/util.h")))))
                (compile "sent/sent.c"
                         #:flags (list "-DVERSION=\"1\""
                                       "-I" include-dependencies
                                       "-I" (file-union "include" headers))))
              (let ((headers `(("drw.h" ,(local-file "sent/drw.h"))
                               ("util.h" ,(local-file "sent/util.h")))))
                (compile "sent/drw.c"
                         #:flags (list "-I" include-dependencies
                                       "-I" (file-union "include" headers))))
              (let ((headers `(("util.h" ,(local-file "sent/util.h")))))
                (compile "sent/util.c"
                         #:flags (list "-I" (file-union "include" headers)))))
        #:flags (list "-L" (file-append fontconfig "/lib")
                      "-L" (file-append libx11 "/lib")
                      "-L" (file-append libxft "/lib")
                      "-lfontconfig" "-lm" "-lX11" "-lXft")))

There you go! A bit long, but still very comprehensible. To build this, we must first download the sent release tarball and extract it.

$ wget https://dl.suckless.org/tools/sent-1.tar.gz
$ mkdir sent
$ tar -C sent -xvf sent-1.tar.gz

Then, we build as usual.

$ guix build -f make.scm
The following derivations will be built:
  /gnu/store/1xh1s4hs7m3mh9qfak4afzrjmhz84rlq-drw.o.drv
  /gnu/store/3z5f5z5dpz9sp13zrbfscjw9dmkknflv-util.o.drv
  /gnu/store/m7rvki1kyyh2gpwmx8y71dabj1zlihn2-sent.o.drv
  /gnu/store/2649ylh902lfd695avpy5p52h6ixawvr-sent.drv
building /gnu/store/1xh1s4hs7m3mh9qfak4afzrjmhz84rlq-drw.o.drv...
environment variable `PATH' set to `/gnu/store/bymwhjw3jhh1izylgwsrlv0w0y881chq-gcc-toolchain-11.4.0/bin'
environment variable `C_INCLUDE_PATH' set to `/gnu/store/bymwhjw3jhh1izylgwsrlv0w0y881chq-gcc-toolchain-11.4.0/include'
environment variable `LIBRARY_PATH' set to `/gnu/store/bymwhjw3jhh1izylgwsrlv0w0y881chq-gcc-toolchain-11.4.0/lib'
successfully built /gnu/store/1xh1s4hs7m3mh9qfak4afzrjmhz84rlq-drw.o.drv
building /gnu/store/m7rvki1kyyh2gpwmx8y71dabj1zlihn2-sent.o.drv...
environment variable `PATH' set to `/gnu/store/bymwhjw3jhh1izylgwsrlv0w0y881chq-gcc-toolchain-11.4.0/bin'
environment variable `C_INCLUDE_PATH' set to `/gnu/store/bymwhjw3jhh1izylgwsrlv0w0y881chq-gcc-toolchain-11.4.0/include'
environment variable `LIBRARY_PATH' set to `/gnu/store/bymwhjw3jhh1izylgwsrlv0w0y881chq-gcc-toolchain-11.4.0/lib'
successfully built /gnu/store/m7rvki1kyyh2gpwmx8y71dabj1zlihn2-sent.o.drv
building /gnu/store/3z5f5z5dpz9sp13zrbfscjw9dmkknflv-util.o.drv...
environment variable `PATH' set to `/gnu/store/bymwhjw3jhh1izylgwsrlv0w0y881chq-gcc-toolchain-11.4.0/bin'
environment variable `C_INCLUDE_PATH' set to `/gnu/store/bymwhjw3jhh1izylgwsrlv0w0y881chq-gcc-toolchain-11.4.0/include'
environment variable `LIBRARY_PATH' set to `/gnu/store/bymwhjw3jhh1izylgwsrlv0w0y881chq-gcc-toolchain-11.4.0/lib'
successfully built /gnu/store/3z5f5z5dpz9sp13zrbfscjw9dmkknflv-util.o.drv
building /gnu/store/2649ylh902lfd695avpy5p52h6ixawvr-sent.drv...
environment variable `PATH' set to `/gnu/store/bymwhjw3jhh1izylgwsrlv0w0y881chq-gcc-toolchain-11.4.0/bin'
environment variable `C_INCLUDE_PATH' set to `/gnu/store/bymwhjw3jhh1izylgwsrlv0w0y881chq-gcc-toolchain-11.4.0/include'
environment variable `LIBRARY_PATH' set to `/gnu/store/bymwhjw3jhh1izylgwsrlv0w0y881chq-gcc-toolchain-11.4.0/lib'
successfully built /gnu/store/2649ylh902lfd695avpy5p52h6ixawvr-sent.drv
/gnu/store/q2liz3kcivy0czys2nw8q3r5nsy2r6bi-sent

I have shown here a static mapping of dependencies between source files and headers. We don't have dynamic detection of headers in the manner of gcc -MM. However, we have all the power of Scheme in our make.scm files, and it should be straightforward to implement such dynamic dependencies.

More examples

I'll stop here in the interest of keeping this blog post brief. But, I have created a companion repo with more examples. And, of particular interest may be a build system for a statically linked sent1 and a build system for a pandoc powered blog.

You can try out the pandoc blog build system like so.

$ git clone https://git.systemreboot.net/gexp-make/
$ cd gexp-make/pandoc-blog
$ guix build -f make.scm

Discussion

USPs

No make clean or make -B necessary

Our build system is aware of changes to source files, the build rules themselves, and even to the full transitive closure of dependencies. Anytime any of these changes, we know to rebuild all affected build products. Therefore, you never have to run anything like make clean or make -B just to be sure. You have complete peace of mind that what you get is what you ordered.

No timestamps

Our build system is entirely based on content hashes. We never use timestamps. Timestamps are surprisingly error-prone, and full of dangerous pitfalls. See mtime comparison considered harmful.

Directories are not special

Make, due to its reliance on timestamps, does not handle directories very well. This is a mess, and workarounds abound. To us, files and directories are all the same—they are just store items.

autotools, cmake, etc. rendered obsolete

Just like Guix itself, our build system is precise par excellence. We specify all dependencies very precisely down to exact versions, commits, compiler toolchain that built them, etc. This means that all the old cruft like autotools, cmake, etc. is rendered obsolete and replaced by clean Scheme code.

Pristine source tree

All build outputs are in the store. You don't have to make clean even to clean up your source tree. And, you don't even need a gitignore!

Everything is Scheme

Make and the many build systems contending for its place invent their own language with their own syntax. Our build system, on the other hand, uses good old Scheme. S-expressions free us from syntax, and enable easy composition of our DSLs. Extending our build system, say for supporting your favourite programming language better, is a simple matter of writing a Scheme library.

The Build Systems à la Carte taxonomy

In the language of Build Systems à la Carte (an excellent paper I highly recommend reading!), our proposed build system is a minimal cloud build system without early cutoff. It keeps a cloud cache (the Guix store) as persistent build information and uses a topological scheduler with static dependencies.

Minimality

We never rebuild anything unless the inputs or the build rules have changed. Therefore, our build system is minimal.

Persistent build information

We cache all past builds in the content-addressed Guix store. This is how we know not to repeat builds.

Scheduler

Tasks are executed in topological order.

Dependencies

Dependencies are static. But, we could cheat a little, like Make does, and make it pseudo-dynamic by constructing parts of the G-expressions programmatically using Scheme code. For example, in the sent make.scm, we could write some Scheme code to automatically find the headers that each source file depends on.

Early cutoff

Guix does not implement early cutoff optimization at the moment. But, it could do so in the future. Early cutoff optimization may even help alleviate some of the world-rebuilding that burdens the project's build farms.

Cloud caching

The Guix store is shared by all users on the machine. So, in effect, this constitutes a "cloud build system", where users can benefit from build products created by other users. With guix offload, this can even happen over the network allowing users on different machines to benefit from each other's builds. This particularly makes sense in projects where build tools are computationally expensive and can benefit from running on a powerful build farm. FPGA toolchains come to mind.

Obstacles to adoption and possible mitigation

Needless to say, our G-expression build system isn't exactly going to take the world by storm. It depends on Guix, a rather heavy-weight dependency. And, it requires a running Guix daemon. Users of other distros, who are not (yet?) sold on Guix, will be hesitant to pick this up. If you use this build system in your project, you'll probably just frustrate packagers and make your project unpopular. However, some mitigation may be possible.

Rootless Guix daemon

A rootless Guix daemon has recently come out. Guix might, in the future, be able to spawn off a daemon on demand thus hiding the daemon from the user's point of view. Thus the user may not have to explicitly install and manage a Guix daemon.

As one of two build systems

A project may choose to have two separate build systems—one G-expression powered and the other a more conventional Make-like build system. The G-expression powered build system may be more powerful and recommended for developers. But downstream and casual users may use the alternative conventional build system. And, depending on the complexity of the project, casual users and packagers who don't hack on the code may only need a build script, not a full-fledged Make-like build system.

DSL that can compile to a Makefile1

Another way could be to have a higher-level DSL that can be compiled both to a Makefile and to our G-expression build system. That way, all parties are happy. But, that's starting to sound a lot like automake. So, beware!

For private repos

A likely niche use case is for private repos such as blogs which need only be built by the one person running the blog. Thus, there is no need to consider others' opinions, and one may simply do as one sees fit. This may apply equally well to small internal projects maintained by and for small communities or teams with complete agreement on the value of this build system.

Footnotes:

1

Thanks to Pjotr for suggesting this.