Scoping and Thunking in Let Declarations

I haven’t stopped fiddling with language development. Actually, I still do it almost full-time in my free time. My latest prototype, Scotch, which uses Haskell-like syntax and semantics, is coming close to a point where I can start developing the main libraries. Close, but there are a few kinks. Here I’ll be covering an issue I’m having with let declarations.

Specifically, what I’m struggling to get my head around is how to model scoping of values declared within let declarations. (Read about let declarations in Learn You A Haskell.)

In this code, the child declaration y uses child declaration z, which is used by the body of the let. Consider for a moment: What if z had side effects?

Scotch Let Declaration
1
2
3
4
f :: s -> (a, s)
f x = let y = something with z
          z = something else
          in do things with y and then z again

How Are Side Effects A Problem In Child Scopes?

Scotch is compiled into JVM bytecode. Value and function declarations are encoded as static methods returning Java 8 lambdas. The declarations internally use thunks to suspend evaluation but also to retain the evaluated result.

Scotch “add” Function Encoded As Static Java Method
1
2
3
4
5
6
7
8
9
10
11
12
13
// What x + y would look like compiled:
//
// This uses the runtime support functions applicable() and callable() to
// automatically generate the relevant thunk types.
public static Applicable<Integer, Applicable<Integer, Integer>> add() {
  return applicable(
    augend -> applicable(
      addend -> callable(
        () -> augend.call() + addend.call()
      )
    )
  );
}
What A Thunk Looks Like
1
2
3
4
5
6
7
8
9
10
11
12
public abstract class Thunk<A> implements Callable<A> {
  private A value;

  public A call() {
    if (value == null) {
      value = evaluate();
    }
    return value;
  }

  protected abstract A evaluate();
}

The reason for using thunks is to support lazy evaluation. Arguments passed into functions are not evaluated until they are referenced. This also comes with the benefit that arguments are only ever evaluated once. Because child declarations form closures over variables in their parent scope, this also ensures local variables are also only evaluated once within these child declarations.

Top-level declarations, like values and functions, are encoded as static Java methods. Each time they are referenced, a new thunk is returned. In let declarations, this poses a problem because every time a declaration is referenced, it returns a new thunk. Say for example we have a child declaration called “username” which fetches a username from a database. Every time the associated thunk is referenced, the database is hit:

What It Looks Like in Pseudo-Java
1
2
3
4
5
6
7
8
static String username() {
return new Thunk() { /* database query */ }
}

static String greet(String msg) {
  log("greeting " + username())
  return msg + " " + username() + "!"
}

This becomes a major problem in Scotch because the values with side-effects look like variables!

What it looks like in Scotch
1
2
3
4
5
6
7
greet msg =
    let username = do
          something to get
          this from a database
    in do
      log "greeting " ++ username
      msg ++ " " ++ username ++ "!"

Modeling Let As A Function

I’ve been mulling around a few crazy ways to model let declarations. My favorite so far is wrapping the let body up in a function, passing in all declarations as arguments:

Psuedo-Java `let` Modeled With Values As Arguments
1
2
3
4
5
6
7
8
9
10
11
12
static String username() {
  return new Thunk() { /* database query */ }
}

static String greet(String msg) {
  greet_(msg, username())
}

static String greet_(String msg, Thunk<String> username) {
  log("greeting" + username.get())        // here it evaluates for the first time
  return msg + " " + username.get() + "!" // here it gets the value from the initial evaluation above
}

This leverages the existing way of modeling evaluation, but it’s kinda ugly. For the near term, this solution is decent, however it adds the overhead of extra arguments being passed around. I’m not entirely sure how this impacts runtime performance in compiled code, though I would like another way of modeling let declarations to compare side-by-side.

Storing Nested Values As Variables

I really like this method of modeling let because it behaves more like how a let looks in source code. Instead of wrapping a let within another function, the declarations are assigned to local variables:

Values As Variables In Pseudo-Java
1
2
3
4
5
6
7
8
9
static String username() {
  return new Thunk() { /* database query */ }
}

static String greet(String msg) {
  var username = username()
  log("greeting " + username)
  return msg + " " + username + "!"
}

I’m not entirely sure what performance impact of variables in let would be compared to using arguments. Either way, I would be calling the methods associated with each declaration and storing them somewhere. The only thing saved is an additional function call, so any gains in performance would be minuscule.

Why The Indecision?

I’m not sure I’m following the best ways to model let. And I also don’t have any immediate way to test either solution short of branching the code and spiking out the two solutions. The tradeoffs aren’t necessarily obvious, and I won’t see which one is better in the long term compared to the one not chosen.

Compilation is hard. Scotch uses a multi-pass compiler with 12 steps currently, so adding a new language feature or changing an existing one can be very arduous tasks. Ideally, I only want to implement let once.

I will be posting on progress and which solution I end up choosing, stay tuned.

@TODO

  • Write-up on lexical scoping in Scotch.
  • Details on how Scotch compiles values and functions in Java 8 lambdas.
  • How simple, non-recursive closures are implemented.
  • Implementation of co-recursive closures.

Links

Credits

Thanks to my dear friend Alexander Zagniotov who took time out of his busy day to review my multiple drafts.

Sterling Benchmarks

Since mid January, I’ve been developing a functional scripting language I call Sterling. In the past few weeks, Sterling has become nearly usable, but it doesn’t seem to be very fast. So this weekend, I’ve taking the time to create a simple (read: naïve) benchmark.

The benchmark uses a recursive algorithm to calculate the Nth member of the Fibonacci sequence. I’ve implemented both Sterling and Java versions of the algorithm and I will be benchmarking each for comparison.

Read on →

Promoting Changes With App-Config-App

The App-Config-App now lets you promote changes between environments!

How does it work?

Perforce lets you create mappings to define the relationship between two diverging code branches. This allows for easy integration of changes between the two branches by referencing the name of the mapping.

See Perforce’s documentation for more details on the how and why of branch mappings.

The App-Config-App reads these branch mappings in order to create paths for promotion between environments.

Read on →