Parametric Polymorphism and The D Programming Language

What Do We Want from Programming Languages?

  • Conciseness
  • Readability
  • Performance
  • Correctness

Why Is This Hard to Achieve?

Conciseness and readability are more achievable when a language requires fewer type annotations in order to work.

Performance and correctness are more achievable with the presence of type information.

Achieving all of our goals requires a concise way of expressing operations of types of objects generically.

How Do We Attempt to Achieve This?

Two approaches:

  • Just-In-Time Compilation (PyPy, V8, HHVM, etc.)
  • Type Parameterisation (C++, Java/Scala (kind of), Swift, D, Rust, etc.)

Solving Generic Problems in Python

(And in other dynamically-typed languages)

  • Throw away types
  • Rely on duck typing and tests
# Create a one-to-many mapping from a sequence of pairs.
def to_many(sequence_of_pairs):
    mapping = {}

    for left, right in sequence_of_pairs:
        mapping.setdefault(left, []).append(right)

    return mapping
to_many([(1, 2), (1, 3)]) == {1: [2, 3]}                    # lists
to_many(((1, 2), (1, 3))) == {1: [2, 3]}                    # tuples
to_many(OrderedDict(((1, 2), (1, 3))).items()) == {1: [3]}  # OrderedDict?
to_many("garbage")                                          # Uh oh...
to_many(42)                                                 # Help!
Pros Cons
  • Concise
  • Readable
  • Wonderfully Ignorant
  • Hard To Optimise

Why Type Parameters?

You need to apply algorithms over a range of type to save yourself from going mad.

Look at a language without type parameters (C99):

// Sum a sequence of numbers.
int sum(int* pointer, size_t length) {
int total = 0;

for(size_t index = 0; index < length; index++) {
    total += pointer[index];
}

return total;
}

What about summing floats?

You have to write it again!

// Sum a sequence of floating point numbers.
float sum_float(float* pointer, size_t length) {
float total = 0;

for(size_t index = 0; index < length; index++) {
    total += pointer[index];
}

return total;
}

What about summing double-precision floats?

Write it again! ... and again.

// Sum a sequence of double-precision floating point numbers.
double sum_double(double* pointer, size_t length) {
double total = 0;

for(size_t index = 0; index < length; index++) {
    total += pointer[index];
}

return total;
}

Are you bored yet?

What is Parametric Polymorphism?

Parametric polymorphism is obtained when a function works uniformly on a range of types; these types normally exhibit some common structure.

Christopher Strachey, 1967

Add Type Parameters

Let's take the code from before, but let's make the number type itself a parameter of the function.

Let's move to C++:

// Sum a sequence of numbers.
template<typename Number> Number sum(Number* pointer, size_t length) {
    Number total = 0;

    for(size_t index = 0; index < length; index++) {
        total += pointer[index];
    }

    return total;
}

Now it works for all numbers!

int int_array[3] = {1, 2, 3};
float float_array[3] = {1, 2, 3};
double double_array[3] = {1, 2, 3};
long long_array[3] = {1, 2, 3};

sum<int>(int_array, 3) == 6
sum<float>(float_array, 3) == 6.0f
// C++ can infer the type parameters, so we don't need to type them.
sum(double_array, 3) == 6.0
sum(long_array, 3) == 6L

Anything C++ Can Do, D Can Do Better

D can do it too, and better.

// Sum a sequence of numbers.
Number sum(Number)(Number[] numberSlice) {
    Number total = 0;

    foreach(number; numberSlice) {
        total += number;
    }

    return total;
}

The standard library ain't half-bad, too.

import std.algorithm: sum;

But Aren't "Generics" Enough?

The short answer is "no," the long answer involves shaming Java.

Let's try and write to_many in Java...

Let's define a Pair first.

public class Example {
    public static class Pair<T, U> {
        final public T first;
        final public U second;

        public Pair(T first, U second) {
            this.first = first;
            this.second = second;
        }

        public static <T, U> Pair<T, U> of(T first, U second) {
            return new Pair<T, U>(first, second);
        }
    }
/* ... */

Let's bang out the function.

    static <T, U> HashMap<T, ArrayList<U>>
    toMany(Collection<Pair<T, U>> sequenceOfPairs) {
       HashMap<T, ArrayList<U>> mapping = new HashMap<>();

       for (Pair<T, U> pair: sequenceOfPairs) {
            if (!mapping.containsKey(pair.first)) {
                mapping.put(pair.first, new ArrayList<U>());
            }

            mapping.get(pair.first).add(pair.second);
       }

       return mapping;
    }
    public static void main (String[] argv) {
        HashMap<Integer, List<Integer>> expectedMapping = new HashMap<>();
        expectedMapping.put(1, Arrays.asList(2, 3));

        HashMap<Integer, List<Integer>> actualMapping = toMany(
            Arrays.asList(Pair.of(1, 2), Pair.of(1, 3))
        );

        assert actualMapping.equals(expectedMapping);
    }
}

What about other types? This only works for Collection

"Generics" Are Too Limiting

  1. Generics like in Java require some base type, like Collection.
  2. You cannot write algorithms which will apply to yet unknown types.
  3. To truly write generic code, you must be able to write code which accepts new types.
  4. Parametric polymorphism is about defining algorithms for categories of types.

Let's Write It In D!

import std.typecons: Tuple, tuple;

U[][T] toMany(T, U)(Tuple!(T, U)[] sequenceOfPairs) {
    U[][T] mapping;

    foreach(pair; sequenceOfPairs) {
        mapping[pair[0]] ~= pair[1];
    }

    return mapping;
}

// One cool feature: marry code with unit tests
unittest {
    assert(toMany([tuple(1, 2), tuple(1, 3)]) == [1: [2, 3]]);
}

Let's Make it More Generic

Let's define a pair, even though we don't have concrete type.

// Write a function template which acts on a pair.
void pairTest(T)(T anything) {
auto first = anything.first;
auto second = anything.second;
}

// Write a value template which returns true
// if such a function will compile for a given type.
enum isPair(T) = __traits(compiles, pairTest!T);

Make the function accept any type which acts like a pair.

auto toMany(Pair)(Pair[] sequenceOfPairs) if(isPair!Pair) {
alias T = typeof(sequenceOfPairs[0].first);
alias U = typeof(sequenceOfPairs[0].second);

U[][T] mapping;

foreach(pair; sequenceOfPairs) {
    mapping[pair.first] ~= pair.second;
}

return mapping;
}

Now the algorithm can be applied to types not yet known.

struct MyNewType(T, U) {
T first; U second;

this(T first, U second) {
    this.first = first;
    this.second = second;
}

MyNewType!(T, U) of(T, U)(T first, U second) {
    return MyNewType!(T, U)(first, second);
}
}

unittest {
assert(toMany([MyNewType.of(1, 2), MyNewType.of(1, 3)]) == [1: [2, 3]]);
}

It doesn't work for Tuple any more though...

...until we define properties for existing types.

auto first(T)(T pair) { return pair[0]; }
auto second(T)(T pair) { return pair[1]; }

unittest {
    auto someTuple = tuple(1, 2);
    // Any f(x) can be rewritten x.f() in D.
    auto first someTuple.first();
    // Or even x.f
    auto second = someTuple.second;
    // It works again!
    assert(toMany([tuple(1, 2), tuple(1, 3)]) == [1: [2, 3]]);
}

Let's Go Even Further, Operate on Any Sequence

import std.range: isInputRange, ElementType;

// Write the algorithm in terms of InputRange!
auto toMany(R)(R sequenceOfPairs)
if(isInputRange!R && IsPair!(ElementType!R)) {
    alias T = typeof(sequenceOfPairs.front[0]);
    alias U = typeof(sequenceOfPairs.front[1]);

    U[][T] mapping;

    foreach(pair; sequenceOfPairs) {
        mapping[pair[0]] ~= pair[1];
    }

    return mapping;
}

What are Ranges?

Ranges are categories of types which represent "ranges" of elements.

An InputRange is anything which acts like this:

T someRange;

// Check if the range is empty.
while(!someRange.empty) {
    // Get the current value.
    auto someElement = someRange.front;

    // Advance to the next value.
    someRange.popFront();
}

std.range Predicates

Predicate Properties
isInputRange .empty, .front, .popFront()
isForwardRange ..., .save,
isBidirectionalRange ..., .back, .popBack()
isRandomAccessRange ..., [index]
isOutputRange .put(someValue)

Worth mentioning: hasLength, hasSlicing, isInfinite.

Example Applications of Ranges

Implementing sort -u.

#!/usr/bin/env rdmd
import std.stdio;
import std.array;
import std.algorithm;

void main() {
    stdin.byLineCopy(KeepTerminator.yes) // InputRange from stdin
    .array                               // Collect all lines in an array
    .sort                                // New InputRange, sorted items
    .uniq                                // New InputRange, skipping duplicates
    .copy(stdout.lockingTextWriter());   // Write to a stdout OutputRange.
}

FizzBuzz!

import std.range: iota;
import std.algorithm: map, each;
import std.conv: to;
import std.stdio: writeln;

void main() {
    iota(1, 101)
    .map!(
        number =>
            number % 3 == 0 && number % 5 == 0 ? "FizzBuzz" :
            number % 3 == 0 ? "Fizz" :
            number % 5 == 0 ? "Buzz" :
            number.to!string
    )
    .each!writeln;
}

Print the first 50 Fibonacci numbers.

#!/usr/bin/env rdmd
import std.range: recurrence, take;
import std.algorithm: each;
import std.stdio: writeln;

void main() {
    recurrence!((a, n) => a[0] + a[1])(1L, 1L)
    .take(50)
    .each!writeln;
}

Printing all ancestor directories.

import std.range;
import std.algorithm;
import std.path;
import std.file;
import std.stdio;

auto iterate(alias func, T)(T initialValue) {
    return recurrence!((values, index) => func(values[0]))(initialValue);
}

void main() {
    getcwd()                                 // get the working directory.
    .iterate!dirName                         // An infinite range of ancestors.
    .until(rootName(getcwd()), OpenRight.no) // Stop when we hit /, C:\, etc.
    .each!writeln;                           // Print everything to stdout.
}

Writing Ranges from Scratch

Let's implement our iterate algorithm from scratch.

struct IterateRange(alias func, T) {
    private T value;
    // isInfiniteRange!this == true
    enum empty = false;
    T front() { return value; }
    void popFront() { value = func(value); }
    this(T initialValue) {
        value = initialValue;
    }
}

// This will work basically as before.
auto iterate(alias func, T)(T initialValue) {
    return IterateRange!(func, T)(initialValue);
}

What Have We Learned?

  • Generic algorithms are the ultimate form of code reuse.
  • Parametric Polymorphism is needed for true generic programming.
  • Primitives like ranges are fundamental building blocks of complex programs.
  • D is very good at letting you do all of this.

What I Didn't Discuss

D has:

  • A faster compilation time than just about anything.
  • Built-in coverage analysis.
  • Compile-time function execution.
  • Thread local variables and atomics.
  • Java-style classes and interfaces.
  • Inline assembly and SIMD instructions.
  • A built-in garbage collector and other memory allocation schemes.
  • Much more!

Other Notable Languages with Parametric Polymorphism

  • C++14 and C++17
  • Haskell
  • Rust
  • Swift

Thank You!

Further reading: