The Rust Programming Language

Rust is a multi-paradigm programming language often used in systems programming. I’ve been interested in learning Rust for a long time, but have tried numerous times and given up. As we’ll soon see, Rust has a unique ownership system which can take some getting used to - especially if you come from a programming language like Java where references are fairly abstract. In an effort to finally understand Rust, I’m writing this post to break down the language own piece at a time and explain it in a meaningful way to others.

I learned Rust primarily through The Rust Programming Language - a free ebook available on rust-lang.org.

Preliminaries

Cargo

As is common with most modern programming languages, Rust comes bundled with a build system and package manager (e.g. Go, Python, Haskell) which is known as Cargo.

I’ll be calling our first project hello. Navigate to a good place to store our project and run:

$ cargo new hello

Inside of this directory you’ll find the following:

$ ls -a
./ ../ Cargo.toml .git/ src/

There is also a simple hello world program generate inside of src/. To build our project we run cargo build, which generates an executable inside of target/debug/hello. Running this executable gives:

$ ./target/debug/hello
Hello, world!

Note that cargo run is a convenient alias for building and then executing the target binary.

This is all we need to know about Cargo for now - we’ll revisit this later.

Variables, Constants, and Mutability

Creating a variable in Rust is very simple using the let keyword

fn main() {
    let x = 17;
    println!("The value of x is : {}", x);
}

One of the core philosophies of Rust is safety, so by default all variables are immutable. For example, if you try to run:

fn main() {
    let x = 17;
    println!("The value of x is : {}", x);
    x = 5;
    println!("The value of x is now : {}", x);
}

The program will not compile, with the message “cannot assign twice to immutable variable”. In order to make a variable mutable, we use the mut keyword as such:

fn main() {
    let mut x = 17;
    println!("The value of x is : {}", x);
    x = 5;
    println!("The value of x is now : {}", x);
}

Constants are declared using the const keyword, and have some special properties:

Constants must be annotated with a type
Constants are always immutable - you cannot use the mut keyword on them
Constants can be declared in any scope (including the global scope)
Constants may only be initialized to a constant expression (i.e. not a function call, etc)

Rust also has a fairly unique feature known as shadowing. Even with immutable variables, the following is possible:

fn main() {
    let x = 17;
    println!("The value of x is : {}", x);
    let x = 5;
    println!("The value of x is now : {}", x);
}

Since the variable is recreated with the let keyword, this is allowed. Note that there is no restriction on the type of object that is created.

Data Types

Rust has support for the following integer types:

Length   Signed Unsigned 
8-bits   i8     u8
16-bits  i16    u16
32-bits  i32    u32
64-bits  i64    u64
128-bit  i128   u128
arch     isize  usize

Integer literals are defined as such:

Type     Example
Decimal  98_222
Hex      0xff
Octal    0o77
Binary   0b1111_0000
Byte     b'A'

Rust has support for two floating point types f32 and f64

Rust also has support for arrays. Like most languages, arrays have a fixed size and can declared literally:

let a = [1, 2, 3, 4, 5];

You can also explicitly declare the type of the values:

let a: [i32; 1] = [7];

Or, you can initialize the array with a set number of initial values (this will create an array of 10 signed 32-bit integers with an initial value of 0);

let a: [i32; 10] = [0; 10]; 

Personally I think this syntax is a bit gross, as this is unintuitive to parse without knowing explicitly knowing what this syntax means.

Functions

Functions in Rust are fairly straightforward and intuitive, so I’ll just give an example of their syntax here

fn adds_one(x: i32) -> i32 {
    x + 1
}

Note that while the return keyword can be used in Rust, it is not necessary and the last line of a function will be implicitly returned so long as it is an expression and not a statement (note that expressions do not end in semi-colons, while statements do);

Control Flow

As should be no surprise, Rust includes a number of structures for control flow. if expressions are used as such

let x = 10;
if x < 10 {
    println!("x is less than 10!");
} else if x < 6 {
    println!("x is less than 6);
} else {
    println!("x is some other value");
}

Rust also very nicely supports using if and else with let statements

let condition = true;
let x = if condition { 7 } else { 5 };

Wow, that’s so readable! It reminds me of programming in Ruby :)

The loop keyword is a built-in infinite loop that only stops when you tell it to

loop {
    println!("you can't stop me!");
}

Using the break statement will end a loop.

There is also a while loop

let mut x = 5;
while x < 10 {
    println!("x has value {}", x);
    x += 1;
}

For structures that supply iterators, they can be looped over using for

let a = [3, 5, 7, 11, 13];
for prime in a.iter() {
    println!("This prime has value {}", prime);
}

Ownership, References, and Borrowing

One of the biggest design decisions for any programming language is how to address memory management. Languages like C and C++ have manual memory management - users must explicitly allocate and free space on the heap. Most modern languages like Java, Python, or Go use garbage collection to constantly look for and free memory which is no longer being used. Generally speaking, manual memory management prioritizes runtime efficiency over developer efficiency - there’s almost no associated overhead with managing memory, yet is prone to errors and bugs. Garbage collection, while still very fast with modern techniques, reduces the inevitable errors introduced by humans during development while introducing some overhead during runtime to look for and free memory.

Rust takes a different approach with the idea of ownership. While ownership takes some getting used to (especially for me!), it introduces minimal overhead during development time and guarantees memory safety while also being very fast at runtime (since the language doesn’t have to constantly look for and free unused memory).

The Stack and the Heap

In my opinion, one of the easiest ways to wrap your head around ownership is to develop a strong mental model for the stack and the heap. Broadly speaking, there are two ways your program can store data in memory. Things that are of a fixed, known size are stored on the stack. For those familiar, this is just a LIFO queue. Adding and removing things from the stack is cheap. However, the stack has the disadvantage of only being able to store data of fixed sizes that are known at compile time. Other data must instead be stored on the heap. Accessing the heap is less efficient, since we have to go and look for a place to put our data.

If you’ve ever programmed in C or C++ you likely already have a good mental model of the stack versus the heap. Things like arrays and primitive types are stored on the stack. This is why you don’t have to manually free an array in C - once it goes out of scope we just remove it from the stack. However, calls to something like malloc will use memory on the heap. That is why you must associated each call to malloc with exactly one call to free.

I’m sure some C programmers will point out that the stack and heap are implementation details and not specific to the language itself, but they’re so commonplace that I’m going to reference them anyway.

In Rust, references are stored on the stack. However, Rust is very particular about how many references may point to memory on the heap, meaning it can much more efficiently evaluate when memory is unused and free it (that’s what we’ll talk about next).

Ownership

To get started understanding ownership, we need to know the rules that Rust uses:

Each value in Rust has a variable called it’s owner
Each value can only be owned by one variable at a time.
When the owner goes out of scope, the value is dropped.

Let’s start by using the example of Strings (note the uppercase S). The String type in Rust is a mutable, growable piece of text. To create a String we would do the following

{
    let s = String::from("hello world");
}

Inside of the specified scope we’ve created a variable s which lives on the stack. However, the actual contents of the string, hello world, are allocated on the heap. s is the owner of the value “hello world”, and as soon as s goes out of scope those values will be freed.

So far this seems pretty simple.

Let’s say that we take another variable, say s2, and initialize it based on s1 like so

{
    let s1 = String::from("hello world");
    let s2 = s1;
}

For any fellow Java programmers who have never read Rust code, what I’m about to say is going to blow your mind. If, after you had created s2, you tried to reference s1, you would get a COMPILER ERROR. So, for example, the following code will not compile.

{
    let s1 = String::from("hello world");
    let s2 = s1;
    println!("s1 has value {}", s1);
}

The first time I saw this I thought it was absolutely insane. The idea here is that since there can only be one owner of the data hello world at a time we can just free that data whenever it’s owner goes out of scope.

Now, using only the tools we have right now, programming in Rust would be very cumbersome. Suppose that you needed to call a function and pass s1 as an argument. Such an operation would take ownership of s1, hence leaving it out of scope once the function finishes executing. This would require us to do something like

fn main() {
    let s1 = String::from("hello world");
    let s1 = take_ownership_and_give_it_back(s1);
    println!("{}" s1);
}

fn take_ownership_and_give_it_back(s: String) -> String {
    s
}

So anytime we would want to call a function we would need to return the parameters back to us in order to keep them from dropping out of scope. Just imagine how cumbersome this would be with multiple arguments and returning tuples….

Luckily for us, the creators of Rust foresaw this problem and came up with a solution: references (no, the Rust creators did not invent references)

References and Borrowing

In order to make working with values more manageable, Rust introduces the notion of a reference. A reference is a variable which points to a value (data) but does not own it. When a reference goes out of scope it does not free the associated data. We use references like such:

fn main() {
    let s1 = String::from("hello world");
    print_value(&s1);
    println!("Value in main: {}", s1);
}

fn print_value(s: &String) {
    println!("Value in function: {}", s); 
}

Let’s take note of a few things here. First, we must be very explicit about references: we specify that we’re passing a reference when we call a function (i.e. print_value(&s1)) and we’re explicit about the fact that the parameter itself is a reference (i.e. fn print_value(s: &String) {).

Note that references are read only, meaning that any attempt to change the data on the heap referenced by s in the print_value function about would result in a compiler error. If we want our function to be able to modify our data we must pass it a mutable reference, as well as make our original variable s1 mutable as well.

fn main() {
    let mut s1 = String::from("hello world");
    add_exclamation_point(&mut s1);
    println!("{}", s1);
}

fn add_exclamation_point(s: &mut String) {
    s.push_str("!");
}

Note that there are some restrictions with mutable references. Namely, each scope must only contain one mutable reference to a piece of data.

Lifetimes

The final concept associated with Rust’s ownership system is the idea of a lifetime. To introduce the idea of lifetimes, let’s consider the following example

{
    let x;
    {
        let y = 17;
        x = &y;
    } // Point A
    println!("x has value: {}", x);
}

Before even running this code let’s try to imagine what would happen. In an outer scope we declare a variable x. In an inner scope we create a variable y which is the owner of the value 17. We set x to point to this value, 17. Recall from our discussion of ownership that once an owner goes out of scope it and all it’s associated data is freed. So, at the line which I’ve labeled as Point A, y will no longer exist, nor will it’s associated data. This means that x is now pointing to some bit of memory which has been freed!

Luckily for us, Rust does not let this code compile. Rust has a borrow checker which compares the lifetimes of references and determines whether or not references are valid.

So far this might seem fairly intuitive. However, things get a little more complicated when we consider functions.

First, let’s consider a fact which might be non-obvious. Suppose I write a function which takes a str reference as a parameter and also returns a str reference. Then the returned reference is either 1) a reference to the parameters or 2) a reference to some static data (like a constant)

So, for example, the following code does not compile:

const S: &str = "hi";

fn main() {
    let s1 = String::from("hello world");
    let s2 = func(&s1);
}

fn func(s: &str) -> &str {
    let s1 = String::from("howdy earth");
    &s1
}

Why? Well, this is a consequence of our ownership rules. The owner of the data howdy earth is s1, which lives in our func function. Once that goes out of scope it is removed. Returning a reference to that data, &s1, is meaningless since references don’t determine ownership. Thus, the only legal ways to write func above are either to return s or S inside our function. It’s because of this fact that Rust doesn’t require us to write this function with explicitly declared lifetimes. However, for the sake of understand, let’s annotate it ourselves.

const S: &str = "hi";

fn main() {
    let s1 = String::from("hello world");
    let s2 = func(&s1);
}

fn func<'a>(s: &'a str) -> &'a str {
    let s1 = String::from("howdy earth");
    &s1
}

Side note, I think Rust’s syntax for lifetimes is gross. But let’s not get bogged down arguing over syntax and instead focus on the concepts. The first part of the function signature, <'a> defines which generic lifetimes are going to be available in our function. s: &'a str binds the lifetime a to s, and the -> &'a str tells the compiler the returned reference as the same lifetime, a.

To me this makes perfect sense. However, it’s not clear why this is at all necessary. Let’s consider an example that is ambiguous.

Suppose we’re writing a function which takes two str references as parameters and returns the reference which is of longer length, like so.

fn main() {
    let s1 = String::from("hello world");
    let s2 = String::from("josh");

    let r = longest(&s1, s2);
}

fn longest(x: &String, y: &String) -> &String {
    if x.len() > y.len() {
        x
    } else {
        y
    }
}

This program will not compile, and Rust will give us a warning that we should introduce lifetime annotations. We can fix our program by changing the second function signature to look like this:

fn longest<'a>(x: &'a String, y: &'a String) -> &'a String {

This tells the compiler that as long as both input parameters are valid lifetimes, so is the output parameter. The important distinction here to make is that we’re not modifying lifetimes, nor are we asking the compiler to trust our judgement on how long references will be valid. Quite the opposite is true. In fact, we’re making a stronger function signature by stating that the output reference is only valid given that both input references are also valid.

Smart Pointers

To introduce the concept of smart pointer’s let’s consider writing a very simple linked list in Rust. I’ll note to my readers that the code I’m about to write is not idiomatic Rust, but rather how I would write it coming from Java. We’ll discuss idiomatic Rust practices later in this article.

If I were writing a linked list class in Java I would do it like such:

public class LinkedList {
    private ListNode front;
    ...

    private class ListNode {
        private int data;
        private ListNode next;
    }
}

Clients of our class will interface with the LinkedList which contains an inner class for a single node in our list. ListNodes are recursively defined, and this is totally ok with Java.

Let’s try to write something similar(ish) in Rust. One of the biggest differences, that is pretty much unavoidable, is that Rust has no support for a null type, so we implement it using Option.

struct Node {
    data: u64,
    next: Option<Node>
}

struct List {
    front: Option<Node>
}

Any attempt to compile this program will give an error with the message “Recursive type ‘Node’ has infinite size”. Rust is trying to understand, at compile time, how much space it needs to allocate on the stack to support our data structure. However, this recursive definition causes problems since a simple struct like the Node is going to be allocated on the stack.

So why does this work in Java? The “equivalent” Java code isn’t allocating the ListNode itself on the stack but rather the “pointer” (Java just calls everything a reference in a very zen fashion). We need to replicate this in Rust in order to know with certainty the size of our List object at compile time.

To do this we use the Box<T> class. This is a simple construct that allocates T on the heap. Since the Box<T> itself has a known size (it really just needs to reference that data on the heap) we can create a struct with a recursive definition.

Our new code looks like this:

struct Node {
    data: u64,
    next: Option<Box<Node>>
}

struct List {
    front: Option<Box<Node>>
}

This compiles… wohoo!! However, there are some common patterns we need to learn before we can effectively use our list. Let’s start by adding some functionality to our List - constructing an empty List and adding elements to it.

impl List {
    fn default() -> List {
        List {
            front: None
        }
    }

    fn add(&mut self, i: u64) {
        ...
    }
}

What should go in our add function? If we assumed that Rust works like Java we would try to write something like:

fn add(&mut self, i: u64) {
    let new_front = Node {
        data: i,
        next: self.front
    }
    self.front = new_front;
}

This would insert a new node at the front of the list that points to the current front, and then point our current value of front to the new node we created. However, this isn’t how smart pointers work. From a syntactic standpoint this is just wrong, but Rust also has a unique pattern here. The problem with our code is that we’re trying to take ownership of self.front in our new_front. Rust doesn’t like this, but it proposes a deal. We can take ownership of self.front so long as we give it another value to replace it with.

From the Rust documentation we can use the mem::replace function:

pub fn replace<T>(dest: &mut T, src: T) -> T

Moves src into the referenced dest, returning the previous dest value.

Ok awesome this is exactly what we’re looking for! Let’s add this in:

fn add(&mut self, i: u64) {
    let new_front = Node {
        data: i,
        next: mem::replace(&must self, None);
    }
    self.front = Some(Box::new(new_front));
}

The mem::replace line is not very easy to read. This is a common enough pattern that Rust provides a more readable and convenient function for us: take. To make this previous code more idiomatic, we could rewrite it as such:

fn add(&mut self, i: u64) {
    let new_front = Node {
        data: i,
        next: self.font.take();
    }
    self.front = Some(Box::new(new_front));
}

This functions much like mem::replace by replacing the given reference with it’s default value (in this case the default value of an Option is None).

Idiomatic Rust

One of the challenges of learning any new programming language is how to write good code in that language. A good example of this is pythonic code, which can take a while to adapt to for newcomers to Python. Writing idiomatic Rust code is difficult, in part because there still aren’t a lot of good resources on the topic. Looking through the idiomatic-rust page and reading the attached resources. Almost all of the links are full of in-progress work. This is great, and I’m really happy the community is developing these resources. However, it means that for now we’ll have to do some extra work in writing good Rust code. I would argue that this task is also hard since Rust presents a lot of ways to do the same task. Go, for example, is an incredible simple language (from a syntactic standpoint) and it means that most Go programs looks very similar to each other. For example, Go has no switch statements - you must write an if statement with lots of conditionals.

I can’t possibly enumerate all common Rust idioms here (I don’t even know 99% of them yet!), but I can point out some basic ones. Let’s improve our LinkedList written above by adding in some common Rust idioms.

The first thing that comes to mind is our Option<Box<Node>> code is quite verbose and difficult to parse. Rust allows for the use of the type keyword to shorten this. Let’s create a new Link type that represents this pointer.

type Link = Option<Box<Node>>;

pub struct Node {
    data: u64,
    next: Link
}

pub struct List {
    front: Link
}

...

Our List should also be generic - why limit it to just types of u64?? Converting our code is actually very easy, we can just specify a generic type T and use the <T> syntax much like we would in Java.

type Link<T> = Option<Box<Node<T>>>;

struct Node<T> {
    data: T,
    next: Link<T>
}

struct List<T> {
    front: Link<T>
}

impl<T> List<T> {
    ...
}

Now we could easily use our List struct for both u64 or String, that’s awesome!!

Closing Thoughts

I had a lot of fun learning Rust. I was a bit intimidated at first since I’ve tried to pick up Rust previously but have given up numerous times after struggling with the language’s ownership model. I usually find that after learning a few languages, such as Java and C, learning something new like Go or Python is relatively easy. It can take a while to learn how to write code in the style of that language, but very rarely do I find it outright confusing. Rust is different, however, and requires you to really dedicate some time to sit down and think about how it works. To anyone who wants to learn Rust, I highly recommend that you spend a lot of time writing Rust code and getting it wrong. Try to write a simple LinkedList without consulting Learning Rust With Entirely Too Many Linked Lists. You’ll struggle for a bit, but you’ll come away much more confident with your knowledge of ownership.

I’m excited to see where Rust goes in the coming years. The language has a lot of potential - memory safety and blazing fast execution speeds. However, I’m curious to see how it’s learning curve translates to adoption in industry. If a new developer joins your team and they’ve never written Go before but they’ve written Java you can probably get them comfortable and up to speed within a few weeks. With Rust it’s harder to say, and it might take some people longer than others to adapt to the language. And in a world where developer velocity is so critical it seems like fringe adoption of Rust, say in microservice architectures, might be slower than with a language like Go. There is also lots of room for the Rust community to grow into. The available online resources for learning Rust are really good - The Book is excellent, and the builtin documentation for the standard library is fantastic. Not to mention the compiler messages are exceedingly helpful in writing correct Rust code. However, I’ve found few resources on writing good Rust code. The language is new enough that there doesn’t seem to be an equivalent Effective Java for Rust. Certain libraries are also small and I have doubts about how long they’ll be maintained. If I were to write an X11 window manager in Rust today I would be weary of using the rust-xcb library as it’s maintainer stopped working on the project. More adoption will help with this, but there’s still lots of room for growth.