blog

Exploring the Rust programming language

In order to celebrate the recent launch of the glorious Firefox 57 (HIGHLY recommended!) I decided to read about the Rust programming language. These events were actually not related, but I like to think they were. I read the whole book on Rust from Mozilla's website, and I pretentiously wrote here everything that I found interesting about the language (both positive and negative things). * Installing the compiler on Arch linux is extremely easy, and it also comes with a package manager (called Cargo), which is really sweet * The package manager uses a really unusual file type: TOML. I don't really get this decision, and I don't think I agree with it either. What is wrong with JSON? If it is too verbose, can't we use YAML? Do I really need to learn yet ANOTHER markup language? That was the first thing I did not like * It is easy to install packages with Cargo and also create blank projects (it creates a main file and the Cargo.***toml*** [ugh] file as well to manage packages). It also offers a lock to control package versions * Building and running the project with Cargo seems very easy - it will even create the binaries in a separate directory for you * Rust is strongly typed (thank you, Mozilla!), but it also has type inference. I really like this as well * Using uninitialized variables will give you a compiler error. I think this is a great detail for security * Even though types can be inferred, function definitions need to have typed parameters - I think this is a great decision that improves readability * Some other good features are Closures and Tuples. They have their peculiarities, but overall, they are pretty much what you would expect from closures and tuples. * Modules are relatively simple and easy to understand. They are somewhat similar to what we see in Node.js, but with one important difference: modules are private for external libraries by default - this is a good safety feature. * Variables are immutable (constant) by default. Really! I wonder why it took people so long to realize that immutability is a great thing! Thank you again, Mozilla * Rust is serious about security. I mean it. They are VERY serious about security. It even has a special keyword to run code that the compiler does not consider safe: the **unsafe** keyword. ##### Rust reminds me a lot of C and C++ ```rust // yes. this is the whole file for a Hello World program fn main() { println!("Hello world"); } ``` If you are a bit familiar with C, you know what a `fn main` means. You may be asking yourself what does the `!` after `println` does, and this is another thing that caught my attention: ##### Macros I like to think that the `!` in the `println!("Hello world");` means that I am screaming at the compiler to print my string (or maybe just being overly excited about printing it) - that would be an excellent feature, but the exclamation mark actually indicates that what I am calling is not a function, but a **macro**. If you know a little bit of C, you know what a macro is, and they are very similar to the macros in Rust - the `println!` is actually a macro that gets expanded to some more verbose/complicated syntactic fluff at compile time. These macros seem to be much more powerful to what we see in C, though: they are capable of pattern matching, recursion, and repetition. The syntax, overall, is very similar to C/C++. That being said, I would like to thank Rust for two things: Thank you, Rust, for being like C. Thank you, Rust, for not being like Python. ##### Raw Pointers It seems to be a bit uncommon, but sometimes they are necessary. There are some drawbacks: there is no guarantee that the referenced memory is valid, the data maybe NULL, the memory management is manual, and there is no defined lifetime. To work with them, you will probably need to wrap the dereference in an `unsafe` block: ```rust let x = 5; let raw = &x as *const i32; let points_at = unsafe { *raw }; println!("raw points at {}", points_at); ``` The unsafe block basically tells the compiler to drop some safety checks, such as pointer dereference in this case, but also race conditions, memory leaks, and calling functions that are marked as 'unsafe'. ##### Expressions x Statements Rust is really serious about things that can return a value and things that can not. For example, this code is valid: ```rust // i32 = integer, 32 bits fn add_one(x: i32) -> i32 { x + 1 } ``` The last line of a function is what is is returned. However, notice the missing semicolon: it interprets that as an expression, which returns a value (which is a number). In this case, this works, because the expression returns a numeric value, which is returned by the function. Statements have a semicolon at the end and do not return a value: ```rust // This is NOT valid! fn add_one(x: i32) -> i32 { x + 1; } ``` This would not work because, as I said, that line does not return a value, so it does not match with our return type. We could, however, do an "early return" with the `return` keyword: ```rust fn add_one(x: i32) -> i32 { return x + 1; } ``` I really don't know how I feel about this. Ruby has a similar feature, which as far as I know, is frowned upon. On the other hand, the compiler will scream at you if the types don't match, and you also need to omit the semicolon, so I think it is not that bad. ##### Array slices Rust allows us to reference parts (or the whole) array using pointers to the memory, for example: ```rust let a = [0, 1, 2, 3, 4]; let complete = &a[..]; // A slice containing all of the elements in `a`. let middle = &a[1..4]; // A slice of `a`: only the elements `1`, `2`, and `3`. ``` I am a big fan of direct memory access, and I am happy with this feature. ##### Functional-style "if" statement I just love this. Take a look: ```rust let x = 5; let y = if x == 5 { 10 } else { 15 }; // y: i32 // Also could be written as: let y = if x == 5 { 10 } else { 15 }; // y: i32 ``` It reminds me of the *if* statements in Haskell. That snippet basically assigns 10 or 15 to *y* depending on the value of *x*. What else can I say here? It's great. ##### Python-style "for", and "loop" for infinite loops For infinite loops, instead of doing this: ```rust while true { //... } ``` We can do: ```rust loop { //... } ``` Unnecessary? Maybe. But I like the clarity that it provides. One thing I am not sure about is the *for* loop. It works like in Python: ```rust for x in 0..10 { println!("{}", x); // x: i32 } ``` I have some mixed feelings about it. On one hand, it is a great chance to make sure the compiler will always optimize loops to be blazing fast (after all, it will be very easy to compute the intervals), on the other hand, I like the freedom that regular *for* loops provide. I understand that they make programs more error prone, but, if in the case where I need to specify my own intervals, will I have to use *while* loops and scatter the assignments all over the place? Isn't that even worse? I like the *loop* loop (heh), but I am not convinced that this *for* loop is an improvement. I like the concept of using ranges, but not the idea of completely removing the old format. But overall, not a big deal. ##### Labels and lack of go-to **Good idea** - being able to label loops: ```rust 'outer: for x in 0..10 { 'inner: for y in 0..10 { if x % 2 == 0 { continue 'outer; } // Continues the loop over `x`. if y % 2 == 0 { continue 'inner; } // Continues the loop over `y`. println!("x: {}, y: {}", x, y); } } ``` **Bad idea** - removing the bogeyman. I mean, the go-to. Yes, I heard about it before: the go-to is mean and eats little children; but realistically speaking, it is a very important tool for low-level code: extra function calls are slow, branches will take extra operations, and if something happens that requires some cleanup before ending a function, a go-to can offer a very quick and easy way to jump to another section and be done with it, instead of having tons of repeated code. I understand the importance of writing safe code, and go-to is really something that should be used VERY sparsely. Still, I think it is a big mistake: it is something used a lot in kernel development, and it will be hard to convert this kind of code to Rust or obtain a similar effect. In conclusion: I am bitter. ##### Ownership Rust has a big emphasis on security, and of course, memory ownership should then be a huge focus. I like this quote from the book:
Many new users to Rust experience something we like to call ‘fighting with the borrow checker’, where the Rust compiler refuses to compile a program that the author thinks is valid. ... There is good news, however: more experienced Rust developers report that once they work with the rules of the ownership system for a period of time, they fight the borrow checker less and less.
So we do have a bit of a learning curve, but as you learn, you'll make less mistakes - well ain't that something. One example I found really interesting: ```rust let v = vec![1, 2, 3]; let v2 = v; println!("v[0] is: {}", v[0]); ``` This will give us an error. Why? Because the ownership of the vector was moved to *v2*, meaning that you can't access it from *v* anymore. Why is this good? Because suppose we have a program in C++ with two references to the same vector (v and v2), and then one part of the application deallocated v2; *v* would still be pointing to the illegal memory, and if we tried to use it, we would get a segmentation fault. This is a very common pitfall in C and C++. It also happens when we send those references as parameters: ```rust fn take(v: Vec) { // What happens here isn’t important. } let v = vec![1, 2, 3]; take(v); println!("v[0] is: {}", v[0]); // Same error. "v" was transferred to the function "take" and then went out of scope ``` There is a very simple way to deal with ownership: borrowing, as they call. This looks very similar to passing references around in C/C++: ```rust fn foo(v: &Vec) { v.push(5); } let v = vec![]; foo(&v); ``` With a few changes: if you are borrowing an object, the callee needs to live longer than the caller - makes sense, right? If you are borrowing, you need to give it back. A reference can not live longer than the object it is pointing to. Another one is: you can only have one mutable reference at a time, this means that only one function can have write permission to an object at a time. This is meant to avoid race conditions. Rust is also very picky about lifetimes. Take a look at this example: ```rust fn skip_prefix(line: &str, prefix: &str) -> &str { // ... } let line = "lang:en=Hello World!"; let lang = "en"; let v; { let p = format!("lang:{}=", lang); // -+ `p` comes into scope. v = skip_prefix(line, p.as_str()); // | } // -+ `p` goes out of scope. println!("{}", v); ``` The problem here is that *line* and *prefix* have different scopes (and therefore, different lifetimes). The compiler would give you a hard time because of this. But interestingly, you can specify the lifetime of value in the function definition: ```rust fn skip_prefix<'a, 'b>(line: &'a str, prefix: &'b str) -> &'a str { // ... } ``` This snippet is saying that the function *skip_prefix* deals with two lifetimes: *a* and *b*, where the parameter *line* has lifetime *a*, *prefix* has lifetime *b*, and the return value will have the same lifetime as the parameter *line* (lifetime *a*). Knowing this, it would be safe to say that the return value can be used in the same scope of the argument for *line*. It sounds a bit tricky, but I think this is a very nice feature. I bet it will save us from many segmentation faults. ##### No ternary operator Sounds like a big deal, but the *if* statement can perform similarly: ```rust return if i > 5 { a } else { b }; ``` That is not too bad then. ##### Structs If you are familiar with structs in C, you know what I am talking about. What I found interesting is how smart they are: they remind me a lot of objects in JavaScript. ```rust struct Point3d { x: i32, y: i32, } let x = 5; let y = 6; // Creating a Point3d and assigning x and y (getting the values from the variables) let mut point = Point3d { x, y }; // gives "point" a new "y", but "x" is the same point = Point3d { y: 1, .. point }; ``` We also have "tuple structs": ```rust struct Color(i32, i32, i32); let black = Color(0, 0, 0); ``` ##### Enums can have associated data I don't remember ever seeing this. Pretty neat stuff: ```rust enum Message { Quit, ChangeColor(i32, i32, i32), Move { x: i32, y: i32 }, Write(String), } let x: Message = Message::Move { x: 3, y: 4 }; ``` ##### Match and Patterns Rust doesn't have switches. Instead, it has an expression called "match", which evaluates patterns: ```rust let x = 5; match x { 1 => println!("one"), 2 => println!("two"), 3 => println!("three"), 4 => println!("four"), 5 => println!("five"), _ => println!("something else"), } let number = match x { 1 => "one", 2 => "two", 3 => "three", 4 => "four", 5 => "five", _ => "something else", }; ``` Since *match* is used with patterns, we can so some really interesting stuff with it: ```rust let x = 4; let y = false; match x { e @ 1...3 => println!("from 1 to 3: {}", e), 4 | 5 if y => println!("4 or 5 and y == true"), _ => println!("anything else"), } ``` We can also destructure objects: ```rust struct Point { x: i32, y: i32, } let point = Point { x: 2, y: 3 }; match point { Point { y, .. } => println!("y is {}", y), } ``` There is really a lot to talk about for matches and patterns, which is not the purpose of this post, but I hope I could get the point across. ##### Methods and Traits Rust does not have classes. Instead, we can assign methods to structs using the *impl* keywords, and simulate interfaces with *traits*: ```rust // A struct for a circle struct Circle { x: f64, y: f64, radius: f64, } // This is an interface that calls for two methods: area and is_larger trait HasArea { fn area(&self) -> f64; fn is_larger(&self, &Self) -> bool; } // Here we are implementing some methods for Circle - those methods also satisfy // the interface HasArea impl HasArea for Circle { fn area(&self) -> f64 { std::f64::consts::PI * (self.radius * self.radius) } fn is_larger(&self, other: &Self) -> bool { self.area() > other.area() } } ``` Seeing how this is done reminds me a lot of how structs work in **Go**, as well as the prototypes in JavaScript. I am not exactly sure why they made this design decision, but I like the freedom of being able to expand types as I want: ```rust // Interface that calls for implementing the method approx_equal trait ApproxEqual { fn approx_equal(&self, other: &Self) -> bool; } // Implementing the interface for the type f32 (float 32b) impl ApproxEqual for f32 { fn approx_equal(&self, other: &Self) -> bool { // Appropriate for `self` and `other` being close to 1.0. (self - other).abs() <= ::std::f32::EPSILON } } println!("{}", 1.0.approx_equal(&1.00000001)); ``` Rust does not have method overloading nor named parameters, so the way they recommend you to build objects is using what they call the Builder Pattern. When you call the associated function (static method) "new", it gives you an "incomplete" object, which you can set its values. When you are done, call the *finalize* method from this object to get the finished object: ```rust struct Circle { x: f64, y: f64, radius: f64, } impl Circle { fn area(&self) -> f64 { std::f64::consts::PI * (self.radius * self.radius) } } struct CircleBuilder { x: f64, y: f64, radius: f64, } impl CircleBuilder { fn new() -> CircleBuilder { CircleBuilder { x: 0.0, y: 0.0, radius: 1.0, } } fn x(&mut self, coordinate: f64) -> &mut CircleBuilder { self.x = coordinate; self } fn y(&mut self, coordinate: f64) -> &mut CircleBuilder { self.y = coordinate; self } fn radius(&mut self, radius: f64) -> &mut CircleBuilder { self.radius = radius; self } fn finalize(&self) -> Circle { Circle { x: self.x, y: self.y, radius: self.radius } } } fn main() { let c = CircleBuilder::new() .x(1.0) .y(2.0) .radius(2.0) .finalize(); println!("area: {}", c.area()); println!("x: {}", c.x); println!("y: {}", c.y); } ``` To implement destructors, we implement the *Drop* trait: ```rust struct Firework { strength: i32, } impl Drop for Firework { fn drop(&mut self) { println!("BOOM times {}!!!", self.strength); } } // Prints: // BOOM times 100!!! // BOOM times 1!!! fn main() { let firecracker = Firework { strength: 1 }; let tnt = Firework { strength: 100 }; } ``` I like this design. It feels new but still familiar, and a lot more flexible than regular classes. ##### Operator overload ```rust use std::ops::Add; #[derive(Debug)] struct Point { x: i32, y: i32, } impl Add for Point { type Output = Point; fn add(self, other: Point) -> Point { Point { x: self.x + other.x, y: self.y + other.y } } } fn main() { let p1 = Point { x: 1, y: 0 }; let p2 = Point { x: 2, y: 3 }; let p3 = p1 + p2; println!("{:?}", p3); } ``` I always liked this feature in C++. If not abused, it can be extremely useful. --- Overall, I really liked what I saw. It seems to me that Rust has a lot of potential: it feels modern, but it's still low-level and powerful enough to build a real operating system with it! I hope I can experiment more with it in the future, because it definitely deserves more attention.