valve's

JSON serialization in Rust, part 2

2014-08-26T09:09:00+04:00

This post is intended for people coming from high level languages, such as Ruby or JavaScript and who may be surprised with the complexity of the JSON serialization infrastructure in Rust.

This is the second part of the 2 part post that deals with decoding JSON strings into Rust values. First part is available here.

Overview

When working with JSON deserialization, we’re interested in Decodable and Decoder traits.

As with serialization, hex and base64 modules are not relevant to JSON deserialization, so we should not pay attention to them.

Deserialization

In order for a type to be decodable from JSON, it must implement Decodable trait. Almost all built-in types already implement it, so you can deserialize them with:

extern crate serialize;
use serialize::{json};

fn main(){
  let numeric_s = "3.14";
  let greeting = "\"Hello, world\"";
  let pi: f64 = json::decode(numeric_s).unwrap();
  println!("{}", pi);
  let decoded_greeting: String = json::decode(greeting).unwrap();
  println!("{}", decoded_greeting);
}
// 3.14
// Hello, world

playpen

Option

Deserializing to Option is somewhat redundant, because json::decode returns DecodeResult, which is a type alias for a regular result. That means you can pattern match on DecodeResult and handle potential failure.

Vector

// assuming in main with json in scope
let raw_json = "[42, 43, 44, 45]";
let vec: Vec<i32> = json::decode(raw_json).unwrap();
println!("{}", vec);
// [42, 43, 44, 56]

playpen

Tuple

Decoding JSON to a tuple is identical to vector, just specify the correct type:

// assuming in main with json in scope
let raw_json = "[42, 43]";
let tup: (i32, i32) = json::decode(raw_json).unwrap();
println!("{}", tup);
// (42, 43)

playpen

HashMap

extern crate serialize;
use serialize::{json};
use std::collections::HashMap;

fn main(){
  let raw_json = "{\"e\":2.71,\"pi\":3.14}";
  let map: HashMap<String, f64> = json::decode(raw_json).unwrap();
  println!("{}", map);
}
// {pi: 3.14, e: 2.71}

playpen

Array

As with serializing, Rust cannot automatically deserialize JSON string into a fixed-length array. The reason for this is the same: arrays’ type signature contain length as part of the type, but Rust currently (and most likely not until after v1.0) can’t be generic over array’s length.

let raw_json = "[42, 43]";
let arr: [i32,..2] = json::decode(raw_json).unwrap();
println!("{}", arr);
// failed to find an implementation of trait
// serialize::serialize::Decodable 
// for [i32, .. 2]

playpen

To remedy this, we will use custom decoding, as we did with custom array encoding in part 1. I’ll show an example of this below.

Structs

As with serialization, it’s possible to have Rust deserialize structs automatically for you. You will need to add the #[deriving(Decodable)] attribute to your struct:

extern crate serialize;
use serialize::{json};

fn main(){
  let raw_json = "{\"name\":\"John Doe\",\"age\":33}";
  let person: Person = json::decode(raw_json).unwrap();
  println!("{}", person);
}

#[deriving(Encodable, Decodable, Show)]
struct Person {
  name: String,
  age: int
}
// Person { name: John Doe, age: 33 }

playpen

Note that I’m using 3 deriving trait implementations for a struct: Encodable, Decodable and Show. This is to make my struct fully JSON (de)serializable and printable automatically.

Custom deserialization

This is probably the cornerstone of the JSON infrastructure. In real life you often cannot control the shape of the JSON that comes to you, so you must be able to convert arbitrary JSON strings into your objects. Luckily, Rust decoding capabilities will help us here.

Let us continue with our Person struct example and deserialize the object from a complex JSON where our object is in the data key. The example might be contrived, but it serves the demo purpose.

To make a type JSON deserializable we need to implement the Decodable trait.

extern crate serialize;
use serialize::{json, Decodable, Decoder};

fn main(){
  let raw_json = r#"{"type": "Person", "data": {"name": "John Doe", "age": 33}}"#;
  let person: Person = json::decode(raw_json).unwrap();
  println!("{}", person);
}

#[deriving(Show)]
struct Person {
  name: String,
  age: int
}

impl<S: Decoder<E>, E> Decodable<S, E> for Person {
  fn decode(decoder: &mut S) -> Result<Person, E> {
    decoder.read_struct("root", 0, |decoder| {
      decoder.read_struct_field("data", 0, |decoder| {
         Ok(Person{
          name: try!(decoder.read_struct_field("name", 0, |decoder| Decodable::decode(decoder))),
          age: try!(decoder.read_struct_field("age", 0, |decoder| Decodable::decode(decoder)))
        })
      })
    })
  }
}
// Person { name: John Doe, age: 33 }

playpen

Let’s break down the code line by line to see what’s going on here.

Line 2: We need to bring both Decodable and Decoder traits into scope. Decodable trait is for the struct to implement, to conform to the JSON deserialization interface, while the Decoder is the low level workhorse of deserialization, which tokenizes and parses the JSON string to convert it to Rust values later.

Line 5: I’m using a raw string literal to avoid escaping double quotes.

Line 6: The line where I’m decoding JSON string into an instance of Person struct. Note that I need to type-annotate the variable when decoding it.

Line 10: We no longer need to use the #[deriving(Decodable)] attribute, because we implement the Decodable trait ourselves.

Line 16: This is the Decodable implementation. It is very similar to the Encodable implementation from part 1 with the exception of S being type restricted to Decoder trait now.

Line 17: Two differences from encode method counterpart: we’re no longer accepting &self as the first argument, because decode is an associated function, rather than a method. The analogy is class methods in Ruby or static methods in Java. Second difference is the return type. It is now Result.

Line 18: This is where the parsing starts. We do actual parsing with read_* family of methods on Decoder instance. Here we’re reading the top-level struct with read_struct method. First argument is the name of the sturct(not used), second is the length (not applicable). The third argument is an instance of anonymous function (lambda). Why are the first and the second arguments not used? I think this is because the entire family of read_* methods of Encoder strives to be uniform and thus a unified set of arguments is used, even when the encoder does not need them.

You can think of the read_struct call as “opening” the top-level JSON object to be able to move inside to read actual values. The lambda is where we descend and continue with reading.

Line 19: The object we’re trying to read is in the data field, so we’re reading it on this line with read_struct_field method. This time the first argument is necessary, because it tells the decoder the actual name of the field. 3rd argument is the lambda again to descend further into the the object in the data field.

Lines 20-21: Field reading happens here. By this time the parser has reached the contents of the data object so we can now just read the fields we’re interested in one-by-one. We’re using read_struct_field again, passing it the name of the field and the index(not used). The third argument is the value of the field, correctly decoded from JSON representation. Since all primitive values in Rust already implement the Decodable trait, we can safely call Decodable::decode on them to deserialize them as the Person struct fields.

Deserializing fixed length arrays

As in part 1, let’s use this knowledge to deserialize a fixed-length array from JSON.

extern crate serialize;
use std::default::Default;
use serialize::{json, Decodable, Decoder};

static BUFFER_SIZE: uint = 4;

fn main(){
  let raw_json = "[42, 43, 44, 45]";
  let buf: Buffer<i32> = json::decode(raw_json).unwrap();
  let Buffer(arr) = buf;
  for i in arr.iter() {
    println!("{}", i);
  }
}

struct Buffer<T>([T,..BUFFER_SIZE]);

impl<S: Decoder<E>, T: Default+Copy+Decodable<S, E>, E> Decodable<S, E> for Buffer<T> {
  fn decode(decoder: &mut S) -> Result<Buffer<T>, E> {
    decoder.read_seq(|decoder, len| {
      if len != BUFFER_SIZE {
        return Err(decoder.error(format!("Expecting array of length: {}, but found {}", BUFFER_SIZE, len).as_slice()));
      }
      let mut arr: [T,..BUFFER_SIZE] = [Default::default(),..BUFFER_SIZE];
      for (i, val) in arr.mut_iter().enumerate() {
        *val = try!(decoder.read_seq_elt(i, Decodable::decode));
      }
      Ok(Buffer(arr))
    })
  }
}
// 42
// 43
// 44
// 45

playpen

Since rust will not allow to provide the implementation of a trait for a type where both the trait and the type were defined in the external crate, we need to create a tuple struct (newtype) for the array.

Overall, this implementation looks similar to the previous, but there are nuances I’d like to point out.

Line 18: Note that the implementation signature adds new T type parameter, which is the type of the array. It can be anything that implements Default+Copy+Encodable traits. Default is to be able to fill array with default values (line 24). Copy is to be able to copy the default values into the new array, while the Decodable is to be able to decode the array elements from JSON.

Lines 22-24: Here I’m checking if the array we’re about to decode contains exactly the number of elements we expect. If not, I exit early with an error.

Lines 26-28: Here I’m iterating the array, obtaining mutable references to its elements and filling them from JSON, using decoder::read_seq_elt.

Line 29: Here I’m returning the result wrapped in Buffer newtype.

deriving(Decodable)

As in part 1, let’s look at the expanded implementation of the #[deriving(Decodable)] attribute. Let’s use the Person struct example again and compile it with --pretty expanded flag:

rustc app.rs --pretty expanded

The output:

#![feature(phase)]
#![no_std]
#![feature(globs)]
#[phase(plugin, link)]
extern crate std = "std";
extern crate rt = "native";
extern crate serialize;
#[prelude_import]
use std::prelude::*;
use serialize::{Decodable};
struct Person {
    name: String,
    age: int,
}
#[automatically_derived]
impl <__D: ::serialize::Decoder<__E>, __E> ::serialize::Decodable<__D, __E>
     for Person {
    fn decode(__arg_0: &mut __D) -> ::std::result::Result<Person, __E> {
        __arg_0.read_struct("Person", 2u,
                            ref |_d|
        ::std::result::Ok(Person{name:
           match _d.read_struct_field("name",
                                      0u,
                                      ref
                                          |_d|
                                          ::serialize::Decodable::decode(_d))
               {
               Ok(__try_var)
               => __try_var,
               Err(__try_var)
               =>
               return Err(__try_var),
           },
       age:
           match _d.read_struct_field("age",
                                      1u,
                                      ref
                                          |_d|
                                          ::serialize::Decodable::decode(_d))
               {
               Ok(__try_var)
               => __try_var,
               Err(__try_var)
               =>
               return Err(__try_var),
           },}))
    }
}

The output is very similar to the manual deserialization code we saw earlier, except that compiler further expanded the try! macros into Err and Ok branches.

JSON serialization in Rust, part 1

2014-08-25T10:08:00+04:00

This post is intended for people coming from high level languages, such as Ruby or JavaScript and who may be surprised with the complexity of the JSON serialization infrastructure in Rust.

This is the first part of the 2 part post that deals with encoding Rust values into JSON. Second part will deal with converting JSON strings back into Rust values.

Overview

JSON serialization lives in the serialize crate. It contains json module where low-level implementation details live and two traits which we are interested in: Encodable and Encoder.

Please note that hex and base64 modules are not relevant to JSON serialization, so we should not pay attention to them.

Serialization

In order for a type to be JSON serializable, it must implement Encodable trait. Almost all built-in types already implement it, so you can serialize them as easily as:

extern crate serialize;
use serialize::json;
fn main(){
  let numeric = 3.14f64;
  let str = "Hello, world";
  println!("{}", json::encode(&numeric));
  println!("{}", json::encode(&str));
}
// 3.14
// "Hello, world"

playpen

Option

// assuming inside main and json in scope
let opt = Some(3.14)
println!("{}", json::encode(&opt));
let opt2: Option<f64> = None;
println!("{}", json::encode(&opt2));
// 3.14
// null

playpen

Vector

let vec = vec!(1939i, 1945);
println!("{}", json::encode(&vec));
// [1939,1945]

playpen

HashMap

let mut map = HashMap::new();
map.insert("pi", 3.14f64);
map.insert("e", 2.71);
println!("{}", json::encode(&map));
// {"e":2.71,"pi":3.14}

playpen

Array

Currently Rust cannot automatically serialize fixed sized arrays to JSON.

let numbers = [1i, 2, 3];
json::encode(&numbers);
// failed to find an implementation of trait 
// serialize::serialize::Encodable,std::io::IoError> for [int, .. 3]
// json::encode(&numbers);

Array’s type signature includes its length, but Rust can’t be generic with array’s length. So in order to serialize an array into JSON, we’ll need to use custom serialization, which I’ll explain further.

UPDATE: Thanks to Reddit user ASeriesOfTubes

It is possible to serialize an array to JSON automatically. You just need to convert it to slice.

let numbers = [1i, 2, 3];
println!("{}", json::encode(&numbers.as_slice()));
// [1,2,3]

playpen

Structs

It’s possible to have Rust automatically implement JSON serialization for your structs. You’ll need to adorn the struct with deriving(Encodable) attribute.

extern crate serialize;
use serialize::{json, Encodable};
fn main(){
  let person = Person{name: "John Doe".to_string(), age: 33};
  println!("{}", json::encode(&person));
}
#[deriving(Encodable)]
struct Person {
  name: String,
  age: int
}
// {"name":"John Doe","age":33}

playpen

Custom serialization

You will inevitably come to the point when Rust’s supplied serialization will not work for you. Luckily we have full control over the serialization process. In order to serialize your type the way you want it, you will need to implement the Encodable trait. Let’s continue with our Person struct example and change it to include a summary field.

extern crate serialize;
use serialize::{json, Encodable, Encoder};
fn main() {
  let person = Person{name: "John Doe".to_string(), age: 33,};
  println!("{}" , json::encode(&person));
}

struct Person {
  name: String,
  age: int,
}

impl <S: Encoder<E>, E> Encodable<S, E> for Person {
  fn encode(&self, encoder: &mut S) -> Result<(), E> {
    match *self {
      Person{name: ref p_name, age: ref p_age} => {
        encoder.emit_struct("Person", 0, |encoder| {
          try!(encoder.emit_struct_field( "age", 0u, |encoder| p_age.encode(encoder)));
          try!(encoder.emit_struct_field( "name", 1u, |encoder| p_name.encode(encoder)));
          try!(encoder.emit_struct_field( "summary", 2u, |encoder| {
            (format!("Nice person named {}, {} years of age", p_name, p_age)).encode(encoder)
          }));
          Ok(())
        })
      }
    }
  }
}
// {"age":33,"name":"John Doe","summary":"Nice person named John Doe, 33 years of age"}

playpen

Let us break down this code line by line to understand what’s going on.

Line 2: We need to bring both Encodable and Encoder traits into scope. Encodable trait is for the struct to implement, to conform to the JSON serialization interface, while the Encoder is the low level workhorse of serialization, which transforms primitive values into JSON bits and combines them all together.

Line 8: We no longer need to use the #[deriving(Encodable)] attribute, because we’re implementing the Encodable trait ourselves.

Line 13: We implement Encodable trait for the Person struct. Encodable full type signature is Encodable where S should be an instance of Encoder. E is a type parameter for Result, which our implementation returns.

Line 14: In order to implement the Encodable trait, we need to write the encode method, which accepts a single S argument. Remember, that S is an instance of Encoder, which is a low level JSON emitter.

Lines 15-16: We’re destructuring (decomposing) the struct to access its fields. To do that we use pattern matching and assign the person fields to p_name and p_age variables.

Line 17: This is where JSON writing begins. We call emit_struct on our encoder and pass it 3 arguments: the name of the struct, current index and an anonymous function(aka lambda). The name of the struct is not used; current index is not used too. What is important is the anonymous function that we’re passing as the 3rd argument. The emit_struct method simply writes {, calls the lambda and then writes closing }. Why are the 1st and the 2nd arguments not used? I think they are there to conform to the uniform style of encoder’s emit_* methods, but they don’t make any sense when writing a JSON object.

Lines 18-22: This is where the body of the JSON object is written. Each field is written with emit_struct_field method that accepts same 3 arguments: name, index and lambda. Name is how you want your object field to be named, index is to correctly insert comma after each field and the lambda’s job is to return correctly escaped JSON representation of the struct’s field value. Since the built-in types already implement the Encodable trait, we can safely call encode on integers and strings to encode their values into JSON.

Line 23: To indicate the successful JSON encoding, we return unit wrapped in Ok enum value of the Result.

Line 24: The line where the closing } of the object is written, because lambda finishes here.

Serializing fixed length arrays

Now armed with the knowledge to write our own implementation of Encodable, we can convert an array into JSON.

extern crate serialize;
use serialize::{json, Encodable, Encoder};
static BUFFER_SIZE: uint = 4;
fn main() {
  let buffer = Buffer([42i, 43, 44, 45]);
  println!("{}", json::encode(&buffer));
}

struct Buffer<T>([T,..BUFFER_SIZE]);


impl <S: Encoder<E>, T: Encodable<S,E>, E> Encodable<S, E> for Buffer<T>{
  fn encode(&self, encoder: &mut S) -> Result<(), E> {
    match *self {
      Buffer(ref data) => {
        let mut counter = 0u;
        encoder.emit_seq(BUFFER_SIZE, |encoder| {
          for i in data.iter() {
            try!(encoder.emit_seq_elt(counter, |encoder| i.encode(encoder)));
            counter += 1;
          }
          Ok(())
        })
      }
    }
  }
}
// [42,43,44,45]

playpen

Overall, this implementation looks similar to the previous, except we’re using the combination of emit_seq + emit_seq_elt to emit [ + elements + ]. We also keep a counter variable to correctly handle the comma.

Note that the implementation signature adds new T type parameter, which is the type of the array. It can be anything that implements Encodable trait.

deriving(Encodable)

Now you’re ready to understand what happens when you use #[deriving(Encodable)]. Let’s use the Person struct example again and compile it with --pretty expanded flag:

rustc app.rs --pretty expanded

We see the output:

#![feature(phase)]
#![no_std]
#![feature(globs)]
#[phase(plugin, link)]
extern crate std = "std";
extern crate serialize;
#[prelude_import]
use std::prelude::*;
use serialize::{Encodable};
struct Person {
    name: String,
    age: int,
}
#[automatically_derived]
impl <__S: ::serialize::Encoder<__E>, __E> ::serialize::Encodable<__S, __E>
     for Person {
    fn encode(&self, __arg_0: &mut __S) -> ::std::result::Result<(), __E> {
        match *self {
            Person { name: ref __self_0_0, age: ref __self_0_1 } =>
            __arg_0.emit_struct("Person", 2u, ref |_e| {
                                match _e.emit_struct_field("name", 0u,
                                                           ref |_e|
                                                               (*__self_0_0).encode(_e))
                                    {
                                    Ok(__try_var) => __try_var,
                                    Err(__try_var) => return Err(__try_var),
                                };
                                return _e.emit_struct_field("age", 1u,
                                                            ref |_e|
                                                                (*__self_0_1).encode(_e));
                            }),
        }
    }
}

The implementation provided by the Rust compiler is almost identical to ours, except that it further expanded the try! macros into Err and Ok branches.

Second part of this article will explain the reverse process: how to decode Rust objects from JSON string.

immutability in Ruby

2014-07-04T15:30:00+04:00

Like many other developers, I’ve been intrigued with functional programming for a long while. I remember myself reading articles promising programming heaven for those who are brave enough to go functional. I bought a used Real World Haskell on Ebay, but sadly never finished it. I then bought Scala for the Impatient, but this time had the persistence to finish the book.

All these years functional programming seemed like a holy grail, but as a true holy grail, I was afraid it was meant to stay undiscovered.

All these years I paid my bills writing Ruby-on-Rails and JavaScript code and never made the functional leap. I never became a full-time Haskell or Scala developer and probably will never become one.

But you know what? It’s possible to be slightly more functional with normal languages we’re using every day. This article will try to demonstrate several concrete examples where functional programming is useful or elegant. I will show you the old way of doing things in Ruby and the new, more functional way of doing similar things in Ruby again.

Let me start by saying that this article assumes you’re interested in functional programming. It also assumes that you’ve probably seen other examples of functional code before.

I’m going to split this article into several parts and each part will elaborate upon a specific example.

Part 1: Immutability

What is immutability? When people speak about immutability they usually mean immutable objects. Quoting from wikipedia:

an immutable object is an object whose state cannot be modified after it is created. This is in contrast to a mutable object, which can be modified after it is created.

A very simple concept with far reaching consequences.

First let’s define what ‘whose state cannot be modified’ really means. At first you may think that such an object is useless. How can we possibly use an object if we cannot change it? Usually an immutable object creates a copy of itself with desired modifications. The original object remains unchanged. You will see the examples of it further in the article.

Immutability and functional programming

Now, another foundational question: why does functional programming favor immutable values and data structures over mutable ones? Is real functional programming possible with mutable values? You probably know that functional programming is more than ‘programming with functions’. It also requires the functions to be pure. I’m not a mathematician and my explanation of pure functions may not be scientifically correct, but you can think of them simply as functions that: always accept an argument, always return a result and the computing of the result depends solely on the input. In other words, a pure function cannot depend on some other data, existing elsewhere, called state, to influence how the result is computed. The only thing that dictates how the result is computed is the function’s argument. Pure functions cannot change the external state either. This is called creating side effects.

Sometimes programmers call the external state “the world” and refer to pure functions as functions that cannot depend on “the world” and read “the world” state, nor change “the world” while making its job.

Why worry at all about the purity of functions? Composability. When your functions are pure, you can compose large programs from small functions. Knowing that a function is pure provides guarantees that it will not change the external state.

Is it possible to write a real program using only pure functions? How can you talk to the database, write to files, charge credit cards and do everything else real programs do? Functional applications are usually built using a pure core (where the bulk of the logic lives) and a thin, impure shell (that provides access to the pure core from the outside world). This way you have a large part of the code that is easy to reason about, easy to test and easy to understand.

Example of a pure function:

def sum_two_numbers(a,b)
  a + b
end

You can see that this function computes the result only using its arguments.

Example of an impure function:

def sum_two_numbers(a,b)
  logger.info("calculating sum of two numbers")
  a + b
end

This function writes to the file system in addition to computing the result. In other words, this function changes “the world” by creating side effects.

Using v2 of this function you hurt composability; you limit yourself in the ways you can use this function in other parts of your program.

Immutability and purity

Now let’s look why function purity demands immutability with a concrete example. We all know that strings in ruby are mutable. You can mutate the string with:

s = "Hello"
# mutating with '<<'
s << ", world"
# mutating with bang methods
s.upcase!
puts s
# => "HELLO, WORLD"

This code fragment modifies the string in-place, mutating it. Now let’s use the string as a function argument:

def upcase_string(input)
  input.upcase!
  input
end

This method mutates the argument and returns it. On the surface, this looks OK, but we have just inadvertently created a side effect. Any external code that relies on this string may break.

Let’s create an example of this:

def upcase_string(input)
  input.upcase!
  input
end
current_user_name = get_current_user.name
upcased_user_name = upcase_string(current_user_name)
# ...
# ...
# somewhere else still thinking that current_user_name is downcased
if current_user_name == 'admin'
  # this will never be true
  # ...
end

You see now that in order to keep function pure we should never mutate its arguments, but create new objects and return them instead. Same function, but this time implemented as pure:

def upcase_string(input)
  input.upcase
end

Just a minor modification gives us many benefits: we’re no longer modifying “the world” and only return a new string with the required modifications.

How can we guarantee that functions never mutate their arguments? By making the arguments immutable, of course!

The key thing to take away here is that by making each object immutable, we can guarantee that functions do not create side effects and remain pure.

Hopefully, by now I have convinced you that immutable objects are useful. Now you probably understand that by limiting the “reach” of the function to only the local function’s scope you automatically decrease the number of potential bugs and unpleasant surprises. However, you may still be unsure about the performance of immutable objects, and think that it is wasteful to create a copy of an object each time it needs to be modified. The following part of the article will hopefully make everything clear.

Immutability and primitives

Let’s define what primitives are. For our purposes, we can refer to primitives as data types, that serve as basic building blocks of the language. Usually the primitives are directly supported by the language. Ints, floats, characters and booleans are primitives and are usually treated in a special way by languages.

You don’t need to do something like:

# in fact you can't do this in Ruby
num = Integer.new(99)

You can use primitives directly:

num = 99
fnum = 3.14

Why does a language usually divide objects into, well, objects and primitives? The reason is performance. Primitives are closer to computer hardware and creating an object for every number is slow.

However, Ruby does not have true primitives, because in Ruby, everything is an object. You can call methods and properties on numbers and extend them with user-defined methods. I will still call them primitives, because it’s what they are on a conceptual level.

On one hand, primitives behave like immutable objects in Ruby:

i = 99
puts i.object_id
# => 7
i += 1
puts i.object_id
# => 12

This snippet demonstrates that you cannot modify a number. In real life this doesn’t make sense either, if you have the number 4 it’s the number 4 — eternal and beautiful. If you add 1 to it, you get completely different number 5, the old 4 stays the same.

On the other hand, you can define your own methods and properties:

class TrueClass
  attr_accessor :name
end
true.name = "one"
false.name = "two"

Integers and floats are frozen by default, while booleans are not.

1.frozen? # true
3.14.frozen? # true
true.frozen? # false

So while some primitives are not frozen, Ruby does not provide mutation methods for them and they usually can be treated as immutable objects. You should remember that this is easily overridden (as is everything in Ruby) and can cause potential problems.

Strings

Before diving into the specifics of Ruby strings, let’s talk about string mutability in general. In most languages strings are immutable: string concatenation or upcasing produces a new string rather than modifying it in-place.

Why do language designers usually make their string implementations immutable? To answer that we need to remember that strings are one of the most used data structures in any programming language.

Let’s consider the cases when string immutability is useful.

Concurrency.

This is a complex topic and I will talk about it later in the article. What you should know at this point is that when any data structure is immutable, it can be freely shared across threads without any locking or synchronization. Immutable data structures don’t need synchronisation at all when used in multithreaded environments.

Modern programming languages are designed from the ground up to be concurrent (go, rust), and having a single string instance to be shared across multiple threads helps to save a lot of memory and avoid the necessity of defensive copying when passing immutable strings around.

Hash table keys

More often than other data types, strings are used as keys in hash tables. This usage demands for strings to return the same hash code after the key and value were added to the hash table. With mutable strings a hash table would need to copy the string in order to guarantee the hash code staying the same. With immutable strings this is not needed.

Security

As I’ve mentioned, strings are used very frequently in any program. This entails a special treatment in terms of security. Strings are used when comparing user-names and passwords, storing credit card numbers and much more. Immutable strings guarantee that a malicious party is unable to tamper with the string after creation.

However, there is a performance downside of immutable strings. Mutable strings allow fast indexing and modifying in-place, as with regular arrays.

String immutability and Ruby

As with primitives, Ruby has no real immutable strings. To be precise, Ruby strings are mutable behind an immutable facade. That is, most operations on strings return new strings, while some of them still allow in-place modification.

Consider these examples:

# immutable operations
s = "hello"
puts s.object_id # 70093095097920
s += ", world"
puts s.object_id # 70093096228400
s = s.upcase
puts s.object_id # 70093096177000

# mutable operations
s = "hello"
puts s.object_id # 70093096113460
s << ", world"
puts s.object_id # 70093096113460
s = s.upcase!
puts s.object_id # 70093096113460

As you can see mutable operations do not create new strings but rather modify existing strings in-place.

Strings as hash keys

Earlier I mentioned that mutable strings do not make good hash keys. Let me prove this:

bad_key = "hal9000"
h = {bad_key => "Odyssey"}
h[bad_key] # "Odyssey"
bad_key << "!"
h[bad_key] # nil

After I modified the string key we can no longer find the value, because the key’s hashcode has changed! Since it’s so easy to mutate the Ruby string, you can end up with a useless hash. This is why it is not recommended to use mutable strings as hash keys.

How can we remedy it? The first option is to freeze the string:

better_key = "hall9000".freeze
h = {better_key => "Odyssey"}
better_key << "!"
# RuntimeError: can't modify frozen String

A second and better option is to use symbols, which are immutable versions of strings often used as identifiers.

best_key = :hal9000
h = {best_key => "Odyssey"}
best_key << :a
NoMethodError: undefined method `<<' for :hal9000:Symbol

When using literal symbols as hash keys, Ruby provides a shorter syntax:

h = {hal9000: "Odyssey"}
# hal9000: gets converted to :hal9000 =>

You might say at this point, “Why can’t I just use symbols instead of strings if they’re immutable equivalents?”. The short answer is you may not be able to, depending on your use case. One reason is that symbols don’t have immutable equivalents of string’s many methods, so it’s inconvenient to use symbols as an immutable replacement. Just compare the number of methods in Symbol and String to see the difference.

Immutable data structures

So far my discussion was around built-in data types and their relationships with immutability. Real-life applications, however, require using data structures in order to be efficient.

What is a data structure?

It’s a complex question, but you can think of it as a way to organize other, simpler data structures in a convenient or efficient way. Some data structures are designed for ease of use, while others are built solely with efficiency in mind.

We all know about lists, queues, hash tables, arrays, trees and many, many more. These data structres can have both mutable and immutable implementations.

Mutable implementations are considered ‘classic’, because they are more widely used, have been around for longer and generally are easier to implement. Immutable counterparts offer advantages in concurrency and security.

While some people use ‘immutable’ and ‘persistent’ interchangeably, they are not the same. Persistent data structure is immutable and keeps and reuses large parts of itself while constructing an immutable copy. As an example, you can think of a persistent linked list that reuses its tail when appending a new node. If you’re interested in functional, persistent data structures, have a look at Purely functional data structures by Chris Okasaki.

Let me also add that many modern programming languages that focus on concurrency have their data structures implemented in an immutable fashion: Scala offers both immutable and mutable collections. Clojure and C# offer immutable collections as well.

Let’s go ahead and implement a classic, mutable stack in Ruby and then reimplement it as immutable. A stack is a data structure that follows this interface:

self push(item)
self pop()
item peek()
bool empty?

Here is mutable implementation that uses a Ruby array as a backing store:

class MutableStack
  def initialize
    @store = []
  end

  def push(item)
    @store.push(item)
    self
  end

  def pop
    @store.pop
    self
  end

  def peek
    @store[-1]
  end

  def empty?
    @store.empty?
  end
end

This implementation is basically a thin wrapper around array. Whenever you call stack.push(item), you’re modifying this array in-place. This implementation possesses all the weaknesses that we discussed previously.

Now to an immutable implementation:

class ImmutableStack
  class EmptyStack
    def empty?
      true
    end

    def push(item)
      ImmutableStack.new(item, self)
    end

    def pop
      raise 'Cannot pop empty stack'
    end

    def peek
      raise 'Cannot peek empty stack'
    end
  end

  def self.empty
    EmptyStack.new
  end

  def initialize(head, tail)
    @head = head
    @tail = tail
  end

  attr_reader :head, :tail

  def peek
    head
  end

  def push(item)
    ImmutableStack.new(item, self)
  end

  def pop
    tail
  end

  def empty?
    false
  end
end

Usage pattern:

s = ImmutableStack.empty
s = s.push(99)
s = s.push(100)
puts s.peek # 100
s = s.pop
puts s.peek # 99
s.peek # Cannot peek empty stack (RuntimeError)

Each destructive operation does not mutate the stack but rather returns a copy of itself with the required modifications. What’s more, it reuses a large portion of itself while doing so, thus making this stack a persistent data structure.

Unfortunately, Ruby does not allow you to directly create private constructors and users can potentially call

ImmutableStack.new(1, ImmutableStack.empty)

If you want a good ruby library of immutable collections, I suggest using hamster.

Immutable data structures and multithreading

When writing a multi-threaded applications, follow these rules:

Avoid sharing data across threads.
If you have to share your data across threads, make this data immutable.
If you can’t avoid sharing mutable data, synchronize access to that data with synchronization constructs, such as Mutex.

In our two stack implementations it is safe to share an immutable version across multiple threads, because they will not be able to modify it in place. Whenever a thread makes a push or a pop, a new instance of the stack is created and returned so that the existing instance is never changed.

Conclusion

Now that you’ve read the article, you might have the impression that immutability is a silver bullet. It is not. It is only one possible way to design software and has its own strengths and weaknesses. Immutability let’s you design your functions and data structures in a new way, gaining much and losing much too. We’ve all been living in a world where the sequential computation was the de-facto standard. In the past immutability was not worth it. For a single core computer, immutability has too much overhead. You must carefully control the state and pay close attention to reusing and copying in order to be efficient. The performance impact that some of the immutable data structures incur can be too significant.

But the world is changing and the sequential model is disappearing. We all have smartphones with 2 or 4 cores. Our smart watches will have 8 cores in a couple of years, which means that concurrent will become the new sequential. If we want to exploit the power of modern hardware, we need to embrace the concurrent way of doing things. This is where immutability advantages outweigh the bad parts. I think that immutability is the way you should design your software now in order to be prepared for the concurrent future.

Introduction to full text search for Rails developers

2014-02-22T15:05:00+04:00

Every developer has heard of full-text search. However, most developers search with SQL and relational databases.

Almost every developer knows deep inside that full-text search is better suited for searching text, but continues to use old LIKE '%?%' queries.

I was one of those developers who never used full-text search, but I have changed and I invite others to join me and discover the other side of search with Solr.

This article assumes you’re comfortable with Ruby, Rails and PostgreSQL. I’ll build a simple people near me application using Solr in small incremental steps and hopefully help readers to overcome the feeling of uncomfortable uneasiness when thinking about full text search technology.

Disclaimer: My goal here is to familiarize the reader with full-text search, not create an ideal rails application structure. I’ll be using long views and JavaScript inside ERB templates. The point is to make a small but complete application in a single article and it is possible to do so only by keeping it really simple.

OK, enough talk, let’s build the app!

Let’s call this app Neibo:

personal  ruby -v
ruby 2.0.0p353 (2013-11-22 revision 43784) [x86_64-darwin13.0.0]
personal  rails -v
Rails 4.0.2
personal  rails new neibo
      create
      create  README.rdoc
      create  Rakefile
      create  config.ru
      create  .gitignore
      create  Gemfile
      create  app
      ...

Let’s remove the sqlite3, turbolinks, coffee-rails, jbuilder and jquery-rails gems as we will not need them. We should also add the pg gem to talk to Postgres DB.

My Gemfile is now:

source 'https://rubygems.org'

gem 'rails', '4.0.2'
gem 'pg'
gem 'sass-rails', '~> 4.0.0'
gem 'uglifier', '>= 1.3.0'

Now you need to set the pg connection in config/database.yml file:

# config/database.yml
common: &common
  adapter: postgresql
  encoding: unicode
  pool: 5
  timeout: 5000
  min_messages: warning

development:
  <<: *common
  database: neibo

test:
  <<: *common
  database: neibo_test

production:
  <<: *common
  database: neibo_production

Then let’s create a Person model and generate a migration for it:

# app/models/person.rb
class Person < ActiveRecord::Base
end

neibo  rails g migration create_people
    invoke  active_record
    create    db/migrate/20140222113048_create_people.rb

# db/migrate/20140222113048_create_people.rb
class CreatePeople < ActiveRecord::Migration
  def change
    create_table :people do |t|
      t.string :name, null: false
      t.text :about, null: false
      t.string :likes
      t.string :dislikes
      t.float :lat, null: true
      t.float :lon, null: true

      t.timestamps
    end
  end
end

For every person we store a name, an about – this is where a person can tell the world about himself, likes – things person likes and dislikes. We also want to store a person’s location, so that other people can locate him within certain radius.

We store a location using two floating point numbers, lat – for latitude, and lon – for longitude. It’s possible to use a specialized Point data type, but I want to keep it simple here.

I make lat & lon attributes nullable in case a user denies the browser geolocation permission.

Let’s create the databases and run the migration.

neibo  rake db:create
 neibo  rake db:migrate
==  CreatePeople: migrating ===================================================
-- create_table(:people)
   -> 0.0080s
==  CreatePeople: migrated (0.0080s) ==========================================

We now need to create a controller, a route and a view:

# app/controllers/people_controller.rb
class PeopleController < ApplicationController
  def index
  end
end

# config/routes.rb
Neibo::Application.routes.draw do
  root to: 'people#index'
  resources :people
end

app/views/people/index.html.erb

<% if flash[:alert].present? %>
  <%= flash[:alert] %>
<% end %>
<% if current_user %>
  Hello, <%= current_user.name %>

  Search people near you: 
  <%= form_tag people_path, method: :get do %>

      <%= text_field_tag :search, params[:search],
      placeholder: 'Search nearby people', type: :search, style: 'width: 400px' %>
      Search within mile radius:
        <%= select_tag :radius, options_for_select([10, 50, 100]) %>
      
    

<%= submit_tag 'Search' %>

  <% end %>
<% else %>
  Hello, guest
  
    Fill your profile so that people could find you.
    Allow browser to access your location if you want to be found by people near you.
  

  <%= form_for @new_person do |f| %>
 <%= f.text_field :name, placeholder: 'Enter your name' %> 

 <%= f.text_area :about, placeholder: 'Tell about yourself' %> 

 <%= f.text_field :likes, placeholder: 'Stuff you like' %> 

 <%= f.text_field :dislikes, placeholder: 'Stuff you don\'t ' %> 

 <%= f.submit 'Save' %> 

  <% end %>
<% end %>

This UI has two parts: if a user has already filled his details, he can use the search form and search for people nearby. If this is a new user, he fills his details, optionally allows a browser to get his location and saves his profile in the database.

We now need to modify the app/assets/javascripts/application.js and remove the files we’re not using. I remove them all and leave the application.js empty.

The view code checks the current_user method to see if the current user profile has been filled. Let’s create this method:

# app/controllers/application_controller.rb
class ApplicationController < ActionController::Base
  protect_from_forgery with: :exception
  helper_method :current_user

  private

  def current_user
    @current_user ||= Person.find(session[:current_user_id]) if session[:current_user_id]
  end
end

I’ll be storing current user’s id in session and get the user object from the database.

Let’s concentrate on the new user scenario. In order for the application to learn the user’s location, we need to grab it from the browser and save.

Adding the code to the view:

  Hello, guest
  
    Fill your profile so that people could find you.
    Allow browser to access your location if you want to be found by people near you.
  
  <%= form_for @new_person do |f| %>
     <%= f.text_field :name, placeholder: 'Enter your name' %> 

     <%= f.text_area :about, placeholder: 'Tell about yourself' %> 

     <%= f.text_field :likes, placeholder: 'Stuff you like' %> 

     <%= f.text_field :dislikes, placeholder: 'Stuff you don\'t ' %> 

    <%= f.hidden_field :lat %>
    <%= f.hidden_field :lon %>
     <%= f.submit 'Save' %> 

  <% end %>

The lines we’re interested in are:

    <%= f.hidden_field :lat %>
    <%= f.hidden_field :lon %>

And the JavaScript:

  <script>
    navigator.geolocation.getCurrentPosition(function(position) {
      document.querySelector('#person_lat').value = position.coords.latitude;
      document.querySelector('#person_lon').value = position.coords.longitude;
    });
  </script>

When a view loads, JavaScripts asks a user for permission to get his location. If the user agrees, the callback is invoked and the location is saved in the hidden fields so that the form can submit them back to the server.

Now the controller part:

# app/controllers/people_controller.rb
def index
  @new_person = Person.new
end

def create
  @new_person = Person.new(person_params)
  if @new_person.save
    session[:current_user_id] = @new_person.id
    redirect_to people_path
  else
    flash.now.alert = 'Please fill your profile'
    render :index
  end
end

private

def person_params
  params.require(:person).permit(:name, :about, :likes, :dislikes, :lat, :lon)
end

Here we have boilerplate ruby code, we’re using strong parameters to only allow a known set of attributes. We then try to create a user and save the new user ID in the session. This way current_user helper method will retrieve the current user from the database. If the validation fails, we just display the message about it and render the view again.

Let’s add those validations:

# app/models/person.rb
class Person < ActiveRecord::Base
  validates :name, :about, :likes, :dislikes, presence: true
end

Now when we go back to the browser and reload the page we can enter the profile data, allow the browser to get our geolocation and click save.

At this point we introduced ourselves to the system and current_user.id is stored in the encrypted cookie.

Next part is where the fun starts: we need to be able to search for other users nearby. We should allow limiting the search radius, specify the search term and see the results.

We must remove people who have our search term in their dislikes attribute. For example if a person dislikes Chinese couisine, and we’re searching for people who like it,… you get the idea.

Let’s take a little detour and speak about Solr and the gems that enable it in Rails. We’ll be using sunspot – an excellent gem that adds a nice DSL (really, it’s nice) on top of rsolr.

At this point you might be asking: “Wait! What’s RSolr? I’m now totally confused between Solr, RSolr and Sunspot and how they relate to each other”. I totally understand your confusion. Let’s break this mess into pieces:

Solr – a Java server that runs as a separate service and communicates via XML over HTTP API. It is generally considered a robust and full-featured
yet hard to learn full-text search solution. The only way you can communicate directly with Solr from a Rails application is to send rather cryptic XML requests.
Nobody wants to mess with raw XML over HTTP, so here enters RSolr – a wrapper around the Solr HTTP API that allows interacting with Solr from Ruby.
However RSolr is still rather low-level and does not provide any DSL or convenience methods to define which Rails models should be searchable and how the indexes will be updated. The need for a new library was apparent, so the Sunspot was born. A really nice DSL that integrates directly into ActiveRecord models and allows specifying which attributes we need to index, as well as how to transform and query the data.

Now you’re saying: “I still don’t understand, if Solr is a Java service it means I need to install and configure it on my system? That’s a horrible perspective, get me out of this!”. Absolutely not. The Sunspot gem is bundled with a development version of Solr and has a nice set of rake tasks to manage it. You can start, stop, reindex the data, all using rake tasks. There is no need to install Solr manually, all you need is to add two gems:

sunspot_solr and sunspot_rails.

sunspot_solr is the pre-packaged development version of Solr and sunspot_rails is the Sunspot gem itself. So you need to make sure you place the sunspot_solr into :development group in your Gemfile.

Now you can start bundled Solr with rake sunspot:solr:start, stop it with rake sunspot:solr:stop and reindex all data with rake sunspot:reindex.

OK, now that confusion is hopefully out of the way, let’s continue with our person search scenario.

Let us define the searchable attributes on our Person model:

class Person < ActiveRecord::Base
  validates :name, :about, :likes, :dislikes, presence: true

  searchable do
    text :name, boost: 5.0
    text :about, :likes
    latlon(:location) { Sunspot::Util::Coordinates.new(lat, lon) }
  end
end

Let’s break it down piece by piece:

searchable block is a place where you define the full-text indexing behavior. Inside this block you can specify various rules describing which attributes should be indexed, their pre-index transformations, facets, filters and so on.
text :name – A person should be searchable by his name.
boost: 5.0 – boost option tells Solr to prioritize the results found by this particular attribute. If you’re searching for John Doe, all the people with this name will come first, and only after those who dislike John Does.
text :about, :likes – Person should be searchable by these attributes.
latlon(:location) { Sunspot::Util::Coordinates.new(lat, lon) } – create a geo-spatial index on person’s location using lat and lon attributes. This will allow searching for people within a certain mile radius.

Great, wasn’t that simple? We’ve defined a set of searchable attributes on a Person model. Now we’re ready to actually search for people.

Let us add a _person partial where search result item will be displayed:

<%# app/views/people/_person.html.erb %>

 <%= person.name %> 
 About:
 <%= person.about %>
 Likes:
 <%= person.likes %>
 Dislikes:
 <%= person.dislikes %>

We also need to add the iteration to the index view:

<%# app/views/people/index.html.erb %>
<% @people.each do |person| %>
  <%= render partial: 'person', locals: {person: person} %>
<% end %>

So the view is ready, let’s modify the controller code:

# app/controllers/people_controller.rb
def index
  if current_user
    if params[:search].present? || params[:radius].present?
      search = Person.search do
        fulltext params[:search]
        if current_user.has_location?
          with(:location).in_radius(current_user.lat, current_user.lon, params[:radius])
        end
      end
      @people = search.results
    else
      @people = []
    end
  else
    @new_person = Person.new
  end
end

On line 3 we check if current user is saved, on line 4 we verify we have something to search by, either a search term or a radius. Then on lines 5 - 10 is where the actual full-text search happens. We use a Model.search method and pass it a block. Inside this block we need to specify the logic of the search. In our case we call the fulltext method and pass it our search term.

Let me be clear, we have two phases: indexing and searching. Indexing is defined inside a model in a searchable block. You use text method to specify which attributes should be full-text searchable.

Search by calling Model.search method and passing it a block too. But this time we call fulltext method to actually do full-text search on indexed attributes.

OK, we now understand how to do full-text search on text attributes, we’re already doing it on name, about and likes attributes. What we also need is a way to restrict the results to a certain radius on a map. This is what lines 7 - 9 are for.

In our application it’s possible for a user to deny geolocation permissions and his profile to be saved without coordinates. So, we need a convenience method to see if the current user has a location:

# app/models/person.rb
def has_location?
  lat && lon
end

This method is useful in Person.search block where we specify the search radius:

# app/controllers/people_controller.rb
if current_user.has_location?
  with(:location).in_radius(current_user.lat, current_user.lon, params[:radius])
end

We’re using the current user’s lat & lon attributes and the radius from params to perform the filtering. You should remember to convert miles to kilometers, because Sunspot operates on kilometers.

OK, first version of the people search is ready to try, let’s run it.

Works fine, but when I search for someone within 10 mile radius, I find myself too. There should be a way to search for other people, excluding myself. Let’s fix it.

Sunspot allows using attributes as filters. For this we should call methods like integer, string, datetime etc. In this case we need to search for all people except those with :id equal to the :id of current user. We also need to filter out the people with dislikes equal to the search term:

# app/models/person.rb
searchable do
  text :name, boost: 5.0
  text :about, :likes
  integer (:id)
  string(:dislikes)
  latlon(:location) { Sunspot::Util::Coordinates.new(lat, lon) }
end

On line 5 we’re creating an indexed filter on :id column, and on the next line a filter on :dislikes column.

Now the filtering itself:

# app/controllers/people_controller.rb
def index
  if current_user
    if params[:search].present? || params[:radius].present?
      search = Person.search do
        without(:id, current_user.id)
        without(:dislikes, params[:search]) if params[:search].present?
        fulltext params[:search]
        if current_user.has_location?
          with(:location).in_radius(current_user.lat, current_user.lon, params[:radius])
        end
      end
      @people = search.results
    else
      @people = []
    end
  else
    @new_person = Person.new
  end
end

On line 6 we’re filtering out people with :id equal to current user’s id. On line 7 we’re filtering out people who dislike stuff I’m searching for.

What does Person.search return? It’s a special Sunspot object that has a results method. So to grab actual active record items, we use @people = search.results code.

Finally we have all pieces of the puzzle. If we run the app now we should be able to save current user’s profile and then go search for other people.

In this article I’ve barely scratched the surface of the Solr & Sunspot capabilities. You should definitely look for more in the documentation if you want to create a full-featured application.

But why should I use fulltext search if I can do everything in SQL?

You’re right, except you can’t. Full text search is a huge topic with a huge set of capabilities. It can do synonym search, wildcard search, stemming and a lot, lot more.

Solr can be as intelligent as to perform word decomposition during a search, operate on word parts and generally behave as a human (almost).

Full-text search is faster too. How much faster? This is a tricky question, because it all depends on the indexed data, but one can safely assume it can be at least several times faster than equivalent SQL searching. For complex searches Solr can be orders of magnitude faster than SQL.

How is new data indexed?

Sunspot handles it for you. It registers a set of hooks that trigger the automatic indexing of updated and new records. If you look into rails log, you’ll see something like:

SOLR Request (455.4ms)  [ path=update parameters={} ]
   (1.1ms)  COMMIT
Redirected to http://localhost:3000/people
  SOLR Request (60.9ms)  [ path=update parameters={} ]

How do I test it?

You should generally avoid touching Solr in unit tests. Either design your tests to avoid talking to Solr in unit tests, or just stub Solr to return pre-canned results.

As for integration tests, indexing data before running them worked best for me. I first prepare some test data, then I reindex it with: rake sunspot:reindex and then run the integration tests.

If you find the topic of testing interesting, drop me a line, I’ll cover it in the next article.

Code

https://github.com/Valve/neibo

Well, I hope the explanation wasn’t too packed, share your ideas in the comments :)

Constant resolution in Ruby

2013-10-26T11:47:00+04:00

Ruby constant resolution has always been somewhat confusing to me. In this article I’m going to demistify it for myself and hopefully help other readers.

What is a constant?

Ruby constant is anything that starts with a capital.

PI = 3.1415

MINUTES_IN_ONE_HOUR = 60

LOOK_MA = "I'm a constant!"

module A
end

class Person
end

module Screen::Widget::Button
end

Yes, regular ALL_CAPITAL are constants, module and class names are constants too.

How Ruby searches constants.

When Ruby tries to resolve a constant, it starts looking in current lexical scope by searching the current module or class. If it can’t find it there, it searches the enclosing scope and so on.

It’s easy to see the lexical scopes search chain with Module::nesting method:

module A
  A_CONSTANT = 'I am defined in module A'
  module B
    module C
      def self.inspect_nesting

        puts Module.nesting.inspect
        puts A_CONSTANT
      end
    end
  end
end

A::B::C.inspect_nesting
# => [A::B::C, A::B, A]
# => I am defined in module A

Module::nesting returns an array of searcheable lexical scopes, starting from current. In above case the search for A_CONSTANT starts from module C, then goes to enclosing scope – module B, and then to module A where it finally finds it.

Nesting modules using alternative syntax

You’ve probably seen the alternative way of defining the enclosing modules:

module Screen
  DEFAULT_RESOLUTION = [1024, 768]
  module Widgets
    module MacOS
    end
  end
end

# Alternative syntax

module Screen::Widgets::MacOS::Button
  def self.inspect_nesting
    puts Module.nesting.inspect
    puts DEFAULT_RESOLUTION
  end
end

Screen::Widgets::MacOS::Button.inspect_nesting
# => [Screen::Widgets::MacOS::Button]
NameError: uninitialized constant Screen::Widgets::MacOS::Button::DEFAULT_RESOLUTION
  from (irb):26:in `inspect_nesting'
  from (irb):29

See the difference? Constant resolution only uses the innermost module for searching, ignoring the enclosing scopes. By defining the modules with this shorter syntax you lose the ability to search for constants in enclosing scopes.

Inheritance

Enclosing scopes is the first place where Ruby searches the constants. Second place is the inheritance hierarchy. Consider this code:

class Person
  DRIVING_LICENSE_AGE = 18
end

class BusDriver < Person
  def can_drive_from
    DRIVING_LICENSE_AGE
  end
end

bus_driver = BusDriver.new
puts bus_driver.can_drive_from

# => 18

Mixins

Ruby can mixin modules into classes as an alternative to inheritance. When a class mixes in a module, this module inserts itself between the class being mixed in and the parent class in the inheritance hierarchy. The simple way to see this is using ancestors method.

module Insurable
  LIFE_INSURANCE_AMOUNT = 150_000
end

class Person
  DRIVING_LICENSE_AGE = 18
end

class BusDriver < Person
  include Insurable
  def can_drive_from
    "Can drive from #{DRIVING_LICENSE_AGE}, with life insurance of $#{LIFE_INSURANCE_AMOUNT}"
  end
end

puts BusDriver.ancestors.inspect
puts BusDriver.new.can_drive_from

# => [BusDriver, Insurable, Person, Object, Kernel, BasicObject]
# => Can drive from 18, with life insurance of $150000

What’s going on here? We’ve defined a base class Person, a child class BusDriver that inherits from Person. We also defined a Insurable module which we mixed into our BusDriver class. When we call the ancestors class method, we see the BusDriver class first, then Insurable module which was wedged between BusDriver and Person. Then goes the Person class, then, obviously, Object. This is all nice and clear.

But why do we see Kernel between Object and BasicObject? This is because Kernel is a module that is mixed into Object thus inserting itself into the inheritance hierarchy. This ancestors array is how the name resolution works throughout the inheritance chain.

Full search path

Now that you’ve seen the inheritance part of the name search, you can see the full picture:

# searching from left to right
full_path = [Module.nesting + Module.ancestors].uniq

const_missing method

When Ruby has finished searching the constants up the nesting and ancestors chain and didn’t find it, it gives the calling code the last chance by calling the const_missing method.

module Person
  def self.const_missing(name)
    puts "Oh me oh my, can't find the constant: #{name}"
  end
end

Person::LOL
# => Oh me oh my, can't find the constant: LOL

NameError

This error is called when Ruby can’t find the constant and there is no const_missing method defined.

Object::BLASTER

# => NameError: uninitialized constant BLASTER
  from (irb):8

Word about autoloading

Let’s say you’d like to be flexible about your constants and load them automatically, following some naming convention? Turns out, there is a way, it’s called autoloading.

If we were to implememt autoloading from scratch, it would be something like this:

def Object.const_missing(name)
  @looked_for ||= {}
  str_name = name.to_s
  raise "Class not found: #{name}" if @looked_for[str_name]
  @looked_for[str_name] = 1
  file = str_name.downcase
  require file
  klass = const_get(name)
  return klass if klass
  raise "Class not found: #{name}"
end

Turns out, we don’t have to, because autoloading is built into Ruby. We have Kernel#autoload, Module#autoload and more sophisticated ActiveSupport::Autoload. I’m not going to cover these topics here but will try to do it in a future post.

Ambiguity

Here comes the tricky part: what if you have multiple constants with the same name? Consider this example:

module Insurable
  LIFE_INSURANCE_AMOUNT = 150_000
end

class Person
  LIFE_INSURANCE_AMOUNT = 50_000
end

class Pilot < Person
  INSURANCE_AMOUNT = 300_000
  include Insurable
end

puts Pilot::INSURANCE_AMOUNT
puts Pilot::LIFE_INSURANCE_AMOUNT

# => 300_000
# => 150_000

Results might seem strange at first, but please remember the full search path:

[Module.nesting + Module.ancestors].uniq

First comes the lexical scope searching and only after the inheritance chain, where mixins are inserted between child and parent classes. Also, when Ruby finds a constant with a given name, it stops looking further.

jQuery inputmask plugin + AngularJS

2013-08-01T13:09:00+04:00

The plugin I was going to use was jquery inputmask by Robin Herbots.

I was working on a small angularjs application and needed a text input masking to force users to enter their telephone numbers in a certain format. “So, what’s the problem?” – I thought.

I added the neccessary code:

 ng-submit="saveApplication()">
   type="text" ng-model="application.phone" id="phone"/>
  ...

// ....
// phone mask for Russia 
$('#phone').inputmask('+7(999)999-99-99');

// My controller
...
$scope.saveApplication = function(){
  phone = application.phone;
  ... saving phone here
}

… only to find out that $scope.application was undefined and no phone value never made it to controller.

I decided to use any angular-specific masking library and found one, but it was still buggy and couldn’t handle my simple +7(999)999-99-99 mask.

Exasperated, I tried several input masking plugins for jQuery, but none of them worked with angular.

Then I stumbled upon this and this and understood:

One must wrap a jQuery plugin into a directive in order for the plugin to work.

Easy-peasy, let’s do it.

Having prepared myself with the documentation on the subject, I wrote version #1:

... assuming I have a module ngApp

ngApp.directive('inputMask', function(){
  return {
    restrict: 'A',
    link: function(scope, el, attrs){
      $(el).inputmask(scope.$eval(attrs.inputMask));
    }
  };
});

And the html:

 input-mask="{mask: '+7(999)999-99-99'}" ng-model="application.phone"/>

Refreshing the browser, I saw the working mask, but the $scope.application was still undefined after filling in the phone, i.e mask was correctly initialized but the value never propagated to the controller. So I added some jQuery into the directive:

... assuming I have a module ngApp

ngApp.directive('inputMask', function(){
  return {
    restrict: 'A',
    link: function(scope, el, attrs){
      $(el).inputmask(scope.$eval(attrs.inputMask));
      $(el).on('change', function(e){
        scope.application == scope.application || {}
        scope.application.phone = $(e.target).val();
      });
    }
  };
});

How do I get rid of hardcoded bindings? We can access the value of ng-model attribute and set the appropriate scope value:

... assuming I have a module ngApp

ngApp.directive('inputMask', function(){
  return {
    restrict: 'A',
    link: function(scope, el, attrs){
      $(el).inputmask(scope.$eval(attrs.inputMask));
      $(el).on('change', function(){
        scope.$eval(attrs.ngModel + "='" + el.val() + "'");
        // or scope[attrs.ngModel] = el.val() if your expression doesn't contain dot.
      });
    }
  };
});

I achieved what I wanted, but I’m still not quite satisfied because of the nudging feeling that this code could still be made more idiomatic and robust.

anonymous browser fingerprinting

2013-07-14T12:29:00+04:00

What is fingerprinting?

Fingerprinting is a technique, outlined in the research by Electronic Frontier Foundation, of anonymously identifying a web browser with accuracy of up to 94%.

Browser is queried its agent string, screen color depth, language, installed plugins with supported mime types, timezone offset and other capabilities, such as local storage and session storage. Then these values are passed through a hashing function to produce a fingerprint that gives weak guarantees of uniqueness.

No cookies are stored to identify a browser.

It’s worth noting that a mobile share of browsers is much more uniform, so fingerprinting should be used only as a supplementary identifying mechanism there.

In this post I’m going to explain how it works in detail and give you real-life statistics accumulated over the period of 4 months of production usage.

Why

I was given an experimental task to implement the fingerprinting for both anonymous and logged-in users of one of our web sites. We wanted to see if it was possible at all to rely on identifying someone this way and not leave cookies. The idea was to accumulate the fingerprints and associated preferences and then pre-filter the information on front page based on what’s known about a user.

Implementation

So I got to work and started making a basic outline in my head. What is that identifies a browser? I gathered it would be: browser agent, browser language, screen color depth, installed plugins and their mime types, timezone offset, local storage, and session storage.

Initially I added the screen resolution as well, but a colleague adviced that one can use multiple monitors with a single laptop, for example connect an external monitor when working in office, so I removed it.

On my laptop browser the values are:

// Assuming jQuery in scope

navigator.userAgent
// "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.110 Safari/537.36"

navigator.language
// "en-US"

var plugins = $.map(navigator.plugins, function(p){
   var mimeTypes = $.map(p, function(mimeType){
    return [mimeType.type, mimeType.suffixes].join('~');
   }).join(',');
  return [p.name, p.description, mimeTypes].join('::');
});


$.each(plugins, function(i, p){
  // truncate only for blog example
  if(p.length > 80){
    console.log(p.substring(0, 77) + '...');
  } else{
    console.log(p);
  }
});

/*
Shockwave Flash:Shockwave Flash 11.7 r700:application/x-shockwave-flash~swf,a... 
Chrome Remote Desktop Viewer:This plugin allows you to securely access other ... 
Widevine Content Decryption Module:Enables Widevine licenses for playback of ... 
Native Client::application/x-nacl~nexe 
Chrome PDF Viewer::application/pdf~pdf,application/x-google-chrome-print-prev... 
Google Talk Plugin Video Accelerator:Google Talk Plugin Video Accelerator ver... 
Google Talk Plugin:Version: 4.0.1.0:application/googletalk~googletalk 
Google Talk Plugin Video Renderer:Version: 4.0.1.0:application/o1d~o1d 
Shockwave Flash:Shockwave Flash 11.2 r202:application/x-shockwave-flash~swf,a...
*/

screen.colorDepth
// 24

new Date().getTimezoneOffset();
// -240

!!window.localStorage
// true

!!window.sessionStorage
// true

So I now knew all my browser had, and I needed to produce the fingerprint itself. For that I wanted to use a fast, non-cryptographic hashing function, such as murmur hashing.

Murmur hashing produces 32-bit integer as a result and works really well. When compared to other popular hash functions, MurmurHash performed well in a random distribution of regular keys.

I picked this implementation and added it to the code.

The last step was to combine all browser’s capabilities into a long string and pass it through hashing.

The end result on my laptop was: 3723825959

As a finishing touch, I wanted to get rid of jQuery, so I implemented the each and map methods and got a no-dependencies script.

How to improve accuracy?

The above research states that the identification accuracy is surprisingly high. But to improve it even further, Flash or Java integration is required to get a list of installed fonts, thus making each browser even more unique.

What about hash collisions?

My tests show that for random strings Murmurh hashing indeed produces collisions, but their number is negligible for my purposes: 5-7 collisions per ~200K of capabilities strings.

What about mobile browsers?

It’s simple: browser fingerprinting is not good with mobile browsers, unless you want to distinguish Android users from iPhone ones.

Results

After having had the fingerprinting on production for 4 months, I have some data to analyze. First of all, let me say that I’m not at liberty to tell the exact number of visitors to the web site, but I can say it is several millions a month, so we have some data to play with. All numbers below represent our usage and do not represent what you might have.

89% of fingerprints are unique

20% of our users have more than one fingerprint, i.e. several browsers or devices.

Very few users have a staggering amount of fingerprints, for example 20-25. I don’t know if they have a lot of devices, use different browsers or something else.

After viewing the results we removed the fingerprinting because of poor identification, especially with mobile devices. If your traffic mostly comes from desktops and you’re OK with 10-12% of false identifications you might want to try it.

Show me the code

code on github – the version I had in production

test your browser

existential operator in CoffeeScript

2013-07-13T12:39:00+04:00

Existential operator ? can be used in three useful ways in CoffeeScript.

Checking the existence of a variable

In JavaScript there is no built-in way of checking the existence of a variable. You can try testing the existence with if(variable){...} but it won’t work in these cases:

if(0){
  console.log('this will not print');
}

if(""){
  console.log('this will not print');
}

if(false){
  console.log('this will not print');
}

// abc was never declared
if(abc){
  //ReferenceError: abc is not defined
}

The correct way to test if variable was both declared and initialized:

if(typeof variable !== 'undefined' && variable !== null){
  console.log('variable was declared and initialized with a value');
}

You may be tempted to use direct comparison with undefined:

if(variable !== undefined ...

But this is asking for trouble because in EcmaScript 3 (all older browsers, such as IE 6-8) undefined can be overwritten:

window.undefined = 'pancakes';

if('pancakes' === undefined){
  console.log('This will print, if you are unfortunate enough to use IE 6-8');
}

All ES5 compatible browsers have undefined immutable, i.e. it can’t be changed, but it’s better to play it safe.

CoffeeScript has a syntactic shortcut for testing existence:

CoffeeScript

if variable?
  console.log('variable is both declared and initialized with a non-null value');

This code will be transpiled to:

if (typeof variable !== "undefined" && variable !== null) {
  console.log('If variable was both declared and initialized with a non-null value');
}

Conditional assignment

What if you want to initialize a variable only if it has not been already initialized? In JavaScript you’d usually do something like:

function getUserLocale(){
  if(this.locale == null){
    this.locale = DB.getLocaleByUser(User.current);
  }
  return this.locale;
}

This technique caches the result of an expensive computation or database query in a variable. In above example, all subsequent getUserLocale function calls will not query the database. The important part is comparing with null using ==, rather than ===, because == will evaluate to true if variable is either undefined or null.

Such cache on first call pattern is widely used in many programming languages, but CoffeeScript has a special syntax for it:

CoffeeScript

getUserLocale = ->
  @locale ?= DB.getLocaleByUser(User.current)

?= is the operator that performs conditional assignment.

This will be transpiled to roughly the same JS as above, using ternary operator:

getUserLocale = function() {
  return this.locale != null ? this.locale : this.locale = DB.getLocaleByUser(User.current);
};

What if you try to conditionally assign an undeclared variable? If you try this:

CoffeeScript

# abc is not declared 
abc ?= 99

This will result in a compile-time error: the variable "abc" can't be assigned with ?= because it has not been declared before. This compile-time checking is very helpful, because it prevents a ReferenceError at run time.

If you know Ruby, ||= is the same thing there.

Safe property / function chaining

Chaining function calls is a great way to write terse yet fluent code. A good example is working with jQuery:

$('#header').css('color', '#fadfad').show('slow').off();

This is made possible because these jQuery functions return a reference to this. But what if one of the function returns null or undefined?

var zip = User.current.address.zip

If current user’s address is null or undefined, the .zip property call will result in TypeError.

A simple but ugly solution would be to use a lot of if checks:

var zip = null;
if(User.current != null && User.current.address != null){
  // only now this is safe
  zip = User.current.address.zip;
}

But this can quickly get out of hand with deep nesting.

CoffeeScript has a safe way of accessing long property chains using ?. variant of existential operator:

CoffeeScript

zip = User.current?.address?.zip

This will either soak up the null or undefined references and safely return undefined or return the final property value. In our case the generated code looks like this:

var zip, _ref;

zip = (_ref = User.current) != null ? _ref.address.zip : void 0;

What is this weird void 0 thing? This is to fight the pre AS5 undefined mutability I referred to earlier. JavaScript defines void as a unary operator that returns undefined for any argument. In other words, CoffeeScript compiler uses a set of nested ternary operators to safely return either last property value or undefined with void 0.

Calling a function safely works similarly:

CoffeeScript

noSuchFunction?()

This transpiles to:

if (typeof noSuchFunction === 'function') {
  noSuchFunction();
}

Key thing to take away here is that CoffeeScript first tests that callable function is defined and is a function. It does this using typeof bla === 'function'. The function is called only if it is defined.

Safe function invocation can be chained as well with other function or property calls:

example from coffeescript.org

lottery.drawWinner?().address?.zip

This transpiles to:

var zip, _ref;
zip = typeof lottery.drawWinner === "function"
  ?
    (_ref = lottery.drawWinner().address) != null ? _ref.zipcode : void 0
  :
    void 0;

Drawing analogy with Ruby-on-Rails ActiveSupport, the safe chaining can be seen as try method.

Conclusion:

CoffeeScript existential operator is a useful tool to cut down the verbosity of JavaScript when dealing with existence and null checks. It also can be used to shield inexperienced JavaScript developers from JavaScript bad parts.

different SMTP settings for ActionMailer action

2013-07-03T16:48:00+04:00

Sometimes you may want to set per-action SMTP settings that are different from site-wide settings. You might try to set them in mailer:

class UserMailer < ActionMailer::Base
  ActionMailer::Base.smtp_settings = {address: 'some.domain'}

  def user_registered(user)
    mail(to: user.email, subject: 'You registered here!')
  end
end

But this will override the smtp settings for the rest of mailers.

Instead, set them for the email instance:

# assuming in controller action
mail = UserMailer.user_registered(current_user)
custom_smtp_settings = {address: 'some.domain', port: 25}
mail.delivery_method.settings.merge! custom_smtp_settings
# now this will send using custom SMTP settings
mail.deliver

rails-3.1

reading command line arguments in rust

2013-07-03T13:31:00+04:00

Since all rust hello world tutorials usually start like this:

fn main(){
  println("Hello, world");
}

It’s unclear where to take command line arguments.

The solution:

use std::os;

fn main(){
  let args: ~[~str] = os::args();
  println(args.to_str());
}

This will give you a vector with strings where first string will be the app being run and the rest is the arguments provided.

rust-0.7

valve's

JSON serialization in Rust, part 2

Overview

Deserialization

Option

Vector

Tuple

HashMap

Array

Structs

Custom deserialization

Deserializing fixed length arrays

deriving(Decodable)

Further reading

JSON serialization in Rust, part 1

Overview

Serialization

Option

Vector

HashMap

Array

UPDATE: Thanks to Reddit user ASeriesOfTubes

Structs

Custom serialization

Serializing fixed length arrays

deriving(Encodable)

immutability in Ruby

Part 1: Immutability

Immutability and functional programming

Immutability and purity

Immutability and primitives

Strings

Concurrency.

Hash table keys

Security

String immutability and Ruby

Strings as hash keys

Immutable data structures

Immutable data structures and multithreading

Conclusion

Introduction to full text search for Rails developers

Hello, <%= current_user.name %>

Search people near you:

Hello, guest

Fill your profile so that people could find you. Allow browser to access your location if you want to be found by people near you.

Hello, guest

Fill your profile so that people could find you. Allow browser to access your location if you want to be found by people near you.

<%= person.name %>

About:

Likes:

Dislikes:

But why should I use fulltext search if I can do everything in SQL?

How is new data indexed?

How do I test it?

Code

Constant resolution in Ruby

What is a constant?

How Ruby searches constants.

Nesting modules using alternative syntax

Inheritance

Mixins

Full search path

const_missing method

NameError

Word about autoloading

Ambiguity

jQuery inputmask plugin + AngularJS

One must wrap a jQuery plugin into a directive in order for the plugin to work.

anonymous browser fingerprinting

What is fingerprinting?

Why

Implementation

How to improve accuracy?

What about hash collisions?

What about mobile browsers?

Results

Show me the code

code on github – the version I had in production

test your browser

existential operator in CoffeeScript