This is the second part of the 2 part post that deals with decoding JSON strings into Rust values. First part is available here.
When working with JSON deserialization, we’re interested in Decodable and Decoder traits.
As with serialization, hex
and base64
modules are not relevant to JSON
deserialization, so we should not pay attention to them.
In order for a type to be decodable from JSON, it must implement Decodable trait. Almost all built-in types already implement it, so you can deserialize them with:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Deserializing to Option is somewhat redundant, because json::decode
returns
DecodeResult, which is a type alias
for a regular result. That means you can pattern match on DecodeResult
and handle potential failure.
1 2 3 4 5 |
|
Decoding JSON to a tuple is identical to vector, just specify the correct type:
1 2 3 4 5 |
|
1 2 3 4 5 6 7 8 9 10 |
|
As with serializing, Rust cannot automatically deserialize JSON string into a fixed-length array. The reason for this is the same: arrays’ type signature contain length as part of the type, but Rust currently (and most likely not until after v1.0) can’t be generic over array’s length.
1 2 3 4 5 6 |
|
To remedy this, we will use custom decoding, as we did with custom array encoding in part 1. I’ll show an example of this below.
As with serialization, it’s possible to have Rust deserialize structs
automatically for you. You will need to add the
#[deriving(Decodable)]
attribute to your struct:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Note that I’m using 3 deriving trait implementations for a struct:
Encodable
, Decodable
and Show
. This is to make my struct fully
JSON (de)serializable and printable automatically.
This is probably the cornerstone of the JSON infrastructure. In real life you often cannot control the shape of the JSON that comes to you, so you must be able to convert arbitrary JSON strings into your objects. Luckily, Rust decoding capabilities will help us here.
Let us continue with our Person
struct example and deserialize the
object from a complex JSON where our object is in the data
key. The example might be contrived, but it serves the demo purpose.
To make a type JSON deserializable we need to implement the Decodable
trait.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
Let’s break down the code line by line to see what’s going on here.
Line 2:
We need to bring both Decodable
and Decoder
traits into
scope. Decodable
trait is for the struct to implement, to conform to
the JSON deserialization interface, while the
Decoder
is the low level workhorse of deserialization, which tokenizes
and parses the JSON string to convert it to Rust values later.
Line 5:
I’m using a raw string literal to avoid escaping double quotes.
Line 6:
The line where I’m decoding JSON string into an instance of
Person
struct. Note that I need to type-annotate the variable when
decoding it.
Line 10:
We no longer need to use the #[deriving(Decodable)]
attribute, because we implement the Decodable
trait ourselves.
Line 16:
This is the Decodable
implementation. It is very similar to
the Encodable
implementation from part 1 with the exception of S
being
type restricted to Decoder
trait now.
Line 17:
Two differences from encode
method counterpart:
we’re no longer accepting &self
as the first argument,
because decode
is an associated function, rather than a method.
The analogy is class methods in Ruby or static methods
in Java. Second difference is the return type. It is now Result<Person,
E>
.
Line 18:
This is where the parsing starts. We do actual parsing with
read_*
family of methods on Decoder
instance. Here we’re reading the
top-level struct with read_struct
method. First argument is the name
of the sturct(not used), second is the length (not applicable). The third argument is an
instance of anonymous function (lambda). Why are the first and the
second arguments not used?
I think this is because the entire family of read_*
methods of
Encoder
strives to be uniform and thus a unified set of arguments
is used, even when the encoder does not need them.
You can think of the read_struct
call as “opening” the top-level JSON
object to be able to move inside to read actual values. The lambda is
where we descend and continue with reading.
Line 19:
The object we’re trying to read is in the data
field, so
we’re reading it on this line with read_struct_field
method.
This time the first argument is necessary, because it tells the decoder
the actual name of the field. 3rd argument is the lambda again to
descend further into the the object in the data
field.
Lines 20-21:
Field reading happens here. By this time the parser
has reached the contents of the data
object so we can now just read the
fields we’re interested in one-by-one. We’re using read_struct_field
again, passing it the name of the field and the index(not used).
The third argument is the value of the field, correctly decoded from
JSON representation. Since all primitive values in Rust already
implement the Decodable
trait, we can safely call Decodable::decode
on them to deserialize them as the Person
struct fields.
As in part 1, let’s use this knowledge to deserialize a fixed-length array from JSON.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
Since rust will not allow to provide the implementation of a trait for a type where both the trait and the type were defined in the external crate, we need to create a tuple struct (newtype) for the array.
Overall, this implementation looks similar to the previous, but there are nuances I’d like to point out.
Line 18:
Note that the implementation signature adds new T
type parameter,
which is the type of the array. It can be anything that implements
Default
+Copy
+Encodable<S, E>
traits.
Default
is to be able to fill array with default values (line 24).
Copy
is to be able to copy the default values
into the new array, while the Decodable<S, E>
is to be able to decode
the array elements from JSON.
Lines 22-24:
Here I’m checking if the array we’re about to decode contains
exactly the number of elements we expect. If not, I exit early with an
error.
Lines 26-28:
Here I’m iterating the array, obtaining mutable
references to its elements and filling them from JSON, using
decoder::read_seq_elt
.
Line 29:
Here I’m returning the result wrapped in Buffer
newtype.
As in part 1, let’s look at the expanded implementation of the
#[deriving(Decodable)]
attribute.
Let’s use the Person
struct example again
and compile it with --pretty expanded
flag:
rustc app.rs --pretty expanded
The output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
|
The output is very similar to the manual deserialization code we
saw earlier, except that compiler further expanded the try!
macros into Err
and
Ok
branches.
There is a convenience ToJson
trait that allows implementors to quickly convert itself into JSON using
intermediate Json enum representation, but I recommend using it only for small and relatively simple data structures.
This is the first part of the 2 part post that deals with encoding Rust values into JSON. Second part will deal with converting JSON strings back into Rust values.
JSON serialization lives in the serialize crate. It contains json module where low-level implementation details live and two traits which we are interested in: Encodable and Encoder.
Please note that hex and base64 modules are not relevant to JSON serialization, so we should not pay attention to them.
In order for a type to be JSON serializable, it must implement Encodable trait. Almost all built-in types already implement it, so you can serialize them as easily as:
1 2 3 4 5 6 7 8 9 10 |
|
1 2 3 4 5 6 7 |
|
1 2 3 |
|
1 2 3 4 5 |
|
Currently Rust cannot automatically serialize fixed sized arrays to JSON.
1 2 3 4 5 |
|
Array’s type signature includes its length, but Rust can’t be generic with array’s length. So in order to serialize an array into JSON, we’ll need to use custom serialization, which I’ll explain further.
It is possible to serialize an array to JSON automatically. You just need to convert it to slice.
1 2 3 |
|
It’s possible to have Rust automatically implement JSON serialization
for your structs. You’ll need to adorn the struct with
deriving(Encodable)
attribute.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
You will inevitably come to the point when Rust’s supplied serialization
will not work for you. Luckily we have full control over the serialization
process. In order to serialize your type the way you want it, you will
need to implement the Encodable
trait. Let’s continue with our Person
struct example and change it to include a summary field.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
Let us break down this code line by line to understand what’s going on.
Line 2:
We need to bring both Encodable
and Encoder
traits into
scope. Encodable
trait is for the struct to implement, to conform to
the JSON serialization interface, while the
Encoder
is the low level workhorse of serialization, which transforms
primitive values into JSON bits and combines them all together.
Line 8:
We no longer need to use the #[deriving(Encodable)]
attribute, because we’re implementing the Encodable
trait ourselves.
Line 13:
We implement Encodable
trait for the Person
struct.
Encodable
full type signature is Encodable<S, E>
where
S
should be an instance of Encoder<E>
.
E
is a type parameter for Result<T, E>,
which our implementation returns.
Line 14:
In order to implement the Encodable
trait, we need to write
the encode
method, which accepts a single S
argument. Remember, that
S
is an instance of Encoder
, which is a low level JSON emitter.
Lines 15-16:
We’re destructuring (decomposing) the struct to access
its fields. To do that we use pattern matching and assign the person
fields to p_name
and p_age
variables.
Line 17:
This is where JSON writing begins. We call emit_struct
on
our encoder and pass it 3 arguments: the name of the struct, current
index and an anonymous function(aka lambda). The name of the struct
is not used; current index
is not used too.
What is important is the anonymous function that we’re passing as the
3rd argument.
The emit_struct
method simply writes {
, calls the lambda and then
writes closing }
. Why are the 1st and the 2nd arguments not used?
I think they are there to conform to the uniform style of encoder’s emit_*
methods,
but they don’t make any sense when writing a JSON object.
Lines 18-22:
This is where the body of the JSON object is written.
Each field is written with emit_struct_field
method that accepts same
3 arguments: name, index and lambda. Name is how you want your object
field to be named, index is to correctly insert comma after each field
and the lambda’s job is to return correctly escaped JSON representation of
the struct’s field value. Since the built-in types already implement the
Encodable
trait, we can safely call encode
on integers and strings
to encode their values into JSON.
Line 23:
To indicate the successful JSON encoding, we return unit
wrapped in Ok enum value of the Result.
Line 24:
The line where the closing }
of the object is written,
because lambda finishes here.
Now armed with the knowledge to write our own implementation of
Encodable
, we can convert an array into JSON.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
Since rust will not allow to provide the implementation of a trait for a type where both the trait and the type were defined in the external crate, we need to create a tuple struct (newtype) for the array.
Overall, this implementation looks similar to the previous, except we’re using
the combination of emit_seq
+ emit_seq_elt
to emit [
+ elements +
]
. We also keep a counter variable to correctly handle the comma.
Note that the implementation signature adds new T
type parameter,
which is the type of the array. It can be anything that implements
Encodable<S, E>
trait.
Now you’re ready to understand what happens when you use
#[deriving(Encodable)]
. Let’s use the Person
struct example again
and compile it with --pretty expanded
flag:
rustc app.rs --pretty expanded
We see the output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
|
The implementation provided by the Rust compiler is almost identical to
ours, except that it further expanded the try!
macros into Err
and
Ok
branches.
Second part of this article will explain the reverse process: how to decode Rust objects from JSON string.
]]>All these years functional programming seemed like a holy grail, but as a true holy grail, I was afraid it was meant to stay undiscovered.
All these years I paid my bills writing Ruby-on-Rails and JavaScript code and never made the functional leap. I never became a full-time Haskell or Scala developer and probably will never become one.
But you know what? It’s possible to be slightly more functional with normal languages we’re using every day. This article will try to demonstrate several concrete examples where functional programming is useful or elegant. I will show you the old way of doing things in Ruby and the new, more functional way of doing similar things in Ruby again.
Let me start by saying that this article assumes you’re interested in functional programming. It also assumes that you’ve probably seen other examples of functional code before.
I’m going to split this article into several parts and each part will elaborate upon a specific example.
What is immutability? When people speak about immutability they usually mean immutable objects. Quoting from wikipedia:
an immutable object is an object whose state cannot be modified after it is created. This is in contrast to a mutable object, which can be modified after it is created.
A very simple concept with far reaching consequences.
First let’s define what ‘whose state cannot be modified’ really means. At first you may think that such an object is useless. How can we possibly use an object if we cannot change it? Usually an immutable object creates a copy of itself with desired modifications. The original object remains unchanged. You will see the examples of it further in the article.
Now, another foundational question: why does functional programming favor immutable values and data structures over mutable ones? Is real functional programming possible with mutable values? You probably know that functional programming is more than ‘programming with functions’. It also requires the functions to be pure. I’m not a mathematician and my explanation of pure functions may not be scientifically correct, but you can think of them simply as functions that: always accept an argument, always return a result and the computing of the result depends solely on the input. In other words, a pure function cannot depend on some other data, existing elsewhere, called state, to influence how the result is computed. The only thing that dictates how the result is computed is the function’s argument. Pure functions cannot change the external state either. This is called creating side effects.
Sometimes programmers call the external state “the world” and refer to pure functions as functions that cannot depend on “the world” and read “the world” state, nor change “the world” while making its job.
Why worry at all about the purity of functions? Composability. When your functions are pure, you can compose large programs from small functions. Knowing that a function is pure provides guarantees that it will not change the external state.
Is it possible to write a real program using only pure functions? How can you talk to the database, write to files, charge credit cards and do everything else real programs do? Functional applications are usually built using a pure core (where the bulk of the logic lives) and a thin, impure shell (that provides access to the pure core from the outside world). This way you have a large part of the code that is easy to reason about, easy to test and easy to understand.
Example of a pure function:
1 2 3 |
|
You can see that this function computes the result only using its arguments.
Example of an impure function:
1 2 3 4 |
|
This function writes to the file system in addition to computing the result. In other words, this function changes “the world” by creating side effects.
Using v2 of this function you hurt composability; you limit yourself in the ways you can use this function in other parts of your program.
Now let’s look why function purity demands immutability with a concrete example. We all know that strings in ruby are mutable. You can mutate the string with:
1 2 3 4 5 6 7 |
|
This code fragment modifies the string in-place, mutating it. Now let’s use the string as a function argument:
1 2 3 4 |
|
This method mutates the argument and returns it. On the surface, this looks OK, but we have just inadvertently created a side effect. Any external code that relies on this string may break.
Let’s create an example of this:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
You see now that in order to keep function pure we should never mutate its arguments, but create new objects and return them instead. Same function, but this time implemented as pure:
1 2 3 |
|
Just a minor modification gives us many benefits: we’re no longer modifying “the world” and only return a new string with the required modifications.
How can we guarantee that functions never mutate their arguments? By making the arguments immutable, of course!
The key thing to take away here is that by making each object immutable, we can guarantee that functions do not create side effects and remain pure.
Hopefully, by now I have convinced you that immutable objects are useful. Now you probably understand that by limiting the “reach” of the function to only the local function’s scope you automatically decrease the number of potential bugs and unpleasant surprises. However, you may still be unsure about the performance of immutable objects, and think that it is wasteful to create a copy of an object each time it needs to be modified. The following part of the article will hopefully make everything clear.
Let’s define what primitives are. For our purposes, we can refer to primitives as data types, that serve as basic building blocks of the language. Usually the primitives are directly supported by the language. Ints, floats, characters and booleans are primitives and are usually treated in a special way by languages.
You don’t need to do something like:
1 2 |
|
You can use primitives directly:
1 2 |
|
Why does a language usually divide objects into, well, objects and primitives? The reason is performance. Primitives are closer to computer hardware and creating an object for every number is slow.
However, Ruby does not have true primitives, because in Ruby, everything is an object. You can call methods and properties on numbers and extend them with user-defined methods. I will still call them primitives, because it’s what they are on a conceptual level.
On one hand, primitives behave like immutable objects in Ruby:
1 2 3 4 5 6 |
|
This snippet demonstrates that you cannot modify a number. In real life this doesn’t make sense either, if you have the number 4 it’s the number 4 — eternal and beautiful. If you add 1 to it, you get completely different number 5, the old 4 stays the same.
On the other hand, you can define your own methods and properties:
1 2 3 4 5 |
|
Integers and floats are frozen by default, while booleans are not.
1 2 3 |
|
So while some primitives are not frozen, Ruby does not provide mutation methods for them and they usually can be treated as immutable objects. You should remember that this is easily overridden (as is everything in Ruby) and can cause potential problems.
Before diving into the specifics of Ruby strings, let’s talk about string mutability in general. In most languages strings are immutable: string concatenation or upcasing produces a new string rather than modifying it in-place.
Why do language designers usually make their string implementations immutable? To answer that we need to remember that strings are one of the most used data structures in any programming language.
Let’s consider the cases when string immutability is useful.
This is a complex topic and I will talk about it later in the article. What you should know at this point is that when any data structure is immutable, it can be freely shared across threads without any locking or synchronization. Immutable data structures don’t need synchronisation at all when used in multithreaded environments.
Modern programming languages are designed from the ground up to be concurrent (go, rust), and having a single string instance to be shared across multiple threads helps to save a lot of memory and avoid the necessity of defensive copying when passing immutable strings around.
More often than other data types, strings are used as keys in hash tables. This usage demands for strings to return the same hash code after the key and value were added to the hash table. With mutable strings a hash table would need to copy the string in order to guarantee the hash code staying the same. With immutable strings this is not needed.
As I’ve mentioned, strings are used very frequently in any program. This entails a special treatment in terms of security. Strings are used when comparing user-names and passwords, storing credit card numbers and much more. Immutable strings guarantee that a malicious party is unable to tamper with the string after creation.
However, there is a performance downside of immutable strings. Mutable strings allow fast indexing and modifying in-place, as with regular arrays.
As with primitives, Ruby has no real immutable strings. To be precise, Ruby strings are mutable behind an immutable facade. That is, most operations on strings return new strings, while some of them still allow in-place modification.
Consider these examples:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
As you can see mutable operations do not create new strings but rather modify existing strings in-place.
Earlier I mentioned that mutable strings do not make good hash keys. Let me prove this:
1 2 3 4 5 |
|
After I modified the string key we can no longer find the value, because the key’s hashcode has changed! Since it’s so easy to mutate the Ruby string, you can end up with a useless hash. This is why it is not recommended to use mutable strings as hash keys.
How can we remedy it? The first option is to freeze the string:
1 2 3 4 |
|
A second and better option is to use symbols, which are immutable versions of strings often used as identifiers.
1 2 3 4 |
|
When using literal symbols as hash keys, Ruby provides a shorter syntax:
1 2 |
|
You might say at this point, “Why can’t I just use symbols instead of strings if they’re immutable equivalents?”. The short answer is you may not be able to, depending on your use case. One reason is that symbols don’t have immutable equivalents of string’s many methods, so it’s inconvenient to use symbols as an immutable replacement. Just compare the number of methods in Symbol and String to see the difference.
So far my discussion was around built-in data types and their relationships with immutability. Real-life applications, however, require using data structures in order to be efficient.
What is a data structure?
It’s a complex question, but you can think of it as a way to organize other, simpler data structures in a convenient or efficient way. Some data structures are designed for ease of use, while others are built solely with efficiency in mind.
We all know about lists, queues, hash tables, arrays, trees and many, many more. These data structres can have both mutable and immutable implementations.
Mutable implementations are considered ‘classic’, because they are more widely used, have been around for longer and generally are easier to implement. Immutable counterparts offer advantages in concurrency and security.
While some people use ‘immutable’ and ‘persistent’ interchangeably, they are not the same. Persistent data structure is immutable and keeps and reuses large parts of itself while constructing an immutable copy. As an example, you can think of a persistent linked list that reuses its tail when appending a new node. If you’re interested in functional, persistent data structures, have a look at Purely functional data structures by Chris Okasaki.
Let me also add that many modern programming languages that focus on concurrency have their data structures implemented in an immutable fashion: Scala offers both immutable and mutable collections. Clojure and C# offer immutable collections as well.
Let’s go ahead and implement a classic, mutable stack in Ruby and then reimplement it as immutable. A stack is a data structure that follows this interface:
1 2 3 4 |
|
Here is mutable implementation that uses a Ruby array as a backing store:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
This implementation is basically a thin wrapper around array.
Whenever you call stack.push(item)
, you’re modifying this array
in-place. This implementation possesses all the weaknesses that we discussed
previously.
Now to an immutable implementation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
|
Usage pattern:
1 2 3 4 5 6 7 |
|
Each destructive operation does not mutate the stack but rather returns a copy of itself with the required modifications. What’s more, it reuses a large portion of itself while doing so, thus making this stack a persistent data structure.
Unfortunately, Ruby does not allow you to directly create private constructors and users can potentially call
1
|
|
If you want a good ruby library of immutable collections, I suggest using hamster.
When writing a multi-threaded applications, follow these rules:
In our two stack implementations it is safe to share an
immutable version across multiple threads, because they will not be able to
modify it in place. Whenever a thread makes a push
or a pop
, a
new instance of the stack is created and returned so that the existing
instance is never changed.
Now that you’ve read the article, you might have the impression that immutability is a silver bullet. It is not. It is only one possible way to design software and has its own strengths and weaknesses. Immutability let’s you design your functions and data structures in a new way, gaining much and losing much too. We’ve all been living in a world where the sequential computation was the de-facto standard. In the past immutability was not worth it. For a single core computer, immutability has too much overhead. You must carefully control the state and pay close attention to reusing and copying in order to be efficient. The performance impact that some of the immutable data structures incur can be too significant.
But the world is changing and the sequential model is disappearing. We all have smartphones with 2 or 4 cores. Our smart watches will have 8 cores in a couple of years, which means that concurrent will become the new sequential. If we want to exploit the power of modern hardware, we need to embrace the concurrent way of doing things. This is where immutability advantages outweigh the bad parts. I think that immutability is the way you should design your software now in order to be prepared for the concurrent future.
]]>Almost every developer knows deep inside that full-text search is better suited for searching text, but continues to use old LIKE '%?%'
queries.
I was one of those developers who never used full-text search, but I have changed and I invite others to join me and discover the other side of search with Solr.
This article assumes you’re comfortable with Ruby, Rails and PostgreSQL. I’ll build a simple people near me application using Solr in small incremental steps and hopefully help readers to overcome the feeling of uncomfortable uneasiness when thinking about full text search technology.
Disclaimer: My goal here is to familiarize the reader with full-text search, not create an ideal rails application structure. I’ll be using long views and JavaScript inside ERB templates. The point is to make a small but complete application in a single article and it is possible to do so only by keeping it really simple.
OK, enough talk, let’s build the app!
Let’s call this app Neibo
:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Let’s remove the sqlite3
, turbolinks
, coffee-rails
, jbuilder
and jquery-rails
gems
as we will not need them. We should also add the pg
gem to talk to Postgres DB.
My Gemfile is now:
1 2 3 4 5 6 |
|
Now you need to set the pg connection in config/database.yml
file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
Then let’s create a Person
model and generate a migration for it:
1 2 3 |
|
1 2 3 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
For every person we store a name
,
an about
– this is where a person can tell the world about himself,
likes
– things person likes and dislikes
.
We also want to store a person’s location, so that other people can locate him within certain radius.
We store a location using two floating point numbers, lat
– for latitude, and lon
– for longitude.
It’s possible to use a
specialized Point data type, but I want to keep it simple here.
I make lat
& lon
attributes nullable in case a user
denies the browser geolocation permission.
Let’s create the databases and run the migration.
1 2 3 4 5 6 |
|
We now need to create a controller, a route and a view:
1 2 3 4 5 |
|
1 2 3 4 5 |
|
app/views/people/index.html.erb
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
|
This UI has two parts: if a user has already filled his details, he can use the search form and search for people nearby. If this is a new user, he fills his details, optionally allows a browser to get his location and saves his profile in the database.
We now need to modify the app/assets/javascripts/application.js
and remove the files we’re not using.
I remove them all and leave the application.js
empty.
The view code checks the current_user
method to see if the current user profile has been filled.
Let’s create this method:
1 2 3 4 5 6 7 8 9 10 11 |
|
I’ll be storing current user’s id in session and get the user object from the database.
Let’s concentrate on the new user
scenario.
In order for the application to learn the user’s location, we need to grab it from the browser and save.
Adding the code to the view:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
The lines we’re interested in are:
1 2 |
|
And the JavaScript:
1 2 3 4 5 6 |
|
When a view loads, JavaScripts asks a user for permission to get his location. If the user agrees, the callback is invoked and the location is saved in the hidden fields so that the form can submit them back to the server.
Now the controller part:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
Here we have boilerplate ruby code, we’re using strong parameters to only allow a known
set of attributes.
We then try to create a user and save the new user ID in the session.
This way current_user
helper method will retrieve the current user from the
database.
If the validation fails, we just display the message about it and render the view again.
Let’s add those validations:
1 2 3 4 |
|
Now when we go back to the browser and reload the page we can enter the profile data, allow the browser to get our geolocation and click save.
At this point we introduced ourselves to
the system and current_user.id
is stored in the encrypted cookie.
Next part is where the fun starts: we need to be able to search for other users nearby. We should allow limiting the search radius, specify the search term and see the results.
We must remove people who have our search term in their dislikes
attribute.
For example if a person dislikes Chinese couisine, and we’re searching for people
who like it,… you get the idea.
Let’s take a little detour and speak about Solr and the gems that enable it in Rails. We’ll be using sunspot – an excellent gem that adds a nice DSL (really, it’s nice) on top of rsolr.
At this point you might be asking: “Wait! What’s RSolr? I’m now totally confused between Solr, RSolr and Sunspot and how they relate to each other”. I totally understand your confusion. Let’s break this mess into pieces:
Now you’re saying: “I still don’t understand, if Solr is a Java service it means I need to install and configure it on my system? That’s a horrible perspective, get me out of this!”. Absolutely not. The Sunspot gem is bundled with a development version of Solr and has a nice set of rake tasks to manage it. You can start, stop, reindex the data, all using rake tasks. There is no need to install Solr manually, all you need is to add two gems:
sunspot_solr
and sunspot_rails
.
sunspot_solr
is the pre-packaged development version of Solr and sunspot_rails
is the Sunspot gem itself. So you need to make sure you place the sunspot_solr
into :development
group in your Gemfile.
Now you can start bundled Solr with rake sunspot:solr:start
, stop it with rake sunspot:solr:stop
and
reindex all data with rake sunspot:reindex
.
OK, now that confusion is hopefully out of the way, let’s continue with our person search scenario.
Let us define the searchable attributes on our Person
model:
1 2 3 4 5 6 7 8 9 |
|
Let’s break it down piece by piece:
searchable
block is a place where you define the full-text indexing behavior.
Inside this block you can specify various rules describing which attributes should
be indexed, their pre-index transformations, facets, filters and so on.text :name
– A person should be searchable by his name.boost: 5.0
– boost option tells Solr to prioritize the results found by this particular attribute.
If you’re searching for John Doe
,
all the people with this name will come first,
and only after those who dislike John Does.text :about, :likes
– Person
should be searchable by these attributes.latlon(:location) { Sunspot::Util::Coordinates.new(lat, lon) }
– create a geo-spatial
index on person’s location using lat
and lon
attributes.
This will allow searching for people within a certain mile radius.Great, wasn’t that simple? We’ve defined a set of searchable attributes on a Person
model.
Now we’re ready to actually search for people.
Let us add a _person
partial where search result item will be displayed:
1 2 3 4 5 6 7 8 9 10 11 |
|
We also need to add the iteration to the index
view:
1 2 3 4 |
|
So the view is ready, let’s modify the controller code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
On line 3
we check if current user is saved, on line 4
we verify we have something to
search by, either a search term or a radius. Then on lines 5 - 10
is where the actual
full-text search happens. We use a Model.search
method and pass it a block.
Inside this block we need to specify the logic of the search.
In our case we call the fulltext
method and pass it our search term.
Let me be clear, we have two phases: indexing and searching. Indexing is defined
inside a model in a searchable
block. You use text
method to specify which attributes
should be full-text searchable.
Search by calling Model.search
method and passing it a block too. But this time
we call fulltext
method to actually do full-text search on indexed attributes.
OK, we now understand how to do full-text search on text attributes, we’re already doing it on
name
, about
and likes
attributes. What we also need is a way to restrict the results
to a certain radius on a map. This is what lines 7 - 9
are for.
In our application it’s possible for a user to deny geolocation permissions and his profile to be saved without coordinates. So, we need a convenience method to see if the current user has a location:
1 2 3 4 |
|
This method is useful in Person.search
block where we specify the search radius:
1 2 3 4 |
|
We’re using the current user’s lat
& lon
attributes and the radius from params to perform the
filtering. You should remember to convert miles to kilometers, because Sunspot operates on
kilometers.
OK, first version of the people search is ready to try, let’s run it.
Works fine, but when I search for someone within 10 mile radius, I find myself too. There should be a way to search for other people, excluding myself. Let’s fix it.
Sunspot allows using attributes as filters. For this we should call methods like integer
,
string
, datetime
etc. In this case we need to search for all people except those
with :id
equal to the :id
of current user. We also need to filter out the people with
dislikes
equal to the search term:
1 2 3 4 5 6 7 8 |
|
On line 5
we’re creating an indexed filter on :id
column, and on the next line a filter on
:dislikes
column.
Now the filtering itself:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
On line 6
we’re filtering out people with :id
equal to current user’s id.
On line 7
we’re filtering out people who dislike stuff I’m searching for.
What does Person.search
return? It’s a special Sunspot object that has a results
method. So to grab actual active record items, we use @people = search.results
code.
Finally we have all pieces of the puzzle. If we run the app now we should be able to save current user’s profile and then go search for other people.
In this article I’ve barely scratched the surface of the Solr & Sunspot capabilities. You should definitely look for more in the documentation if you want to create a full-featured application.
You’re right, except you can’t. Full text search is a huge topic with a huge set of capabilities. It can do synonym search, wildcard search, stemming and a lot, lot more.
Solr can be as intelligent as to perform word decomposition during a search, operate on word parts and generally behave as a human (almost).
Full-text search is faster too. How much faster? This is a tricky question, because it all depends on the indexed data, but one can safely assume it can be at least several times faster than equivalent SQL searching. For complex searches Solr can be orders of magnitude faster than SQL.
Sunspot handles it for you. It registers a set of hooks that trigger the automatic indexing of updated and new records. If you look into rails log, you’ll see something like:
1 2 3 4 |
|
You should generally avoid touching Solr in unit tests. Either design your tests to avoid talking to Solr in unit tests, or just stub Solr to return pre-canned results.
As for integration tests, indexing data before running them worked best for me.
I first prepare some test data, then I reindex it with:
rake sunspot:reindex
and then run the integration tests.
If you find the topic of testing interesting, drop me a line, I’ll cover it in the next article.
https://github.com/Valve/neibo
Well, I hope the explanation wasn’t too packed, share your ideas in the comments :)
]]>Ruby constant is anything that starts with a capital.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Yes, regular ALL_CAPITAL
are constants, module and class names are constants too.
When Ruby tries to resolve a constant, it starts looking in current lexical scope by searching the current module or class. If it can’t find it there, it searches the enclosing scope and so on.
It’s easy to see the lexical scopes search chain with Module::nesting method:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
Module::nesting
returns an array of searcheable lexical scopes, starting from current.
In above case the search for A_CONSTANT
starts from module C, then goes to enclosing scope – module B, and then to module A where it finally finds it.
You’ve probably seen the alternative way of defining the enclosing modules:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
See the difference? Constant resolution only uses the innermost module for searching, ignoring the enclosing scopes. By defining the modules with this shorter syntax you lose the ability to search for constants in enclosing scopes.
Enclosing scopes is the first place where Ruby searches the constants. Second place is the inheritance hierarchy. Consider this code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Ruby can mixin modules into classes as an alternative to inheritance. When a class mixes in a module, this module inserts itself between the class being mixed in and the parent class in the inheritance hierarchy. The simple way to see this is using ancestors method.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
What’s going on here? We’ve defined a base class Person
, a child class BusDriver
that inherits from Person
. We also defined a Insurable
module which we mixed into our BusDriver
class. When we call the ancestors
class method, we see the BusDriver
class first, then Insurable
module which was wedged between BusDriver
and Person
. Then goes the Person
class, then, obviously, Object
. This is all nice and clear.
But why do we see Kernel
between Object
and BasicObject
? This is because Kernel
is a module that is mixed into Object
thus inserting itself into the inheritance hierarchy. This ancestors
array is how the name resolution works throughout the inheritance chain.
Now that you’ve seen the inheritance part of the name search, you can see the full picture:
1 2 |
|
When Ruby has finished searching the constants up the nesting and ancestors chain and didn’t find it, it gives the calling code the last chance by calling the const_missing method.
1 2 3 4 5 6 7 8 |
|
This error is called when Ruby can’t find the constant and there is no const_missing
method defined.
1 2 3 4 |
|
Let’s say you’d like to be flexible about your constants and load them automatically, following some naming convention? Turns out, there is a way, it’s called autoloading.
If we were to implememt autoloading from scratch, it would be something like this:
1 2 3 4 5 6 7 8 9 10 11 |
|
Turns out, we don’t have to, because autoloading is built into Ruby. We have Kernel#autoload, Module#autoload and more sophisticated ActiveSupport::Autoload. I’m not going to cover these topics here but will try to do it in a future post.
Here comes the tricky part: what if you have multiple constants with the same name? Consider this example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Results might seem strange at first, but please remember the full search path:
1
|
|
First comes the lexical scope searching and only after the inheritance chain, where mixins are inserted between child and parent classes. Also, when Ruby finds a constant with a given name, it stops looking further.
]]>I was working on a small angularjs application and needed a text input masking to force users to enter their telephone numbers in a certain format. “So, what’s the problem?” – I thought.
I added the neccessary code:
1 2 3 4 |
|
1 2 3 4 5 6 7 8 9 10 |
|
… only to find out that $scope.application
was undefined
and no phone value never made it to controller.
I decided to use any angular-specific masking library and found one, but it was still buggy and couldn’t handle my simple +7(999)999-99-99
mask.
Exasperated, I tried several input masking plugins for jQuery, but none of them worked with angular.
Then I stumbled upon this and this and understood:
Easy-peasy, let’s do it.
Having prepared myself with the documentation on the subject, I wrote version #1:
1 2 3 4 5 6 7 8 9 10 |
|
And the html:
1
|
|
Refreshing the browser, I saw the working mask, but the $scope.application
was still undefined
after filling in the phone, i.e mask was correctly initialized but the value never propagated to the controller. So I added some jQuery into the directive:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
How do I get rid of hardcoded bindings? We can access the value of ng-model
attribute and set
the appropriate scope value:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
I achieved what I wanted, but I’m still not quite satisfied because of the nudging feeling that this code could still be made more idiomatic and robust.
]]>Fingerprinting is a technique, outlined in the research by Electronic Frontier Foundation, of anonymously identifying a web browser with accuracy of up to 94%.
Browser is queried its agent string, screen color depth, language, installed plugins with supported mime types, timezone offset and other capabilities, such as local storage and session storage. Then these values are passed through a hashing function to produce a fingerprint that gives weak guarantees of uniqueness.
No cookies are stored to identify a browser.
It’s worth noting that a mobile share of browsers is much more uniform, so fingerprinting should be used only as a supplementary identifying mechanism there.
In this post I’m going to explain how it works in detail and give you real-life statistics accumulated over the period of 4 months of production usage.
I was given an experimental task to implement the fingerprinting for both anonymous and logged-in users of one of our web sites. We wanted to see if it was possible at all to rely on identifying someone this way and not leave cookies. The idea was to accumulate the fingerprints and associated preferences and then pre-filter the information on front page based on what’s known about a user.
So I got to work and started making a basic outline in my head. What is that identifies a browser? I gathered it would be: browser agent, browser language, screen color depth, installed plugins and their mime types, timezone offset, local storage, and session storage.
Initially I added the screen resolution as well, but a colleague adviced that one can use multiple monitors with a single laptop, for example connect an external monitor when working in office, so I removed it.
On my laptop browser the values are:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
|
So I now knew all my browser had, and I needed to produce the fingerprint itself. For that I wanted to use a fast, non-cryptographic hashing function, such as murmur hashing.
Murmur hashing produces 32-bit integer as a result and works really well. When compared to other popular hash functions, MurmurHash performed well in a random distribution of regular keys.
I picked this implementation and added it to the code.
The last step was to combine all browser’s capabilities into a long string and pass it through hashing.
The end result on my laptop was: 3723825959
As a finishing touch, I wanted to get rid of jQuery, so I implemented the each
and map
methods and got a no-dependencies script.
The above research states that the identification accuracy is surprisingly high. But to improve it even further, Flash or Java integration is required to get a list of installed fonts, thus making each browser even more unique.
My tests show that for random strings Murmurh hashing indeed produces collisions, but their number is negligible for my purposes: 5-7 collisions per ~200K of capabilities strings.
It’s simple: browser fingerprinting is not good with mobile browsers, unless you want to distinguish Android users from iPhone ones.
After having had the fingerprinting on production for 4 months, I have some data to analyze. First of all, let me say that I’m not at liberty to tell the exact number of visitors to the web site, but I can say it is several millions a month, so we have some data to play with. All numbers below represent our usage and do not represent what you might have.
89% of fingerprints are unique
20% of our users have more than one fingerprint, i.e. several browsers or devices.
Very few users have a staggering amount of fingerprints, for example 20-25. I don’t know if they have a lot of devices, use different browsers or something else.
After viewing the results we removed the fingerprinting because of poor identification, especially with mobile devices. If your traffic mostly comes from desktops and you’re OK with 10-12% of false identifications you might want to try it.
?
can be used in three useful ways in CoffeeScript.
In JavaScript there is no built-in way of checking the existence of a variable.
You can try testing the existence with if(variable){...}
but it won’t work in these cases:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
The correct way to test if variable was both declared and initialized:
1 2 3 |
|
You may be tempted to use direct comparison with undefined
:
1
|
|
But this is asking for trouble because in EcmaScript 3 (all older browsers, such as IE 6-8) undefined
can be overwritten:
1 2 3 4 5 |
|
All ES5 compatible browsers have undefined
immutable, i.e. it can’t be changed, but it’s better to play it safe.
CoffeeScript has a syntactic shortcut for testing existence:
1 2 |
|
This code will be transpiled to:
1 2 3 |
|
What if you want to initialize a variable only if it has not been already initialized? In JavaScript you’d usually do something like:
1 2 3 4 5 6 |
|
This technique caches the result of an expensive computation or database query in a variable.
In above example, all subsequent getUserLocale
function calls will not query the database.
The important part is comparing with null
using ==
, rather than ===
, because ==
will evaluate to true
if variable is either undefined
or null
.
Such cache on first call pattern is widely used in many programming languages, but CoffeeScript has a special syntax for it:
1 2 |
|
?=
is the operator that performs conditional assignment.
This will be transpiled to roughly the same JS as above, using ternary operator:
1 2 3 |
|
What if you try to conditionally assign an undeclared variable? If you try this:
1 2 |
|
This will result in a compile-time error:
the variable "abc" can't be assigned with ?= because it has not been declared before
.
This compile-time checking is very helpful, because it prevents a ReferenceError
at run time.
If you know Ruby, ||=
is the same thing there.
Chaining function calls is a great way to write terse yet fluent code. A good example is working with jQuery:
1
|
|
This is made possible because these jQuery functions return a reference to this
.
But what if one of the function returns null
or undefined
?
1
|
|
If current user’s address is null
or undefined
, the .zip
property call will result in TypeError
.
A simple but ugly solution would be to use a lot of if
checks:
1 2 3 4 5 |
|
But this can quickly get out of hand with deep nesting.
CoffeeScript has a safe way of accessing long property chains using ?.
variant of existential operator:
1
|
|
This will either soak up the null
or undefined
references and safely return undefined
or
return the final property value. In our case the generated code looks like this:
1 2 3 |
|
What is this weird void 0
thing? This is to fight the pre AS5 undefined
mutability I referred to earlier.
JavaScript defines void as a unary operator that returns undefined
for any argument. In other words, CoffeeScript compiler uses a set of nested ternary operators to safely return either last property value or undefined
with void 0
.
Calling a function safely works similarly:
1
|
|
This transpiles to:
1 2 3 |
|
Key thing to take away here is that CoffeeScript first tests that callable function is defined and is a function.
It does this using typeof bla === 'function'
. The function is called only if it is defined.
Safe function invocation can be chained as well with other function or property calls:
1
|
|
This transpiles to:
1 2 3 4 5 6 |
|
Drawing analogy with Ruby-on-Rails ActiveSupport, the safe chaining can be seen as try method.
CoffeeScript existential operator is a useful tool to cut down the verbosity of JavaScript when dealing with existence and null checks. It also can be used to shield inexperienced JavaScript developers from JavaScript bad parts.
]]>1 2 3 4 5 6 7 |
|
But this will override the smtp settings for the rest of mailers.
Instead, set them for the email instance:
1 2 3 4 5 6 |
|
rails-3.1
1 2 3 |
|
It’s unclear where to take command line arguments.
The solution:
1 2 3 4 5 6 |
|
This will give you a vector with strings where first string will be the app being run and the rest is the arguments provided.
rust-0.7