-
@ verbiricha
2023-02-28 09:27:18I developed an anonymous URL shortener service in Clojure. In this post I'll talk about how the implementation uses Redis for storage. The code is open source and you can find it on GitHub.
URL Shortener
A URL shortener is a web service that takes URLs and assigns them an alphanumeric ID. Visiting the URL associated with the ID redirects to the original. Short links look better for sharing them with the world.
A key-value store such as Redis fits our use case considering our storage needs: given a URL we need to store it associated with an alphanumeric ID and then want to be able to retrieve the original URL given the alphanumeric ID.
Redis
Redis is an in-memory data structure key-value store that can be used as a database, cache or message broker. The strings are the most fundamental Redis data type and the only one we will need.
Even though is a key-value store and doesn't have a query language, Redis supports many data types with a rich set of commands, transactions and LUA scripting.
Clients interact with a Redis server using a protocol called "REdis Serialization Protocol" (RESP): a simple, human-readable protocol with a request-response model. The redis-cli command-line client can be used to interact with a Redis server from the console.
$ redis-cli 127.0.0.1:6379> PING PONG 127.0.0.1:6379>
For talking to Redis from Clojure we use the excellent Carmine library. It provides a
wcar
macro that acquires a connection to Redis and inside its body we can use functions from thetaoensso.carmine
namespace that correspond to Redis commands.```clojure (ns shortener (:require [taoensso.carmine :as car :refer [wcar]]))
(def redis {:spec {:host "127.0.0.1"}})
(wcar redis (car/ping)) ;;=> "PONG" ```
ID generation
Redis allows to treat strings as 64-bit signed integers provided that the string can be represented as an integer. Commands like INCR can be used to manipulate keys containing numerical values.
Redis stores integers in their integer representation, so for string values that actually hold an integer, there is no overhead for storing the string representation of the integer.
Using
INCR
on a key that doesn't exist sets it to 0 before performing the increment, so we can useINCR
to treat a key as a counter.```clojure (defn unique-id [conn] (wcar conn (car/incr "unique-id.counter")))
(unique-id redis) ;;=> 1 (unique-id redis) ;;=> 2 (unique-id redis) ;;=> 3 ```
Since we want the IDs to be short, human-readable and URL friendly we'll use base-encoding. The most popular base encoding for URL strings is base 64, but encogio uses a custom alphabet which includes the letter Ñ.
```clojure (def alphabet "abcdefghijklmnñopqrstuvwxyzABCDEFGHIJKLMNÑOPQRSTUVWXYZ1234567890_-")
(def base (count alphabet)) ;;=> 66 ```
It has a couple more characters than the base-64 alphabet so we'll have to write the base encoding function for the custom alphabet.
```clojure (defn base-encode [input] (loop [n input res ""] (cond (zero? input) (subs alphabet 0 1) (zero? n) res :else (recur (quot n base) (str (nth alphabet (rem n base)) res)))))
(base-encode 0) ;;=> "a"
(base-encode 1) ;;=> "b"
(base-encode 9999999999999) ;;=> "b1-gJÑ4j" ```
Storing URLs
With the ID generation and encoding solved we can write the functions for storing and retrieving URLs. The
store-url!
function takes a Redis connection and a URL, returning its ID after storing it in Redis.```clojure (def urls-prefix "url-shortener.id:")
(defn store-url! [conn url] (let [id (unique-id conn) id-key (base-encode id) redis-key (str urls-prefix id-key)] (wcar conn (car/set redis-key url)) id-key))
(store-url! redis "http://habla.news") ;;=> "b"
(store-url! redis "http://github.com/verbiricha/habla") ;;=> "c" ```
For retrieval we use the GET command, returning
nil
when the given ID does exist.```clojure (defn get-url! [conn id] (let [redis-key (str urls-prefix id)] (wcar conn (car/get redis-key))))
(get-url! redis "b") ;;=> "http://habla.news"
(get-url! redis "c") ;;=> "http://github.com/verbiricha/habla"
(get-url! redis "d") ;;=> nil ```
Aliases
The basics for the URL shortener are in place but we haven't taken into account that we want users to be able to provide a custom alias. Since our IDs are already alphanumeric we'll treat aliases as IDs and use them as keys for storing aliased URLs.
Once a key in our URLs keyspace is picked we don't want to overwrite it so we have to make sure that we only write them once.
Atomic set
Redis supports running LUA scripts transactionally with the EVAL command so we can guarantee the atomicity of a set operation by doing it inside a LUA script.
Redis guarantees that a script is executed in an atomic way: no other script or Redis command will be executed while a script is being executed.
There are two types of arguments we pass to LUA scripts: key arguments (assumed to be used as Redis keys) and regular arguments. They are available in the
KEYS
andARGV
special variables respectively. Note that LUA uses 1-based indexing.The Redis EVAL command takes a LUA script, the number of key arguments and the script arguments and returns the result returned by the script. Here's a few examples of how we can use it from Clojure:
```clojure (wcar redis (car/eval "return {KEYS[1]};" 1 "a-key")) ;;=> ["a-key"]
(wcar redis (car/eval "return {KEYS[1], ARGV[1]};" 1 "a-key" 42)) ;;=> ["a-key" "42"] ```
Redis commands can be called from Lua using
redis.call
.```clojure (wcar redis (car/eval "return redis.call('SET', KEYS[1], ARGV[1]);" 1 "a-key" 42)) ;;=> "OK"
(wcar redis (car/get "a-key")) ;;=> "42" ```
With that in mind we write a LUA script that will result in an error if the key we are trying to set already exists.
lua -- Receives a key (KEYS[1]) and a value (ARGV[1]) and atomically -- sets the key to value if it doesn't exist. -- -- Returns "OK" if successful, "duplicate key" if not. -- local exists = tonumber(redis.call("EXISTS", KEYS[1])); if exists == 0 then return redis.call("SET", KEYS[1], ARGV[1]); else return redis.status_reply("duplicate key"); end;
Since we'll be using this script quite frequently, we'll use Carmine's eval* helper which optimistically tries the Redis EVALSHA command to save bandwidth.
```clojure (def atomic-set-lua " local exists = tonumber(redis.call('EXISTS', KEYS[1])); if exists == 0 then return redis.call('SET', KEYS[1], ARGV[1]); else return redis.status_reply('duplicate key'); end;")
(defn atomic-set! [conn k v] (let [set? (wcar conn (car/eval* atomic-set-lua 1 k v))] (if (= set? "OK") {:key k :value v} {:error :conflict}))) ```
When trying to set a key using the
atomic-set!
function it will only be written if it doesn't exist already.```clojure (atomic-set! redis "write-once" 42) ;;=> {:key "write-once" :value 42}
(atomic-set! redis "write-once" 42) ;;=> {:error :conflict} ```
atomic-set!
can be used now to support aliasing URLs.```clojure (defn alias-url! [conn url alias] (let [redis-key (str urls-prefix alias) result (atomic-set! conn redis-key url)] (when-not (:error result) alias)))
(get-url! redis "blog") ;;=> nil
(alias-url! redis "http://habla.news" "blog") ;;=> "blog"
(get-url! redis "blog") ;;=> "http://habla.news"
(alias-url! redis "http://habla.news" "blog") ;;=> nil ```
Finally the
store-url!
function needs to useatomic-set!
as well and try another autogenerated ID when running into a conflict.clojure (defn store-url! [conn url] (let [id (unique-id conn) id-key (base-encode id) redis-key (str urls-prefix id-key) result (atomic-set! conn redis-key url)] (if-not (:error result) id-key (recur conn url)))
Conclusion
We have used #[4] and #[5] to implement the basis of an URL shortener service that supports both autogenerated IDs and user supplied aliases. In future posts we'll be looking at how to expose our URL shortener to the world through HTTP as well as protecting it from bad actors by rate limiting.
Happy hacking!