-
@ Mike Dilger ☑️
2023-07-29 03:13:59Gossip: Switching to LMDB
Unlike a number of other nostr clients, Gossip has always cached events and related data in a local data store. Up until recently, SQLite3 has served this purpose.
SQLite3 offers a full ACID SQL relational database service.
Unfortunately however it has presented a number of downsides:
- It is not as parallel as you might think.
- It is not as fast as you might hope.
- If you want to preserve the benefit of using SQL and doing joins, then you must break your objects into columns, and map columns back into objects. The code that does this object-relational mapping (ORM) is not trivial and can be error prone. It is especially tricky when working with different types (Rust language types and SQLite3 types are not a 1:1 match).
- Because of the potential slowness, our UI has been forbidden from direct database access as that would make the UI unresponsive if a query took too long.
- Because of (4) we have been firing off separate threads to do the database actions, and storing the results into global variables that can be accessed by the interested code at a later time.
- Because of (4) we have been caching database data in memory, essentially coding for yet another storage layer that can (and often did) get out of sync with the database.
LMDB offers solutions:
- It is highly parallel.
- It is ridiculously fast when used appropriately.
- Because you cannot run arbitrary SQL, there is no need to represent the fields within your objects separately. You can serialize/deserialize entire objects into the database and the database doesn't care what is inside of the blob (yes, you can do that into an SQLite field, but if you did, you would lose the power of SQL).
- Because of the speed, the UI can look stuff up directly.
- We no longer need to fork separate threads for database actions.
- We no longer need in-memory caches of data. The LMDB data is already in-memory (it is memory mapped) so we just access it directly.
The one obvious downside is that we lose SQL. We lose the query planner. We cannot ask arbitrary question and get answers. Instead, we have to pre-conceive of all the kinds of questions we want to ask, and we have to write code that answers them efficiently. Often this involves building and maintaining indices.
Indices
Let's say I want to look at fiatjaf's posts. How do I efficiently pull out just his recent feed-related events in reverse chronological order? It is easy if we first construct the following index
key: EventKind + PublicKey + ReverseTime value: Event Id
In the above, '+' is just a concatenate operator, and ReverseTime is just some distant time minus the time so that it sorts backwards.
Now I just ask LMDB to start from (EventKind=1 + PublicKey=fiatjaf + now) and scan until either one of the first two fields change, or more like the time field gets too old (e.g. one month ago). Then I do it again for the next event kind, etc.
For a generalized feed, I have to scan a region for each person I follow.
Smarter indexes can be imagined. Since we often want only feed-related event kinds, that can be implicit in an index that only indexes those kinds of events.
You get the idea.
A Special Event Map
At first I had stored events into a K-V database under the Id of the event. Then I had indexes on events that output a set of Ids (as in the example above).
But when it comes to storing and retrieving events, we can go even faster than LMDB.
We can build an append-only memory map that is just a sequence of all the events we have, serialized, and in no particular order. Readers do not need a lock and multiple readers can read simultaneously. Writers will need to acquire a lock to append to the map and there may only be one writer at a time. However, readers can continue reading even while a writer is writing.
We can then have a K-V database that maps Id -> Offset. To get the event you just do a direct lookup in the event memory map at that offset.
The real benefit comes when we have other indexes that yield events, they can yield offsets instead of ids. Then we don't need to do a second lookup from the Id to the Event, we can just look directly at the offset.
Avoiding deserialization
Deserialization has a price. Sometimes it requires memory allocation (if the object is not already linear, e.g. variable lengthed data like strings and vectors are allocated on the heap) which can be very expensive if you are trying to scan 150,000 or so events.
We serialize events (and other objects where we can) with a serialization library called speedy. It does its best to preserve the data much like it is represented in memory, but linearized. Because events start with fixed-length fields, we know the offset into the serialized event where these first fields occur and we can directly extract the value of those fields without deserializing the data before it.
This comes in useful whenever we need to scan a large number of events. Search is the one situation where I know that we must do this. We can search by matching against the content of every feed-related event without fully deserialing any of them.