ssb-flume-follower-sql

(work in progress 🚧 ) A sql-based database for secure scuttlebutt, written in rust, with bindings to js

This module parses the flume append only log file and inserts each message into a sql database.

This is conceptually very similar to a flume-view but it isn't a flume view that plugs into the rest of flume. @mmckegg wrote a post about his dream of a "consensus free scuttlestack." This module intenionally isn't a flume view because we want clients to be able to take ownership of the indexes they need without needing consensus about how they modify the .ssb folder.

Features

Easy to use
- models all the common relationships between messages, enabling powerful queries that are generally hard to do
- uses knex for powerful async queries
Fast + flexible
- built on Rust + Sqlite3
- supports processing the log and inserting into the db in spare cpu time, so your application never chokes
- bulding + queries are FAST (benchmarked!)
Advanced features
- query the view at any time - for when waiting for the view to be in sync isn't relevant
- supports multiple identities (WIP) (multiple keypairs), with decryption done with Rust
- supports multiple clients running different versions of this view, by giving you control over where this view is saved
Friendly. We have a code of conduct and are commited to abiding by it.
- Found something missing from the docs? Can't understand the code? Found an un-tested edge case? Spotted a poorly named variable? Raise an issue! We'd love to help.
Well tested
Well documented

Example

const SqlView = require('ssb-flume-follower-sql')
const config = // load ssb config 
const keys =  // load ssb keys
const logPath = Path.join(config.path, 'flume', 'log.offset')
const secret = ssbKeys.ssbSecretKeyToPrivateBoxSecret(keys)

// Constructing a new sqlView doesn't do that much automatically. A new sqlite db is created if it doesn't exist. No indexing is happening automatically.
const sqlView = SqlView(logPath, '/tmp/patchwork.sqlite3', secret, keys.id)

// The sql view has the knex instance 
var { knex } = sqlView

//Query for content of 20 most recent posts. 
knex
  .select(['content'])
  .from('messages_raw')
  .join('authors', 'messages_raw.author_id', 'authors.id')
  .orderBy('flume_seq', 'desc')
  .limit(20)
  .asCallback(function(err, result) {
    console.log(result) 
  })

// Process data from the offset log into the db as it comes in, but throttled so it's not too cpu hungry. (Assumes you can use `requestIdleCallback`)
window.requestIdleCallback(function processMore (deadline) {
  window.requestIdleCallback(processMore)

  //uses the latest sequence from flumedb to check if there are new things in the offset log.
  const sbotLatest = api.sbot.obs.latestSequence()()

  if (sqlView.getLatest() === sbotLatest) return

  // Process chunks until we've used up the available free time.
  while (deadline.timeRemaining() > 0) {
    sqlView.process({ chunkSize: 250 })
  }

  sqlViewLatest = sqlView.getLatest()
})

See more example queries below

Schema

Tables:

Views

sql views of the db that do joins you're likely to use.

messages

Query that generates view:

SELECT 
  flume_seq,
  key_id,
  seq,
  received_time,
  asserted_time,
  root_id,
  fork_id,
  author_id,
  content,
  content_type,
  is_decrypted,
  keys.key as key,
  root_keys.key as root,
  fork_keys.key as fork,
  authors.author as author
FROM messages_raw 
JOIN keys ON keys.id=messages_raw.key_id
LEFT JOIN keys AS root_keys ON root_keys.id=messages_raw.root_id
LEFT JOIN keys AS fork_keys ON fork_keys.id=messages_raw.fork_id
JOIN authors ON authors.id=messages_raw.author_id

`links`

`abouts`

`contacts`

`mentions`

API

var SqlView = require('ssb-flume-follower-sql')
var sqlView = SqlView('/path/to/log.offset', '/path/to/view.sqlite', <pubKey>, <secreKey> )

sqlView.process(opts = {})

opts is mandatory and has one optional field:

opts.chunkSize (optional) - Sets the maximum number of items to process. If this is omitted it will process all entries, bringing the view up to date with the log.
Note that processing will block this thread while executing. If you want to limit resource use of processing, use something like requestIdleCallback like in the example. Also be careful not to make opts.chunkSize too large. As a starting point, my machine processes 10000 entries in 140ms.

sqlView.getLatest()

Gets the latest flume sequence value processed by the db.

sqlView.knex

A knex instance ready to do read only queries on the db.

TBD if I can get knex writes working. Sqlite theortically supports multiple db connections.
TBD if you should mess with the db.

sqlView.modifiers

TBD if these are a good idea. They are syntactic sugar for common queries using knex modify. I'll know more once I write a whole lot of queries for patchwork.

More Example Queries

Content of my most recent 20 posts

db.knex
  .select(['content', 'author'])
  .from('messages_raw')
  .join('authors', 'messages_raw.author_id', 'authors.id')
  .where('authors.is_me', 1)
  .orderBy('flume_seq', 'desc')
  .limit(20)

Content of my most recent posts, pages of 20, third page.

db.knex
  .select(['content', 'author'])
  .from('messages_raw')
  .join('authors', 'messages_raw.author_id', 'authors.id')
  .where('authors.is_me', 1)
  .orderBy('flume_seq', 'desc')
  .limit(20)
  .offset(40)

Authors I block

db.knex
  .select(['author'])
  .from('contacts_raw')
  .join('authors', 'contacts_raw.author_id', 'authors.id')
  .where('authors.is_me', 1)
  .where('state', -1)

Authors I follow

db.knex
  .select(['author'])
  .from('contacts_raw')
  .join('authors', 'contacts_raw.author_id', 'authors.id')
  .where('authors.is_me', 1)
  .where('state', 1)

How many mentions do I have since a given flume sequence

var last_seq_num = 12345

db.knex('mentions_raw')
  .count('id')
  .join('authors', 'mentions_raw.link_to_author_id', 'authors.id')
  .join('messages_raw', 'mentions_raw.link_from_key_id', 'messages_raw.key_id')
  .where('authors.is_me', 1)
  .where('messages_raw.flume_seq', '>', last_seq_num )

All the about messages about me

db.knex
  .select(['content'])
  .from('abouts_raw')
  .join('authors', 'abouts_raw.link_to_author_id', 'authors.id')
  .join('messages_raw', 'abouts_raw.link_from_key_id', 'messages_raw.key_id')
  .where('authors.is_me', 1)

All recent posts by me and people I follow (1 hop)

TODO

What messages reference a given blob

TODO

Which authors reference a given blob

TODO

Which blobs haven't been referenced by me or people I follow in the last year.

TODO

Hints for developing queries:

install sqlite3 on your system.
in the sqlite command prompt run:
- sqlite> .timer on. The timer lets you see how long your queries are taking. There are often multiple ways to write a query. Try them out to see which one is faster. If something is slow, pay attention to which columns are indexed (marked with IDX in the schema diagram.)
- sqlite> .headers on. Gives you column header names at the top of your results.

Performance

Building the db:

Roughly 10x faster!

Sqlite db rebuild as fast as possible: 40s

Sqlite db rebuild, chunks of 250, running on patchwork's main thread without lagging the ui: 64s.

Flume rebuild of indexes used by patchwork: 404s

NB: This is a bit hard to do an exact comparison. Expect these numbers to change.

Querying:

WIP.

From rust (not using knex) querying for all authors I block on a db of 100k messages: 54 micro seconds.

Disk use:

Roughly 65% of the offset log.

Roughly 50% of the existing flume indexes.

My Sqlite db is 379MB.

My offset log is 598MB,

My flumedb indexes are 771MB.

Install

With npm installed, run

$ npm install ssb-flume-follower-sql

Development

Build for your environment

Dev Dependencies:

rust + cargo
cargo-make
cmake

$ ./build-native.sh

Cross compile

Dev Dependencies

same as above
cross
dockcross

Cross compiling is still a work in progress. Still todo:

split out scripts to cross in build-native.sh
add script to tar up cross compiled binding with correct prebuild compatible filename
windows builds don't work

Acknowledgments

@ahdinosaur for orchestrating the sunrise choir.
@mmckegg for the infectious enthusiasm.
@mixmix for awesome feedback on the readme.
@dominictarr for making it work before making it nice.
@AljoschaMeyer for making it nice before making it work.
@noffle for common-readme.

All the lovely people on scuttlebutt who make it the place it is.

Code of Conduct

License

LGPL-3.0

Name		Name	Last commit message	Last commit date
Latest commit History 144 Commits
docs/images		docs/images
native		native
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build-native.sh		build-native.sh
code-of-conduct.md		code-of-conduct.md
index.js		index.js
modifiers.js		modifiers.js
package-lock.json		package-lock.json
package.json		package.json

License

sunrise-choir/ssb-flumeview-sql

Folders and files

Latest commit

History

Repository files navigation

ssb-flume-follower-sql

Contents

Features

Example

Schema

Tables:

Views

messages

links

abouts

contacts

mentions

API

sqlView.process(opts = {})

sqlView.getLatest()

sqlView.knex

sqlView.modifiers

More Example Queries

Content of my most recent 20 posts

Content of my most recent posts, pages of 20, third page.

Authors I block

Authors I follow

How many mentions do I have since a given flume sequence

All the about messages about me

All recent posts by me and people I follow (1 hop)

What messages reference a given blob

Which authors reference a given blob

Which blobs haven't been referenced by me or people I follow in the last year.

Hints for developing queries:

Performance

Building the db:

Querying:

Disk use:

Install

Development

Build for your environment

Cross compile

Acknowledgments

See Also

License

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Sponsor this project

Languages

`links`

`abouts`

`contacts`

`mentions`