Auto-compaction and compaction options. #1

kevinswiber · 2013-03-18T21:24:09Z

The log-structured hash table creates data on each operation. A separate merge process is needed to clean up the .data and .hint files.

kevinswiber · 2013-04-22T14:49:32Z

So I'm looking at adding an option to Medea like so:

var db = new Medea({
  compaction: 'auto' // value: 'manual' or 'auto'
});

Compaction would spawn a new process to get the work done.

Question: Should this default to 'manual' or 'auto'? What would be least surprising? What would be the most kick-ass option in your opinion?

/cc @sandfox @kevinohara80 @mdobson

mdobson · 2013-04-22T14:54:20Z

Automatically by default. Keep it well documented that processes are going to be spawned for the cleanup process. I'd rather have a disposable process spawned to clean this up then to have it grow out of control.

-Matt

sandfox · 2013-04-22T14:55:02Z

auto, and let grown ups turn it off? So this way everything keeps working at least for new users.

Aside from making a messy file structure and eventually losing all your disk space, what are the other downside to not ever running compaction?

kevinswiber · 2013-04-22T15:02:44Z

Thanks, fellas. I'm on the same page. Just wanted to make sure the page wasn't torn out of some twisted version of Alice in Wonderland.

@sandfox Another downside is a longer startup time. Medea iterates through all the hint files on startup to build the keydir. If it has to parse through lots of dead entries, that takes a while.

Out of the box, Medea has a maximum file size limit and will open new data/hint files as needed. You can set this as an option on the Medea constructor (maxFileSize). Not sure it that's documented or not. It should keep chugging until it eats up all the disk space.

For some use cases, this probably won't matter much. For a lot, it will.

Scheduling the merge process is another feature of Bitcask (always, never, or within a certain time range). I'm wondering how system-intensive this process will be. I guess I won't know until it's built. :)

sandfox · 2013-04-22T15:04:54Z

@kevinswiber Does compaction ever block? (in my head i think it must do at least some of the time)

kevinswiber · 2013-04-22T15:46:16Z

@sandfox Yep. Trying to figure out how to make this work elegantly in Node. In Bitcask, the keydir is shared between multiple Erlang processes. Without using a memory mapping trick, I can't do that with legit OS processes.

I'm afraid we might have to block for the entire duration of the merge process. I still need to map out what this process will do in Node-land.

Because of this, merge strategies might be:

Early and often - The tidier it is, the less time it will take for cleanups. Pay a small price a larger number of times.
Within a time window - Do it only when expecting downtime (or on startup?). Pay a larger price a fewer number of times.

Thoughts?

sandfox · 2013-04-22T15:59:02Z

I think default should be on start-up and throw out a little log message saying whats happening. This way it's very non-surprising when getting started. (nothing more annoying that things stopping midway through being on for "unknown" reasons). Otherwise let people specify attempting compaction every [x timeframe]. in combination with som configurable threshold for when compaction should happen too so it doesn't do it needlessly?

kevinswiber · 2013-04-22T20:37:03Z

There are a few thresholds in Bitcask.

File too small
Dead bytes exceeds limit - how many bytes are being used for dead keys
Fragmentation exceeds limit - ratio of dead keys to total keys.

I think I'm going to try something in-process. We might be able to do it with minimal or no blocking.

Load keydir (or use keydir loaded on open).
Create new data file for writing.
Iterate over data files.
Iterate over entries in each data file.
For each data file entry, check the keydir value.
If keydir value points to a file with a greater file ID (or same file ID with greater offset), do not add to new file.
If the file entry equals the key dir value and it's not a tombstone, add it to the new file.
If the entry is a tombstone value, remove it from the keydir.
Update the keydir with the new value size and position.
Wrap entries into new file if file exceeds max size limit.
Afterward, clean up by deleting old files.

This might make everything slower during a merge as more system resources will be utilized, but we may be able to get away with not blocking (for very long). That would be effin' sweet.

kevinswiber · 2013-04-23T19:04:05Z

I've got the compaction code working now. You can find it in master at commit 26681ef. This doesn't do any automatic scheduling yet.

Below is some output. I had 47MB worth of key-value pairs in file 81 that I duplicated in a subsequent call to ./example/bench. This shows how Medea adds a new file of that size. Then I run compaction. You can see a new file created that is only 51MB with the remainder squashed.

» ls -hl medea 
total 136424
-rw-r--r--  1 kevin  staff    51M Apr 23 14:45 81.medea.data
-rw-r--r--  1 kevin  staff    15M Apr 23 14:45 81.medea.hint
» node ./example/bench.js 
set small >>
    time: 19611.69 ops/s
set medium >>
    time: 20597.32 ops/s
set large >>
    time: 19212.3 ops/s
get large >>
    time: 49578.58 ops/s
get medium >>
    time: 56433.41 ops/s
get small >>
    time: 58788.95 ops/s
» ls -hl medea
total 253184
-rw-r--r--  1 kevin  staff    51M Apr 23 14:45 81.medea.data
-rw-r--r--  1 kevin  staff    15M Apr 23 14:45 81.medea.hint
-rw-r--r--  1 kevin  staff    47M Apr 23 14:54 82.medea.data
-rw-r--r--  1 kevin  staff    10M Apr 23 14:54 82.medea.hint
» node ./example/test_compaction.js
done
» ls -hl medea 
total 136424
-rw-r--r--  1 kevin  staff    51M Apr 23 14:57 84.medea.data
-rw-r--r--  1 kevin  staff    15M Apr 23 14:57 84.medea.hint
»

kevinswiber · 2013-04-23T19:10:58Z

FYI: You may notice the file number jumps from 82 to 84. That's because Medea holds file 83 open while compacting. File 83 is supposed to record ongoing file updates while compaction is running. File 84 becomes the newly merged data file. There were no updates in this case, and when Medea was properly closed, it deleted the empty file. (Note: These numbers don't mean anything special. Medea just increments the file ID for data and hint files. It starts at 1.)

Medea has a "wrap" concept that closes an active file and starts a new one when it reaches a configurable size limit. That should also be in merge cases, though I haven't tested it yet.

I expect ongoing reads/writes to work as compaction is running, but both will be impacted as resource usage spikes on the machine.

kesla · 2014-07-21T15:56:24Z

Hey, let's get this discussion going again!

I'd like to opt for automatic compaction! And running it in the same process, and running it on startup.

Perhaps we could count how many keys that's been changed/added/removed and then run compaction when that number is above a certain threshold?

Since the compaction is running mostly asynchronous, but in series (only one asynchronous operation running at the time) the performance impact shouldn't be to large. But that should definitely be benchmarked, so we can find a good balancen

kevinswiber · 2014-07-21T17:42:43Z

We need smart defaults for auto-compaction that apply to multiple use cases. Just not sure what those are at this time.

kevinswiber added the enhancement label Jul 21, 2014

kesla mentioned this issue Jul 22, 2014

Batched writes do not respect maxFileSize. #27

Open

kevinswiber mentioned this issue Jul 22, 2014

What's the catch medea/medeadown#2

Open

kevinswiber removed the discussion label Jul 31, 2014

kevinswiber mentioned this issue Sep 18, 2014

The journey to v1.0.0. #44

Open

kevinswiber changed the title ~~Create a separate merge process.~~ Auto-compaction and compaction options. Sep 18, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-compaction and compaction options. #1

Auto-compaction and compaction options. #1

kevinswiber commented Mar 18, 2013

kevinswiber commented Apr 22, 2013

mdobson commented Apr 22, 2013

sandfox commented Apr 22, 2013

kevinswiber commented Apr 22, 2013

sandfox commented Apr 22, 2013

kevinswiber commented Apr 22, 2013

sandfox commented Apr 22, 2013

kevinswiber commented Apr 22, 2013

kevinswiber commented Apr 23, 2013

kevinswiber commented Apr 23, 2013

kesla commented Jul 21, 2014

kevinswiber commented Jul 21, 2014

Auto-compaction and compaction options. #1

Auto-compaction and compaction options. #1

Comments

kevinswiber commented Mar 18, 2013

kevinswiber commented Apr 22, 2013

mdobson commented Apr 22, 2013

sandfox commented Apr 22, 2013

kevinswiber commented Apr 22, 2013

sandfox commented Apr 22, 2013

kevinswiber commented Apr 22, 2013

sandfox commented Apr 22, 2013

kevinswiber commented Apr 22, 2013

kevinswiber commented Apr 23, 2013

kevinswiber commented Apr 23, 2013

kesla commented Jul 21, 2014

kevinswiber commented Jul 21, 2014