Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-compaction and compaction options. #1

Open
kevinswiber opened this issue Mar 18, 2013 · 12 comments
Open

Auto-compaction and compaction options. #1

kevinswiber opened this issue Mar 18, 2013 · 12 comments

Comments

@kevinswiber
Copy link
Member

The log-structured hash table creates data on each operation. A separate merge process is needed to clean up the .data and .hint files.

@kevinswiber
Copy link
Member Author

So I'm looking at adding an option to Medea like so:

var db = new Medea({
  compaction: 'auto' // value: 'manual' or 'auto'
});

Compaction would spawn a new process to get the work done.

Question: Should this default to 'manual' or 'auto'? What would be least surprising? What would be the most kick-ass option in your opinion?

/cc @sandfox @kevinohara80 @mdobson

@mdobson
Copy link
Contributor

mdobson commented Apr 22, 2013

Automatically by default. Keep it well documented that processes are going to be spawned for the cleanup process. I'd rather have a disposable process spawned to clean this up then to have it grow out of control.

-Matt

@sandfox
Copy link
Contributor

sandfox commented Apr 22, 2013

auto, and let grown ups turn it off? So this way everything keeps working at least for new users.

Aside from making a messy file structure and eventually losing all your disk space, what are the other downside to not ever running compaction?

@kevinswiber
Copy link
Member Author

Thanks, fellas. I'm on the same page. Just wanted to make sure the page wasn't torn out of some twisted version of Alice in Wonderland.

@sandfox Another downside is a longer startup time. Medea iterates through all the hint files on startup to build the keydir. If it has to parse through lots of dead entries, that takes a while.

Out of the box, Medea has a maximum file size limit and will open new data/hint files as needed. You can set this as an option on the Medea constructor (maxFileSize). Not sure it that's documented or not. It should keep chugging until it eats up all the disk space.

For some use cases, this probably won't matter much. For a lot, it will.

Scheduling the merge process is another feature of Bitcask (always, never, or within a certain time range). I'm wondering how system-intensive this process will be. I guess I won't know until it's built. :)

@sandfox
Copy link
Contributor

sandfox commented Apr 22, 2013

@kevinswiber Does compaction ever block? (in my head i think it must do at least some of the time)

@kevinswiber
Copy link
Member Author

@sandfox Yep. Trying to figure out how to make this work elegantly in Node. In Bitcask, the keydir is shared between multiple Erlang processes. Without using a memory mapping trick, I can't do that with legit OS processes.

I'm afraid we might have to block for the entire duration of the merge process. I still need to map out what this process will do in Node-land.

Because of this, merge strategies might be:

  1. Early and often - The tidier it is, the less time it will take for cleanups. Pay a small price a larger number of times.
  2. Within a time window - Do it only when expecting downtime (or on startup?). Pay a larger price a fewer number of times.

Thoughts?

@sandfox
Copy link
Contributor

sandfox commented Apr 22, 2013

I think default should be on start-up and throw out a little log message saying whats happening. This way it's very non-surprising when getting started. (nothing more annoying that things stopping midway through being on for "unknown" reasons). Otherwise let people specify attempting compaction every [x timeframe]. in combination with som configurable threshold for when compaction should happen too so it doesn't do it needlessly?

@kevinswiber
Copy link
Member Author

There are a few thresholds in Bitcask.

  • File too small
  • Dead bytes exceeds limit - how many bytes are being used for dead keys
  • Fragmentation exceeds limit - ratio of dead keys to total keys.

I think I'm going to try something in-process. We might be able to do it with minimal or no blocking.

  1. Load keydir (or use keydir loaded on open).
  2. Create new data file for writing.
  3. Iterate over data files.
  4. Iterate over entries in each data file.
  5. For each data file entry, check the keydir value.
  6. If keydir value points to a file with a greater file ID (or same file ID with greater offset), do not add to new file.
  7. If the file entry equals the key dir value and it's not a tombstone, add it to the new file.
  8. If the entry is a tombstone value, remove it from the keydir.
  9. Update the keydir with the new value size and position.
  10. Wrap entries into new file if file exceeds max size limit.
  11. Afterward, clean up by deleting old files.

This might make everything slower during a merge as more system resources will be utilized, but we may be able to get away with not blocking (for very long). That would be effin' sweet.

@kevinswiber
Copy link
Member Author

I've got the compaction code working now. You can find it in master at commit 26681ef. This doesn't do any automatic scheduling yet.

Below is some output. I had 47MB worth of key-value pairs in file 81 that I duplicated in a subsequent call to ./example/bench. This shows how Medea adds a new file of that size. Then I run compaction. You can see a new file created that is only 51MB with the remainder squashed.

» ls -hl medea 
total 136424
-rw-r--r--  1 kevin  staff    51M Apr 23 14:45 81.medea.data
-rw-r--r--  1 kevin  staff    15M Apr 23 14:45 81.medea.hint
» node ./example/bench.js 
set small >>
    time: 19611.69 ops/s
set medium >>
    time: 20597.32 ops/s
set large >>
    time: 19212.3 ops/s
get large >>
    time: 49578.58 ops/s
get medium >>
    time: 56433.41 ops/s
get small >>
    time: 58788.95 ops/s
» ls -hl medea
total 253184
-rw-r--r--  1 kevin  staff    51M Apr 23 14:45 81.medea.data
-rw-r--r--  1 kevin  staff    15M Apr 23 14:45 81.medea.hint
-rw-r--r--  1 kevin  staff    47M Apr 23 14:54 82.medea.data
-rw-r--r--  1 kevin  staff    10M Apr 23 14:54 82.medea.hint
» node ./example/test_compaction.js
done
» ls -hl medea 
total 136424
-rw-r--r--  1 kevin  staff    51M Apr 23 14:57 84.medea.data
-rw-r--r--  1 kevin  staff    15M Apr 23 14:57 84.medea.hint
»

@kevinswiber
Copy link
Member Author

FYI: You may notice the file number jumps from 82 to 84. That's because Medea holds file 83 open while compacting. File 83 is supposed to record ongoing file updates while compaction is running. File 84 becomes the newly merged data file. There were no updates in this case, and when Medea was properly closed, it deleted the empty file. (Note: These numbers don't mean anything special. Medea just increments the file ID for data and hint files. It starts at 1.)

Medea has a "wrap" concept that closes an active file and starts a new one when it reaches a configurable size limit. That should also be in merge cases, though I haven't tested it yet.

I expect ongoing reads/writes to work as compaction is running, but both will be impacted as resource usage spikes on the machine.

@kesla
Copy link
Member

kesla commented Jul 21, 2014

Hey, let's get this discussion going again!

I'd like to opt for automatic compaction! And running it in the same process, and running it on startup.

Perhaps we could count how many keys that's been changed/added/removed and then run compaction when that number is above a certain threshold?

Since the compaction is running mostly asynchronous, but in series (only one asynchronous operation running at the time) the performance impact shouldn't be to large. But that should definitely be benchmarked, so we can find a good balancen

@kevinswiber
Copy link
Member Author

We need smart defaults for auto-compaction that apply to multiple use cases. Just not sure what those are at this time.

@kevinswiber kevinswiber changed the title Create a separate merge process. Auto-compaction and compaction options. Sep 18, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants