Improve history serialization and persistence #135

nathansobo · 2016-02-26T05:47:45Z

@as-cii I wanted to some up some of my thoughts on history serialization in case you're interested. Here's a potential set of steps that I think could put is in a better spot both design-wise and in terms of performance.

Compact Binary Patch Format

Switch the undo/redo stack to store patches rather than the current list of temporally-ordered changes.
Implement Patch.prototype.compact that converts the internal representation of the patch from a tree to a flat buffer. This format would only support getChanges by directly iterating the serialized data, which is easy with that library. Call this method at the end of each transaction to keep the history in a memory-efficient way.
Implement Patch.prototype.serialize in terms of the same buffer format.

It might make more sense to just serialize patches to buffers before storing them in the history and deserializing them on an as-needed basis. The compact approach where you actually store Patch instances with a different backing store just seemed cool in that it maintains a nice programmatic interface, but it may be more trouble than it's worth in other ways.

Incremental Persistence

I'm still concerned about repeatedly serializing the same history over and over again. It might not be a big deal once it's smaller, but if it is, I have this idea for incrementally persisting the state.

Create a global buffer pool in IndexedDB... We'd store opened buffers in a single pool, regardless of window or project with a key scheme like atom::buffers::/path/to/buffer.txt (there may be precedent for the key delimiter so we should research).
Associated with each buffer could be its modified text in a record at atom::buffers::/path/to/buffer.txt::modifiedContents
Also associated with each buffer could be an undo and redo stack, stored via lexicographically-sorted keys as follows: atom::buffers::/path/to/buffer.txt::undo::1, atom::buffers::/path/to/buffer.txt::undo::2, atom::buffers::/path/to/buffer.txt::undo::3, etc. If you pop an undo entry, delete its record. If you build a new undo entry, increment the integer in the key. (You might actually want to decouple the undo and redo entry keys from the buffer's path and just synthesize an id that you reference from the path once to simplify rename.)

It may be that the history never gets big enough to require this, but it would be a cool scheme for basically associate an infinite history with any file on disk in a persistent way. I think it might be pretty cool.

The text was updated successfully, but these errors were encountered:

as-cii · 2016-03-02T15:38:32Z

Today I have worked on changing the History to reason in terms of patches but I still have a couple of concerns about the design that I think it's worth sharing here. 💭

Our API allows to undo individual changes and, while we call groupChangesSinceCheckpoint at the end of each transaction, History might be changed in the middle of it. This leaves us with an heterogeneous history that requires us to take into account scenarios where e.g. a call to truncateUndoStack needs to combine entries that may be patches or individual changes, or when iterating the stack during popUndoStack we need to apply the same kind of differentiation. We might use a Patch to represent individual changes, but that feels somewhat overkill in terms of performance.

Other than that, we cannot use the patch that we pop from the undo stack to compose it later when emitting a stopped changing event, because we would need to compose the inverse of it. This requires us to actually create a different patch, and not reusing the same one makes having oldText in Patch less appealing in terms of performance. Thus, for this particular problem, I believe @maxbrunsfeld's suggestion of creating an inverse patch on the fly might actually be better in terms of design.

Nonetheless, now that I have more background on how History needs to work, I am wondering whether having patches is really a design improvement in the way we track changes. Allowing to undo changes in the middle of a transaction makes the Patch model less perfect because it increases the complexity of an otherwise homogeneous piece of state.

@nathansobo @maxbrunsfeld: how would you feel about not changing how we store History changes? I am starting to have the sense that the design tradeoffs outweigh the advantages, considering also that we can achieve the same speedups in serialization by replacing arrays/objects with flat buffers. Thoughts? 💡

maxbrunsfeld · 2016-03-02T17:01:17Z

how would you feel about not changing how we store History changes?

I am ok with that for now. Refactoring to use the Patch feels a bit orthogonal to the concern of making the serialization format more compact. That said, I do think that using the Patch in the history would eventually be a nice change, because of the current poor performance of undo when there are lots of changes, e.g. in vim-mode after terminating a long insert-mode session. It's definitely a less important concern than serialization though.

nathansobo added the enhancement label Feb 26, 2016

as-cii mentioned this issue Feb 29, 2016

Reintroduce metadata splicing atom-archive/atom-patch#6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve history serialization and persistence #135

Improve history serialization and persistence #135

nathansobo commented Feb 26, 2016

as-cii commented Mar 2, 2016

maxbrunsfeld commented Mar 2, 2016

Improve history serialization and persistence #135

Improve history serialization and persistence #135

Comments

nathansobo commented Feb 26, 2016

Compact Binary Patch Format

Incremental Persistence

as-cii commented Mar 2, 2016

maxbrunsfeld commented Mar 2, 2016