Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for createDocument to be loaded for subsequent createConnections #822

Merged
merged 5 commits into from
May 17, 2024

Conversation

jordangarcia
Copy link
Contributor

@jordangarcia jordangarcia commented May 10, 2024

Changes

  • in Hocuspocus add a mapping loadingDocuments of Map<string, Promise<Document>> to store loading documents
  • createDocument will use loadingDocuments to dedupe docs, always returning a promise of the fully loaded document
  • Eliminate tests where it was testing clients connecting to a doc still loading.

Fixes #821

nperez0111
nperez0111 previously approved these changes May 11, 2024
Copy link

@nperez0111 nperez0111 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit, but otherwise this looks like a good change to me

Comment on lines 167 to 176
test('disconnects all clients related to the document when an error is thrown in onLoadDocument', async t => {
const resolvesNeeded = 4

await new Promise(async resolve => {

const server = await newHocuspocus({
async onLoadDocument() {
return new Promise((resolve, fail) => {
setTimeout(() => {
// eslint-disable-next-line prefer-promise-reject-errors

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is removed because this behavior should not exist anymore. This test is asserting that if the onLoadDocument throws then provider2 should still be able to connect to it, which is what we do not want.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the error may be transient, so if it throws it should be able to retry, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure i follow. Wouldn't it be up to the HocuspocusProvider to handle retry logic once the connection fails to be established.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I add the following console logs to the test i see:

test.only('disconnects all clients related to the document when an error is thrown in onLoadDocument', async t => {
  const resolvesNeeded = 4

  await new Promise(async resolve => {

    const server = await newHocuspocus({
      async onLoadDocument() {
        return new Promise((resolve, fail) => {
          setTimeout(() => {
            // eslint-disable-next-line prefer-promise-reject-errors
            fail('ERROR')
          }, 250)
        })
      },
      async onStoreDocument(data) {
        t.fail('MUST NOT call onStoreDocument')
      },
    })

    let resolvedNumber = 0
    const resolver = () => {
      resolvedNumber += 1

      if (resolvedNumber >= resolvesNeeded) {
        t.is(server.documents.size, 0)
        t.is(server.getConnectionsCount(), 0)
        resolve('done')
      }
    }

    const provider1 = newHocuspocusProvider(server, {
      onConnect() {
        console.log('provider1 connect')
        resolver()
      },
      onClose(event) {
        provider1.disconnect()
        console.log('provider1 disconnect')
        resolver()
      },
    })

    const provider2 = newHocuspocusProvider(server, {
      onConnect() {
        console.log('provider2 connect')
        resolver()
      },
      onClose() {
        provider2.disconnect()
        console.log('provider2 disconnect')
        resolver()
      },
    })

  })

})
provider2 connect
provider1 disconnect
provider2 disconnect
provider2 disconnect

I feel like this is not what i would expect this behavior to do in the first place.

Comment on lines 428 to 433
const loadDocPromise = this.loadDocument(documentName, request, socketId, connection, context)

this.loadingDocuments.set(documentName, loadDocPromise)

return loadDocPromise
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This map will just grow in memory and always short circuit fetching the document (since it will have the promise in memory).
I'd suggest doing a .finally to remove this documentName from the map afterwards to clean it up. We don't want this to act as a cache

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch here on this growing infinitely!

However, I dont quite follow using .finally here. The issue that I am thinking of is that we use loadingDocuments as the state of whether a doc has been loaded in the form of a promise. The only time it seems acceptable to delete from this map is when we call unloadDocument. I just pushed a change that removes the Promise<Document> when unloadDocument is called.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jordangarcia I'm unsure that your change got pushed up here, can you check that?

I think that we are thinking of this mapping a bit differently. I am thinking of this map only being useful for "re-using" a fetch to the loading document. This would make it so that two clients requesting the same document while loading would wait for the same document to finish loading.

Client A fetches Doc 1
Client A waits for Doc 1 to load
Client B fetches Doc 1
Client B waits on the same promise as Client A
Doc loads
Client A syncs
Client B syncs

Whereas if I'm understanding you correctly, you want to store the promise here and use this as the source of truth for documents that are loaded or not. When this is what the documents map is meant to do.

Please let me know your thoughts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My interpretation of the code is that the this.documents map isn't quite intended to keep track of "loaded" documents. Since we insert into that map before the document is loaded. this.documents seems to represent the documents that this hocuspocus instance is managing, regardless of their loading state. Changing this.documents to represent documents to be loaded seems like a much bigger API change since many functions rely on this mapping to understand the state of the system.

The loadDocuments map would be used as the state of the loading documents and a way for createDocument to guarantee a singular doc always being returned (or a promise to the resolved value). The fact that loadDocuments sets the created document in this.documents then returns the document gaurantees referential equality between the two maps. I do think this may seem a bit fragile to code changes that wouldn't keep these two in sync.

The key issue here is that createDocument was using the this.documents map as a cache of the documents. To be consistent we always want createDocument to be Promise<Document> of the loaded document. I dont really see a way around not using loadingDocuments as a cache here for createDocument. We could add more state to the system and keep track of Map<string, boolean> for whether a document is loaded and then use this.documents instead of this.loadingDocuments but that just seems more complex to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea this makes sense to me as well. createDocument is idempotent and will always return a Promise that resolves to the singular (referential) source of truth Document.

I do see how this technically is adding more "redundant" state, but I think it's quite clear and simple.

@nperez0111 Curious to hear your thoughts!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this with @janthurau and we came up with the commit he added on top.

How we were thinking of it is that loadingDocument is just a temporary cache for documents that are still in progress of loading and that anyone who calls createDocument will just receive the same promise that is currently loading if there is one.

We also maintained the signature of Promise<Document> with Promise.resolve of this.documents.get(documentName)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good solution as well. Thanks for helping out!

Would love to get this out as soon as possible.

@janthurau janthurau merged commit bc42146 into ueberdosis:main May 17, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Connections to doc are possible while onLoadDocument is still in progress
4 participants