Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add skip conform to schema #1572

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

vasily-polonsky
Copy link

@vasily-polonsky vasily-polonsky commented Feb 8, 2023

Summary:

The library shows poor performance when you need to query a large number of items from the database, especially if they have a nested structure. This is especially true for serverless, where the environment is not a cluster/EC2 instance and the CPU is usually limited. For example, during development on a local machine the operation of scanning 250 items with dynamoose takes about 3 seconds, in a lambda with 0.1 vCPU it takes over 30 seconds. This makes it not possible to use on production. Using aws-sdk it takes 600ms for the same requests in both environments (local and serverless).

The problem is that dynamoose consumes huge CPU resources during the confromToSchema method call, so the solution I propose is to make it possible to skip this method call during large queries where a large number of items must be returned. Mongoose has a solution to this problem with lean. Lean works a little differently, but the idea is the same - make it possible to handle a large number of items with less performance degradation. Meanwhile, I think the confromToSchema method itself needs to be refactored.

I haven't written documentation for the skipConformToSchema method, because I think we should first agree on the name and whether it's appropriate to introduce this method.

GitHub linked issue:

#1260
https://stackoverflow.com/questions/74356120/slow-on-fetching-records-from-dynamodb-table

#----

Other information:

Type (select 1):

  • Bug fix
  • [ X ] Feature implementation
  • Documentation improvement
  • Testing improvement
  • Test added to report bug (GitHub issue #---- @---)
  • Something not listed here

Is this a breaking change? (select 1):

  • 🚨 YES 🚨
  • [ X ] No
  • I'm not sure

Is this ready to be merged into Dynamoose? (select 1):

  • [ X ] Yes
  • No

Are all the tests currently passing on this PR? (select 1):

  • [ X ] Yes
  • No

Other:

  • [ X ] I have read through and followed the Contributing Guidelines
  • [ X ] I have searched through the GitHub pull requests to ensure this PR has not already been submitted
  • I have updated the Dynamoose documentation (if required) given the changes I made
  • I have added/updated the Dynamoose test cases (if required) given the changes I made
  • [ X ] I agree that all changes made in this pull request may be distributed and are made available in accordance with the Dynamoose license
  • [ X ] All of my commits and commit messages are detailed, explain what changes were made, and are easy to follow and understand
  • I have filled out all fields above

@github-actions
Copy link
Contributor

github-actions bot commented Feb 8, 2023

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@vasily-polonsky
Copy link
Author

I dedicate any and all copyright interest in this software to the public domain. I make this dedication for the benefit of the public at large and to the detriment of my heirs and successors. I intend this dedication to be an overt act of relinquishment in perpetuity of all present and future rights to this software under copyright law.

@github-actions
Copy link
Contributor

This pull request is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 3 days.

@nopol10
Copy link

nopol10 commented Sep 16, 2023

Hi, any chance for this or something similar to get merged soon? 🙂
The >5-10 times speed improvement is sorely needed as dynamoose becomes unusable in parts of my application 😢
I'm open to helping out in any way if needed

@fishcharlie
Copy link
Member

@nopol10 & @vasily-polonsky I'm debating this one a lot. I get the performance benefit. But I then question what the value is in Dynamoose. The whole idea of Dynamoose is to have strict schemas for your DynamoDB data.

I'm curious if either of you could explain why with this PR you still choose Dynamoose over numerous other options out there? It seems like this is getting rid of a huge benefit and purpose of Dynamoose. So I'm curious about your thoughts and how you are viewing that.

Regardless, this PR needs unit tests.

@vasily-polonsky
Copy link
Author

vasily-polonsky commented Sep 17, 2023

Hi @fishcharlie
I understand your concerns, but there is another side to it. When I use Dynamoose to work with DDB tables, I'm sure the data there is correct, and Dynamoose works really well, if you need to fetch one record or say 20, the performance issue is not critical. But sometimes I need to fetch 300-500 records (sometimes it's actually a whole table and I need to do some aggregation or calculations) and this is where problems arise. If I know for sure that the data is correct, can I have a flag to skip this check to improve my response time? This issue is really critical for serverless environments, where the vCPU is usually lower than in a kubernetes pod.

Regarding the skipConformToSchema flag, I see that there is another PR with a similar idea #1586 . I would suggest to compare the performance of these two solutions and choose the best one

@nopol10
Copy link

nopol10 commented Sep 18, 2023

Yup, as @vasily-polonsky says, there are cases where I am certain that the schema will be correct so I will not be worried about non-conformance.

In addition, dynamoose provides an API that makes dynamodb related code easier to write and maintain so I'll like to have retain that if possible with a small code change rather than rewriting the query to use the dynamodb ask directly.

@ThienNDDi
Copy link

hi, @fishcharlie it has been a while since this PR is opened. Sadly, i think i got this issue too.
I used scan a table having more than 200 record. Immediately, any request to that table spent nearly 25s.
I hope this will be merge soon.

@fishcharlie
Copy link
Member

This can't be merged until unit tests are added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants