Add skip conform to schema #1572

vasily-polonsky · 2023-02-08T16:40:40Z

Summary:

The library shows poor performance when you need to query a large number of items from the database, especially if they have a nested structure. This is especially true for serverless, where the environment is not a cluster/EC2 instance and the CPU is usually limited. For example, during development on a local machine the operation of scanning 250 items with dynamoose takes about 3 seconds, in a lambda with 0.1 vCPU it takes over 30 seconds. This makes it not possible to use on production. Using aws-sdk it takes 600ms for the same requests in both environments (local and serverless).

The problem is that dynamoose consumes huge CPU resources during the confromToSchema method call, so the solution I propose is to make it possible to skip this method call during large queries where a large number of items must be returned. Mongoose has a solution to this problem with lean. Lean works a little differently, but the idea is the same - make it possible to handle a large number of items with less performance degradation. Meanwhile, I think the confromToSchema method itself needs to be refactored.

I haven't written documentation for the skipConformToSchema method, because I think we should first agree on the name and whether it's appropriate to introduce this method.

GitHub linked issue:

#1260
https://stackoverflow.com/questions/74356120/slow-on-fetching-records-from-dynamodb-table

#----

Other information:

Type (select 1):

Bug fix
[ X ] Feature implementation
Documentation improvement
Testing improvement

Test added to report bug (GitHub issue #---- @---)
Something not listed here

Is this a breaking change? (select 1):

🚨 YES 🚨
[ X ] No
I'm not sure

Is this ready to be merged into Dynamoose? (select 1):

[ X ] Yes
No

Are all the tests currently passing on this PR? (select 1):

[ X ] Yes
No

Other:

[ X ] I have read through and followed the Contributing Guidelines
[ X ] I have searched through the GitHub pull requests to ensure this PR has not already been submitted
I have updated the Dynamoose documentation (if required) given the changes I made
I have added/updated the Dynamoose test cases (if required) given the changes I made
[ X ] I agree that all changes made in this pull request may be distributed and are made available in accordance with the Dynamoose license
[ X ] All of my commits and commit messages are detailed, explain what changes were made, and are easy to follow and understand
I have filled out all fields above

github-actions · 2023-02-08T16:40:59Z

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

vasily-polonsky · 2023-02-08T16:45:21Z

I dedicate any and all copyright interest in this software to the public domain. I make this dedication for the benefit of the public at large and to the detriment of my heirs and successors. I intend this dedication to be an overt act of relinquishment in perpetuity of all present and future rights to this software under copyright law.

github-actions · 2023-02-16T01:51:53Z

This pull request is stale because it has been open 7 days with no activity. Remove stale label or comment or this will be closed in 3 days.

nopol10 · 2023-09-16T03:01:06Z

Hi, any chance for this or something similar to get merged soon? 🙂
The >5-10 times speed improvement is sorely needed as dynamoose becomes unusable in parts of my application 😢
I'm open to helping out in any way if needed

fishcharlie · 2023-09-17T17:03:23Z

@nopol10 & @vasily-polonsky I'm debating this one a lot. I get the performance benefit. But I then question what the value is in Dynamoose. The whole idea of Dynamoose is to have strict schemas for your DynamoDB data.

I'm curious if either of you could explain why with this PR you still choose Dynamoose over numerous other options out there? It seems like this is getting rid of a huge benefit and purpose of Dynamoose. So I'm curious about your thoughts and how you are viewing that.

Regardless, this PR needs unit tests.

vasily-polonsky · 2023-09-17T19:05:15Z

Hi @fishcharlie
I understand your concerns, but there is another side to it. When I use Dynamoose to work with DDB tables, I'm sure the data there is correct, and Dynamoose works really well, if you need to fetch one record or say 20, the performance issue is not critical. But sometimes I need to fetch 300-500 records (sometimes it's actually a whole table and I need to do some aggregation or calculations) and this is where problems arise. If I know for sure that the data is correct, can I have a flag to skip this check to improve my response time? This issue is really critical for serverless environments, where the vCPU is usually lower than in a kubernetes pod.

Regarding the skipConformToSchema flag, I see that there is another PR with a similar idea #1586 . I would suggest to compare the performance of these two solutions and choose the best one

nopol10 · 2023-09-18T00:05:19Z

Yup, as @vasily-polonsky says, there are cases where I am certain that the schema will be correct so I will not be worried about non-conformance.

In addition, dynamoose provides an API that makes dynamodb related code easier to write and maintain so I'll like to have retain that if possible with a small code change rather than rewriting the query to use the dynamodb ask directly.

ThienNDDi · 2023-12-28T04:41:40Z

hi, @fishcharlie it has been a while since this PR is opened. Sadly, i think i got this issue too.
I used scan a table having more than 200 record. Immediately, any request to that table spent nearly 25s.
I hope this will be merge soon.

fishcharlie · 2023-12-28T16:00:02Z

This can't be merged until unit tests are added.

add skip conform to schema

0765176

github-actions bot added a commit that referenced this pull request Feb 8, 2023

@vasily-polonsky has signed the CLA from Pull Request #1572

13a0a9c

github-actions bot added the status:stale label Feb 16, 2023

fishcharlie added status:not-stale and removed status:stale labels Feb 16, 2023

fishcharlie mentioned this pull request Apr 15, 2023

add readStrict option #1586

Closed

20 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add skip conform to schema #1572

Add skip conform to schema #1572

vasily-polonsky commented Feb 8, 2023 •

edited

github-actions bot commented Feb 8, 2023 •

edited

vasily-polonsky commented Feb 8, 2023

github-actions bot commented Feb 16, 2023

nopol10 commented Sep 16, 2023

fishcharlie commented Sep 17, 2023

vasily-polonsky commented Sep 17, 2023 •

edited

nopol10 commented Sep 18, 2023

ThienNDDi commented Dec 28, 2023

fishcharlie commented Dec 28, 2023

Add skip conform to schema #1572

Are you sure you want to change the base?

Add skip conform to schema #1572

Conversation

vasily-polonsky commented Feb 8, 2023 • edited

Summary:

GitHub linked issue:

Other information:

Type (select 1):

Is this a breaking change? (select 1):

Is this ready to be merged into Dynamoose? (select 1):

Are all the tests currently passing on this PR? (select 1):

Other:

github-actions bot commented Feb 8, 2023 • edited

vasily-polonsky commented Feb 8, 2023

github-actions bot commented Feb 16, 2023

nopol10 commented Sep 16, 2023

fishcharlie commented Sep 17, 2023

vasily-polonsky commented Sep 17, 2023 • edited

nopol10 commented Sep 18, 2023

ThienNDDi commented Dec 28, 2023

fishcharlie commented Dec 28, 2023

vasily-polonsky commented Feb 8, 2023 •

edited

github-actions bot commented Feb 8, 2023 •

edited

vasily-polonsky commented Sep 17, 2023 •

edited