Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache get_decoder_class #1833

Closed
thewhaleking opened this issue May 1, 2024 · 3 comments · Fixed by #1834
Closed

Cache get_decoder_class #1833

thewhaleking opened this issue May 1, 2024 · 3 comments · Fixed by #1834
Assignees

Comments

@thewhaleking
Copy link
Contributor

thewhaleking commented May 1, 2024

Currently, a large portion of time (29000+ calls for get_delegates) for RPC calls is taken up by decoding. Profiling this shows that the vast majority of this decoding time is actually caused by calls to the scalecodec.base.RuntimeConfiguration.get_decoder_class method.

Because of the fairly limited number of decoder classes, we should be able to cache this with functools.cache to see large speed improvements.

@thewhaleking thewhaleking self-assigned this May 1, 2024
@thewhaleking
Copy link
Contributor Author

Because the decoding is being done by a third-party library (scalecodec), we will have to monkey-patch in a functools.cache call like so:

import functools
from scalecodec import base as scalecodec_base
import bittensor as bt

original_get_decoder_class = scalecodec_base.RuntimeConfiguration.get_decoder_class


@functools.cache
def patched_get_decoder_class(self, type_string):
    return original_get_decoder_class(self, type_string)


scalecodec_base.RuntimeConfiguration.get_decoder_class = patched_get_decoder_class

sub = bt.subtensor("finney")
sub.get_delegates()

With this, we can see a reduction of calls (for get_delegates) of the get_decoder_class method from 94,542 to 332.

In real-world performance, we see that the entire execution time for this script improves by ~48% with the patch implemented. Note that this only involves running five times each, so results may vary with times of day, ping, etc.:

Original Patched Run
3.754099130630493 2.3359689712524414 0
4.5155346393585205 2.0710458755493164 1
4.193195104598999 2.0840811729431152 2
4.192205905914307 2.0497679710388184 3
3.9943289756774902 2.2274067401885986 4
4.129872751235962 2.153654146194458 average

I believe implementing this in the code base will drastically reduce overall decode time.

@thewhaleking
Copy link
Contributor Author

thewhaleking commented May 2, 2024

@RomanCh-OT had some concerns about potential memory leak caused by caching, so I investigated.

Uncached
Image

Cached
Image

Given the way the functools.lru_cache works, we should never run into a situation where this would ever be a problem. The memory sizes are nearly identical when using the cache vs not doing so. Note the times being slower than normally stated in these images are due to my own network latency.

@thewhaleking
Copy link
Contributor Author

Opened polkascan/py-scale-codec#117 to add the caching ability and functionality to the scalecodec library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant