-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Store metadata of known files and folders in memory to check against in recursive upload #149
base: master
Are you sure you want to change the base?
Conversation
osfclient/models/storage.py
Outdated
@@ -67,34 +71,52 @@ def create_file(self, path, fp, update=False): | |||
directories = directory.split(os.path.sep) | |||
# navigate to the right parent object for our file | |||
parent = self | |||
if os.path.dirname(path) not in self.known_folder_set: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, you're right, missed this one. I recently switched from Vim to mostly VSCode, and I learned the hard way that there's a "diffEditor.ignoreTrimWhitespace"
setting with a default of true
(terrible default in my opinion), which messed things up when I did line-by-line staging.
osfclient/models/storage.py
Outdated
@@ -35,6 +37,8 @@ def _update_attributes(self, storage): | |||
self._new_folder_url = self._get_attribute(storage, | |||
'links', 'new_folder') | |||
self._new_file_url = self._get_attribute(storage, 'links', 'upload') | |||
self.known_file_dict = dict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd call these self.known_files
and self.known_folders
. To avoid having a dict and a set, can you store them both in self.known_paths = dict()
where for folders we use None
as value?
Performance wise testing for membership in a set
and a dict
should be ~same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that does seem a little cleaner. Will do.
Looks like a good approach. Some nitpicks as inline comments. |
@betatim My latest commit implements your suggestion of using a single
as you mentioned in #148, since I'm still not sure what that would look like. I still need to add tests as well. Any advice you have would be appreciated. |
Conflicts: osfclient/cli.py osfclient/models/storage.py osfclient/tests/mocks.py
Codecov Report
@@ Coverage Diff @@
## master #149 +/- ##
==========================================
+ Coverage 92.17% 92.47% +0.29%
==========================================
Files 13 13
Lines 588 611 +23
==========================================
+ Hits 542 565 +23
Misses 46 46
Continue to review full report at Codecov.
|
I've got what I think are a reasonable set of tests for this. Now I just want to do some time tests to make sure this is practically doing what I want. |
I did some quick tests on uploading a single folder with 10 empty files. Uploading the directory to a blank project on osf.io with the command
took about 45 seconds for me. Deleting those files on osf.io and uploading with
reduced the time to about 28 seconds. Next I tried doing recursive force overwrites of those files. The command
took about 80 seconds, while
took about 40 seconds. Also My only question for @betatim and/or @ctb and/or anyone else interested in weighing in is whether we should make the cache option True by default for recursive uploads. Are there some tests you'd want to see before making that call? |
Also, can I propose bumping the version and pushing to pypi after this version? |
are there any news on this PR? I am trying to upload data using |
@sappelhoff you can install @benlindsay version with |
Fixes #148
Not sure if this is the right way to do it, but to test this branch out, I needed the fix from #147, so this builds on that even though it isn't merged yet. This is my preliminary attempt to speed up recursive uploads when lots of files are present, although it doesn't address @betatim's concern of
from #148.