Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Geo-rep transfers large files multiple times while using Tar+ssh #4321

Open
harikrishnannm opened this issue Mar 20, 2024 · 3 comments

Comments

@harikrishnannm
Copy link

harikrishnannm commented Mar 20, 2024

Description of problem:

I am using glusterfs 6.6 version with georeplication enabled. Geo-rep uses Tar+ssh to transfer the files from primary site to secondary site.

When large file are uploaded to primary site, it is being sent multiple time.

Example, when a 92 GB file was uploaded, it took more than 20 hours to complete. On checking this, its found that the same file got recorded in multiple changelog files, 127 changelogs to be exact, this caused the file to be sent 127 times to the secondary site, which effectively increase the datatransfer to be around 11 TB (92 GB x 127 = 11684 GB ~ 11 TB).

what if we make the changes in the master files (geo-replication/syncdaemon/master.py) something like this :

diff --git a/geo-replication/syncdaemon/master.py b/geo-replication/syncdaemon/master.py
index f02cdb4c7f..430f8fff29 100644
--- a/geo-replication/syncdaemon/master.py
+++ b/geo-replication/syncdaemon/master.py
@@ -1281,6 +1281,7 @@ class GMasterChangelogMixin(GMasterCommon):
             self.batch_stats["DATA_START_TIME"] = time.time()
 
         # sync data
+        datas = datas - self.datas_in_batch
         if datas:
             self.a_syncdata(datas)
             self.datas_in_batch.update(datas)

The exact command to reproduce the issue:

create a file using the below command on primary site:

dd if=/dev/zero of=/path/to/file bs=32k count=2949120

It will take some time for the file to be created, which in turn will be recorded in multiple changelogs.

Additional info:

Commands used in georeplication (snippet from geo-replication/syncdaemon/resource.py file)

At primary side:

tar_cmd = ["tar"] + \
-    ["--sparse", "-cf", "-", "--files-from", "-"]
+    ["--record-size=128K", "-cf", "-", "--files-from", "-"]

At secondary side

ssh_cmd = gconf.get("ssh-command").split() + \
    gconf.get("ssh-options-tar").split() + \
    ["-p", str(gconf.get("ssh-port"))] + \
    [host, "tar"] + \
-    ["--overwrite", "-xf", "-", "-C", rdir]
+    ["--overwrite", "--record-size=128K","-xf", "-", "-C", rdir]

Other info:

Volume Name: xxxx
Type: Distributed-Replicate
Volume ID: xxxxx
Status: Started
Snapshot Count: 0
Number of Bricks: 30 x 3 = 90
Transport-type: tcp

For geo-replication we are using cert based unprivileged user (geo-rep-user)

sync_method:tarssh

- The operating system / glusterfs version:

OS: Ubuntu Focal
Glusterfs version : 6.6

@pranithk
Copy link
Member

@aravindavk Do you see any issue in the fix suggested? We are also going to upstream the changes he mentioned for tar. We achieved 15X speed up using the --record-size=128K

I also verified that the issue is present in the latest release.

@aravindavk
Copy link
Member

Removing duplicate data entries looks good. But it will not solve the duplicate issues completely. Geo-rep processes the changeling files in a loop. If the changes recorded in different batches then there is a chance that it gets picked up in the next batch processing. I think it is a good patch since it reduces the chances.

--record-size=128K looks good.

--sparse option is required to handle the sparse files seperately. If it is removed only for testing the file created using dd then fine (Patch mentioned in the issue removed this option).

@pranithk
Copy link
Member

Removing duplicate data entries looks good. But it will not solve the duplicate issues completely. Geo-rep processes the changeling files in a loop. If the changes recorded in different batches then there is a chance that it gets picked up in the next batch processing. I think it is a good patch since it reduces the chances.

Correct. This is only an optimisation.

--record-size=128K looks good.

We will send a PR for this as well.

--sparse option is required to handle the sparse files seperately. If it is removed only for testing the file created using dd then fine (Patch mentioned in the issue removed this option).

Our application doesn't produce sparse files. To get the correct fix we have to upgrade, so we went with this approach until upgrade.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants