fpsync parts number and intermediate runs

Fab11 — Tue, 24 Oct 2023 18:57:17 -0000

Hello,

Sorry to come back with a reply so late.

I increased parts number from 1000 to 10000 but it was still way faster to generate. I then moved to 100000 and it looks to suit better. In the worst case, the last part was generated just an hour before the end of the sync (half an hour on average) for a 7h to 9h job duration.

Difficult to say if it was quicker and how much because it was a second pass anyway. So a big part of the time rsync took to copy files in the first pass was not spent again during the second pass.

To give an overview of the jobs, they were pretty much balanced. Each job generated around 1900 parts of 100000 files, like said before lasting between 7 to 9 hours, generating around 10 Go of part files. Still using 24 parallel threads. Same volumes, each one being 8 TB.

I couldn't compare a fpsync replay job versus a single rsync using a single file list. Indeed, I was able to generate lists of only modified files in the source (from the application using those data), so I ended up with an optimized list.

So, finding the right settings is not easy... But the tool is very powerfull and hugely speed up such a copy process compared to regular ones (cp, mv, rsync, robocopy, ...) so the big value-added is there anyway.

I have other suggestions regarding log files and jobs:
- It would be nice to have an option to replace the job id (run id), which looks like a timestamp, by its human-readable format YYYYDDMM-HHMMSS (or similar) or even by a custom name
- Having fpsync -l showing as well a status to quickly identify if there were errors during the run. This may be retrieved from the last lines of log/<run_id>/fpsync.log which outputs such a status?
- It would be great to replay only errors of a job, so the process would be even faster and optimized. As errors are all logged in log/<run_id>/*.stderr, I was wondering if new lists could be generated from there

What do you think about all this?

Best regards.

fpsync parts number and intermediate runs

Fab11 — Fri, 28 Jul 2023 08:21:05 -0000

Hello,

I will try that and let you know.
You are welcome for the ideas :) Thanks for your consideration.

Best regards.

fpsync parts number and intermediate runs

Fab11 — Tue, 25 Jul 2023 13:52:16 -0000

Hello,

Many thanks for your detailed answer!

You confirmed 1000 files is not enough, it's my opinion too. I realised that while looking at the top command output frequently. Parts files number passed on the command line to fpsync were way ahead of the last generated one, meaning fpart was faster generating parts than rsync running them. I will maybe try to generate bigger parts for intermediate runs.

Another question regarding jobs (thanks for the details on the usage). I was wondering what would be the most efficient or the fastest between a replay (which will then skip file system crawling as it already has all the generated part files) and a single rsync. The difference being passing a single file list to rsync (created with a cat of all part files), as I guess fpsync will read all its parts and run as many rsync as parts.
Would it make sense?

I would also like to ask you about log files (not only logs but everything related to a job in /tmp/fpsync by default), which turned to be really big in my case (more than 20 GB for a 8 TB job). Aside from disk space, there is also a quirk because many files are generated. For each part, I saw at least three files: the part itself, a stdoutput logfile and stderror logfile. As the last two of them are in the same directory, I ended up having directory listing issues (wait basically). As most, if not all, stderror files are empty is everything works well, they are somehow useless.
Would it make sense to consider adding an option to "clean" the log directory of empty files at the end of a job, or even at the end of each rsync before switching to the next one?

Thanks again.

Best regards.

fpsync parts number and intermediate runs

Ganael Laplanche — Sun, 23 Jul 2023 10:37:43 -0000

Hello,

Sorry for my delayed answer, I am just back from holidays :p

[...]
1) To not wait for big lists to be generated by fpart and start rsync right
away, I set fpsync to use parts of 1000 files (-f). But I realised the
number of parts is very huge (45,000+). I have to say data are spreaded on
several 8 TB volumes, processed one at a time and one after the other. The
server running the jobs has 48 cores and fpsync is setup to run 24 parallel
rsync (-n).
Is this a mistake? Should I use bigger parts, for instance 10,000 files
each, to reduce the total part number?

Yes, I would probably try to generate less partitions (bigger ones), but there
is no easy way to compute the ideal number :

1) If you generate too many (small) partitions, you will loose time forking
very small rsync processes.

2) Fpsync is able to start transfers as soon as a single partition has been
generated ; it generates the next ones during that transfer. If you start 24
parallel rsync jobs, you probably want to generate those 24 first partitions
as fast as possible, so you don't want them to be too big.

You have to find a good balance here. Anyway, 1000 files definitely seems to
small (IMHO).

2) I didn't find, in the article and on the web, information regarding the
intermediate synchronisations (the "middle" ones, before the final one).
How this should be handled? Just running the same fpsync command again?
Because running again existing jobs would skip the parts generation but
will not take care of potential new files in the source, so restart from
scratch seems mandatory. But it also means it will take almost as long as
the first pass?

Yes, re-running fpsync the same way will take the same time as the first pass.

To avoid crawling time, you can use fpsync's replay feature (-R) but, as you
say, it will only update known files' contents. If your files only change by
contents, this can be a good solution. If they are frequently deleted and
replaced by other ones (names are changing), that would not work as fpsync
would skip most of them. There is no easy solution here, the only way to get
new file names is to crawl the filesystem again (re-run fpsync from scratch).

For the final pass, you can have a look at fpsync's option -E that makes it
work on a directory basis and enables --delete option, but if you have very
few directories it will not work very well (it will not be able to produce
enough partitions).

Another solution could be the retrieve all parts
of a job and merge them into a single or a few files to pass to rsync to
run a few syncs instead of so many? But again, new files won't be taken
into account.

That will just replay the synchronization. If that's what you want, you
probably want to use option -E, it will be easier to handle.

The documentation regarding jobs restart/replaying lacks
details and examples in my opinion.
How would you do?

Thanks for that feedback, I'll add that to my TODO list.

Here is a small example :

$ fpsync -l
<=== Listing runs

Nothing has been run here.

$ fpsync -n 2 /usr/src/ /var/tmp/src/

That commands starts a first run...

$ fpsync -l
<=== Listing runs
===> Run: ID: 1690106684-1860, status: replayable (synchronization complete,
use -R to replay)

...that becomes replayable afterwards with that command :

$ fpsync -R -r 1690106684-1860

You can prepare a run (i.e. not run rsync commands but just generate jobs)
by adding -p to the initial command :

$ fpsync -p -n 2 /usr/src/ /var/tmp/src/
1690106994 <=== Successfully prepared run: 1690106992-4040
$ fpsync -l
<=== Listing runs
===> Run: ID: 1690106684-1860, status: replayable (synchronization complete,
use -R to replay)
===> Run: ID: 1690106992-4040, status: resumable (synchronization not
complete, use -r to resume)

It then becomes resumable and can be started that way :

$ fpsync -r 1690106992-4040

A side note : you probably want to use current fpsync's git version as a bug
has been fixed regarding resume/replay :

https://github.com/martymac/fpart/commit/
be14d1c172daca70a2502a231e75d72f9e398265

Hope this helps,
Best regards,
(and thanks for your interest in fpart/fpsync !)

Ganael.

fpsync parts number and intermediate runs

Fab11 — Wed, 19 Jul 2023 18:52:31 -0000

Hello,

I have to migrate a large amount of data to a new storage array (around 100 TB), which is mostly composed of small files (max 100 KB) located into a 7 directory deep structure. The transfert to the new storage array has to be achieved through a SMB/CIFS share. So, Samba and small files, the worst case scenario...
Using rsync or robocopy would take ages. I discovered fpart and fpsync by accident and it turned out to be the magic solution according to my speed tests. It would let me copy these data in two weeks instead of six months.

I read with lot of interest the article "Parallélisez vos transferts de fichiers" (I'm french) but I would need some advises on two subjects:

1) To not wait for big lists to be generated by fpart and start rsync right away, I set fpsync to use parts of 1000 files (-f). But I realised the number of parts is very huge (45,000+). I have to say data are spreaded on several 8 TB volumes, processed one at a time and one after the other. The server running the jobs has 48 cores and fpsync is setup to run 24 parallel rsync (-n).
Is this a mistake? Should I use bigger parts, for instance 10,000 files each, to reduce the total part number?

2) I didn't find, in the article and on the web, information regarding the intermediate synchronisations (the "middle" ones, before the final one). How this should be handled? Just running the same fpsync command again? Because running again existing jobs would skip the parts generation but will not take care of potential new files in the source, so restart from scratch seems mandatory. But it also means it will take almost as long as the first pass? In my case, files are so small I doubt rsync'ing them or not will make a change.
Another solution could be the retrieve all parts of a job and merge them into a single or a few files to pass to rsync to run a few syncs instead of so many? But again, new files won't be taken into account.
The documentation regarding jobs restart/replaying lacks details and examples in my opinion.

How would you do?

Thanks for your help and advise.

fpart and -x exclusions

Ganael Laplanche — Thu, 04 Jul 2019 10:06:00 -0000

You're welcome. Thanks for using fpart!

fpart and -x exclusions

Brett Worth — Wed, 03 Jul 2019 21:49:20 -0000

Thanks for that Ganael.

It must just be just the number of files swamping my metedata server even with the exclusions.

fpart and -x exclusions

Ganael Laplanche — Tue, 02 Jul 2019 11:33:17 -0000

Hi Brett,

That should not be the case: when excluding a pattern, fpart sets FTS_SKIP option to avoid stat()ing useless files. See:

https://sourceforge.net/p/fpart/code/ci/master/tree/src/file_entry.c#l779

You can test that behaviour by creating a small set of files and dirs and truss (or strace) the process :

$ find .
.
./level1-dir1
./level1-dir1/level2-file2
./level1-dir1/level2-file1
./level1-dir1/level2-dir2
./level1-dir1/level2-dir1
./level1-dir1/level2-dir1/level3-file2
./level1-dir1/level2-dir1/level3-dir2
./level1-dir1/level2-dir1/level3-dir1
./level1-dir1/level2-dir1/level3-file1
./level1-dir2
./level1-file1
./level1-file2

$ truss -f fpart -n 1 -x level2-dir1 . 2>&1 | grep fstatat
58347: fstatat(AT_FDCWD,".",{ mode=drwxr-xr-x ,inode=30429,size=6,blksize=131072 },AT_SYMLINK_NOFOLLOW) = 0 (0x0)
58347: fstatat(AT_FDCWD,"level1-dir1",{ mode=drwxr-xr-x ,inode=30435,size=6,blksize=131072 },AT_SYMLINK_NOFOLLOW) = 0 (0x0)
58347: fstatat(AT_FDCWD,"level1-dir2",{ mode=drwxr-xr-x ,inode=30436,size=2,blksize=131072 },AT_SYMLINK_NOFOLLOW) = 0 (0x0)
58347: fstatat(AT_FDCWD,"level1-file1",{ mode=-rw-r--r-- ,inode=30444,size=0,blksize=131072 },AT_SYMLINK_NOFOLLOW) = 0 (0x0)
58347: fstatat(AT_FDCWD,"level1-file2",{ mode=-rw-r--r-- ,inode=30445,size=0,blksize=131072 },AT_SYMLINK_NOFOLLOW) = 0 (0x0)
58347: fstatat(AT_FDCWD,"level2-file2",{ mode=-rw-r--r-- ,inode=30440,size=0,blksize=131072 },AT_SYMLINK_NOFOLLOW) = 0 (0x0)
58347: fstatat(AT_FDCWD,"level2-file1",{ mode=-rw-r--r-- ,inode=30439,size=0,blksize=131072 },AT_SYMLINK_NOFOLLOW) = 0 (0x0)
58347: fstatat(AT_FDCWD,"level2-dir2",{ mode=drwxr-xr-x ,inode=30438,size=2,blksize=131072 },AT_SYMLINK_NOFOLLOW) = 0 (0x0)
58347: fstatat(AT_FDCWD,"level2-dir1",{ mode=drwxr-xr-x ,inode=30437,size=6,blksize=131072 },AT_SYMLINK_NOFOLLOW) = 0 (0x0)

In the above example, you can see that nothing has been stat()ed within level2-dir1.

Hope this helps,

Best regards,

Ganael.

fpart and -x exclusions

Brett Worth — Tue, 02 Jul 2019 06:32:11 -0000

Hi again. The problem I now have is that the source filesystem contain 160M files and is on a lustre version that predates DNE so there is a single metadata server. The metadata is a real bottleneck.

I have done an initial copy which took 8 days for 986TB which is fine but now that I am doing another sync it looks like it's going to take a few days for the fpart to complete. I could reduce this by about 90% if I could just exclude a couple of directories. The pattern limitation that you descibe above would be fine but I now suspect that even though fpart is getting a match on the directory and excluding it, it is still going down into the subtree and getting a match on a path component for everything under there. In otherwords it will still be looking at all 160M files and excluding most of them.

Am I wrong about that assumption? It's been running for more than 32 hours now even with the names of those two directories excluded.

fpart and -x exclusions

Ganael Laplanche — Mon, 01 Jul 2019 10:07:49 -0000

Hi Brett,

Thanks :)

Options -x/-X/-y/-Y only matches files or directory names, not their path (like find's option -name).

Regards,
Ganael.

Recent posts to Discussion

fpsync parts number and intermediate runs

fpsync parts number and intermediate runs

fpsync parts number and intermediate runs

fpsync parts number and intermediate runs

fpsync parts number and intermediate runs

fpart and -x exclusions

fpart and -x exclusions

fpart and -x exclusions

fpart and -x exclusions

fpart and -x exclusions