mpiFileUtils - File utilities designed for scalability and performance.



mpiFileUtils provides both a library called libmfu and a suite of MPI-based tools to manage large datasets, which may vary from large directory trees to large files. High-performance computing users often generate large datasets with parallel applications that run with many processes (millions in some cases). However those users are then stuck with single-process tools like cp and rm to manage their datasets. This suite provides MPI-based tools to handle typical jobs like copy, remove, and compare for such datasets, providing speedups of up to 20-30x. It also provides a library that simplifies the creation of new tools or can be used in applications.

Documentation is available on ReadTheDocs.

DAOS Support

mpiFileUtils supports a DAOS backend for dcp, dsync, and dcmp. Custom serialization and deserialization for DAOS containers to and from a POSIX filesystem is provided with daos-serialize and daos-deserialize. Details and usage examples are provided in DAOS Support.


We welcome contributions to the project. For details on how to help, see our Contributor Guide


Copyright (c) 2013-2015, Lawrence Livermore National Security, LLC. Produced at the Lawrence Livermore National Laboratory CODE-673838

Copyright (c) 2006-2007,2011-2015, Los Alamos National Security, LLC. (LA-CC-06-077, LA-CC-10-066, LA-CC-14-046)

Copyright (2013-2015) UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the Department of Energy.

Copyright (c) 2015, DataDirect Networks, Inc.

All rights reserved.

  dcp: support DAOS

    dcp: support DAOS

    - Use mfu_file_t struct to wrap fd vs obj
    - create DAOS specific io functions
    - Create mfu_file_* functions that call DAOS backend
    - Add DAOS I/O functions to common library
    - call daos io backend in dcp
    - Add FindCART.cmake and FindDAOS.cmake
    - Add ifdef to DAOS specific code
    - create mfu_file in all tools for walk

    Signed-off-by: Danielle Sikich [email protected]

    opened by dsikich 32
  Dcmp improve

    Dcmp improve

    Here are some minor improvements for dcmp. "dcmp: improve the usage() dcmp: print the matched count of each conjunctions group in summary" These two are used to improve the usage and summary log, in order to make it more friendly to users.

    "dcmp: record the rawtime directly rather than converting from wtime" This patch uses localtime() to record the time rather than MPI_Wtime (with additional conversion), to avoid some unexpected condition, e.g. time overflow.

    "dcmp: skip the size differ check for dir" This commit remove the size differ check for dir, as we are focusing the size check of files, not dirs, and on the other hand, size of dir may be different even contained files are the same, depends on the fs.

    Please refer to the commits for details.

    opened by KnightKu 29
  dcp and dsync: add --dereference and --no-dereference

    dcp and dsync: add --dereference and --no-dereference

    Support DAOS --preserve

    Added new IO functions:

    • llistxattr
      • mfu_file_llistxattr
      • mfu_llistxattr
      • daos_llistxattr
    • lgetxattr
      • mfu_file_lgetxattr
      • mfu_lgetxattr
      • daos_lgetxattr
    • lsetxattr
      • mfu_file_lsetxattr
      • mfu_lsetxattr
      • daos_lsetxattr
    • lchown
      • mfu_file_lchown
      • mfu_lchown
      • daos_lchown
        • DFS has owner and group permissions only at the container level so this is just a placeholder to avoid errors.
    • utimensat
      • mfu_file_utimensat
      • mfu_utimensat
      • daos_utimensat

    #closes #404

    Implement --dereference and --no-dereference


    • Added dereference option
      • default 0


    • Added no-dereference option
      • default 0

    mfu_flist_walk.c -> walk_stat_process

    • Added DEREFERENCE global, similar to existing globals
    • If DEREFERENCE call stat
      • Effectively follows symlinks to files/dirs pointed to
    • Otherwise call lstat
    • Added error message for failed stat/lstat


    • Altered daos_hash_lookup to set errno on error
      • Adjusted callers to account for this
    • Fixed missing parent lookup in daos_access
    • Added mfu_file_faccessat, mfu_faccessat, daos_faccessat
      • Parameters allow for not following symlinks
    • Renamed daos_stat to daos_lstat to be more accurate
    • Added mfu_file_stat, mfu_stat, daos_stat
      • Similar to lstat, but follows symlinks
      • daos_stat is a WIP, pending DFS library changes


    • Explicitly initialize some DAOS structs to silence erroneous valgrind errors.

    mfu_param_path.c -> mfu_param_path_check_copy

    • Added no_dereference parameter
    • If no_dereference, call faccessat(..., AT_SYMLINK_NOFOLLOW)
      • Essentially allows broken links to be copied
    • Otherwise, call access


    • Added --dereference
      • "-L"
    • Added --no-dereference
      • "-l" (short name hidden)


    • Added --dereference
      • "-L"
    • Added --no-dereference
      • "-x" (short name hidden)


    • Added --dereference

    #closes #29

    Signed-off-by: Dalton Bohning [email protected]

    opened by daltonbohning 28
  daos: add support for copying daos containers at obj level

    daos: add support for copying daos containers at obj level

    This adds support for copying any type of daos container in parallel.

    * add obj_id_lo and obj_id_hi to elem_t struct
    * add total_oids field to flist
    * add a pack and unpack call for obj_id_lo and obj_id_hi
    * create mfu_flist_global_oid_size to return total oids
    * add boolean is_posix_copy for non-posix copies
    * adds daos_obj_copy function and helper functions

    This copy starts by retrieving all obj ids on rank 0, then it uses the mfu_flist_spread function to distribute all obj ids evenly across ranks. After the obj ids are distributed, the obj ids are copied using the local flist.

    Also, daos will split the anchor used to retrive all obj ids and enumerate the dkeys, akeys, and records. There is work that still needs to be done from the daos level to split the anchor for the object iterator table used to retrieve obj ids. In the case that a large record extent needs to be copied, daos already chunks the record extents using their index ranges. Each record is of a fixed size, where multiple records is a record extent.

    Signed-off-by: Danielle Sikich [email protected]

    opened by dsikich 23
  dwalk: add default file histogram

    dwalk: add default file histogram

    When file_histogram option is used this creates a file histogram of all walked files. This option does not require any range inputs. It will give the user a count of files in each bin. The bins are created based on the max file size. So if the max file size is between 2^20 - 2^30, then this is the last bin. The bins start with the first bin for 0 byte files, then goes up from 2^10, 2^20, 2^30, etc. The bins are created dynamically based on the max file size.

    opened by dsikich 22
  daos: add daos-serialize and daos-deserialize tools

    daos: add daos-serialize and daos-deserialize tools

    * add optional HDF5_SUPPORT to build
    * add helper functions to mfu_daos
    * daos-serialize serializes daos containers into hdf5
      files, for long term storage on a POSIX filesystem.
    * daos-deserialize deserializes an hdf5 file by
      restoring it back into a new daos container

    daos-deserialize can only be used when daos-serialize was the tool used to create the hdf5 file. The serialize tool evenly distributes the containers object ids among ranks, and each rank then writes to its own HDF5 file. The deserialize tool evenly distributes the HDF5 files among ranks.

    examples: mpirun -np 2 daos-serialize -v /pool/cont mpirun -np 2 daos-deserialize -v --pool h5file h5file ...

    TODO: add obj level statistics after dsync obj level support PR is merged TODO: investigate using libcircle for distribution of files among ranks in daos-deserialize

    Signed-off-by: Danielle Sikich [email protected]

    opened by dsikich 20
  feeding binary cache file from dfind or dwalk to drm causes seg fault

    feeding binary cache file from dfind or dwalk to drm causes seg fault

    Tried with both 0.10 and current master. Same behavior. Host is Centos 7

    1. gcc/8.2.0 2) openmpi/4.0.3 3) cmake/3.13.2

    Trying to build lists of files for scratch purge and I'm having trouble passing data between tools like dfind and dwalk. I notice if I strings the cache file it's storing much more data than the single entry my test data should. Though not knowing the format not sure but I'm seeing file paths outside the path i'm running in. If I use text output or print to screen the results are what I would expect.

    mpirun -np 4 ~/mpifileutils/build/src/dwalk/dwalk --output /tmp/output -p .

     mpirun -np 4 ~/mpifileutils/build/src/drm/drm --input /tmp/output --dryrun
    [2020-05-24T20:14:01] Reading from input file: /tmp/output
    [gl-login1:58972:0:58972] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
    ==== backtrace (tid:  58972) ====
     0 0x0000000000010e52 mfu_flist_compute_depth()  ???:0
     1 0x0000000000021efd list_elem_decode()  mfu_flist_io.c:0
     2 0x0000000000022325 list_insert_decode()  mfu_flist_io.c:0
     3 0x0000000000022838 read_cache_variable()  mfu_flist_io.c:0
     4 0x0000000000023ea7 mfu_flist_read_cache()  ???:0
     5 0x0000000000401d18 main()  ???:0
     6 0x0000000000022505 __libc_start_main()  ???:0
     7 0x00000000004013c9 _start()  ???:0
    opened by brockpalen 17
  drm: add traceless mode support

    drm: add traceless mode support

    Add traceless mode support to drm (-T/--traceless), if the option is enabled, unlink file/dir/link will not change its parent dir's timestamp (atiem/mtime), it is useful if some guy want to remove the extra files under dst dir after a dcp, but not change the timestamp of the dst filesystem tree, in order to keep the same as src.

    Signed-off-by: Gu Zheng [email protected]

    opened by KnightKu 17
  Reporting job progress during DCP transfer

    Reporting job progress during DCP transfer

    During our data migration from ORNL Spider 1 file system to Spider 2, it is not uncommon to have a transfer job that last more than a day. One issue user/Ops guys wonders is: how much has been transferred so far, and how much is left? Is there a solution to this problem? Thanks.

    question usecase wish list 
    opened by fwang2 16
  Make chunksize/blocksize runtime option

    Make chunksize/blocksize runtime option

    This changeset makes both chunksize and blocksize to be a runtime option instead of a compile time option. It provides two command line options:

    -k chunksize (MB unit, default is 1MB as it is) -b blockxize (MB unit, default is 1MB as it is)

    In my test, with 500GB dataset with 5 100GB file (large file case), using default setting, I am getting 392 MB/s average transfer speed in 3 test runs using runtime option -k 16 -b 4, I got 709 MB/s average transfer speed in 3 test runs.

    So the improvement seems to be tangible, and changes to the system to be minimal. ORNL is putting these changes into further testing.

    Please let me know if this is something of interests to merge.



    opened by fwang2 16
  Inclusion of parallel compression and decompression based on bzip2

    Inclusion of parallel compression and decompression based on bzip2

    I am working with Dr. Feiyi Wang and am in the process of developing parallel compression and decompression using libcircle and bzip2. Could I have permission to merge my changes? There are also changes to enable an user to use dtar with compression and/ or decompression. This makes some changes to dtar, but will not affect simple dtar or untar. It will use an additional -j flag to use compression.

    opened by ahanagemini 15
  [Request] Add exclude option to dtar

    [Request] Add exclude option to dtar

    I would like to be able to exclude files following a pattern, like tar does: mpirun -np 128 dtar --verbose --exclude="n*" --create --file /tmp/toto.tar * This is like feature request for dtar

    opened by cessenat 0
  Feature request : dtar append mode

    Feature request : dtar append mode

    When one runs dtar inside Slurm for instance, and when the allocated time is not sufficient, the tar file is not complete. I would be happy to have some append mode, where repeating the same command exactly would complete the tar.

    opened by cessenat 0
  dcp: User ID and Group ID

    dcp: User ID and Group ID

    As part of Near Node Flash (NNF) Data Movement implementation, there is the need to specify the User ID and Group ID when running dcp (explained here).

    For dcp, this work seems pretty straightforward to add with -U [uid] and -G [gid].

    For example dcp -U 1001 -G 1002 file.out

    Does this work for everyone? Any other comments before I begin?

    opened by NateThornton 1
  ddup hashing and openssl 3.x

    ddup hashing and openssl 3.x

    Was looking through the code to see if there was a less computationally expensive way to handle duplicate files (SHA256 is unnecessary IMO, but that's a separate issue/PR) and as it turns out openssl 3.x drops support for the low level interface to hash functions.

    Considering openssl 1.1.1 is EOL in under a year, this may be worth looking at. From what I'm reading, the correct functions are in evp.h per

    opened by theAeon 0
  Add O_NOATIME to the flags for file open so the source file's atime i…

    Add O_NOATIME to the flags for file open so the source file's atime i…

    …s preserved (if the platform supports it). Signed-off-by: Doug Johnson [email protected].

    Lightly tested dsync on Linux (Red Hat 7.9). Not sure if this covers the bases for what's needed to reliably not change the source file's atime. I went off a comment src/common/mfu_flist.h:29 where the _GNU_SOURCE define appears to have been intended to be used for this purpose. I checked the open and closed issues and pull requests to see if this was ever discussed, surprisingly absent. This feature would be useful for those migrating data and that use atime for purging or migration purposes: the source file's atime needs to be preserved in the interregnum between when the copy is initiated, subsequent invocations of dsync, and the cutover to the new location of the data is performed.

    opened by dpjohnson 0
  • v0.11.1(Feb 5, 2022)

    New features:

    • Release tarball to package mpiFileUtils with appropriate versions of LWGRP, DTCMP, and libcircle to simplify builds. See
    • dcp, dsync: added --xattrs={none, all, non-lustre, libattr} option for better control when copying extended attributes. Lustre extended attributes are no longer copied by default. Copying of extended attributes is now independent of the --preserve option to copy owner, group, permissions, and timestamps.

    New DAOS features:

    • New daos-serialize and daos-deserialize tools for moving containers to/from HDF5 format.
    • Support for paths formatted as daos://<pool>/<cont>/[path]

    Bug Fixes:

    • libmfu: incorrect item count in mfu_flist_copy progress message
    • libmfu: segfault in strmap


    • liblwgrp v1.0.4
    • libdtcmp v1.1.4
    • libcircle v0.3
    • libarchive v3.5.1
    Source code(tar.gz)
    Source code(zip)
    mpifileutils-v0.11.1.tgz(555.61 KB)
  • v0.11(Feb 1, 2021)

    New features:

    • libmfu: mfu_flist_archive function to create and extract tar (tape archive) files
    • libmfu: updated default I/O buffer and chunk sizes from 1MB to 4MB
    • libmfu: define major, minor, patch versioning on
    • dcmp, dcp, dsync: renamed --synchronous to --direct as the option to enable O_DIRECT
    • dcmp, dsync: added --chunksize and --blocksize options to size work units when slicing files
    • dcp: log more errors and return 1 from main on error
    • dcp, dsync, dwalk: added --dereference option for dereferencing symbolic links
    • dtar: promoted from experimental to production tool

    New DAOS features:

    • dcmp, dcp, dsync: added DAOS support for DAOS POSIX containers
    • dcp: added DAOS support for generic DAOS containers
    • see for documentation on DAOS usage

    Bug Fixes:

    • libmfu: improved O_DIRECT buffer and file offset alignments when copying files
    • libmfu: compile bug to define SYS_getdents on aarch64 systems


    • liblwgrp v1.0.3
    • libdtcmp v1.1.1
    • libarchive v3.5.1
    Source code(tar.gz)
    Source code(zip)
    mpifileutils-v0.11.tgz(594.61 KB)
  • v0.10.1(Jul 6, 2020)

    This is primarily a bug fix release of v0.10. Among other improvements, it includes fixes to support newer Lustre, Open MPI, and GCC versions.

    New features:

    • dchmod: check for CAP_FOWNER and CAP_CHOWN to reduce need for --force option

    Bug fixes:

    • libmfu: drop dead Lustre-specific code that prevented compilation with Lustre v2.13 and newer
    • libmfu: convert deprecated MPI functions to use newer MPI_Comm_create_keyval, MPI_Comm_set_attr, and MPI_Comm_get_attr
    • libmfu: support empty lists in mfu_flist_sort
    • libmfu: switch from "external32" to "native" MPI I/O data representation to better support Open MPI v4.0.3, and add error checking around MPI I/O calls
    • libmfu: pad short path names with null when writing binary cache files
    • libmfu: avoid multiple definitions of global variables to allow easier builds with the GCC v10.1 compiler
    • libmfu: update file offset alignment, buffer alignment, and transfer size for improved O_DIRECT support in mfu_flist_copy
    • dfilemaker: correct usage message
    • dfind: support file names containing spaces in --exec {} substitutions and avoid seg fault when missing terminating ;
    Source code(tar.gz)
    Source code(zip)
  • v0.10(Jan 28, 2020)

    New features:

    • libmfu: reduced verbosity of debug messages when copying GPFS ACLs
    • libmfu: tweaked lite walk progress to count directories after reading them, rather than discovering them
    • libmfu: include item rate in walk progress messages
    • libmfu: added mfu_progress_start/update/complete functions to periodically print progress information
    • progress messages enabled in dchmod, dcmp, dcp, dreln, drm, dstripe, dsync, dwalk, see new --progress option for more details
    • dchmod: enable --user and --group options to accept numeric user id and group id values in addition to names
    • dchmod: added algorithm to avoid stat on walk when stat info is not needed
    • dchmod: skip chown/chmod calls on items that do not need to be changed
    • dchmod: added --force option to always call chown/chmod
    • dchmod: added --silent option to suppress EPERM error messages
    • dcp, dcp1: added --chunksize and --blocksize options to control slicing of large files during copy
    • drm: added --stat, --text, and --output options to record list of files drm attempts to remove
    • dsync: added --link-dest option to create hardlinks to conserve storage space and inodes during incremental backups
    • dsync: added --sparse option to write sparse files

    Bug Fixes:

    • libmfu: fix call to segmented scan leading to false positives and false negatives when detecting file content differences in dcmp and dsync --contents
    • dfind: mask file mode with S_IFMT instead of file type to avoid collisions between different file types, previously -type f would also return symlinks


    • libcircle v0.3 or higher
    Source code(tar.gz)
    Source code(zip)
  • v0.9.1(Apr 17, 2019)

    This is primarily a bug fix release for v0.9, but we also promote dbz2 to be a released tool.

    New features:

    • libmfu: added functions for parallel compression / decompression of dbz2 files
    • dbz2: compress and decompress a large file in parallel. Thanks to Ahana Roy Choudhury for contributing the original dbz2 implementation using libcircle for parallel compression and decompression and for proposing the original dbz2 file format that facilitates parallel decompression.
    • dwalk: --file_histogram option prints a default size histogram
    • dwalk: --text option is now documented. Use this option with --output to create a text file.
    • per common user request, tools now verbose by default, use new --quiet option to silence
    • simplified informational messages to be more concise
    • many tools have had their usage and output text updated

    Bug fixes:

    • cmake: rpaths now supported in both build and install directories
    • cmake: tests added to enable Lustre APIs
    • cmake: add path to find FindGPFS.cmake when using -DENABLE_GPFS=ON
    • libmfu: fixed bug in cache file format that was introduced in v0.9
    • dbcast: fixed to no longer create target directory if it already exists
    • dfind: output file was mistakenly hardcoded to write to text format
    • dwalk: --distribution option led to a hang when using incorrect syntax
    Source code(tar.gz)
    Source code(zip)
  • v0.9(Jan 28, 2019)

    We've officially converted to CMake! Instructions for building are here:

    New Features:

    • dcmp: include nanoseconds when comparing timestamps
    • dcmp: new --lite option to compare files based on file type, file size, and modification time rather than file content
    • drm: new --aggressive option to delete files while walking
    • dsync: default behavior no longer deletes files at the destination, deleting now requires new --delete option
    • dsync: optionally copies in batches with --batch-files option as form of self-checkpointing long running dsync jobs
    • drm: fix segfault when deleting a large number of files
    • libmfu: avoid problematic MPI I/O external32 for more consistent file format
    • libmfu: support for GPFS ACLs

    New Tools:

    • dfind: filters file list based on different criteria
    • dreln: update symlinks whose targets use absolute paths, useful after dsync
    Source code(tar.gz)
    Source code(zip)
  • v0.8.1(Sep 25, 2018)

  • v0.8(Aug 31, 2018)

    New features:

    • dchmod: added --owner option to change user on files
    • dchmod: fixed bug when using 'a' option in symbolic notation
    • dcmp: added expressions to compare permissions and ACL on files
    • dcp: fix bug that prevented some subdirectories from being created
    • dcp1: original dcp from LANL, may be faster than dcp for directory trees with lots of small files, a similar algorithm to be merged into dcp in future release
    • dfilemaker: updated to create multiple directory levels, files of different sizes with random content, and symlinks
    • dfilemaker1: dfilemaker from v0.7, to be merged into dfilemaker in future release
    • dstripe: added support for Lustre 2.5
    • dsync: new tool to synchronize one directory tree with another (good for backups or completing partial copies)
    • libmfu: added API calls to define new list elements and set their properties
    • libmfu: added function to write file list to text file

    New experimental tools (work in progress):

    • dbz2: compress a single file with bz2
    • dfind: filter file lists with find-like tests
    • dgrep: parallel grep
    • dtar: parallel tar (incomplete)

    Known bugs:

    • dtar uses libarchive that assumes a single process is writing the tar archive. It likely will generate corrupt tar archives if run with more than one process. This will be fixed in a future release.
    • The binary format for reading and writing filelists to files uses MPI I/O external32 data representation. This has proved to be buggy across MPI implementations. In a future release, mpiFileUtils will be changed to read/write these files with POSIX I/O. If a work around is needed, one may change "external32" to "native" in src/common/mfu_flist_io.c.
    • dsync --contents has a bug when computing its offset during lseek for overwriting an existing file (fixed in v0.8.1)

    Updated dependency:

    • Requires update of DTCMP from v1.0.3 to v1.1.0 for new DTCMP_Segmented_scanv/exscanv calls.

    md5sum: 1082600e7ac4e6b2c13d91bbec40cffb

    Source code(tar.gz)
    Source code(zip)
    mpifileutils-0.8.tar.gz(652.25 KB)
  • v0.7(Jun 23, 2017)

    • dbcast: now creates destination directory before broadcasting file
    • dbcast: added --size option to set file segment size
    • dchmod: process umask if user provides no ugoa letter in symbolic notation
    • dcmp: added --sync option to synchronize source and target directory trees
    • dsh: added support for wildcard filters, limit ls output to 100 items by default
    • dsh: added --output and --file options to save modified flist on exit
    • dstripe: now recursively processes files under a directory
    • dstripe: added --report to print file striping info
    • dstripe: added --count and --size options to set stripe parameters
    • dstripe: added --minsize option to only process files above a certain size
    • libmfu: moved copy logic from dcp to new mfu_flist_copy routine
    • libmfu: write flist cache files using multiple stripes, one per MPI rank
    • configure: fixed --enable-experimental builds

    Known bugs:

    • using an input list with dcp is broken in v0.7, but this is patched on the main branch

    md5sum: c081f7f72c4521dddccdcf9e087c5a2b

    Source code(tar.gz)
    Source code(zip)
    mpifileutils-0.7.tar.gz(545.19 KB)
  • v0.6(May 11, 2017)

    This is an intermediate release along the roadmap for v1.0.

    • dchmod: added dchmod to set access permissions and group
    • added node2 test suite and integrated with
    • dcmp: parallelized compare for large data files (now in addition to large directories)
    • dcp: promote dcp2 to dcp, renamed dcp to dcp1
    • renamed common library from libbayer to libmfu (mpiFileUtils)
    • moved significant code to libmfu
    • numerous bugs fixes and other enhancements
    Source code(tar.gz)
    Source code(zip)
    mpifileutils-0.6.tar.gz(595.89 KB)
  • v0.0.1-alpha.3(Jan 14, 2014)

  • v0.0.1-alpha.2(Dec 13, 2013)

    • A basic system for building man pages.
    • Configure time options to enable or disable experimental utilities.
    • Many small documentation fixes.
    • Initial travis-ci configuration.
    • Minor build warning fixes.
    • Initial stubs for nose test suite and dfilemaker.
    Source code(tar.gz)
    Source code(zip)
  • v0.0.1-alpha.1(Dec 6, 2013)

    • Initial tag, since a good portion of the codebase compiles with autotools.
    • This release has not been tested at all. Please don't expect it to work well.
    Source code(tar.gz)
    Source code(zip)
