Releases: ttkb-oss/dedup
dedup-0.0.6
Full Changelog: release-0.0.5...release-0.0.6
dedup
now supports -l
/--link
& -s
/--symlink
options which use links and symlinks instead of cloned blocks, respectively. There are a few use cases where this makes sense: file systems (like HFS+) that support links but not clonefile(2)
, integration with tools like tar(1)
that understand multiple entries pointing to the same inode, but not files with the same CLONEID, etc. These options are not as well tested as the clonefile(2)
path and bypass some previously checks that are no longer relevant (e.g. file system support for clones).
important
This is a pre-release build. There are no known bugs which cause data loss, but that doesn't mean there aren't unknown bugs that might cause data loss. While the author has not experienced any data loss with this release, and care has been taken to prevent data loss from occurring, users should understand that data loss could occur due to unforeseen circumstances.
dedup
is only recommended for:
- Data that is backed up
- Data that can be recreated
- Data that can be lost due to unforeseen circumstances
dedup-0.0.5
Full Changelog: release-0.0.4...release-0.0.5
- Resolves #1
clonefile(2)
Creates Files with DifferentATTR_CMNEXT_CLONEID
- Resolves #2
copyfile(3)
Replaces Data with Broken HFS Compress Resource Fork
dedup
now handles HFS transparent compression files more gracefully. dedup-0.0.4 would identify compressed files by how they were mangled by copyfile(3)
, but dedup-0.0.5 checks flags prior to any mangling and avoids using a compressed file as the source of other clones. Based on the way compressed files work and how APFS shares data blocks, it is not possible to clone compressed files that share data. Compressed files will be replaced by clones if there is at least one uncompressed matching file.
important
This is a pre-release build. There are no known bugs which cause data loss, but that doesn't mean there aren't unknown bugs that might cause data loss. While the author has not experienced any data loss with this release, and care has been taken to prevent data loss from occurring, users should understand that data loss could occur due to unforeseen circumstances.
dedup
is only recommended for:
- Data that is backed up
- Data that can be recreated
- Data that can be lost due to unforeseen circumstances
dedup-0.0.4
Full Changelog: release-0.0.3...release-0.0.4
A critical bug related to lazy hashing is fixed in this release. When more than two files had the same [device, size, first byte, last byte] tuple, the SHA-256 wasn't populated on any file after the second. This resulted in a significant number of files with an "empty" SHA-256 pattern that matched all of the other files with the tuple, regardless of contents.
While most allocations are needed for most of the duration of the application, leaving cleanup for program termination make future use of several functions impractical in longer running applications. All dynamically allocated memory is now freed prior to program termination.
Some use of mmap(2)
was replaced with fopen
/ftell
/fgetc
. There is a race between fstat
and mmap
when the size of the file could change making the size of passed to mmap
incorrect. If the file is smaller, reading further into the file than has been mapped results in a SIGBUS
. Reading the file the old-fashioned POSIX way is still broken, but will not raise a signal. As mentioned below, it is not recommended to run dedup
on a directory tree that may be changing while dedup
is running. Options to exclude files may make this more practical in the future.
important
This is a pre-release build. There are no known bugs which cause data loss, but that doesn't mean there aren't unknown bugs that might cause data loss. While the author has not experienced any data loss, and care has been taken to prevent data loss from occurring, users should understand that data loss could occur due to unforeseen circumstances.At this point
dedup
is only recommended for:
- Data that is backed up
- Data that can be recreated
- Data that can be lost due to unforeseen circumstances
There are two known bugs that will not cause data loss, but do result in unexpected behavior (duplicates may not be cloned) and accounting (clones do not use the same clone id):
- #1
clonefile(2)
Creates Files with DifferentATTR_CMNEXT_CLONEID
- #2
copyfile(3)
Replaces Data with Broken HFS Compress Resource ForkHow does
dedup
mitigate against failures that may result in data loss? Clones are created along side the file that is going to be replaced. After sanity checks (especially related to #2), the original file is atomically swapped with the clone.
dedup-0.0.3
warning
This release contains a bug that could result in data loss. Artifacts have been removed. This bug was introduced in 51e677a
and resolved in b43477f
Full Changelog: release-0.0.2...release-0.0.3
Lazy hashing significantly reduces the time it takes for trees with large, unique files.
dedup-0.0.2
Full Changelog: release-0.0.1...release-0.0.2
dedup-0.0.1
Full Changelog: release-0.0.0...release-0.0.1
Cleaned up documentation.
dedup-0.0.0
Full Changelog: https://github.com/ttkb-oss/dedup/commits/release-0.0.0
Initial release.