How freedup "extra styles" work
This concept was introduced in version 1.1 due to the fact that I wanted files to be
linked although they differed. I am talking of mp3 files where the tags showed minor
variations. First I considered retagging all files, but I would have to remove either
all or complete all tags (n.b. MP3v1 tags are at the end, MP3v2 tags are at the
beginning of an mp3 file, both are optional).
The extra style now should compare the essential file content, i.e. the
mpeg encoded sound part in case of the mp3 files. Currently the following
rules are established:
- mp3 strips the mp3v1 and mp3v2 tags and provides comparison of the remaining body.
- mp4 strips all sections up to the first mdat section and everything
including the first non-mdat section after it. This should work for
iPod files, AAC/FAAC encoded sounds, and files usually having extensions like MP4, M4A, M4V, etc.
- mpc strips the the APETAGEX labelled tail from mousepack audio files
- ogg strips all infos until the sequence "vorbis.BCV" where the dot is arbitrary.
Minor trailing infos (less than 128 bytes) are also cut off.
- jpg tries to strip the comments at the beginning of each file. Since some comments
where after the quantization table, this is stripped, too.
since for each file type exactly one method exists (might change in future),
an automated mode will call the respective method according to the file magic.
The name of the files does not matter.
Please note, that these styles change the behaviour according to the file contents.
The change the size of the compared contents, but this does not affect the options
that belong to the files, like ownerships or file names.
If you like to contribute, this is quite simple. There are source files for each style.
Start with a copy of my.c and my.h. Rename the functions, fill in your way to evaluate
the irrelevant bytes at start and the trailing ones, as well as a way to find size and magic.
Add a matching line to the extra[] table in auto.c, compile, test and submit to me.
How freedup works
- scan all directory trees recursively for all regular files
- build a list of those files and keep their name, lstat() and arg position
- sort the files by comparing their sizes using qsort()
- in case the comparison has to report equal file size
additional properties are compared
- most property checks have to be added using command line options
- if all demands are fullfilled, the files are compared block by block (4k)
- if both files are identical and on the same file system they will
be renamed, hard linked, renamed file removed.
- if hardlinking is not possible soft links are tried,
except one of the paths is not starting at root (but can be forced)
- sorting is repeated, the reason why it is needed was not checked yet
- finally a short report is delivered