Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-19 Thread Kamal Mostafa
> On Mon, Jan 16, 2012 at 12:58:13PM -0800, Kamal Mostafa wrote: > > * Package name: duff > > * URL : http://duff.sourceforge.net/ On Tue, 2012-01-17 at 09:56 +0100, Simon Josefsson wrote: > If there aren't warnings about use of SHA1 in the tool, there should > be. While I don't re

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-17 Thread Andy Smith
Hello, On Tue, Jan 17, 2012 at 09:12:58AM +, Lars Wirzenius wrote: > rdfind seems to be quickest one, but duff compares well with hardlink, > which (see http://liw.fi/dupfiles/) was the fastest one I knew of in > Debian so far. Does anyone know of a duplicate file finder that can keep its dat

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-17 Thread Johan Henriksson
> > Ah, right. So you'll start writing yet another tool? ;) > > I've implemented pretty much that (http://liw.fi/dupfiles), but my > duplicate file finder is not so much better than existing ones in > Debian that I would inflict it on Debian. But the algorithm works > nicely, and works even for peo

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-17 Thread Samuel Thibault
Samuel Thibault, le Tue 17 Jan 2012 14:02:45 +0100, a écrit : > On my PhD work directory, with various stuff in it (500MiB, 18000 files, > big but also small files (svn/git checkouts etc)), everything being in > cache already (no disk I/O): > > hardlink -t --dry-run . > /dev/null 1,06s user

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-17 Thread Lars Wirzenius
On Tue, Jan 17, 2012 at 02:05:10PM +0100, Samuel Thibault wrote: > Roland Mas, le Tue 17 Jan 2012 13:41:23 +0100, a écrit : > > Samuel Thibault, 2012-01-17 12:03:41 +0100 : > > > > [...] > > > > > I'm not sure to understand what you mean exactly. If you have even > > > just a hundred files of the

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-17 Thread Samuel Thibault
Roland Mas, le Tue 17 Jan 2012 13:41:23 +0100, a écrit : > Samuel Thibault, 2012-01-17 12:03:41 +0100 : > > [...] > > > I'm not sure to understand what you mean exactly. If you have even > > just a hundred files of the same size, you will need ten thousand file > > comparisons! > > I'm sure th

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-17 Thread Samuel Thibault
Samuel Thibault, le Tue 17 Jan 2012 12:15:16 +0100, a écrit : > Lars Wirzenius, le Tue 17 Jan 2012 10:45:20 +, a écrit : > > On Tue, Jan 17, 2012 at 10:30:20AM +0100, Samuel Thibault wrote: > > > Lars Wirzenius, le Tue 17 Jan 2012 09:12:58 +, a écrit : > > > > real user system max RSS e

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-17 Thread Roland Mas
Samuel Thibault, 2012-01-17 12:03:41 +0100 : [...] > I'm not sure to understand what you mean exactly. If you have even > just a hundred files of the same size, you will need ten thousand file > comparisons! I'm sure that can be optimised. Read all 100 files in parallel, comparing blocks of s

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-17 Thread Samuel Thibault
Lars Wirzenius, le Tue 17 Jan 2012 10:45:20 +, a écrit : > On Tue, Jan 17, 2012 at 10:30:20AM +0100, Samuel Thibault wrote: > > Lars Wirzenius, le Tue 17 Jan 2012 09:12:58 +, a écrit : > > > real user system max RSS elapsed cmd > > > > > > (s) (s)

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-17 Thread Samuel Thibault
Lars Wirzenius, le Tue 17 Jan 2012 10:45:20 +, a écrit : > > > Personally, I would be wary of using checksums for file comparisons, > > > since comparing files byte-by-byte isn't slow (you only need to > > > do it to files that are identical in size, and you need to read > > > all the files any

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-17 Thread Lars Wirzenius
On Tue, Jan 17, 2012 at 10:30:20AM +0100, Samuel Thibault wrote: > Lars Wirzenius, le Tue 17 Jan 2012 09:12:58 +, a écrit : > > real user system max RSS elapsed cmd > > (s) (s) (s)(KiB) (s) > > 3.2

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-17 Thread Samuel Thibault
Lars Wirzenius, le Tue 17 Jan 2012 09:12:58 +, a écrit : > real user system max RSS elapsed cmd > (s) (s) (s)(KiB) (s) > 3.2 2.4 5.862784 5.8 hardlink --dry-run files > /dev/null >

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-17 Thread Lars Wirzenius
On Mon, Jan 16, 2012 at 12:58:13PM -0800, Kamal Mostafa wrote: > * Package name: duff > * URL : http://duff.sourceforge.net/ A quick speed comparison: real user system max RSS elapsed cmd (s) (s) (s)(KiB) (s)

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-16 Thread martin f krafft
also sprach Kamal Mostafa [2012.01.17.0049 +0100]: > In my humble opinion, that would be an unreasonable pre-condition for > inclusion in Debian. Our standard for inclusion should not be that a > new package must be "vastly better" than other similar packages. That > would deny a new package the

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-16 Thread Kamal Mostafa
On Mon, 2012-01-16 at 23:07 +0100, Joerg Jaspert wrote: > >> What is it the benefit over fdupes, rdfind, ...? > > ..., hardlink, ... > finddup from perforate After a quick evaluation of the various "find dupe files" tools, I was attracted to try duff because: 1. It looked easier to use than the o

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-16 Thread Joerg Jaspert
>> What is it the benefit over fdupes, rdfind, ...? > ..., hardlink, ... finddup from perforate > Was thinking about packaging it myself already, so I may also sponsor > Kamal's package when it's ready. You just listed the third duplicate (and me no. 4), and still go blind right on "ohoh, i spon

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-16 Thread Axel Beckert
Hi, Samuel Thibault wrote: > > * Package name: duff > > Version : 0.5 > > Upstream Author : Camilla Berglund > > * URL : http://duff.sourceforge.net/ > > * License : Zlib > > Programming Lang: C > > Description : Duplicate file finder > > > > Duff is a

Re: Bug#656142: ITP: duff -- Duplicate file finder

2012-01-16 Thread Samuel Thibault
Kamal Mostafa, le Mon 16 Jan 2012 12:58:13 -0800, a écrit : > Package: wnpp > Severity: wishlist > Owner: Kamal Mostafa > > > * Package name: duff > Version : 0.5 > Upstream Author : Camilla Berglund > * URL : http://duff.sourceforge.net/ > * License : Zlib >

Bug#656142: ITP: duff -- Duplicate file finder

2012-01-16 Thread Kamal Mostafa
Package: wnpp Severity: wishlist Owner: Kamal Mostafa * Package name: duff Version : 0.5 Upstream Author : Camilla Berglund * URL : http://duff.sourceforge.net/ * License : Zlib Programming Lang: C Description : Duplicate file finder Duff is a comman