Gregory Piñero ha scritto:
> Wow, that looks excellent. I'll definately try it out. I'm assuming
> this is an existing project, e.g. you didn't write it after reading
> this thread?
Yes it is an existing projects of course ;)
Right now I've no time to improve it.
I hope that later this summer I
Wow, that looks excellent. I'll definately try it out. I'm assuming
this is an existing project, e.g. you didn't write it after reading
this thread?
-Greg
On 2/9/06, name <[EMAIL PROTECTED]> wrote:
> Gregory Piñero ha scritto:
> :
> > If anyone would be kind enough to improve it I'd love to ha
Gregory Piñero ha scritto:
:
> If anyone would be kind enough to improve it I'd love to have these
> features but I'm swamped this week!
>
> - MD5 checking for find exact matches regardless of name
> - Put each set of duplicates in its own subfolder.
Done? http://pyfdupes.sourceforge.net/
Bye,
l
Diez B. Roggisch wrote:
> I did a levenshtein-fuzzy-search myself, however I enhanced my version by
> normalizing the distance the following way:
>
> def relative(a, b):
> """
> Computes a relative distance between two strings. Its in the range
> (0-1] where 1 means total equality.
>
On Thu, 1 Feb 2006, it was written:
> Tom Anderson <[EMAIL PROTECTED]> writes:
>
>>> The obvious way is make a list of hashes, and sort the list.
>>
>> Obvious, perhaps, prudent, no. To make the list of hashes, you have to
>> read all of every single file first, which could take a while. If your
Steven D'Aprano wrote:
> This isn't a criticism, it is a genuine question. Why do people compare
> local files with MD5 instead of doing a byte-to-byte compare? Is it purely
> a caching thing (once you have the checksum, you don't need to read the
> file again)? Are there any other reasons?
Becau
Steven D'Aprano <[EMAIL PROTECTED]> writes:
> Sure. But if you are just comparing two files, is there any reason to
> bother with a checksum? (MD5 or other.)
No of course not, except in special situations, like some problem
opening and reading both files simultaneously. E.g.: the files are on
two
Tom Anderson <[EMAIL PROTECTED]> writes:
> > The obvious way is make a list of hashes, and sort the list.
>
> Obvious, perhaps, prudent, no. To make the list of hashes, you have to
> read all of every single file first, which could take a while. If your
> files are reasonably random at the beginni
On Tue, 31 Jan 2006 13:38:50 -0800, Paul Rubin wrote:
> Steven D'Aprano <[EMAIL PROTECTED]> writes:
>> This isn't a criticism, it is a genuine question. Why do people compare
>> local files with MD5 instead of doing a byte-to-byte compare? Is it purely
>> a caching thing (once you have the checksu
On Tue, 31 Jan 2006, it was written:
> Steven D'Aprano <[EMAIL PROTECTED]> writes:
>
>> This isn't a criticism, it is a genuine question. Why do people compare
>> local files with MD5 instead of doing a byte-to-byte compare?
I often wonder that!
>> Is it purely a caching thing (once you have th
Steven D'Aprano <[EMAIL PROTECTED]> writes:
> This isn't a criticism, it is a genuine question. Why do people compare
> local files with MD5 instead of doing a byte-to-byte compare? Is it purely
> a caching thing (once you have the checksum, you don't need to read the
> file again)? Are there any o
On Tue, 31 Jan 2006 10:51:44 -0500, Gregory Piñero wrote:
> http://www.blendedtechnologies.com/removing-duplicate-mp3s-with-python-a-naive-yet-fuzzy-approach/60
>
> If anyone would be kind enough to improve it I'd love to have these
> features but I'm swamped this week!
>
> - MD5 checking for fi
I wonder which algorithm determines the similarity between two strings better?
On 1/31/06, Kent Johnson <[EMAIL PROTECTED]> wrote:
> Gregory Piñero wrote:
> > Ok, ok, I got it! The Pythonic way is to use an existing library ;-)
> >
> > import difflib
> > CloseMatches=difflib.get_close_matches(AFi
Gregory Piñero wrote:
> Ok, ok, I got it! The Pythonic way is to use an existing library ;-)
>
> import difflib
> CloseMatches=difflib.get_close_matches(AFileName,AllFiles,20,.7)
>
> I wrote a script to delete duplicate mp3's by filename a few years
> back with this. If anyone's interested in s
> Thanks for that, I'll have a look. (So many packages, so little
> time...)
Yes, there's a standard library for everything it seems! Except for a
MySQL api :-(
> > I wrote a script to delete duplicate mp3's by filename a few years
> > back with this. If anyone's interested in seeing it, I'll p
Ok, ok, I got it! The Pythonic way is to use an existing library ;-)
import difflib
CloseMatches=difflib.get_close_matches(AFileName,AllFiles,20,.7)
I wrote a script to delete duplicate mp3's by filename a few years
back with this. If anyone's interested in seeing it, I'll post a blog
entry on
BBands wrote:
> I have some CDs and have been archiving them on a PC. I wrote a Python
> script that spans the archive and returns a list of its contents:
> [[genre, artist, album, song]...]. I wanted to add a search function to
> locate all the versions of a particular song. This is harder than y
BBands wrote:
> Diez B. Roggisch wrote:
> > I did a levenshtein-fuzzy-search myself, however I enhanced my version by
> > normalizing the distance the following way:
>
> Thanks for the snippet. I agree that normalizing is important. A
> distance of three is one thing when your strings are long, bu
Diez B. Roggisch wrote:
> I did a levenshtein-fuzzy-search myself, however I enhanced my version by
> normalizing the distance the following way:
Thanks for the snippet. I agree that normalizing is important. A
distance of three is one thing when your strings are long, but quite
another when they
Diez B. Roggisch wrote:
> The advantage becomes apparent when you try to e.g. compare
>
> "Angelina Jolie"
>
> with
>
> "AngelinaJolei"
>
> and
>
> "Bob"
>
> Both have a l-dist of 3
>>> distance("Angelina Jolie", "AngelinaJolei")
3
>>> distance("Angelina Jolie", "Bob")
13
what did I miss ?
Fredrik Lundh wrote:
> Diez B. Roggisch wrote:
>
>> The advantage becomes apparent when you try to e.g. compare
>>
>> "Angelina Jolie"
>>
>> with
>>
>> "AngelinaJolei"
>>
>> and
>>
>> "Bob"
>>
>> Both have a l-dist of 3
>
distance("Angelina Jolie", "AngelinaJolei")
> 3
distance("Angeli
I have some CDs and have been archiving them on a PC. I wrote a Python
script that spans the archive and returns a list of its contents:
[[genre, artist, album, song]...]. I wanted to add a search function to
locate all the versions of a particular song. This is harder than you
might think. For exa
> As mentioned above this works quite well and I am happy with it, but I
> wonder if there is a more Pythonic way of doing this type of lookup?
I did a levenshtein-fuzzy-search myself, however I enhanced my version by
normalizing the distance the following way:
def relative(a, b):
"""
Com
23 matches
Mail list logo