I've written my first program to take a given directory and look in all directories below it for duplicate files (duplicate being defined as having the same MD5 hash, which I know isn't a perfect solution, but for what I'm doing is good enough)
My problem now is that my output file is a rather confusing jumble of paths and I'm not sure the best way to make it more user readable. My gut reaction would be to go through and list by first directory, but is there a logical way to do it so that all the groupings that have files in the same two directories would be grouped together? So I'm thinking I'd have: First File Dir /some/directory/ Duplicate directories: some/other/directory/ Original file 1 , dupicate file 1 Original file 2, duplicate file 2 some/third directory/ original file 3, duplicate file 3 and so forth, where the Original file would be the file name in the First files so that all the ones are the same there. I fear I'm not explaining this well but I'm hoping someone can either ask questions to help get out of my head what I'm trying to do or can decipher this enough to help me. Here's a git repo of my code if it helps: https://github.com/CyberCowboy/FindDuplicates _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor