On Thu, 2010-10-14 at 12:05 -0700, Jim Nelson wrote: > There's a number of different issues and problems in play here, and > I'm afraid they're all being conflated under the banner of "duplicate > detection". It's important to understand at least a couple of basic > concepts.
Thanks for the clarification, Jim - see my interpolated comments below: > > First, in regards to a file that is twice the size of another being > regarded as a duplicate, that is most likely due to this bug: > http://trac.yorba.org/ticket/2587 It was an oversight on my part and > I would like to fix this for 0.8. Excellent. > > Now, beyond that, there are three kinds of duplicate detection (that > banner I was mentioning before): > > 1. The same filepath: If you import /home/jim/photo.jpg and then > import again /home/jim/photo.jpg, Shotwell will treat that as a > duplicate. This is a special case of duplicate detection; Shotwell > will not allow you to have two photo "objects" in your library > pointing to the same file. Note that this is true even if you > update/change the file outside of Shotwell, i.e. modify its EXIF. > Shotwell 0.7 does *not* detect external changes and update the > library. > > Shotwell 0.8, on the other hand, will do that exact thing by detecting > the change at startup: http://trac.yorba.org/ticket/2476 If I understand this correctly, in Shotwell 0.8 it will be necessary to close Shotwell and open it again if one of the files is changed (in my case, if I change the date-stamp in the EXIF data) - an attempt to import that file again will still be seen as an attempt to import a duplicate because the file has the same name, but all will be well the next time I start Shotwell. > > 2. The same file contents: If two photos are byte-for-byte identical, > they are considered duplicates and Shotwell will only import one of > them. This comparison is of the entire file; changing the EXIF in one > is enough to make the files different. Precisely what I was expecting from duplicate detection. > > 3. The same file on camera and on disk: Camera import introduces a > problem. Because it takes so long to download a file and it's > desirable to be able to see which photos you've already imported > before importing them, we want to detect duplicates before pulling > down the file. We do this by comparing the photo thumbnails (which is > what led to that bug I mentioned at the top of this message). > Otherwise, we would download the file, compare it to the library, > realize it was a duplicate, and then delete the file. > OK - I found this useful today after I'd failed to delete image files I'd already downloaded from my camera. I think I'd prefer to have the option of overwriting the already-downloaded images, though. > Now, apart from all this, the trash can creates a special case. If > you move a photo to the trash and then import a file that is a > duplicate -- i.e. is either the same filepath OR the same contents -- > Shotwell will restore the photo object from the trash and not import > the new file. (In the case of the same filepath, that's essentially > what you're asking for.) We assume if you import a duplicate file, > you're saying "You know what, I want this photo after all," and we > restore it from the trash. It's also restoring all the stuff you've > done to it while it was in Shotwell, i.e. its tags, transformations, > and so on. We see this as valuable stuff to be preserving. > I hear what you're saying, but I have to say I find its implementation in practice totally counterintuitive! If I had put something in the wastebasket and then realised I wanted it after all, I would take it out of the wastebasket myself. I agree that transferring the tags associated with the original file would be helpful - this could be implemented as a dialog box asking something like: Importing file with same name and path - keep tags? [yes]...[yes to all]...[no]...[no to all] > This is all quite distinct from a different operation we refer to as > "re-import": To take an existing photo in the library and re-examine > all its properties and reflect any changes in the database > (thumbnails, tags, etc.). I think what Michael is asking for when > importing a file already in the trash is for Shotwell to re-import it. > In 0.8 we're attacking the problem slightly differently, but detecting > the change automatically and re-importing it in the background. But > 0.7 does not have this capability in any manner. Yes, I think 0.8 will deal with my complaints. How will it handle the preservation of tags when it discovers a changed file on startup? And how about the case of a changed file which at some stage been edited via Shotwell? My preference would be to transfer the tags and drop the edit trail. > > I know this is a lot to absorb, but I feel like some concepts and > terms are being muddied. Some of it is my own fault, as we're trying > to keep it simple for users, and some of it is due to a bug that's > being grouped in as a part of designed behavior. By being aware of > the different cases of "duplicate detection", I think we can figure > what is useful, what can be tweaked, and what needs to be changed. > > -- Jim > "Duplicate" means "identical" to me - you can't have shades of duplicate-ness. If the process of checking a potential duplicate is going to take a long time, then the user should be given the option to abort or continue. Michael _______________________________________________ Shotwell mailing list [email protected] http://lists.yorba.org/cgi-bin/mailman/listinfo/shotwell
