On Tue, 2012-06-26 at 04:54 -0400, Zdenek Pavlas wrote: > > Ok, so what is the desire for both cases here. Above that we say: > > > > reget = None [None|'simple'|'check_timestamp'] > > > > whether to attempt to reget a partially-downloaded file. Reget > > only applies to .urlgrab and (obviously) only if there is a > > partially downloaded file. Reget has two modes: > > > > ...which implies this is getting extra data or all data, the above > > kind of implies we are getting extra data or nothing (or maybe all data or > > nothing). But there is a also a problem with the idea... > > Yes, the idea is to use reget=simple when we have unique URLs, > and reget=check_timestamp for URLs where content changes over time.
But if we use this to get "primary" when it's not using unique names, people are not going to be happy if their 8MB of 12MB download restarts. I understand that you are thinking of this in the context of repomd.xml but checking the timestamps/ETags/whatever and dealing with small files (where it doesn't matter if you skip "resume" and just re-download everything) are distinct things. > > This timestamp is going to be one of three things: > > > > 1. The timestamp we last tried to download FOO, and stopped before we > > got it all. > > > > 2. The timestamp we last downloaded all of FOO, but didn't have a > > last-modified. > > > > 3. The timestamp of the server last-modified when we last downloaded > > all of FOO and had a last-modified so urlgrabber used utimes(). > > > > ...which is problematic. > > 1. This implies timestamp check fails for every partially downloaded > file. That's why I ignore opts.range unless reget==simple. > > 2. We'd always reget the whole file (it's a special case of 1). > > 3. Yes, I rely on utime() being used on completed files only. > Why is that problematic? Not sure what you mean by #1 but a _partial_ download will always have a newer timestamp than the timestamp on the server, and... The If-Modified-Since request-header field is used with a method to make it conditional: if the requested variant has not been modified since the time specified in this field, an entity will not be returned from the server; instead, a 304 (not modified) response will be returned without any message-body. ...so we'll fail to verify that the data is good, but urlgrabber will fail to (re)download anything because the timestamp is newer and it just gets 304s. The problem with #3 is the same ... servers are not _required_ to return Last-Modified, and if they don't we can't use utime() and if we haven't used utime() we really shouldn't be passing the mtime we do have to the server. With multiple server we can also be downloading from ftp one day, and then downloading from http the next ... and we shouldn't be using the timestamps in those cases either. This is probably really hard (if not impossible) to trigger with repomd.xml ... because it's so small, but then as you said it's not a measurable improvement even if it works ... because it's so small. Also there's the problem that anything using metalink files implies that checking the timestamps is a noop anyway (or should be, in some weird server failure cases it could cause problems). _______________________________________________ Yum-devel mailing list Yum-devel@lists.baseurl.org http://lists.baseurl.org/mailman/listinfo/yum-devel