https://bugzilla.wikimedia.org/show_bug.cgi?id=47406

Christian Puehringer <c...@gmx.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |c...@gmx.at

--- Comment #8 from Christian Puehringer <c...@gmx.at> ---
I agree with that it is a good idea to support both replacement of entire
articles and diffs. 
This is because I believe that omission of diff may have a huge impact on file
size: In particular articles which are updated infrequently, and there are a
lot of them, often have changes which only affect a single line, such as
spelling fixes or interwiki link updates. Thus not supporting diffs may lead to
very large diff file. 
On the other hand, when diff-support is available, it still may make sense
to store complete articles instead of diffs for some articles, for example
to reduce processing effort in diff creation, but also merge.

One other point which is worth considering is disk space usage during merge. In
particular on mobile devices, when the planned download manager is implemented,
the incremental update capability would be very useful. However, if the update
process requires free space for the old, the diff, and the new file, it won't
be useable in many cases because there is not enough freespace on mobile
devices.
Therefore something like in-place updating the old zim file, and merging while
dowloading the diff would make sense.
For sure this is not trivial, and hardly possible to be implemented in the
GSoC, but it would make sense to keep this in mind when defining the diff and
merge processes and the diff file format, so that this feature could be added
in the future. For example, with such an approach it is important that the zim
file stays consistend during the update, and later resuming is possible.
Note: If actual in-place update is not feasible, the zim-split feature could be
used, e.g. on download of full zim file is split in let's say 64 MB chunks. On
update new chunk (size may change) is written completely before old is deleted.
Thus during update only 64 MB additional storage is required, while without
in-place update tens of GB could be necessary.

And other interesting feature could be to support on-the-fly merge: When this
feature is used the diff zim file is not merged during update. Instead it is
just stored besides the old zim file, and the zimlib merges the articles from
both (or more) files on access. Benefit is that the end user does not need to
run the probably pretty long running merge task. In addition it is faster,
because no recompression needs to be done. Furthermore, no additional storage
during update is required. Drawback is that is uses more storage, thus it
depends on the size of the diff-files, whether this approach is actually
feasible. 
Anyway, I think its worth considering this feature, as it should not require
much additional effort, in fact it could even be an intermediate step for the
currently planned approach.
For sure for this on-the-fly approach supporting multiple deltas would make
sense, but this could be implemented later.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to