https://bugzilla.wikimedia.org/show_bug.cgi?id=54328

--- Comment #4 from Brad Jorsch <bjor...@wikimedia.org> ---
(In reply to Aaron Halfaker from comment #3)
> 1. Character vs. line offset
> I'd much rather represent diffs based on a character offset I'm afraid of
> representing position with something like lineno since linebreaks are
> differently defined between systems.

Isn't that an argument for line-based rather than chatacter-based offsets?

>  Character offsets would also allow us
> to make changes to our diff detection strategy without changing the output.
> 
> 2. Machine readable vs. human readable diffs
> Machine readable diff opcode formats tend to represent the full set of
> operations used to recreate a revision -- not just the context.

OTOH, what is the usual use of querying the diffs? I suspect it's more often
that the client is wanting to display a human-readable diff to the end user
than because the client is wanting to do the equivalent of the 'patch' utility
on an already-downloaded local copy of the article.

> and diffs tend to be represented in few operations.

On talk pages, maybe. But someone heavily copyediting an article is likely to
generate a huge number of operations. With the way the diff algorithm works,
even some simple edits will generate many operations as it tries to match up
individual letters in the old vs new paragraphs.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to