[Bug 21860] Add checksum field to text table; expose it in API

2011-02-07 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21860

Diederik van Liere  changed:

   What|Removed |Added

   Keywords||analytics

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21860] Add checksum field to text table; expose it in API

2011-02-02 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21860

--- Comment #24 from Rob Lanphier  2011-02-02 21:41:40 UTC 
---
Domas, I don't have an opinion on whether it belongs in the index.  The most
important use case (including in the XML dumps) doesn't seem to require it. 
There are compelling use cases discussed above that would make adding it to the
index interesting, but I know that's a much more expensive request.  If the
primary objection to adding these checksums revolves around adding the field to
the index, we should split that conversation off into a separate request.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21860] Add checksum field to text table; expose it in API

2011-02-02 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21860

--- Comment #23 from Domas Mituzas  2011-02-02 
21:00:34 UTC ---
are you going to include it into any 'revision' index?

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21860] Add checksum field to text table; expose it in API

2011-02-02 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21860

Rob Lanphier  changed:

   What|Removed |Added

 CC||ro...@wikimedia.org

--- Comment #22 from Rob Lanphier  2011-02-02 20:47:26 UTC 
---
There's been a few offline discussions about how MD5 sums might relate to
performance, so I figured I better document my take here.  At our current rate
of editing (if I'm reading the numbers correctly), we get somewhere on the
order of 10-12 million edits per month, which translates into a rate of
something like 5 edits per second.  On my laptop, the built-in PHP function
(which is presumably
what we'd use) takes somewhere on the order of 2 milliseconds to compute the
MD5 sum of the built in American English dictionary file on my computer (which
is approximately 931k).  This is all back-of-the-envelope stuff, so I could be
wildly off somewhere, but it seems like pretty negligible load given the
benefit of not forcing everyone else to compute these values downstream.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21860] Add checksum field to text table; expose it in API

2010-09-25 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21860

Rob Lanphier  changed:

   What|Removed |Added

 Blocks||25312

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21860] Add checksum field to text table; expose it in API

2010-07-15 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21860

--- Comment #21 from John Erling Blad  2010-07-16 01:54:41 UTC 
---
Seems like the arguments starts to change and it is more a discussion about
wetter such functionality should be allowed than if there are real reasons for
not supplying them. To somehow stop or hinder the additional services seems
rather counterproductive so I don't want to argue about why and how that shall
be done.

If the API supply some functionality it must be assumed that someone will use
the functionality to make available some additional services. In this case the
base functionality is to supply the text for the revisions. This will take a
long time to transfer and make some additional services difficult to implement.
It will not make them impossible to implement. At the same time the servers
will get additional load to transfer the content.

The services _are_ possible to implement and someone will build them. I have
made some such services and surely someone else surely will do too. In this
situation it is wise to carefully consider which measure can be taken to
decrease the load on the system. Right now we are stuck with a solution that
creates a very large load on the servers.

By calculating a digest in the servers instead of in the clients the total load
from transferring the information will be lower. If the digests are calculated
for each call it would probably be wise to use digests which are easy to
calculate. This still puts some strain on the database servers as they has to
serve the text for each revision. By storing the digests in the database more
heavy digests can be used, and even some heavy to compute locality hashing
functions can be used. All such digests will be fast to transfer, decreasing
the load on the servers.

I propose that we use store at least one digest in the database. This should be
stored in such a way that it is possible to use it for revert detection on the
recent changes page. Probably this should use something like MD5 on a
uncompacted version of the content, check out string grammar if its unknown to
you. It should also be possible to use it for revisions in the API, this to
support services like WikiDashboard. There are probably several lists that
should be adjusted accordingly. To support services like history flow we need
something like a nilsimsa digest on identifiable blocks in the content. I would
propose that each paragraph has a nilsimsa digest. I'm not sure if this needs
to be stored in the database, as this will be requested rather seldom. Probably
they can be calculated on the fly.

I guess at least one digest should be stored in the recent changes table for
page revisions. If this lacks digests for older entries that shouldn't do much
harm, the interesting thing here is to detect wetter a new version is unique
within a short timeframe. I would keep digests for comparison in memcached for
lets say the 4-5 last revisions of an article. Note that this digest must be
reasonable collision free for the complete set of articles and revisions, that
is something like MD5 or other very long digests.

In the revisions table it will be stored at least one digest to identify
revisions. Such digests can be more lightweight and use compacted versions of
the content. The digests should be rather short integers. Remember that those
shall be used to detect similar versions, not the larger sets from the recent
changes table.

For history flow I would compute the digests on the fly in the web server. Such
computations would be rather heavy, but they will be done seldom.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21860] Add checksum field to text table; expose it in API

2010-07-14 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21860

--- Comment #20 from MZMcBride  2010-07-14 16:20:00 UTC ---
(In reply to comment #19)
> John Erling Blad's proposal to generate the checksums only when requested
> should not significantly increase database load.  In general, hashing
> algorithms are O(n) (linear) in complexity (very fast, even for large text
> chunks).  The requested checksums could be cached to improve the performance,
> but given the speed at which checksums (even the "heavy" ones) can be
> generated, this benefit is likely to be small at best. 

I think you're overlooking the fact that Wikimedia wikis (like the English
Wikipedia) are exceptional. For a normal MediaWiki installation, the database
load would likely be negligible. I doubt the same is true for English Wikipedia
database. It seems like a waste of resources to constantly re-compute hashes
when it's trivial to store them.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21860] Add checksum field to text table; expose it in API

2010-07-14 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21860

--- Comment #19 from Aaron Halfaker  2010-07-14 
15:49:02 UTC ---
> This bug was closed exactly because it will put an extra load on database, not
> due to bandwidth issues.

John Erling Blad's proposal to generate the checksums only when requested
should not significantly increase database load.  In general, hashing
algorithms are O(n) (linear) in complexity (very fast, even for large text
chunks).  The requested checksums could be cached to improve the performance,
but given the speed at which checksums (even the "heavy" ones) can be
generated, this benefit is likely to be small at best. 

I was under the impression that this bug was closed because the developers
either deemed that it was in unimportant new feature to support and/or not
worth developer time.  I cannot argue with that.  Database load, on the other
hand, is not a good reason to close this bug.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21860] Add checksum field to text table; expose it in API

2010-07-14 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21860

--- Comment #18 from Domas Mituzas  2010-07-14 
08:52:18 UTC ---
it may not be as expensive storing the column, but eventually people will start
querying by it :-)

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21860] Add checksum field to text table; expose it in API

2010-07-13 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21860

MZMcBride  changed:

   What|Removed |Added

 CC||b...@mzmcbride.com

--- Comment #17 from MZMcBride  2010-07-13 17:06:15 UTC ---
There's a definite use-case for adding a column to the recentchanges table.
It's not that expensive to store an extra column.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21860] Add checksum field to text table; expose it in API

2010-07-13 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21860

--- Comment #16 from Max Semenik  2010-07-13 17:03:47 
UTC ---
> The API function must get the content for the revisions from the database, but
> only the digest is transfered. The database request will be heavy but serving
> the request to the client will not. For most uses there will never be a need 
> to
> calculate the more heavy hash functions.

This bug was closed exactly because it will put an extra load on database, not
due to bandwidth issues.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21860] Add checksum field to text table; expose it in API

2010-07-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21860

John Erling Blad  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|INVALID |

--- Comment #15 from John Erling Blad  2010-07-13 01:02:00 UTC 
---
A solution to avoid changing the database schema is to add a digest function to
the API, possibly allowing several digest functions. If this is used for
identifying distinct versions within a returned set of revisions with similar
sizes the digest can be very simple. We don't have to create 2¹²⁸ -ish possible
values, 2⁸ -ish values is more than enough if combined with the size. 

The API function must get the content for the revisions from the database, but
only the digest is transfered. The database request will be heavy but serving
the request to the client will not. For most uses there will never be a need to
calculate the more heavy hash functions.

Something like "rvdigest=pearson", perhaps also using a variant of FNV-1 or
even SHA1 or AES. It could also be interesting to use sone kind of locality
sensitive hashing [1] to make similar systems as described in[2][3].

The computational heavier methods could be given a maximum number of revisions
as a hard limit for each request, forcing the tool developer to choose the
computational easy methods if they suffice for the purpose.

[1] http://en.wikipedia.org/wiki/Locality_sensitive_hashing
[2] http://www.grouplens.org/node/427
[3] http://doi.acm.org/10.1145/985692.985765

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21860] Add checksum field to text table; expose it in API

2010-07-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21860

John Erling Blad  changed:

   What|Removed |Added

 CC||vac...@jeb.no

--- Comment #14 from John Erling Blad  2010-07-12 20:13:52 UTC 
---
An additional field for one or more digests would be very useful for analysis
purposes. Examples are WikiDasboard-like gadgets that shows who wrote the bulk
of the content [1] (screendump from Norwegian (bokmål) Wikipedia [2]), history
trees showing revert activity [3], identification of type of action, etc. This
will have uses at least on the history page, the user contributions page and
the recent changes page.

For some types of analysis the size of the content is sufficient, and if better
identification is necessary the content for the revisions that seems similar
because of the size can be downloaded, yet it would be better to have a digest
instead. By using digests it won't be necessary to branch out to some special
code to download additional information, the information would be readily
available.

Calculating the value in the server each time they are used does not seems to
be viable, yet it could be calculated if it is missing and then stored in the
database. Different use may need different digests, but a MD5-like digest of
the complete article and a Nilsimsa-like digest should be sufficient for most
purposes. Not sure how Nilsimsa would turn out for large articles... 

[1] http://blog.wikimedia.org/2010/wikidashboard-revisited/
[2] http://no.wikipedia.org/wiki/Fil:Skjermdump-history-graphic.png 
[3] http://www.grouplens.org/node/427

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21860] Add checksum field to text table; expose it in API

2010-02-17 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21860

--- Comment #13 from Gurch  2010-02-17 23:44:41 
UTC ---
(In reply to comment #12)
> I'm not sure how the conversation got steered to recent changes monitoring
> since I don't understand the usefulness of knowing when no-op edits are saved
> to articles

No-op edits aren't saved at all, they have no entry in the page history.

> I'd like to bring back the discussion of how a checksum could
be used to quickly determine which revisions are reverts computationally

Which is the reason it would be useful for recent changes monitoring.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21860] Add checksum field to text table; expose it in API

2010-02-17 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21860

--- Comment #12 from Aaron Halfaker  2010-02-17 
21:06:33 UTC ---
I'm not sure how the conversation got steered to recent changes monitoring
since I don't understand the usefulness of knowing when no-op edits are saved
to articles, but I'd like to bring back the discussion of how a checksum could
be used to quickly determine which revisions are reverts computationally. 
Reverts and reverting are functions of MediaWiki and the way it manages
history. 

Research suggests that most reverts in the English Wikipedia are identity
reverts[1] (reverts that can be detected by comparing checksums between
revisions).  Detecting identity reverts through the current API requires the
retrieval of the complete text of the revisions which consumes appreciable disk
access and network resources.  With the addition of checksums, this expensive
retrieval process could be forgone and writing tools that interact with
MediaWiki's API would be easier and more straightforward.

As for the possibility of collision, an MD5 checksum could contain 16^32
possible values which means that, so long as the realm of possible outputs is
uniform, the probability of a collision is about 1 in sqrt(16^32)[2] or
1:18,446,744,073,709,552,000.  

1. http://doi.acm.org/10.1145/1240624.1240698
2. http://en.wikipedia.org/wiki/Birthday_paradox

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21860] Add checksum field to text table; expose it in API

2010-02-17 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21860

Gurch  changed:

   What|Removed |Added

 CC||matthew.brit...@btinternet.
   ||com

--- Comment #11 from Gurch  2010-02-17 11:58:20 
UTC ---
(In reply to comment #10)
> If anyone wants to reopen this bug, feel free to provide collision percentages
> on the projects data, and how it deviates from 'length' collision data.

I think it's more that the collision odds for a checksum are low enough (on the
order of, say, 1 in 2^32, even if not actually that low) that one can
confidently assume they represent different data, which you can't do for
length.

For heuristic "these might be the same revision" purposes, length is enough.
For "this is definitely the same revision as revision X, therefore since we've
already checked that revision, we don't have to check this one" purposes, it is
not sufficient.

It would be a useful optimization to recent changes monitoring, but not that
important from my perspective (at least, there are much more important issues
to resolve).

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21860] Add checksum field to text table; expose it in API

2010-02-17 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21860

Domas Mituzas  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||INVALID

--- Comment #10 from Domas Mituzas  2010-02-17 
11:49:08 UTC ---
After more thinking, I'm marking this feature request as 'INVALID'. 

Nobody has brought in any discussion here yet how useful that would be, except
unaccounted assumptions.

If anyone wants to reopen this bug, feel free to provide collision percentages
on the projects data, and how it deviates from 'length' collision data.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21860] Add checksum field to text table; expose it in API

2010-02-17 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21860

--- Comment #9 from Domas Mituzas  2010-02-17 11:43:34 
UTC ---
otoh, I personally think that is really useless change, as for a low chance of
collisions, that can be calculated in other ways, we end up storing lots of
data all around.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21860] Add checksum field to text table; expose it in API

2010-02-17 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21860

--- Comment #8 from Reedy  2010-02-17 11:41:51 UTC ---
(In reply to comment #6)
> (In reply to comment #5)
> > Shouldn't we maybe really split this into 2 bugs?
> 
> Not really needed, I think. The bulk of the work to be done is in adding the
> field, exposing it through the API is trivial.

True. At least it being assigned to the right components helps.. And with the
API peoples being CC'd, it can be dealt with when necessary!

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21860] Add checksum field to text table; expose it in API

2010-02-17 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21860

Domas Mituzas  changed:

   What|Removed |Added

 CC||domas.mitu...@gmail.com

--- Comment #7 from Domas Mituzas  2010-02-17 11:39:59 
UTC ---
Why 'revision' table, if it is just for recent change monitoring? If we add it
to 'revision', people will want it on 'recentchanges' too, once we add it to
'recentchanges', there's not much win in having it on 'revision'... :)

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21860] Add checksum field to text table; expose it in API

2010-02-17 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21860

Roan Kattouw  changed:

   What|Removed |Added

  Component|API |Database
 AssignedTo|roan.katt...@gmail.com  |wikibug...@lists.wikimedia.
   ||org

--- Comment #6 from Roan Kattouw  2010-02-17 11:18:22 
UTC ---
(In reply to comment #5)
> Shouldn't we maybe really split this into 2 bugs?

Not really needed, I think. The bulk of the work to be done is in adding the
field, exposing it through the API is trivial.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21860] Add checksum field to text table; expose it in API

2010-02-17 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21860

Reedy  changed:

   What|Removed |Added

 CC||s...@reedyboy.net

--- Comment #5 from Reedy  2010-02-17 09:59:56 UTC ---
Shouldn't we maybe really split this into 2 bugs?

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21860] Add checksum field to text table; expose it in API

2010-01-16 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21860


Alexandre Emsenhuber [IAlex]  changed:

   What|Removed |Added

 CC||alex.emsenhu...@bluewin.ch
   Keywords||schema-change




-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21860] Add checksum field to text table; expose it in API

2009-12-16 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21860


Alexandre Emsenhuber [IAlex]  changed:

   What|Removed |Added

   Keywords|bot-interface   |




-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21860] Add checksum field to text table; expose it in API

2009-12-16 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21860





--- Comment #4 from Roan Kattouw   2009-12-16 14:23:15 
UTC ---
(In reply to comment #3)
> Interesting idea. Remember that the text table can also be an external store,
> so any implementation needs to keep that in mind.
> 

The checksum field (MD5 hash of the content?) would probably be in the revision
table. It would also make the optimization of reusing text rows more feasible
(this probably wouldn't have too much of an impact on Wikipedia due to our
usage of gzip in external storage, but it could save a lot of space on
third-party wikis not using ES).


-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21860] Add checksum field to text table; expose it in API

2009-12-16 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21860


Chad H.  changed:

   What|Removed |Added

 CC||innocentkil...@gmail.com




--- Comment #3 from Chad H.   2009-12-16 14:19:10 UTC 
---
Interesting idea. Remember that the text table can also be an external store,
so any implementation needs to keep that in mind.


-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21860] Add checksum field to text table; expose it in API

2009-12-15 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21860


Roan Kattouw  changed:

   What|Removed |Added

 CC||roan.katt...@gmail.com




--- Comment #2 from Roan Kattouw   2009-12-15 22:40:00 
UTC ---
(In reply to comment #1)
> Such a thing would definitely be useful for recent changes monitoring (can see
> quickly if the current revision of a page is identical to a previous revision,
> without retrieving the whole text of the previous revision). However since the
> field wouldn't be populated for older revisions, that would make it actually
> rather useless, at least to begin with.
> 

I don't think it'd be that useless in the beginning. Most reverts revert to a
version that's at most 24-48h old (experience, no actual data to back this up),
so "to begin with" would be a very small timeframe.


-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 21860] Add checksum field to text table; expose it in API

2009-12-15 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=21860





--- Comment #1 from Gurch   2009-12-15 22:35:03 
UTC ---
Such a thing would definitely be useful for recent changes monitoring (can see
quickly if the current revision of a page is identical to a previous revision,
without retrieving the whole text of the previous revision). However since the
field wouldn't be populated for older revisions, that would make it actually
rather useless, at least to begin with.


-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l