Re: [Toolserver-l] Extracting basic revision data

2010-11-29 Thread Михајло Анђелковић
Hm, that is very correct. The data I've got do not have this info. But I won't run such a query again soon, since this still does the job: for now I only want to acknowledge when somebody has left sr.wp and to book the reason by reviewing the talk and other relevant pages from that time. Thank yo

Re: [Toolserver-l] Extracting basic revision data

2010-11-29 Thread River Tarnell
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Михајло Анђелковић: > Namespaces are easily determined from the page prefix, I am not > bothered if there are any anomalies out there (i.e. page starting with > "User talk:" being in NS 0) There are no page namespace prefixes in the databases. IOW, "

Re: [Toolserver-l] Extracting basic revision data

2010-11-29 Thread Михајло Анђелковић
Thank you, guys, I've already taken what I needed. Namespaces are easily determined from the page prefix, I am not bothered if there are any anomalies out there (i.e. page starting with "User talk:" being in NS 0) and the query is lighter in case ns isn't being pulled out from the DB. In overall,

Re: [Toolserver-l] Extracting basic revision data

2010-11-29 Thread MZMcBride
Михајло Анђелковић wrote: > I would ask for allowance to run a request that can be resource > consuming if not properly scaled: > > SELECT page.page_title as title, rev_user_text as user, rev_timestamp > as timestamp, rev_len as len FROM revision JOIN page ON page.page_id = > rev_page WHERE rev_id

Re: [Toolserver-l] Extracting basic revision data

2010-11-29 Thread Platonides
Михајло Анђелковић wrote: > Hello, > > I would ask for allowance to run a request that can be resource > consuming if not properly scaled: > > SELECT page.page_title as title, rev_user_text as user, rev_timestamp > as timestamp, rev_len as len FROM revision JOIN page ON page.page_id = > rev_page

Re: [Toolserver-l] Extracting basic revision data

2010-11-29 Thread River Tarnell
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Михајло Анђелковић: > WHERE rev_id > 0 AND rev_id < [...] AND rev_deleted = 0; Please check that MySQL plans this correctly (using the rev_id index). > If this is generally allowed to do, my question is how large chunks of > data can I take at once,

Re: [Toolserver-l] Extracting basic revision data

2010-11-29 Thread Михајло Анђелковић
Unfortunately, the complete dumps contain lots if data I don't actually need and I am afraid I am not willing to commit such an impact to my small HDD. And even more, they are really unavailable since 10.11, which is kind of very long already. Right now I have time for this research and I want to

Re: [Toolserver-l] Extracting basic revision data

2010-11-29 Thread Petr Kadlec
2010/11/29 Михајло Анђелковић : > This is intended to extract basic data about all publicly visible > revisions from 1 to [...]. Info about each revision would be a 4-tuple > title/user name/time/length. I need this data to start generating a > timeline of editing of srwiki, so it is intended to be

[Toolserver-l] Extracting basic revision data

2010-11-28 Thread Михајло Анђелковић
Hello, I would ask for allowance to run a request that can be resource consuming if not properly scaled: SELECT page.page_title as title, rev_user_text as user, rev_timestamp as timestamp, rev_len as len FROM revision JOIN page ON page.page_id = rev_page WHERE rev_id > 0 AND rev_id < [...] AND re