On Tue, Jul 7, 2009 at 04:49, Steve Bennett<stevag...@gmail.com> wrote: > On Mon, Jul 6, 2009 at 9:05 PM, Amir E. Aharoni<amir.ahar...@gmail.com> wrote: >> 2. The info won't be up-to-date. Would it be too much to ask to search >> the database directly using regexes? > > What's your use case? Obviously all the points below are valid and > rule out directly regex searching on the entire Wikipedia database, > for instance, but I wonder if you could have hybrid cases like "return > pages that contain X and regex Y". Since X can be indexed, you're > immediately working on a (much) smaller subset.
It is not really that important for me to search the live Wikipedia. Currently i mostly want to satisfy my linguistic curiosity and find out statistics about usage of different spellings in the Hebrew Wikipedia (modern [[Hebrew spelling]] is wildly inconsistent). The regular search engine, which is mostly tailored for English, is almost useless for this task. But searching a dump would be enough. (AWB is ruled out, because i frequently need to run it on GNU/Linux.) -- אמיר אלישע אהרוני Amir Elisha Aharoni http://aharoni.wordpress.com "We're living in pieces, I want to live in peace." - T. Moore _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l