On Tue, Jul 7, 2009 at 04:49, Steve Bennett<stevag...@gmail.com> wrote:
> On Mon, Jul 6, 2009 at 9:05 PM, Amir E. Aharoni<amir.ahar...@gmail.com> wrote:
>> 2. The info won't be up-to-date. Would it be too much to ask to search
>> the database directly using regexes?
>
> What's your use case? Obviously all the points below are valid and
> rule out directly regex searching on the entire Wikipedia database,
> for instance, but I wonder if you could have hybrid cases like "return
> pages that contain X and regex Y". Since X can be indexed, you're
> immediately working on a (much) smaller subset.

It is not really that important for me to search the live Wikipedia.

Currently i mostly want to satisfy my linguistic curiosity and find
out statistics about usage of different spellings in the Hebrew
Wikipedia (modern [[Hebrew spelling]] is wildly inconsistent). The
regular search engine, which is mostly tailored for English, is almost
useless for this task. But searching a dump would be enough.

(AWB is ruled out, because i frequently need to run it on GNU/Linux.)

-- 
אמיר אלישע אהרוני
Amir Elisha Aharoni

http://aharoni.wordpress.com

"We're living in pieces,
 I want to live in peace." - T. Moore

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to