-Original Message-
From: wikitech-l-boun...@lists.wikimedia.org
[mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Brion Vibber
Sent: Thursday, March 7, 2013 9:59 PM
To: Wikimedia developers
Subject: Re: [Wikitech-l] Indexing non-text content in LuceneSearch
On Thu, Mar 7, 2013
Hi all!
I would like to ask for you input on the question how non-wikitext content can
be indexed by LuceneSearch.
Background is the fact that full text search (Special:Search) is nearly useless
on wikidata.org at the moment, see
https://bugzilla.wikimedia.org/show_bug.cgi?id=42234.
The reason
On Thu, Mar 7, 2013 at 11:45 AM, Daniel Kinzler dan...@brightbyte.de wrote:
1) create a specialized XML dump that contains the text generated by
getTextForSearchIndex() instead of actual page content.
That probably makes the most sense; alternately, make a dump that
includes both raw data and
On 07.03.2013 20:58, Brion Vibber wrote:
3) The indexer code (without plugins) should not know about Wikibase, but it
may
have hard coded knowledge about JSON. It could have a special indexing mode
for
JSON, in which the structure is deserialized and traversed, and any values
are
added
(1) seems like the right way to go to me too.
There may be other ways but puppet/files/lucene/lucene.jobs.sh has a
function called
import-db() which creates a dump like this:
php $MWinstall/common/multiversion/MWScript.php dumpBackup.php $dbname
--current $dumpfile
Ram
On Thu, Mar 7, 2013