Re: [Wikitech-l] Indexing non-text content in LuceneSearch

Munagala Ramanath Thu, 07 Mar 2013 14:11:02 -0800

(1) seems like the right way to go to me too.

There may be other ways but puppet/files/lucene/lucene.jobs.sh has a
function called
import-db() which creates a dump like this:


   php $MWinstall/common/multiversion/MWScript.php dumpBackup.php $dbname
--current > $dumpfile

Ram


On Thu, Mar 7, 2013 at 1:05 PM, Daniel Kinzler <dan...@brightbyte.de> wrote:

> On 07.03.2013 20:58, Brion Vibber wrote:
> >> 3) The indexer code (without plugins) should not know about Wikibase,
> but it may
> >> have hard coded knowledge about JSON. It could have a special indexing
> mode for
> >> JSON, in which the structure is deserialized and traversed, and any
> values are
> >> added to the index (while the keys used in the structure would be
> ignored). We
> >> may still be indexing useless interna from the JSON, but at least there
> would be
> >> a lot fewer false negatives.
> >
> > Indexing structured data could be awesome -- again I think of file
> > metadata as well as wikidata-style stuff. But I'm not sure how easy
> > that'll be. Should probably be in addition to the text indexing,
> > rather than replacing.
>
> Indeed, but option 3 is about *blindly* indexing *JSON*. We definitly want
> indexed structured data, the question is just how to get that into the
> LSearch
> infrastructure.
>
> -- daniel
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Indexing non-text content in LuceneSearch

Reply via email to