Re: Wikipedia or reuters like index for testing facets?

Jason Rutherglen Wed, 15 Jul 2009 11:35:53 -0700

Yeah that's what I was thinking of as an alternative, use enwiki
and randomly generate facet data along with it. However for
consistent benchmarking the random data would need to stay the
same so that people could execute the same benchmark
consistently in their own environment.


On Tue, Jul 14, 2009 at 6:28 PM, Mark Miller<markrmil...@gmail.com> wrote:
> Why don't you just randomly generate the facet data? Thats prob the best way
> right? You can control the uniques and ranges.
>
> On Wed, Jul 15, 2009 at 1:21 AM, Grant Ingersoll <gsing...@apache.org>wrote:
>
>> Probably not as generated by the EnwikiDocMaker, but the WikipediaTokenizer
>> in Lucene can pull out richer syntax which could then be Teed/Sinked to
>> other fields.  Things like categories, related links, etc.  Mostly, though,
>> I was just commenting on the fact that it isn't hard to at least use it for
>> getting docs into Solr.
>>
>> -Grant
>>
>> On Jul 14, 2009, at 7:38 PM, Jason Rutherglen wrote:
>>
>>  You think enwiki has enough data for faceting?
>>>
>>> On Tue, Jul 14, 2009 at 2:56 PM, Grant Ingersoll<gsing...@apache.org>
>>> wrote:
>>>
>>>> At a min, it is trivial to use the EnWikiDocMaker and then send the doc
>>>> over
>>>> SolrJ...
>>>>
>>>> On Jul 14, 2009, at 4:07 PM, Mark Miller wrote:
>>>>
>>>>  On Tue, Jul 14, 2009 at 3:36 PM, Jason Rutherglen <
>>>>> jason.rutherg...@gmail.com> wrote:
>>>>>
>>>>>  Is there a standard index like what Lucene uses for contrib/benchmark
>>>>>> for
>>>>>> executing faceted queries over? Or maybe we can randomly generate one
>>>>>> that
>>>>>> works in conjunction with wikipedia? That way we can execute real world
>>>>>> queries against faceted data. Or we could use the Lucene/Solr mailing
>>>>>> lists
>>>>>> and other data (ala Lucid's faceted site) as a standard index?
>>>>>>
>>>>>>
>>>>> I don't think there is any standard set of docs for solr testing - there
>>>>> is
>>>>> not a real benchmark contrib - though I know more than a few of us have
>>>>> hacked up pieces of Lucene benchmark to work with Solr - I think I've
>>>>> done
>>>>> it twice now ;)
>>>>>
>>>>> Would be nice to get things going. I was thinking the other day: I
>>>>> wonder
>>>>> how hard it would be to make Lucene Benchmark generic enough to accept
>>>>> Solr
>>>>> impls and Solr algs?
>>>>>
>>>>> It does a lot that would suck to duplicate.
>>>>>
>>>>> --
>>>>> --
>>>>> - Mark
>>>>>
>>>>> http://www.lucidimagination.com
>>>>>
>>>>
>>>> --------------------------
>>>> Grant Ingersoll
>>>> http://www.lucidimagination.com/
>>>>
>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>>>> Solr/Lucene:
>>>> http://www.lucidimagination.com/search
>>>>
>>>>
>>>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>
>
>
> --
> --
> - Mark
>
> http://www.lucidimagination.com
>

Re: Wikipedia or reuters like index for testing facets?

Reply via email to