Re: SIP-10: Solr 9 examples: Can we use Ref Guide as a dogfood example?

2020-09-04 Thread YOGENDRA SONI
Any good text Information Retrieval dataset may be a good candidate. https://github.com/harpribot/awesome-information-retrieval#datasets these datasets have benchmarks and sample queries also. On Fri, Sep 4, 2020 at 11:26 AM David Smiley wrote: > It's tempting to accomplish two goals at once

Re: SIP-10: Solr 9 examples: Can we use Ref Guide as a dogfood example?

2020-09-03 Thread David Smiley
It's tempting to accomplish two goals at once (tutorial & searchable ref guide) but I think the realities of making a *good* searchable ref guide may distract someone from learning as it tries to do both well. A searchable ref-guide could very well be its own project that we point people learning

Re: SIP-10: Solr 9 examples: Can we use Ref Guide as a dogfood example?

2020-09-01 Thread Alexandre Rafalovitch
That Jeopardy set reads very dubious. Content that was collected by scraping and available on various sharing sites (including Mega!). I would not feel comfortable working with that in our context. There are other dataset sources. I like the ones that Data is Plural newsletter collects:

Re: SIP-10: Solr 9 examples: Can we use Ref Guide as a dogfood example?

2020-09-01 Thread Jan Høydahl
What about 200.000 Jeopardy questions in JSON format? https://www.reddit.com/r/datasets/comments/1uyd0t/20_jeopardy_questions_in_a_json_file/ I downloaded the file in a few seconds, and it also has

Re: SIP-10: Solr 9 examples: Can we use Ref Guide as a dogfood example?

2020-09-01 Thread Alexandre Rafalovitch
I've thought of providing instructions. But for good indexing, we should use adoc format as source, rather than html (as Cassandra's presentation showed), so that means dependencies to build by user to get asciidoctor library. And the way to get content, so either git clone or download the whole

Re: SIP-10: Solr 9 examples: Can we use Ref Guide as a dogfood example?

2020-09-01 Thread Jan Høydahl
I’d rather ship a tutorial and tooling that explains how to index the ref-guide, than shipping a binary index. What other full-text datasets have you considered as candidates for getting-started examples? Jan > 1. sep. 2020 kl. 05:53 skrev Alexandre Rafalovitch : > > I did not say it was

Re: SIP-10: Solr 9 examples: Can we use Ref Guide as a dogfood example?

2020-08-31 Thread Alexandre Rafalovitch
I did not say it was trivial, but I also did not quite mention the previous research. https://github.com/arafalov/solr-refguide-indexing/blob/master/src/com/solrstart/refguide/Indexer.java Uses official AsciidoctorJ library directory. Not sure if that's just JRuby version of Asciidoctor we

Re: SIP-10: Solr 9 examples: Can we use Ref Guide as a dogfood example?

2020-08-31 Thread Gus Heck
Some background to consider before committing to that... it might not be as trivial as you think. (I've often thought it ironic that we don't have real search for our ref guide... ) https://www.youtube.com/watch?v=DixlnxAk08s -Gus On Mon, Aug 31, 2020 at 2:06 PM Ishan Chattopadhyaya <

Re: SIP-10: Solr 9 examples: Can we use Ref Guide as a dogfood example?

2020-08-31 Thread Ishan Chattopadhyaya
I love the idea of making the ref guide itself as an example dataset. That way, we won't need to ship anything separately. Python's beautiful soup can extract text from the html pages. I'm sure there maybe such things in Java too (can Tika do this?). On Mon, 31 Aug, 2020, 11:18 pm Alexandre

SIP-10: Solr 9 examples: Can we use Ref Guide as a dogfood example?

2020-08-31 Thread Alexandre Rafalovitch
Hi, I need a sanity check. I am in the planning stages for the new example datasets to ship with Solr 9. The one I am looking at is great for structured information, but is quite light on full-text content. So, I am thinking of how important that is and what other sources could be used. One -