Injecting synonymns into Solr

2015-04-30 Thread Zheng Lin Edwin Yeo
Hi, Does anyone knows any faster method of populating the synonyms.txt file instead of manually typing in the words into the file, which there could be thousands of synonyms around? Regards, Edwin

Re: How to register a custom QParserPlugin

2015-04-30 Thread Oliver Obenland
Hi Hoss, thank you for your help. This helps a lot. I can see the plugin neither in the log nor in the plugin list, but it works now (got an exception from our class, so I know it'll be called). Thanks a lot! Oliver Am 29.04.2015 um 18:40 schrieb Chris Hostetter: : snippet queryparser

AW: Odp.: solr issue with pdf forms

2015-04-30 Thread Steve.Scholl
Hey, thanks a lot for the hint with pdfbox-app.jar. For testing purpose I now extracted a affected pdf form and a usual pdf file. The result ist he following: Usual pdf file: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et d pdf form:

AW: Odp.: solr issue with pdf forms

2015-04-30 Thread Steve.Scholl
Thank you very much fort he detailed information. I now checked the properties of the content fied. In my oppinion it is indexed, right?: Field: content Properties: Indexed, Tokenized, Stored, TermVector Stored Schema: Indexed, Tokenized, Stored, TermVector Stored Index: Indexed, Tokenized,

Need help with Nested docs situation

2015-04-30 Thread roySolr
Hello, I have a situation and i'm a little bit stuck on the way how to fix it. For example the following data structure: *Deal* All Coca Cola 20% off *Products* Coca Cola light Coca Cola Zero 1L Coca Cola Zero 20CL Coca Cola 1L When somebody search to Cola discount i want the result of the

Proximity Search

2015-04-30 Thread Vijaya Narayana Reddy Bhoomi Reddy
Hi, I have created my index with the default configurations. Now I am trying to use proximity search. However, I am bit not sure on the results and where its going wrong. For example, I want to find two phrases this is phrase one and another phrase this is the second phrase with not more than a

Re: analyzer, indexAnalyzer and queryAnalyzer

2015-04-30 Thread Dan Davis
Hi Doug, nice write-up and 2 questions: - You write your own QParser plugins - can one keep the features of edismax for field boosting/phrase-match boosting by subclassing edismax? Assuming yes... - What do pf2 and pf3 do in the edismax query parser? hon-lucene-synonyms plugin links

Re: Proximity Search

2015-04-30 Thread Vijaya Narayana Reddy Bhoomi Reddy
I just tried with simple proximity search like word1 word2 ~3 and it is not working. Just wondering whether I have to make any configuration changes to solrconfig.xml to make proximity search work? Thanks Vijay On 30 April 2015 at 14:32, Vijaya Narayana Reddy Bhoomi Reddy

Negative Boosting documents with a certain word

2015-04-30 Thread O. Olson
Hi, My Solr documents contain descriptions of products, similar to a BestBuy or a NewEgg catalog. I'm wondering if it were possible to push a product down the ranking if it contains a certain word. By this I mean it would still appear in the search results. However, instead of appearing

Re: Proximity Search

2015-04-30 Thread Rajani Maski
Hi Vijaya, I just quickly tried proximity search with the example set shipped with solr 5 and it looked like working for me. Perhaps, what you could is debug the query by enabling debugQuery=true. Here are the steps that I tried.(Assuming you are on Solr 5. Though this term proximity

Avoiding a schema.xml

2015-04-30 Thread Sznajder ForMailingList
Hi, I am interested to index some documents in Solr, as I did in Lucene. I mean: giving via solrJ all the information about the field I am adding (Tokenize, store, facet etc...) can we do that? Or is it mandatory to define a schema on the collection? Thanks a lot! Benjamin

Re: Injecting synonymns into Solr

2015-04-30 Thread Vincenzo D'Amore
Which version of solr? On Thu, Apr 30, 2015 at 9:58 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi, Does anyone knows any faster method of populating the synonyms.txt file instead of manually typing in the words into the file, which there could be thousands of synonyms around?

Re: Injecting synonymns into Solr

2015-04-30 Thread Kaushik
I am facing the same problem; currently I am resorting to a custom program to create this file. Hopefully there is a better solution out there. Thanks, Kaushik On Thu, Apr 30, 2015 at 3:58 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi, Does anyone knows any faster method of

Re: Injecting synonymns into Solr

2015-04-30 Thread Scott Dawson
There is a possible solution here: https://issues.apache.org/jira/browse/LUCENE-2347 (Dump WordNet to SOLR Synonym format). I don't have personal experience with it. I only know about it because it's mentioned on page 184 of the 'Solr in Action' book by Trey Grainger and Timothy Potter. Maybe

RE: Proximity Search

2015-04-30 Thread Allison, Timothy B.
You'll need the ComplexPhraseQueryParser [1] to handle multiterm (wildcard/fuzzy/regex) terms in proximity. Beware, though, that that does not perform analysis on fuzzy/wildcard IIRC). The SurroundQueryParser can probably do both phrase near phrase and multiterm within proximity. Same

Re: Injecting synonymns into Solr

2015-04-30 Thread Zheng Lin Edwin Yeo
I'm using Solr-5.0.0 and ZooKeeper-3.4.6. I've gotton some samples from the Moby Treasure List http://www.gutenberg.org/catalog/world/results?title=moby+list to try it out. However, currently I can only have up to around 2100 lines in my synonyms.txt in when I load the configuration into

Re: Proximity Search

2015-04-30 Thread Vijaya Narayana Reddy Bhoomi Reddy
Thanks Rajani. I could get proximity search work for individual words. However, still could not make it work for two phrases, each containing more than a word. Also, results seem to be unexpected for proximity queries with wildcards. Thanks Regards Vijay On 30 April 2015 at 15:19, Rajani

Re: Collections API Overseer status and statistics

2015-04-30 Thread Shawn Heisey
On 4/30/2015 11:22 AM, Ryan Steele wrote: What time unit is the Solr collections API overseerstatus action using in the returned data? For example, given the following XML: double name=avgTimePerRequest0.15491020578778136/double Is the avgTimePerRequest in seconds? Most timing data in Solr

Re: Proximity Search

2015-04-30 Thread Sujit Pal
Hi Vijay, I haven't tried this myself, but perhaps you could build the two phrases as PhraseQueries and connect them up with a SpanQuery? Something like this (using your original example). PhraseQuery p1 = new PhraseQuery(); for (String word : this is phrase 1.split()) { p1.add(new

RE: Odp.: solr issue with pdf forms

2015-04-30 Thread Davis, Daniel (NIH/NLM) [C]
Steve, Another possibility is to use the Linux pdftotext command-line utility or a software daemon linked with the libraries it uses, usually part of the poppler-utils package. pdfbox should have the same basic capabilities, but may run a little slower. If you have very many filled pdf

Re: Proximity Search

2015-04-30 Thread Dmitry Kan
Hi, If adding PhraseQuery objects does not work, then using SpanNearQuery with slop 0 and order true for p1 and p2 should work (tried). Dmitry On Thu, Apr 30, 2015 at 8:43 PM, Sujit Pal sujit@comcast.net wrote: Hi Vijay, I haven't tried this myself, but perhaps you could build the two

Re: Negative Boosting documents with a certain word

2015-04-30 Thread Chris Hostetter
: My Solr documents contain descriptions of products, similar to a BestBuy or : a NewEgg catalog. I'm wondering if it were possible to push a product down : the ranking if it contains a certain word. By this I mean it would still

Collections API Overseer status and statistics

2015-04-30 Thread Ryan Steele
What time unit is the Solr collections API overseerstatus action using in the returned data? For example, given the following XML: double name=avgTimePerRequest0.15491020578778136/double Is the avgTimePerRequest in seconds? Thanks, Ryan

Re: Proximity Search

2015-04-30 Thread Vijaya Narayana Reddy Bhoomi Reddy
Thanks Tim for the information. I shall have a look at them. Thanks Regards Vijay On 30 April 2015 at 18:13, Allison, Timothy B. talli...@mitre.org wrote: You'll need the ComplexPhraseQueryParser [1] to handle multiterm (wildcard/fuzzy/regex) terms in proximity. Beware, though, that that

RE: Proximity Search

2015-04-30 Thread Vijay Bhoomireddy
Thanks All, I shall try out the options and see how the results are. Thanks Regards Vijay -Original Message- From: Dmitry Kan [mailto:solrexp...@gmail.com] Sent: 30 April 2015 18:58 To: solr-user@lucene.apache.org Subject: Re: Proximity Search Hi, If adding PhraseQuery objects does

Optimal configuration for high throughput indexing

2015-04-30 Thread Vinay Pothnis
Hello, I have a usecase with the following characteristics: - High index update rate (adds/updates) - High query rate - Low index size (~800MB for 2.4Million docs) - The documents that are created at the high rate eventually expire and are deleted regularly at half hour intervals I

Lucene/Solr Revolution 2015 - Austin Oct 13-16 - CFP ends next Week

2015-04-30 Thread Chris Hostetter
(cross posted, please confine any replies to general@lucene) A quick reminder and/or heads up for htose who haven't heard yet: this year's Lucene/Solr Revolution is happeing in Austin Texas in October. The CFP and Early bird registration are currently open. (CFP ends May 8, Early Bird ends

Re: Collections API Overseer status and statistics

2015-04-30 Thread Shalin Shekhar Mangar
HI Ryan, That is in milliseconds. On Thu, Apr 30, 2015 at 10:52 PM, Ryan Steele ryan.ste...@pgi.com wrote: What time unit is the Solr collections API overseerstatus action using in the returned data? For example, given the following XML: double

Re: Injecting synonymns into Solr

2015-04-30 Thread Zheng Lin Edwin Yeo
Just to populate it with the general synonym words. I've managed to populate it with some source online, but is there a limit to what it can contains? I can't load the configuration into zookeeper if the synonyms.txt file contains more than 2100 lines. Regards, Edwin On 1 May 2015 05:44, Chris

Re: Injecting synonymns into Solr

2015-04-30 Thread Zheng Lin Edwin Yeo
Thank you for the info. Yup this works. I found out that we can't load files that are more than 1MB into zookeeper, as it happens to any files that's larger than 1MB in size, not just the synonyms files. But I'm not sure if there will be an impact to the system, as the number of synonym text file

Bug with full text search fields in multiple languages (solr 5)

2015-04-30 Thread erantone
Dear all, I have defined two dynamic fields: dynamicField name=*_texts_en stored=true type=text_en multiValued=true indexed=true/ dynamicField name=*_texts_pt stored=true type=text_pt multiValued=true indexed=true/ for documents in English and in Portuguese, with the following index and

Re: Injecting synonymns into Solr

2015-04-30 Thread Philippe Soares
Split your synonyms into multiple files and set the SynonymFilterFactory with a coma-separated list of files. e.g. : synonyms=syn1.txt,syn2.txt,syn3.txt On Thu, Apr 30, 2015 at 8:07 PM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Just to populate it with the general synonym words. I've

Re: Injecting synonymns into Solr

2015-04-30 Thread Chris Hostetter
: There is a possible solution here: : https://issues.apache.org/jira/browse/LUCENE-2347 (Dump WordNet to SOLR : Synonym format). If you have WordNet synonyms you do't need any special code/tools to convert them -- the current solr.SynonymFilterFactory supports wordnet files (just specify

Re: analyzer, indexAnalyzer and queryAnalyzer

2015-04-30 Thread Doug Turnbull
- You write your own QParser plugins - can one keep the features of edismax for field boosting/phrase-match boosting by subclassing edismax? Assuming yes... hon-lucene-synonyms does this, but largely by copy pasting the code (sorry about the broken link!) pf2 and pf3 take the query hello my

Re: Avoiding a schema.xml

2015-04-30 Thread Erick Erickson
Could you explain a bit more _why_ you want to do this? As you're probably well aware, there are multiple ways to shoot yourself in the foot in lower-level Lucene. If you have some situation where you're creating indexes on the fly that may vary then you could consider the managed schema that

Re: Avoiding a schema.xml

2015-04-30 Thread Shawn Heisey
On 4/30/2015 8:43 AM, Sznajder ForMailingList wrote: I am interested to index some documents in Solr, as I did in Lucene. I mean: giving via solrJ all the information about the field I am adding (Tokenize, store, facet etc...) can we do that? Or is it mandatory to define a schema on the

Re: Odp.: solr issue with pdf forms

2015-04-30 Thread Erick Erickson
OK, given all that Tika _is_ sending the weird characters to Solr. You can get them out of the index by using someting like PatternReplaceTokenFilterFactory or PatternReplaceCharFilterFactory in you analysis chain. However, you'll still be stuck with the odd characters showing up in your browser.

Re: Odp.: solr issue with pdf forms

2015-04-30 Thread Jack Krupansky
Or use a Solr update processor to scrub the source values. The regex pattern replacement processor could do the trick: http://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/update/processor/RegexReplaceProcessorFactory.html -- Jack Krupansky On Thu, Apr 30, 2015 at 11:17 AM, Erick

Re: Odp.: solr issue with pdf forms

2015-04-30 Thread Erick Erickson
Jack: I keep forgetting those things exist, thanks for the reminder! On Thu, Apr 30, 2015 at 8:23 AM, Jack Krupansky jack.krupan...@gmail.com wrote: Or use a Solr update processor to scrub the source values. The regex pattern replacement processor could do the trick:

RE: analyzer, indexAnalyzer and queryAnalyzer

2015-04-30 Thread Davis, Daniel (NIH/NLM) [C]
Thank you. -Original Message- From: Doug Turnbull [mailto:dturnb...@opensourceconnections.com] Sent: Thursday, April 30, 2015 11:33 AM To: solr-user@lucene.apache.org; Dan Davis Subject: Re: analyzer, indexAnalyzer and queryAnalyzer - You write your own QParser plugins - can one keep