Re: Another japanese analysis problem

2014-04-18 Thread Alexandre Rafalovitch
Did you read through the CJK article series? Maybe there is something in there? http://discovery-grindstone.blogspot.com/2013/10/cjk-with-solr-for-libraries-part-1.html Sorry, no help on actual Japanese. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project:

'qt' parameter is not working in search call of SolrPhpClient

2014-04-18 Thread harshrossi
I am using SolrPhpClient for interacting with Solr via PHP. I am using a custom request handler ( /select_test ) with 'edismax' feature in Solr config file requestHandler name=/select_test class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str str

solr parallel update and total indexing Issue

2014-04-18 Thread ~$alpha`
There is a bis issue in solr parallel update and total indexing Total Import syntax (working) dataimport?command=full-importcommit=trueoptimize=true Update syntax(working) solr/update?softCommit=true' -H 'Content-type:application/json' -d '[{id:1870719,column:{set:11}}]'

Re: Another japanese analysis problem

2014-04-18 Thread Shawn Heisey
On 4/18/2014 12:04 AM, Alexandre Rafalovitch wrote: Did you read through the CJK article series? Maybe there is something in there? http://discovery-grindstone.blogspot.com/2013/10/cjk-with-solr-for-libraries-part-1.html Sorry, no help on actual Japanese. Almost everything I know about the

Re: Where to specify numShards when startup up a cloud setup

2014-04-18 Thread Liu Bo
Hi zzT Putting numShards in core.properties also works. I struggled a little bit while figuring out this configuration approach. I knew I am not alone! ;-) On 2 April 2014 18:06, zzT zis@gmail.com wrote: It seems that I've figured out a configuration approach to this issue. I'm having

Having trouble with German compound words in Solr 4.7

2014-04-18 Thread Alistair
Hello all, I'm a fairly new Solr user and I need my search function to handle compound words in German. I've searched through the archives and found that Solr already has a Filter Factory made for such words called DictionaryCompoundWordTokenFilterFactory. I've already built a list of words that

Re: Having trouble with German compound words in Solr 4.7

2014-04-18 Thread Jack Krupansky
Make sure your field type has the autoGeneratePhraseQueries=true attribute (default is false). q.op only applies to explicit terms, not to terms which decompose into multiple terms. Confusing? Yes! -- Jack Krupansky -Original Message- From: Alistair Sent: Friday, April 18, 2014 6:11

space between search terms

2014-04-18 Thread kumar
Hi, I Have a field called title. It is having a values called indira nagar as well as indiranagar. If i type any of the keywords it has to display both results. Can anybody help how can we do this? I am using the title field in the following way: fieldType name=title class=solr.TextField

Re: space between search terms

2014-04-18 Thread Jack Krupansky
Use an index-time synonym filter with a synonym entry: indira nagar,indiranagar But do not use that same filter at query time. But, that may mess up some exact phrase queries, such as: q=indiranagar xyz since the following term is actually positioned after the longest synonym. To resolve

Re: multi word search for elevator (QueryElevationComponent) not working

2014-04-18 Thread Niranjan
Hi Remi , Thanks for your reply. I tried with with setting the query_text for apple ipod and added the required doc_id to elevate. I got the result but again I am not able to get the desired result for NLP queries such as ipod nano generation 5 or apple ipod best music . As in both the queries

Re: Having trouble with German compound words in Solr 4.7

2014-04-18 Thread Alistair
Hey Jack, thanks for the reply. I added autoGeneratePhraseQueries=true to the fieldType and now it's giving me even more results! I'm not sure if the debug of my query will be helpful but I'll paste it just in case someone might have an idea. This produces 113524 results, whereas if I manually

QueryElevationComponent always reads config from zookeeper

2014-04-18 Thread ronak kirit
Hello, I was looking into QueryElevationComponent component. As per the spec (http://wiki.apache.org/solr/QueryElevationComponent), if config is not found in zookeepr, it should be loaded from data directory. However, I see the bug. It doesn't seem to be working even in latest 4.7.2 release. I

Re: cache warming questions

2014-04-18 Thread Kranti Parisa
cool, thanks. Thanks, Kranti K. Parisa http://www.linkedin.com/in/krantiparisa On Thu, Apr 17, 2014 at 11:37 PM, Erick Erickson erickerick...@gmail.comwrote: No, the 5 most recently used in a query will be used to autowarm. If you have things you _know_ are going to be popular fqs, you

Re: Filtering Solr Queries

2014-04-18 Thread Erick Erickson
Is this a manageable list? That is, not a zillion names? If so, it seems like you could do this with synonyms. Assuming your string_ci bit is a string type, you'd need to change that to something like KeywordTokenizerFactory followed by filters, and you might want to add something like

Re: Having trouble with German compound words in Solr 4.7

2014-04-18 Thread Siegfried Goeschl
Hi Alistair, quick email before getting my plane - I worked with similar requirements in the past and tuning SOLR can be tricky * are you hitting the same SOLR query handler (application versus manual checking)? * turn on debugging for your application SOLR queries so you see what query is

Re: 'qt' parameter is not working in search call of SolrPhpClient

2014-04-18 Thread Erick Erickson
You're confusing a couple of things here. the /select_test can be accessed by pointing your URL at it rather than using qt, i.e. the destination you're going to will be http://server:port/solr/collection/select_test rather than http://server:port/solr/collection/select Best, Erick On Thu, Apr

multi-field suggestions

2014-04-18 Thread Michael Sokolov
I've been working on getting AnalyzingInfixSuggester to make suggestions using tokens drawn from multiple fields. I've done this by copying tokens from each of those fields into a destination field, and building suggestions using that destination field. This allows me to use different

Re: solr parallel update and total indexing Issue

2014-04-18 Thread Erick Erickson
try not setting softCommit=true, that's going to take the current state of your index and make it visible. If your DIH process has deleted all your records, then that's the current state. Personally I wouldn't try to mix-n-match like this, the results will take forever to get right. If you

Re: Can I reconstruct text from tokens?

2014-04-18 Thread Michael Sokolov
I believe you could use term vectors to retrieve all the terms in a document, with their offsets. Retrieving them from the inverted index would be expensive since the index is term-oriented, not document-oriented. Without tv, I think you essentially have to scan the entire term dictionary

Re: Indexing Big Data With or Without Solr

2014-04-18 Thread Vineet Mishra
Thanks Furkan, I will definitely give it a try then. Thanks again! On Tue, Apr 15, 2014 at 7:53 PM, Furkan KAMACI furkankam...@gmail.comwrote: Hi Vineet; I've been using SolrCloud for such kind of Big Data and I think that you should consider to use it. If you have any problems you can

Boost Search results

2014-04-18 Thread A Laxmi
Hi, When I started to compare the search results with the two options below, I see a lot of difference in the search results esp. the* urls that show up on the top *(*Relevancy *perspective). (1) Nutch 2.2.1 (with *Solr 4.0*) (2) Bing custom search set-up I wonder how should I tweak the boost

Re: Can I reconstruct text from tokens?

2014-04-18 Thread Ramkumar R. Aiyengar
Sorry, didn't think this through. You're right, still the same problem.. On 16 Apr 2014 17:40, Alexandre Rafalovitch arafa...@gmail.com wrote: Why? I want stored=false, at which point multivalued field is just offset values in the dictionary. Still have to reconstruct from offsets. Or am I

Re: Boost Search results

2014-04-18 Thread Markus Jelsma
Hi, replicating full features search engine behaviour is not going to work with nutch and solr out of the box. You are missing a thousand features such as proper main content extraction, deduplication, classification of content and hub or link pages, and much more. These things are possible to

Re: Boost Search results

2014-04-18 Thread A Laxmi
Hi Markus, Yes, you are right. I passed the qf from my front-end framework (PHP which uses SolrClient). This is how I got it set-up: $this-solr-set_param('defType','edismax'); $this-solr-set_param('qf','title^10 content^5 url^5'); where you can see qf = title^10 content^5 url^5

Re: Boost Search results

2014-04-18 Thread A Laxmi
Markus, like I mentioned in my last email, I have got the qf with title, content and url. That doesn't help a whole lot. Could you please advise if there are any other parameters that I should consider for solr request handler config or the numbers I have got for title, content, url in qf

Re: Can I reconstruct text from tokens?

2014-04-18 Thread Erick Erickson
Luke actually does this, or attempts to. The doc you assemble is lossy though It doesn't have stop words All capitalization is lost original terms for synonyms are lost all punctuation is lost I don't think you can do this unless you store term information. it's slow. original words that are

Re: space between search terms

2014-04-18 Thread Ahmet Arslan
Hi Jack, I am planning to extract and publish such words for Turkish language. But I am not sure how to utilize them. I wonder if there is a more flexible solution that will work query time only. That would not require reindexing every time a new item is added.  Ahmet On Friday, April 18,

Re: space between search terms

2014-04-18 Thread Erick Erickson
Ahmet: Yeah, the index .vs. query time bit is a pain. Often what people will do is take their best shot at index time, then accumulate omissions and use that list for query time. Then whenever they can/need to re-index, merge the query-time list into the index time list and start over. Not an

Re: space between search terms

2014-04-18 Thread Jack Krupansky
The LucidWorks Search query parser does indeed support multi-word synonyms at query time. I vaguely recall some Jira traffic on supporting multi-word synonyms at query time for some special cases, but a review of CHANGES.txt does not find any such changes that made it into a release, yet.

need help from hard core solr experts - out of memory error

2014-04-18 Thread Candygram For Mongo
I have lots of log files and other files to support this issue (sometimes referenced in the text below) but I am not sure the best way to submit. I don't want to overwhelm and I am not sure if this email will accept graphs and charts. Please provide direction and I will send them. *Issue

Re: need help from hard core solr experts - out of memory error

2014-04-18 Thread Walter Underwood
I see heap size commands for 128 Meg and 512 Meg. That will certainly run out of memory. Why do you think you have 6G of heap with these settings? –Xmx128m –Xms128m –Xmx512m –Xms512m wunder On Apr 18, 2014, at 5:15 PM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: I have lots of

Re: need help from hard core solr experts - out of memory error

2014-04-18 Thread Candygram For Mongo
We consistently reproduce this problem on multiple systems configured with 6GB and 12GB of heap space. To quickly reproduce many cases for troubleshooting we reduced the heap space to 64, 128 and 512MB. With 6 or 12GB configured it takes hours to see the error. On Fri, Apr 18, 2014 at 5:54 PM,

is there any way to post images and attachments to this mailing list?

2014-04-18 Thread Candygram For Mongo

Re: is there any way to post images and attachments to this mailing list?

2014-04-18 Thread A Laxmi
Just upload them in Google Drive and share the link with this group. On Fri, Apr 18, 2014 at 9:15 PM, Candygram For Mongo candygram.for.mo...@gmail.com wrote:

Re: need help from hard core solr experts - out of memory error

2014-04-18 Thread Candygram For Mongo
I have uploaded several files including the problem description with graphics to this link on Google drive: https://drive.google.com/folderview?id=0B7UpFqsS5lSjWEhxRE1NN2tMNTQusp=sharing I shared it with this address solr-user@lucene.apache.org so I am hoping it can be accessed by people in the

Re: Boost Search results

2014-04-18 Thread Aman Tandon
I guess you can apply some deboost for URL. Lakshmi it will be more helpful to suggest if you also provide some kind of example about what you want to achieve On Saturday, April 19, 2014, A Laxmi a.lakshmi...@gmail.com wrote: Markus, like I mentioned in my last email, I have got the qf with

Re: Indexing Big Data With or Without Solr

2014-04-18 Thread Aman Tandon
Vineet please share after you setup for solr cloud Are you using jetty or tomcat.? On Saturday, April 19, 2014, Vineet Mishra clearmido...@gmail.com wrote: Thanks Furkan, I will definitely give it a try then. Thanks again! On Tue, Apr 15, 2014 at 7:53 PM, Furkan KAMACI

Re: need help from hard core solr experts - out of memory error

2014-04-18 Thread Shawn Heisey
On 4/18/2014 6:15 PM, Candygram For Mongo wrote: We are getting Out Of Memory errors when we try to execute a full import using the Data Import Handler. This error originally occurred on a production environment with a database containing 27 million records. Heap memory was configured for