Re: First query to find meta data, second to search. How to group into one?

2012-05-16 Thread Mikhail Khludnev
Hello, have you checked MoreLikeThis feature? On Tue, May 15, 2012 at 11:26 PM, Samarendra Pratap samarz...@gmail.comwrote: - We are calculating frequency of category ids in these top results. We are not using facets because that gives count for all, relevant or irrelevant, results.

curl or nutch

2012-05-16 Thread Tolga
Hi, I have been trying for a week. I really want to get a start, so what should I use? curl or nutch? I want to be able to index pdf, xml etc. and search within them as well. Regards,

Boosting score by Geo distance

2012-05-16 Thread roySolr
Hello, I want to boost the score of the founded documents by geo distance. I use this: bf=recip(geodist(),2,1000,30) It works but i don't know what the parameters mean? (2,1000,30) Thanks Roy -- View this message in context:

Re: Problem with AND clause in multi core search query

2012-05-16 Thread ravicv
Hi Eric, So for this scenario i wrote a custom request handler and get individual results from each core and then i am applying *AND * clause up on the results. Please let me know whether this approach will cause any other disturbances/Issues later? Or can you suggest me some other approach?

Re: First query to find meta data, second to search. How to group into one?

2012-05-16 Thread Samarendra Pratap
Thanks Sujit, Mikhail for you suggestions Sujit - Continuing to do it at client side increases one extra cycle between server and the client. Moreover it does not remain centralized, so I may have to repeat client side logic to multiple places, depending upon how it is implemented. Mikhail -

commit question

2012-05-16 Thread marco crivellaro
Hi all, this might be a silly question but I've found different opinions on the subject. When a search is run after a commit is performed will the result include all document(s) committed until last commit? use case (sync): 1- add document 2- commit 3- search (faceted) will faceted search on

Re: curl or nutch

2012-05-16 Thread findbestopensource
You could very well use Solr. It has support to index the PDF and XML files. If you want to index websites and search using page rank then choose Nutch. Regards Aditya www.findbestopensource.com On Wed, May 16, 2012 at 1:13 PM, Tolga to...@ozses.net wrote: Hi, I have been trying for a week.

Re: First query to find meta data, second to search. How to group into one?

2012-05-16 Thread Mikhail Khludnev
Your approach sounds like well knows old school one http://nlp.stanford.edu/IR-book/html/htmledition/pseudo-relevance-feedback-1.html I believe you can hack MLT and do what you need. I'm working on something like this, and there are a number of approaches. One of the simple one is build custom

Re: Boosting score by Geo distance

2012-05-16 Thread Mikhail Khludnev
http://wiki.apache.org/solr/FunctionQuery#recip you are welcome On Wed, May 16, 2012 at 12:25 PM, roySolr royrutten1...@gmail.com wrote: Hello, I want to boost the score of the founded documents by geo distance. I use this: bf=recip(geodist(),2,1000,30) It works but i don't know what

Adding config to SolrCloud without creating any shards/slices

2012-05-16 Thread Per Steffensen
Hi We want to create a Solr config in ZK during installation of our product, but we dont want to create any shards in that phase. We will create shards from our application when it starts up and also automatically maintain the set of shards from our application (which uses SolrCloud). The

Dismax query results vary on Solr1.4 and 3.6.

2012-05-16 Thread Katsuyoshi NOGUCHI
Hi, guys! I need some advice. When sending the same dismax query to Solr 1.4 and 3.6, query results of search words analized by WordDelimiterFilterFactory are different as below: [Search Word] test.pdf [Result] Solr1.4: Search results are analized by test AND pdf Solr3.6: Search results are

SolrJ 4, soft commit

2012-05-16 Thread marco crivellaro
Hi all, I am evaluating Solr 4.0 fot its NRT capabilities. How can you perform a soft commit with solrj 4.0? HttpSolrServer.commit method doesn't have softCommit option which appears to be an option available for the commit command:

indexing Dublin core xml files

2012-05-16 Thread ggggGuys
Hello, i'd like to index xml files in the Dublin Core format in Solr. I'd like to know which files i should modify and how. Thank you :) -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-Dublin-core-xml-files-tp3984060.html Sent from the Solr - User mailing list

Solr Single Core vs Multiple Cores installation for localization

2012-05-16 Thread Ivan Hrytsyuk
Hello, We are going to add multi-language support for our Solr-based project. We consider next Solr installation types: 1. Single core - all fields for all languages reside in a single core. I.e. title_en, description_en, title_de, description_de, title_fr, description_fr 2.

PermGen OOM Error

2012-05-16 Thread richard.pog...@holidaylettings.co.uk
When running Solr we are experiencing PermGen OOM exceptions, this problem gets worse and worse the more documents are added and committed. Stopping the java process does not seem to free the memory. Has anyone experienced issues like this. Kind regards, Richard

Re: curl or nutch

2012-05-16 Thread Tirthankar Chatterjee
If you use curl you will need to track every document and recurse inside folders,etc. If you use nutch it takes care of incremental crawling in the configured locations and submits the docs which changed from its previous run. The lack of a simple File system crawler around Solr is a big

Re: PermGen OOM Error

2012-05-16 Thread SH
so have to increase the memory available to the JVM, what servlet container are you using? SH On 05/16/2012 01:50 PM, richard.pog...@holidaylettings.co.uk wrote: When running Solr we are experiencing PermGen OOM exceptions, this problem gets worse and worse the more documents are added and

Re: curl or nutch

2012-05-16 Thread Tolga
Can nutch crawl/index files as well? On 5/16/12 12:29 PM, findbestopensource wrote: You could very well use Solr. It has support to index the PDF and XML files. If you want to index websites and search using page rank then choose Nutch. Regards Aditya www.findbestopensource.com On Wed, May

Re: FrenchLightStemFilterFactory : normalizing tokens longer than 4 characters and having repeated characters in it

2012-05-16 Thread Tanguy Moal
Any idea someone ? I think this is important since this could produce weird results on collections with numbers mixed in text. From my understanding, there are a few options to address the issue : 1) Make *LightStemmer token type aware and don't try to stem on things that are not text

Re: Adding config to SolrCloud without creating any shards/slices

2012-05-16 Thread Mark Miller
k On May 16, 2012, at 5:35 AM, Per Steffensen wrote: Hi We want to create a Solr config in ZK during installation of our product, but we dont want to create any shards in that phase. We will create shards from our application when it starts up and also automatically maintain the set of

Re: SolrJ 4, soft commit

2012-05-16 Thread Mark Miller
On May 16, 2012, at 6:07 AM, marco crivellaro wrote: Hi all, I am evaluating Solr 4.0 fot its NRT capabilities. How can you perform a soft commit with solrj 4.0? HttpSolrServer.commit method doesn't have softCommit option which appears to be an option available for the commit command:

Re: commit question

2012-05-16 Thread Mark Miller
On May 16, 2012, at 5:23 AM, marco crivellaro wrote: Hi all, this might be a silly question but I've found different opinions on the subject. When a search is run after a commit is performed will the result include all document(s) committed until last commit? use case (sync): 1- add

Re: slave index not cleaned

2012-05-16 Thread Jasper Floor
The slave index does indeed grow over a period of time regardless of restarts. We do run on 1.4 however. We will be updating to 3.6 very soon however so I will see how that works out. Actually we should be able to see this on our staging platform. thanks everyone. mvg, Jasper On Mon, May 14,

Facing Problem while testing solr 3.6 with Tomcat 6

2012-05-16 Thread Amit Handa
hi All, Kindly guide me in resolving the following issue which is coming while testing Apache Solr 3.6 with Tomcat 6 while trying to access http://localhost:8080/solr-example/; HTTP Status 500 - -- *type* Exception report *message* ** *description* *The server

Re: SolrJ 4, soft commit

2012-05-16 Thread Ahmet Arslan
You can still access the raw params for the update request though - and then just look at http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22 Just get the modifiable params from the request and set the soft commit. Does this code work? SolrServer server

Re: SolrJ 4, soft commit

2012-05-16 Thread crive
Will have a go at it in a bit, in the meantime I've kind of workaround it setting autoSoftCommit maxDocs to 1. On Wed, May 16, 2012 at 3:08 PM, Ahmet Arslan iori...@yahoo.com wrote: You can still access the raw params for the update request though - and then just look at

Language analyzers

2012-05-16 Thread anarchos78
Hello, Is it possible to use two language analyzers for one fieldtype. Lets say Greek and English (for indexing and querying) Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Language-analyzers-tp3984116.html Sent from the Solr - User mailing list archive at

Re: Language analyzers

2012-05-16 Thread Sven Maurmann
Hi! Could you explain this a little more detailed? Thanks, Sven Am 16.05.2012 um 16:17 schrieb anarchos78: Hello, Is it possible to use two language analyzers for one fieldtype. Lets say Greek and English (for indexing and querying) Thanks -- View this message in context:

Re: Language analyzers

2012-05-16 Thread Robert Muir
On Wed, May 16, 2012 at 10:17 AM, anarchos78 rigasathanasio...@hotmail.com wrote: Hello, Is it possible to use two language analyzers for one fieldtype. Lets say Greek and English (for indexing and querying) For greek and english, its easy, they use totally different characters so none of

Re: FrenchLightStemFilterFactory : normalizing tokens longer than 4 characters and having repeated characters in it

2012-05-16 Thread Robert Muir
On Wed, May 16, 2012 at 8:28 AM, Tanguy Moal tanguy.m...@gmail.com wrote: Any idea someone ? I think this is important since this could produce weird results on collections with numbers mixed in text. I agree, i think we should just add ' Character.isLetter(ch)' to the undoublet check?

RE: FrenchLightStemFilterFactory : normalizing tokens longer than 4 characters and having repeated characters in it

2012-05-16 Thread Steven A Rowe
Hi Tanguy, I looked at the code, and I can see where the problem you describe is happening. I think it's a bug: if numbers are search terms, stemming them by compressing repeated digits makes little sense. Could you file a bug in JIRA? Please include the examples you gave in your earlier

Re: slave index not cleaned

2012-05-16 Thread Jasper Floor
Btw, confirmed that this doesn't happen on our development stage with 3.6. On Wed, May 16, 2012 at 3:59 PM, Jasper Floor jasper.fl...@m4n.nl wrote: The slave index does indeed grow over a period of time regardless of restarts. We do run on 1.4 however. We will be updating to 3.6 very soon

Must match and terms with only one letter

2012-05-16 Thread roySolr
Hello, I use the MM function on my edismax requesthandler(70%). This works great but i have one problem: When is search for A Cole there has to been only one term match(mm = 70%). The problem is the A, It returns 9200 documents with an A in it. Is there a posssibility to skip terms with only one

Re: FrenchLightStemFilterFactory : normalizing tokens longer than 4 characters and having repeated characters in it

2012-05-16 Thread Tanguy Moal
Thank you! JIRA issue filed : https://issues.apache.org/jira/browse/SOLR-3463 -- Tanguy 2012/5/16 Steven A Rowe sar...@syr.edu Hi Tanguy, I looked at the code, and I can see where the problem you describe is happening. I think it's a bug: if numbers are search terms, stemming them by

Sort by length percentage match

2012-05-16 Thread Alejandro Cuesta
Hi, I have a field containing cities and I'd like to sort the results based on length percentage match. Example: Asuming I've got these cities in the index: london, south west london, londonderry, oxford And I search for london, I'd like to get a list sorted like this: london

RE: Sort by length percentage match

2012-05-16 Thread Steven A Rowe
Hi Alejandro, N-grams http://en.wikipedia.org/wiki/N-gram might be a good fit. Using bigrams (n-grams of length 2) for london, you'd get tokens lo, on, nd, do, on. This should provide the hit ordering you want. Although it's not listed on Solr's analysis factories wiki page

Solr query and double quotes

2012-05-16 Thread anarchos78
Hello friends, When I am passing queries in solr I pass them as strings (“blah blah”). I am doing this because I have encoding problems with Greek (my input field accept Greek characters only as string). But solr sees the characters inside the quotes as an “exact match” term. Is there a way to

Re: - Solr 4.0 - How do I enable JSP support ? ...

2012-05-16 Thread rjain15
http://localhost:8983/solr/#/~cloud I get the 404 error Loading of undefined failed with HTTP-Status 404 I am using the nightly build, apache-solr-4.0-2012-05-15_08-20-37 Thanks Rajesh -- View this message in context:

Re: Update JSON not working for me

2012-05-16 Thread rjain15
Hi I have tried with the latest nightly build apache-solr-4.0-2012-05-15_08-20-37 I am trying on a Windows 64 bit OS, I believe you have tested this on the LINUX box (based on the shell script) Not sure what I am missing, but the doesn't seem to work: I have changed the URL to just call the

Re: Update JSON not working for me

2012-05-16 Thread rjain15
Hi I have tried with the latest nightly build apache-solr-4.0-2012-05-15_08-20-37 I am trying on a Windows 64 bit OS, I believe you have tested this on the LINUX box (based on the shell script) Not sure what I am missing, but the doesn't seem to work: I have changed the URL to just call the

Re: - Solr 4.0 - How do I enable JSP support ? ...

2012-05-16 Thread Stefan Matheis
And you're running SolrCloud and not just 'java -jar start.jar', right Rajesh? On Wednesday, May 16, 2012 at 7:39 PM, rjain15 wrote: http://localhost:8983/solr/#/~cloud I get the 404 error Loading of undefined failed with HTTP-Status 404 I am using the nightly build,

Solr request tracking

2012-05-16 Thread Rahul Warawdekar
Hi, Is there any mechanism by which we can track and trend the incoming Solr search requests ? Some mechanisms like logging all incoming Solr requests to a different log file than Tomcat's and have a tool to trend the patterns ? -- Thanks and Regards Rahul A. Warawdekar

Re: - Solr 4.0 - How do I enable JSP support ? ...

2012-05-16 Thread rjain15
java -jar start.jar -OPTIONS=jsp What is SolrCloud...sorry newbie to Solr. Thanks Rajesh -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-0-How-do-I-enable-JSP-support-tp3983763p3984195.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: - Solr 4.0 - How do I enable JSP support ? ...

2012-05-16 Thread Stefan Matheis
That will just enable the Support for rendering JSP's, but not more. For SolrCloud you may want to read the Wiki: http://wiki.apache.org/solr/SolrCloud On Wednesday, May 16, 2012 at 8:07 PM, rjain15 wrote: java -jar start.jar -OPTIONS=jsp What is SolrCloud...sorry newbie to Solr.

Re: Update JSON not working for me

2012-05-16 Thread Yonik Seeley
On Wed, May 16, 2012 at 1:43 PM, rjain15 rjai...@gmail.com wrote: http://localhost:8983/solr/select?q=title:monsterswt=jsonindent=true Try switching title:monsters to name:monsters https://issues.apache.org/jira/browse/SOLR-2598 Looks like the data was changed to use the name field instead and

CloudSolrServer not working with standalone Zookeeper

2012-05-16 Thread Daniel Brügge
Hi, I am just playing around with SolrCloud and have read in articles like http://www.lucidimagination.com/blog/2012/03/05/scaling-solr-indexing-with-solrcloud-hadoop-and-behemoth/that it is sufficient to create the connection to the Zookeeper instance and not to the Solr instance. When I try to

Re: Update JSON not working for me

2012-05-16 Thread rjain15
Hi, No. Changing to name:monsters didn't work Here is my guess, the UpdateJSON is not adding any new documents to the existing index. The document count remains the same after I call the UpdateJSON. I am new to Solr, my guess is that if there is some underlying schema that dictates what can

Re: CloudSolrServer not working with standalone Zookeeper

2012-05-16 Thread Daniel Brügge
OK, it's also not working with an internal started Zookeeper. On Wed, May 16, 2012 at 8:29 PM, Daniel Brügge daniel.brue...@googlemail.com wrote: Hi, I am just playing around with SolrCloud and have read in articles like

Re: Update JSON not working for me

2012-05-16 Thread Yonik Seeley
On Wed, May 16, 2012 at 2:36 PM, rjain15 rjai...@gmail.com wrote: No. Changing to name:monsters didn't work OK, but you'll have to do that if you get the other part working. Here is my guess, the UpdateJSON is not adding any new documents to the existing index. If that's true, the most

Re: Update JSON not working for me

2012-05-16 Thread Michael Della Bitta
Lookout, the first end quote is in the wrong spot. Michael On Wed, May 16, 2012 at 3:29 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, May 16, 2012 at 2:36 PM, rjain15 rjai...@gmail.com wrote: No. Changing to name:monsters didn't work OK, but you'll have to do that if you get the

Re: Update JSON not working for me

2012-05-16 Thread rjain15
Hi Firstly, apologies for the long post, I changed the quote to double quote (and sometimes it is messy copying from DOS windows) Here is the command and the output on the Jetty Server Window. I am highlighting some important pieces, I have enabled the LOG LEVEL to DEBUG on the JETTY window.

Re: Update JSON not working for me

2012-05-16 Thread Yonik Seeley
On Wed, May 16, 2012 at 4:10 PM, rjain15 rjai...@gmail.com wrote: Hi Firstly, apologies for the long post, I changed the quote to double quote (and sometimes it is messy copying from DOS windows) Here is the command and the output on the Jetty Server Window. I am highlighting some important

Re: Update JSON not working for me

2012-05-16 Thread rjain15
Yonik You are the best !!! Yes, as soon as I changed the Content-type:application/json it worked. Now I can see all my updates to the book category. I am ready to roll, thanks for the patience and help. regards Rajesh -- View this message in context:

Solr 4.0 commit parameter 'waitFlush'

2012-05-16 Thread rjain15
I am using the commit parameter waitFlush, and seems it throws an exception in 4.0 I am not sure what is the purpose of this parameter and whether it is required or not SEVERE: org.apache.solr.common.SolrException: Unknown commit parameter 'waitFlush' at

Re: Solr 4.0 commit parameter 'waitFlush'

2012-05-16 Thread Jack Krupansky
As the doc says: In Solr 4.0 it will be removed. See: http://wiki.apache.org/solr/UpdateXmlMessages But, the UpdateJSON doc certainly needs to be updated as well. -- Jack Krupansky -Original Message- From: rjain15 Sent: Wednesday, May 16, 2012 5:08 PM To: solr-user@lucene.apache.org

Re: Solr query and double quotes

2012-05-16 Thread Jack Krupansky
Change blah blah to blah blah, two separate strings, two separate query terms. -- Jack Krupansky -Original Message- From: anarchos78 Sent: Wednesday, May 16, 2012 1:28 PM To: solr-user@lucene.apache.org Subject: Solr query and double quotes Hello friends, When I am passing queries

Re: PermGen OOM Error

2012-05-16 Thread Jack Krupansky
PermGen memory has to do with number of classes loaded, rather than documents. Here are a couple of pages that help explain Java PermGen issues. The bottom line is that you can increase the PermGen space, or enable unloading of classes, or at least trace class loading to see why the problem

Re: Solr Single Core vs Multiple Cores installation for localization

2012-05-16 Thread Jack Krupansky
First you have to answer the twin questions of what you want the user experience to be and what expectations users may have independent of your intentions. Do you intend to have separate, language specific search UI? That would match up with separate cores, but can be done with a language

Re: Dismax query results vary on Solr1.4 and 3.6.

2012-05-16 Thread Jack Krupansky
The query may be the same, but your analyzers are radically different. Just a hunch, but maybe GosenTokenizerFactory is treating the . as a space. In 1.4 you were using SenTokenizerFactory. Or maybe GosenBasicFormFilterFactory is treating the . as a space. In any case, my hunch is that

Re: Must match and terms with only one letter

2012-05-16 Thread Jack Krupansky
Add a (and maybe other single letters) to the stopwords file. Then it won't show up in the query at all. And with edismax, enable PF2 and maybe PF3 so that instances of a cole would get boosted. -- Jack Krupansky -Original Message- From: roySolr Sent: Wednesday, May 16, 2012 10:58

Re: Must match and terms with only one letter

2012-05-16 Thread Walter Underwood
Except you can never match a, so that is a bad idea. So much for the query vitamin a. wunder On May 16, 2012, at 5:47 PM, Jack Krupansky wrote: Add a (and maybe other single letters) to the stopwords file. Then it won't show up in the query at all. And with edismax, enable PF2 and maybe

Re: Must match and terms with only one letter

2012-05-16 Thread Jack Krupansky
Ah, sorry. I meant to add that you should have a stop filter in the query analyzer, but not in the index analyzer. -- Jack Krupansky -Original Message- From: Walter Underwood Sent: Wednesday, May 16, 2012 8:52 PM To: solr-user@lucene.apache.org Subject: Re: Must match and terms with

Re: Dismax query results vary on Solr1.4 and 3.6.

2012-05-16 Thread Katsuyoshi NOGUCHI
OK, I understand how those words are tokenized by different tokenizer factories. My question is that how I can have solr analyze and search for test AND pdf. As Solr1.4 gives result of test AND pdf, I want Solr 3.6 to do the same. (Solr3.6 gives result of test OR pdf). Any idea? 2012/5/17 Jack

Posting JSON Data to Solr using XHR?

2012-05-16 Thread rjain15
Hi I am trying to post JSON Data to Solr using XHR / JQuery and it doesn't seem to work. I don't get any exception on the jetty console. Has anyone tried this before and are their any obvious gotchas in my code. Here is my code snippet $(document).ready(function(){ var

Re: Update JSON not working for me

2012-05-16 Thread Lance Norskog
This is my json variant of solr/example/exampledocs/post.sh. It takes an url as the first parameter. #!/bin/sh # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information

Re: Dismax query results vary on Solr1.4 and 3.6.

2012-05-16 Thread Shinichiro Abe
If you want to treat test.pdf as a phrase test pdf, it might work by setting text_sen autoGeneratePhraseQueries=true. Regards, Shinichiro Abe On 2012/05/17, at 10:39, Katsuyoshi NOGUCHI wrote: OK, I understand how those words are tokenized by different tokenizer factories. My question is

Re: curl or nutch

2012-05-16 Thread Otis Gospodnetic
It can, as can ManifoldCF.  But you should ask on nutch-user list (this may also be documented on the Wiki) Otis  Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm  From: Tolga to...@ozses.net To:

Re: Quering Solr

2012-05-16 Thread Lance Norskog
Yes, 'text_gr' in solr/example/conf/schema.xml is (I think) the Greek text type. It is commented out. fieldType name=text_greek class=solr.TextField analyzer class=org.apache.lucene.analysis.el.GreekAnalyzer/ /fieldType This has someone's idea of how Greek text analysis should

highlighter not respecting sentence boundry

2012-05-16 Thread abhayd
hi I am using highlighter component with hl.frgmenter=regexhl.regex.pattern=[-\w ,/\n]\']{20,200} Basically the configuration that comes with fragmenter in highlighting component in solrconfig.xml file. My snippets don't start with start of sentence. I also tried boundary scanner

Re: Dismax query results vary on Solr1.4 and 3.6.

2012-05-16 Thread Jack Krupansky
I just noticed that you used dismax in 1.4 vs. edismax in 3.6. There may be other differences that I have not yet noticed. Also, you should have separate index and query analyzers so that catenateWords=0 catenateNumbers=0 for the query analyzer. It could be that the catenateWords=1

Re: Dismax query results vary on Solr1.4 and 3.6.

2012-05-16 Thread Katsuyoshi NOGUCHI
I can receive same result! Thanks! 2012/5/17 Shinichiro Abe shinichiro.ab...@gmail.com If you want to treat test.pdf as a phrase test pdf, it might work by setting text_sen autoGeneratePhraseQueries=true. Regards, Shinichiro Abe On 2012/05/17, at 10:39, Katsuyoshi NOGUCHI wrote: OK, I