Hello,
have you checked MoreLikeThis feature?
On Tue, May 15, 2012 at 11:26 PM, Samarendra Pratap samarz...@gmail.comwrote:
- We are calculating frequency of category ids in these top results. We
are not using facets because that gives count for all, relevant or
irrelevant, results.
Hi,
I have been trying for a week. I really want to get a start, so what
should I use? curl or nutch? I want to be able to index pdf, xml etc.
and search within them as well.
Regards,
Hello,
I want to boost the score of the founded documents by geo distance. I use
this:
bf=recip(geodist(),2,1000,30)
It works but i don't know what the parameters mean? (2,1000,30)
Thanks
Roy
--
View this message in context:
Hi Eric,
So for this scenario i wrote a custom request handler and get individual
results from each core and then i am applying *AND * clause up on the
results.
Please let me know whether this approach will cause any other
disturbances/Issues later?
Or can you suggest me some other approach?
Thanks Sujit, Mikhail for you suggestions
Sujit -
Continuing to do it at client side increases one extra cycle between server
and the client.
Moreover it does not remain centralized, so I may have to repeat client
side logic to multiple places, depending upon how it is implemented.
Mikhail -
Hi all,
this might be a silly question but I've found different opinions on the
subject.
When a search is run after a commit is performed will the result include all
document(s) committed until last commit?
use case (sync):
1- add document
2- commit
3- search (faceted)
will faceted search on
You could very well use Solr. It has support to index the PDF and XML
files. If you want to index websites and search using page rank then choose
Nutch.
Regards
Aditya
www.findbestopensource.com
On Wed, May 16, 2012 at 1:13 PM, Tolga to...@ozses.net wrote:
Hi,
I have been trying for a week.
Your approach sounds like well knows old school one
http://nlp.stanford.edu/IR-book/html/htmledition/pseudo-relevance-feedback-1.html
I believe you can hack MLT and do what you need.
I'm working on something like this, and there are a number of approaches.
One of the simple one is build custom
http://wiki.apache.org/solr/FunctionQuery#recip
you are welcome
On Wed, May 16, 2012 at 12:25 PM, roySolr royrutten1...@gmail.com wrote:
Hello,
I want to boost the score of the founded documents by geo distance. I use
this:
bf=recip(geodist(),2,1000,30)
It works but i don't know what
Hi
We want to create a Solr config in ZK during installation of our
product, but we dont want to create any shards in that phase. We will
create shards from our application when it starts up and also
automatically maintain the set of shards from our application (which
uses SolrCloud). The
Hi, guys! I need some advice.
When sending the same dismax query to Solr 1.4 and 3.6,
query results of search words analized by WordDelimiterFilterFactory are
different as below:
[Search Word]
test.pdf
[Result]
Solr1.4: Search results are analized by test AND pdf
Solr3.6: Search results are
Hi all,
I am evaluating Solr 4.0 fot its NRT capabilities.
How can you perform a soft commit with solrj 4.0?
HttpSolrServer.commit method doesn't have softCommit option which appears to
be an option available for the commit command:
Hello, i'd like to index xml files in the Dublin Core format in Solr. I'd
like to know which files i should modify and how. Thank you :)
--
View this message in context:
http://lucene.472066.n3.nabble.com/indexing-Dublin-core-xml-files-tp3984060.html
Sent from the Solr - User mailing list
Hello,
We are going to add multi-language support for our Solr-based project.
We consider next Solr installation types:
1. Single core - all fields for all languages reside in a single core.
I.e. title_en, description_en, title_de, description_de, title_fr,
description_fr
2.
When running Solr we are experiencing PermGen OOM exceptions, this problem gets
worse and worse the more documents are added and committed.
Stopping the java process does not seem to free the memory.
Has anyone experienced issues like this.
Kind regards,
Richard
If you use curl you will need to track every document and recurse inside
folders,etc.
If you use nutch it takes care of incremental crawling in the configured
locations and submits the docs which changed from its previous run.
The lack of a simple File system crawler around Solr is a big
so have to increase the memory available to the JVM, what servlet container are
you using?
SH
On 05/16/2012 01:50 PM, richard.pog...@holidaylettings.co.uk wrote:
When running Solr we are experiencing PermGen OOM exceptions, this problem gets
worse and worse the more documents are added and
Can nutch crawl/index files as well?
On 5/16/12 12:29 PM, findbestopensource wrote:
You could very well use Solr. It has support to index the PDF and XML
files. If you want to index websites and search using page rank then choose
Nutch.
Regards
Aditya
www.findbestopensource.com
On Wed, May
Any idea someone ?
I think this is important since this could produce weird results on
collections with numbers mixed in text.
From my understanding, there are a few options to address the issue :
1) Make *LightStemmer token type aware and don't try to stem on things that
are not text
k
On May 16, 2012, at 5:35 AM, Per Steffensen wrote:
Hi
We want to create a Solr config in ZK during installation of our product, but
we dont want to create any shards in that phase. We will create shards from
our application when it starts up and also automatically maintain the set of
On May 16, 2012, at 6:07 AM, marco crivellaro wrote:
Hi all,
I am evaluating Solr 4.0 fot its NRT capabilities.
How can you perform a soft commit with solrj 4.0?
HttpSolrServer.commit method doesn't have softCommit option which appears to
be an option available for the commit command:
On May 16, 2012, at 5:23 AM, marco crivellaro wrote:
Hi all,
this might be a silly question but I've found different opinions on the
subject.
When a search is run after a commit is performed will the result include all
document(s) committed until last commit?
use case (sync):
1- add
The slave index does indeed grow over a period of time regardless of
restarts. We do run on 1.4 however. We will be updating to 3.6 very
soon however so I will see how that works out. Actually we should be
able to see this on our staging platform.
thanks everyone.
mvg,
Jasper
On Mon, May 14,
hi All,
Kindly guide me in resolving the following issue which is coming while
testing Apache Solr 3.6 with Tomcat 6 while trying to access
http://localhost:8080/solr-example/;
HTTP Status 500 -
--
*type* Exception report
*message* **
*description* *The server
You can still access the raw params for the update request
though - and then just look at
http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22
Just get the modifiable params from the request and set the
soft commit.
Does this code work?
SolrServer server
Will have a go at it in a bit, in the meantime I've kind of workaround it
setting autoSoftCommit maxDocs to 1.
On Wed, May 16, 2012 at 3:08 PM, Ahmet Arslan iori...@yahoo.com wrote:
You can still access the raw params for the update request
though - and then just look at
Hello,
Is it possible to use two language analyzers for one fieldtype. Lets say
Greek and English (for indexing and querying)
Thanks
--
View this message in context:
http://lucene.472066.n3.nabble.com/Language-analyzers-tp3984116.html
Sent from the Solr - User mailing list archive at
Hi!
Could you explain this a little more detailed?
Thanks,
Sven
Am 16.05.2012 um 16:17 schrieb anarchos78:
Hello,
Is it possible to use two language analyzers for one fieldtype. Lets say
Greek and English (for indexing and querying)
Thanks
--
View this message in context:
On Wed, May 16, 2012 at 10:17 AM, anarchos78
rigasathanasio...@hotmail.com wrote:
Hello,
Is it possible to use two language analyzers for one fieldtype. Lets say
Greek and English (for indexing and querying)
For greek and english, its easy, they use totally different characters
so none of
On Wed, May 16, 2012 at 8:28 AM, Tanguy Moal tanguy.m...@gmail.com wrote:
Any idea someone ?
I think this is important since this could produce weird results on
collections with numbers mixed in text.
I agree, i think we should just add ' Character.isLetter(ch)' to the
undoublet check?
Hi Tanguy,
I looked at the code, and I can see where the problem you describe is happening.
I think it's a bug: if numbers are search terms, stemming them by compressing
repeated digits makes little sense.
Could you file a bug in JIRA? Please include the examples you gave in your
earlier
Btw, confirmed that this doesn't happen on our development stage with 3.6.
On Wed, May 16, 2012 at 3:59 PM, Jasper Floor jasper.fl...@m4n.nl wrote:
The slave index does indeed grow over a period of time regardless of
restarts. We do run on 1.4 however. We will be updating to 3.6 very
soon
Hello,
I use the MM function on my edismax requesthandler(70%). This works great
but i have one problem:
When is search for A Cole there has to been only one term match(mm = 70%).
The problem is the A, It returns 9200 documents with an A in it. Is
there a posssibility to skip terms with only one
Thank you!
JIRA issue filed : https://issues.apache.org/jira/browse/SOLR-3463
--
Tanguy
2012/5/16 Steven A Rowe sar...@syr.edu
Hi Tanguy,
I looked at the code, and I can see where the problem you describe is
happening.
I think it's a bug: if numbers are search terms, stemming them by
Hi,
I have a field containing cities and I'd like to sort the results based
on length percentage match.
Example:
Asuming I've got these cities in the index:
london, south west london, londonderry, oxford
And I search for london, I'd like to get a list sorted like this:
london
Hi Alejandro,
N-grams http://en.wikipedia.org/wiki/N-gram might be a good fit.
Using bigrams (n-grams of length 2) for london, you'd get tokens lo, on,
nd, do, on. This should provide the hit ordering you want.
Although it's not listed on Solr's analysis factories wiki page
Hello friends,
When I am passing queries in solr I pass them as strings (“blah blah”). I am
doing this because I have encoding problems with Greek (my input field
accept Greek characters only as string). But solr sees the characters inside
the quotes as an “exact match” term. Is there a way to
http://localhost:8983/solr/#/~cloud
I get the 404 error
Loading of undefined failed with HTTP-Status 404
I am using the nightly build, apache-solr-4.0-2012-05-15_08-20-37
Thanks
Rajesh
--
View this message in context:
Hi
I have tried with the latest nightly build
apache-solr-4.0-2012-05-15_08-20-37
I am trying on a Windows 64 bit OS, I believe you have tested this on the
LINUX box (based on the shell script)
Not sure what I am missing, but the doesn't seem to work:
I have changed the URL to just call the
Hi
I have tried with the latest nightly build
apache-solr-4.0-2012-05-15_08-20-37
I am trying on a Windows 64 bit OS, I believe you have tested this on the
LINUX box (based on the shell script)
Not sure what I am missing, but the doesn't seem to work:
I have changed the URL to just call the
And you're running SolrCloud and not just 'java -jar start.jar', right Rajesh?
On Wednesday, May 16, 2012 at 7:39 PM, rjain15 wrote:
http://localhost:8983/solr/#/~cloud
I get the 404 error
Loading of undefined failed with HTTP-Status 404
I am using the nightly build,
Hi,
Is there any mechanism by which we can track and trend the incoming Solr
search requests ?
Some mechanisms like logging all incoming Solr requests to a different log
file than Tomcat's and have a tool to trend the patterns ?
--
Thanks and Regards
Rahul A. Warawdekar
java -jar start.jar -OPTIONS=jsp
What is SolrCloud...sorry newbie to Solr.
Thanks
Rajesh
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-4-0-How-do-I-enable-JSP-support-tp3983763p3984195.html
Sent from the Solr - User mailing list archive at Nabble.com.
That will just enable the Support for rendering JSP's, but not more. For
SolrCloud you may want to read the Wiki: http://wiki.apache.org/solr/SolrCloud
On Wednesday, May 16, 2012 at 8:07 PM, rjain15 wrote:
java -jar start.jar -OPTIONS=jsp
What is SolrCloud...sorry newbie to Solr.
On Wed, May 16, 2012 at 1:43 PM, rjain15 rjai...@gmail.com wrote:
http://localhost:8983/solr/select?q=title:monsterswt=jsonindent=true
Try switching title:monsters to name:monsters
https://issues.apache.org/jira/browse/SOLR-2598
Looks like the data was changed to use the name field instead and
Hi,
I am just playing around with SolrCloud and have read in articles like
http://www.lucidimagination.com/blog/2012/03/05/scaling-solr-indexing-with-solrcloud-hadoop-and-behemoth/that
it
is sufficient to create the connection to the Zookeeper instance and not to
the Solr instance.
When I try to
Hi,
No. Changing to name:monsters didn't work
Here is my guess, the UpdateJSON is not adding any new documents to the
existing index.
The document count remains the same after I call the UpdateJSON.
I am new to Solr, my guess is that if there is some underlying schema that
dictates what can
OK, it's also not working with an internal started Zookeeper.
On Wed, May 16, 2012 at 8:29 PM, Daniel Brügge
daniel.brue...@googlemail.com wrote:
Hi,
I am just playing around with SolrCloud and have read in articles like
On Wed, May 16, 2012 at 2:36 PM, rjain15 rjai...@gmail.com wrote:
No. Changing to name:monsters didn't work
OK, but you'll have to do that if you get the other part working.
Here is my guess, the UpdateJSON is not adding any new documents to the
existing index.
If that's true, the most
Lookout, the first end quote is in the wrong spot.
Michael
On Wed, May 16, 2012 at 3:29 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
On Wed, May 16, 2012 at 2:36 PM, rjain15 rjai...@gmail.com wrote:
No. Changing to name:monsters didn't work
OK, but you'll have to do that if you get the
Hi
Firstly, apologies for the long post, I changed the quote to double quote
(and sometimes it is messy copying from DOS windows)
Here is the command and the output on the Jetty Server Window. I am
highlighting some important pieces,
I have enabled the LOG LEVEL to DEBUG on the JETTY window.
On Wed, May 16, 2012 at 4:10 PM, rjain15 rjai...@gmail.com wrote:
Hi
Firstly, apologies for the long post, I changed the quote to double quote
(and sometimes it is messy copying from DOS windows)
Here is the command and the output on the Jetty Server Window. I am
highlighting some important
Yonik
You are the best !!!
Yes, as soon as I changed the Content-type:application/json it worked.
Now I can see all my updates to the book category.
I am ready to roll, thanks for the patience and help.
regards
Rajesh
--
View this message in context:
I am using the commit parameter waitFlush, and seems it throws an exception
in 4.0
I am not sure what is the purpose of this parameter and whether it is
required or not
SEVERE: org.apache.solr.common.SolrException: Unknown commit parameter
'waitFlush'
at
As the doc says: In Solr 4.0 it will be removed.
See:
http://wiki.apache.org/solr/UpdateXmlMessages
But, the UpdateJSON doc certainly needs to be updated as well.
-- Jack Krupansky
-Original Message-
From: rjain15
Sent: Wednesday, May 16, 2012 5:08 PM
To: solr-user@lucene.apache.org
Change blah blah to blah blah, two separate strings, two separate
query terms.
-- Jack Krupansky
-Original Message-
From: anarchos78
Sent: Wednesday, May 16, 2012 1:28 PM
To: solr-user@lucene.apache.org
Subject: Solr query and double quotes
Hello friends,
When I am passing queries
PermGen memory has to do with number of classes loaded, rather than
documents.
Here are a couple of pages that help explain Java PermGen issues. The bottom
line is that you can increase the PermGen space, or enable unloading of
classes, or at least trace class loading to see why the problem
First you have to answer the twin questions of what you want the user
experience to be and what expectations users may have independent of your
intentions.
Do you intend to have separate, language specific search UI? That would
match up with separate cores, but can be done with a language
The query may be the same, but your analyzers are radically different.
Just a hunch, but maybe GosenTokenizerFactory is treating the . as a
space. In 1.4 you were using SenTokenizerFactory. Or maybe
GosenBasicFormFilterFactory is treating the . as a space. In any case, my
hunch is that
Add a (and maybe other single letters) to the stopwords file. Then it
won't show up in the query at all.
And with edismax, enable PF2 and maybe PF3 so that instances of a cole
would get boosted.
-- Jack Krupansky
-Original Message-
From: roySolr
Sent: Wednesday, May 16, 2012 10:58
Except you can never match a, so that is a bad idea. So much for the query
vitamin a.
wunder
On May 16, 2012, at 5:47 PM, Jack Krupansky wrote:
Add a (and maybe other single letters) to the stopwords file. Then it won't
show up in the query at all.
And with edismax, enable PF2 and maybe
Ah, sorry. I meant to add that you should have a stop filter in the query
analyzer, but not in the index analyzer.
-- Jack Krupansky
-Original Message-
From: Walter Underwood
Sent: Wednesday, May 16, 2012 8:52 PM
To: solr-user@lucene.apache.org
Subject: Re: Must match and terms with
OK, I understand how those words are tokenized by different tokenizer
factories.
My question is that how I can have solr analyze and search for test AND
pdf.
As Solr1.4 gives result of test AND pdf, I want Solr 3.6 to do the same.
(Solr3.6 gives result of test OR pdf).
Any idea?
2012/5/17 Jack
Hi
I am trying to post JSON Data to Solr using XHR / JQuery and it doesn't seem
to work. I don't get any exception on the jetty console. Has anyone tried
this before and are their any obvious gotchas in my code.
Here is my code snippet
$(document).ready(function(){
var
This is my json variant of solr/example/exampledocs/post.sh. It takes
an url as the first parameter.
#!/bin/sh
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information
If you want to treat test.pdf as a phrase test pdf,
it might work by setting text_sen autoGeneratePhraseQueries=true.
Regards,
Shinichiro Abe
On 2012/05/17, at 10:39, Katsuyoshi NOGUCHI wrote:
OK, I understand how those words are tokenized by different tokenizer
factories.
My question is
It can, as can ManifoldCF. But you should ask on nutch-user list (this may
also be documented on the Wiki)
Otis
Performance Monitoring for Solr / ElasticSearch / HBase -
http://sematext.com/spm
From: Tolga to...@ozses.net
To:
Yes, 'text_gr' in solr/example/conf/schema.xml is (I think) the Greek
text type. It is commented out.
fieldType name=text_greek class=solr.TextField
analyzer class=org.apache.lucene.analysis.el.GreekAnalyzer/
/fieldType
This has someone's idea of how Greek text analysis should
hi
I am using highlighter component with
hl.frgmenter=regexhl.regex.pattern=[-\w ,/\n]\']{20,200}
Basically the configuration that comes with fragmenter in highlighting
component in solrconfig.xml file.
My snippets don't start with start of sentence.
I also tried boundary scanner
I just noticed that you used dismax in 1.4 vs. edismax in 3.6. There may
be other differences that I have not yet noticed.
Also, you should have separate index and query analyzers so that
catenateWords=0 catenateNumbers=0 for the query analyzer. It could be
that the catenateWords=1
I can receive same result!
Thanks!
2012/5/17 Shinichiro Abe shinichiro.ab...@gmail.com
If you want to treat test.pdf as a phrase test pdf,
it might work by setting text_sen autoGeneratePhraseQueries=true.
Regards,
Shinichiro Abe
On 2012/05/17, at 10:39, Katsuyoshi NOGUCHI wrote:
OK, I
71 matches
Mail list logo