On Wed, Jul 1, 2009 at 10:28 PM, Sumit Aggarwal
wrote:
> Hi Shalin,
> Sorry for the confusion but i dont have separate index fields. I have all
> information in only one index field descp. Now is it possible what you
> explained.
>
>
No, you should separate out the data in multiple fields for this
i use lucene-core-2.9-dev.jar, lucene-misc-2.9-dev.jar
On Thu, Jul 2, 2009 at 2:02 PM, James liu wrote:
> i try http://wiki.apache.org/solr/MergingSolrIndexes
>
> system: win2003, jdk 1.6
>
> Error information:
>
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.lucene.misc.IndexMerg
i try http://wiki.apache.org/solr/MergingSolrIndexes
system: win2003, jdk 1.6
Error information:
> Caused by: java.lang.ClassNotFoundException:
> org.apache.lucene.misc.IndexMergeTo
> ol
> at java.net.URLClassLoader$1.run(Unknown Source)
> at java.security.AccessController.doPriv
On Thu, Jul 2, 2009 at 10:24 AM, Francis Yakin wrote:
>
> This is only for version 1.3.0? We are running 1.2.0 currently.
>
Yes, DIH is available since 1.3 only.
--
Regards,
Shalin Shekhar Mangar.
Thanks Noble!
This is only for version 1.3.0? We are running 1.2.0 currently.
Francis
-Original Message-
From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble
Paul ??? ??
Sent: Wednesday, July 01, 2009 9:43 PM
To: solr-user@lucene.apache.org
Subject: Re:
I use windows. Did you try tortoisesvn ?
it is quick and simple
On Tue, Jun 30, 2009 at 8:53 PM, ahammad wrote:
>
> Hello,
>
> I am trying to install a patch for Solr
> (https://issues.apache.org/jira/browse/SOLR-284) but I'm not sure how to do
> it in Windows.
>
> I have a copy of the nightly bui
StreamingServer adds docs in multiple threads using the same http connection
Or
you can use CommonsHttpSolrServer#add(Iterator docIterator)
method
if you are unhappy w/ the perf you can use the BinaryRequestWriter
http://wiki.apache.org/solr/Solrj#head-ddc28af4033350481a3cbb27bc1d25bffd801af0
i
did you explore DIH http://wiki.apache.org/solr/DataImportHandler
it has features to import from Db, xml files etc
On Thu, Jul 2, 2009 at 3:37 AM, Francis Yakin wrote:
>
> We have several thousands of xml files in database that we load it to solr
> master
> The Database uses "http" connection
your technique seems to work best (modify the sql)
because DIH resides in a solr core , distributing is the load is not easy
There are plans to make DIH work as a library where it will post docs
using SolrJ. May be it is possible to include this feature there
On Thu, Jul 2, 2009 at 3:34 AM, Jay
What ever is possible with xsl is possible. I do not think embedded
strings can be manipulated
On Thu, Jul 2, 2009 at 8:05 AM, Matt Mitchell wrote:
> I know you can transform Solr document fields, but is it possible to have
> Solr transform XML that might be embedded (as a string) in a field?
>
>
complete xpath is not supported
/book/body/chapter/p
should work.
if you wish all the text under irrespective of nesting , tag
names use this
On Thu, Jul 2, 2009 at 5:31 AM, Jay Hill wrote:
> I'm using the XPathEntityProcessor to parse an xml structure that looks like
> this:
>
>
> J
How you import the documents as csv data/file from Oracle Database to Sol
master( they are two different machines)?
And you have the doc for using EmbeddedSolrServer?
Thanks Otis!
Francis
-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
Sent: Wednesday, Ju
Glen,
Are you saying that we have to use LuSql replacing our Solr?
Francis
-Original Message-
From: Glen Newton [mailto:glen.new...@gmail.com]
Sent: Wednesday, July 01, 2009 8:06 PM
To: solr-user@lucene.apache.org
Subject: Re: Is there any other way to load the index beside using "http"
Glen,
Database we use is Oracle, I am not the database administrator, so I don't
familiar with their script.
SO, basically we have the Oracle SQL script to load the XML files over HTTP
connection to our Solr Master.
My question is there any other way instead of using HTTP connection to load th
Otis,
Do you have the document how to do those things that you mentioned?
How about if I don't want use HHTP at all? Or we have no other option that we
have to use HHTP to transfer the XML files to Solr master from Db box?
Thanks
Francis
-Original Message-
From: Otis Gospodnetic [mai
You can directly load to the backend Lucene using LuSql[1]. It is
faster than Solr, sometimes as much as an order of magnitude faster.
Disclosure: I am the author of LuSql
-Glen
http://zzzoot.blogspot.com/
[1]http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql
2009/7/1 Francis Y
Francis,
There are a number of things you can do to make indexing over HTTP faster.
You can also import documents as csv data/file.
Finally, you can use EmbeddedSolrServer.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Francis Yakin
>
I know you can transform Solr document fields, but is it possible to have
Solr transform XML that might be embedded (as a string) in a field?
Matt
Hmmm - my very limited understanding of xpath says that /book/body/chapter/p
should work.
Some quick testing with XPath Expression Testbed shows both
/book/body/chapter/p and /book/body/chapter//p selecting the right nodes.
I'm not sure what's up.
Are you actually looking for /book/body/chapter/
Hi,
I am looking at this piece of configuration in solrconfig.xml
solr
solrconfig.xml
schema.xml
q=solr&version=2.0&start=0&rows=0
server-enabled
It wasn't clear to me what 'server-enabled' means here. Is it a file name?
If it is file name, wher
I'm using the XPathEntityProcessor to parse an xml structure that looks like
this:
Joe Smith
World Atlas
Content I want is here
More content I want is here.
Still more content here.>/p>
The author and title parse out fine:
Hi Mark:
I did try that. The problem is that you can't tell
FileSystemXmlApplicationContext to load with a different ClassLoader.
-John
On Wed, Jul 1, 2009 at 4:20 PM, Mark Miller wrote:
> I havn't done much a spring in a while, and I've never done anything
> hardcore, but could you:
>
> i
I havn't done much a spring in a while, and I've never done anything
hardcore, but could you:
in your plugin code, grab the classloader of the plugin class and init the
spring context with it?
org.springframework.core.io.DefaultResourceLoader#setClassLoader
Or you could use the solr classloader
Hi guys:
What is the plan with this issue? Should there be a bug created?
I am having a similar issue from a different angle:
1) using spring which is instantiating beans when the plugins are loaded
2) classloader mismatch.
3) only way to resolve, copy my jars to solr.war's WEB-INF/lib
For performance reasons, we're attempting to build the index used with
Solr, directly in Lucene. It works for the most part fine, but I'm
having issue when it comes to stemming. I'm guessing this is due to a
mismatch in how Lucene is stemming, with how Solr stems during its
queries or somet
Gurjot -
Another option - Solr 1.3 provides runtime statistics through JMX:
http://wiki.apache.org/solr/SolrJmx
cheers,
--bemansell
On Wed, Jul 1, 2009 at 9:05 AM, Gurjot Singh wrote:
> Hi,
> Is there a way to monitor the number of search queries made on the solr
> index.
>
> Thanks
> Gurjot
>
We have several thousands of xml files in database that we load it to solr
master
The Database uses "http" connection and transfer those files to solr master.
Solr then translate xml files to their lindex.
We are experiencing issue with close/open connection in the firewall and very
very sl
I'm using the DIH to index records from a relational database. No problems,
everything works great. But now, due to the size of index (70GB w/ 25M+
docs) I need to shard and want the DIH to distribute documents evenly
between two shards. Current approach is to modify the sql query in the
config fil
By removing both the stopwordfilterFactory and SynonymfilterFactory, the
indexing time per doc has reduced drastically to 2 to 5 ms per doc.
Next I will try out StreamingServer. Any distinct advantages of using
StreamingServer
Thanks,
Kalyan Manepalli
-Original Message-
From: Manepalli
Hi,
5GB heap sounds quite big, let along the 8 GB heap. I would try simple stuff
like jmap to see what's eating the memory, and if that doesn't work I'd try
using a profiler.
Turn off norms if you don't need them, and either use trie-based fields for
date if you have them and sort by them, o
Our max heap was configured to use 5GB. It has been running fine until we
tried to deploy a new queryConverter for our SpellcheckComponent. After
which, we upped our heap to 8GB and still had issues.
Solr is the only webapp running on Tomcat.
We are using sorting and faceting, but again, hadn'
Regarding the analysis, we do couple of things during indexing. First is use a
dictionary text file for stopword filter factory. Secondly we use synonym text
file for SynonymfilterFactory. I will test the indexing speed by temporarily
removing both of them.
Thanks,
Kalyan Manepalli
-Origin
I, too, can confirm 1.4 is solid.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Ed Summers
> To: solr-user@lucene.apache.org
> Sent: Wednesday, July 1, 2009 8:47:26 AM
> Subject: solr v1.4 in production?
>
> Here at the Library of Cong
Kalyan,
150/200 ms per 1 document to index seems too long, but it really depends on how
much analysis is going on and size of docs. 32 threads seems too high, unless
your Solr server really has 32 cores.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Mess
Hi,
Could it simply be the case that you really do need all that memory that the
JVM start consuming with time? How large of a heap are you using, is Solr the
only webapp in your TOmcat, and are you using sorting or faceting?
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
Here are some specs for my indexer.
Indexer is custom Java code that reads data from DB and other services builds
the solrDocument and submits it using SolrJ via Http. Indexer is doing a bit of
work for building the documents. The overhead is around 30 to 40ms. For every
document addition solr t
Kalyan,
Using SolrJ? Use the StreamingServer, it's nice and fast.
Alternatively, start multiple indexing threads (match the number of Solr server
CPU cores) and index from there.
Send batches of docs, not one by one.
Don't commit or optimize until you are done.
Otis
--
Sematext -- http://semat
Kalyan,
Tell us about your indexer. Is it DIH-powered? Custom Java code,
perhaps, using SolrJ indexing over HTTP? Is your indexer doing a lot
of work itself to preprocess documents before sending to Solr?
Erik
On Jul 1, 2009, at 3:42 PM, Manepalli, Kalyan wrote:
Hi,
On Wed, Jul 1, 2009 at 3:20 PM, Chris Harris wrote:
> There seems to be a near-universal condemnation in best practices
> guides of using floating point types to store prices, and yet this is
> exactly what Solr's example schema.xml does.
Decimals may not have exact representations in binary float
Hi,
I have a very generic question regarding indexing. In my current
app, I have about 450,000 docs each doc size around 2k. The total indexing time
is around 2hrs.
Now due to multi language support, the number of documents is increasing to 2.0
million. The total indexing time is exc
There seems to be a near-universal condemnation in best practices
guides of using floating point types to store prices, and yet this is
exactly what Solr's example schema.xml does.
This leads to a couple of questions:
0. Is my above assessment just wrong?
1. Is there something unique about Solr
Or if you must for some reason, you can raise the limit with the following
system property:
org.mortbay.http.HttpRequest.maxFormContentSize=50
You could also do it in the servlet context, and I think there is even
a way in jetty.xml.
--
- Mark
http://www.lucidimagination.com
On Thu, Jul 2, 2009 at 12:14 AM, GiriGG wrote:
>
> Hi All,
>
> I am trying to do a distributed search and getting the below error. Please
> let me know if you know how to solve this issue.
>
> 18:20:28,202 ERROR [STDERR] Caused by:
> org.apache.solr.common.SolrException:
> *Form_too_large*
> __ja
Hi All,
I am trying to do a distributed search and getting the below error. Please
let me know if you know how to solve this issue.
18:20:28,200 ERROR [STDERR]
org.apache.solr.client.solrj.SolrServerException: Error executing query
18:20:28,200 ERROR [STDERR] at
org.apache.solr.client.solrj.
We recently created a custom class for our spellchecking implementation in
Solr. We decided to include the class in a custom jar and deployed it to
the /lib directory in solr_home to use it as a plugin.
After a while (about 12 hours), the heap usage for Solr slowly starts to
rise, and we eventua
Thank you for the clarification.
Kevin
On Wed, Jul 1, 2009 at 12:30 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:
> Unfortunately, Trie fields cannot be used for faceting. Faceting works on
> indexed tokens and trie stores multiple tokens (in its own encoding) into
> the index. This
Hi Shalin,
Sorry for the confusion but i dont have separate index fields. I have all
information in only one index field descp. Now is it possible what you
explained.
Thanks,
Sumit
On Wed, Jul 1, 2009 at 10:16 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:
> On Wed, Jul 1, 2009 at 10
On Wed, Jul 1, 2009 at 10:01 PM, Sumit Aggarwal
wrote:
> Hi Shalin,
> specifying facet.query='small'&facet.query=large will actually filter the
> result also.and it wont give me facet count for both at the same
> time...
> i will give total resultset for both these terms.
No. facet.query wil
Can any one please give me some insight best/industry acceptable
configuration for solr.
My Requirement is :
1. Indexes scaled to 3 boxes. (What is best partitioning index strategies)
2. Replicated Index servers for each box...
3.. Each box will have than multiple index folders also based on type o
Yes, I reindexed the entire repository after each of my changes. Here is the
output with debug on.
== DEBUG OUTPUT BEGIN ==
0
83
standard
10
0
content
on
*,score
on
创意或商业创新、
on
dismax
2.2
创意或商业创新、
创意或商业创新、
+Disjunct
Hi Shalin,
specifying facet.query='small'&facet.query=large will actually filter the
result also.and it wont give me facet count for both at the same time...
i will give total resultset for both these terms. since i am very new to
solr so i dont understand how facet counting behaves in that ca
On Wed, Jul 1, 2009 at 8:19 PM, Kevin MacClay wrote:
> I'm interested to see the performance benefits of using "TrieRange" fields
> in Solr 1.4, but I am running into some problems giving them a try. When I
> retrieve facet counts against a TrieRange field, the values are garbled and
> the first
We're using recent nightly snapshots of Solr in various applications,
and also our (Lucid's) certified distributions which include many
1.4'ish goodies in a supportable fashion.
So, yeah, I definitely have no qualms about recommending trunk or
nightly builds of Solr. Granted, of course, th
Gurjot Singh schrieb:
Hi,
Is there a way to monitor the number of search queries made on the
solr index.
http://localhost:8983/solr/admin/stats.jsp
Look for "requests :".
Michael Ludwig
Also, see this thread:
http://www.lucidimagination.com/search/document/c55ea357cd4749e9/upgrade_to_solr_1_4
On Wed, Jul 1, 2009 at 9:12 PM, Matthew Runo wrote:
> We're using an svn grab of 1.4 in production mostly to get the Java
> replication code. We don't have any problems to report.
>
> Her
On Wed, Jul 1, 2009 at 9:42 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:
> On Wed, Jul 1, 2009 at 8:25 PM, Sumit Aggarwal
> wrote:
>
>> The example given says i can specify only one term as facet prefix. My
>> Requirement is i should be able to pass few set of facet terms which will
On Wed, Jul 1, 2009 at 8:25 PM, Sumit Aggarwal wrote:
> The example given says i can specify only one term as facet prefix. My
> Requirement is i should be able to pass few set of facet terms which will
> return me facet count for those terms only..
>
> So i wanted to do some thing like
> q=re
Hi,
Is there a way to monitor the number of search queries made on the solr
index.
Thanks
Gurjot
We're using an svn grab of 1.4 in production mostly to get the Java
replication code. We don't have any problems to report.
Here's the version we're using:
1.4-dev 749558:749756M - built on 2009-03-03 at 13:10:05
Thanks for your time!
Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.c
Hi Bill,
The example given says i can specify only one term as facet prefix. My
Requirement is i should be able to pass few set of facet terms which will
return me facet count for those terms only..
So i wanted to do some thing like
q=red dress
facet=true
facet.field=descp
facet.mincount=1
fac
I'm interested to see the performance benefits of using "TrieRange" fields
in Solr 1.4, but I am running into some problems giving them a try. When I
retrieve facet counts against a TrieRange field, the values are garbled and
the first three counts are erroneous. Am I doing something wrong?
In s
Hi agree that faceting might be the thing that defines this app. The app is
mostly snappy during daytime since we optimize the index around 7.00 GMT.
However faceting is never snappy.
We speeded things up a whole bunch by creating various "less cardinal"
fields from the originating publishedDate w
Koji Sekiguchi schrieb:
I'm not a Windows user, but I think you can use Linux command (e.g.
patch, to apply SOLR-284 patch to Solr nightly build) on cygwin
environment.
The standalone patch utility for Win32 is another option.
http://gnuwin32.sourceforge.net/packages/patch.htm
Michael Ludwig
You can use a facet query. Here is an example from the Solr Wiki:
http://wiki.apache.org/solr/SimpleFacetParameters#head-1da3ab3995bc4abcdce8e0f04be7355ba19e9b2c
Bill
On Wed, Jul 1, 2009 at 8:34 AM, Sumit Aggarwal wrote:
> >
> > Suppose i wanted to search for red dress and i want to get facet
ahammad wrote:
Hello,
I am trying to install a patch for Solr
(https://issues.apache.org/jira/browse/SOLR-284) but I'm not sure how to do
it in Windows.
I have a copy of the nightly build, but I don't know how to proceed. I
looked at the HowToContribute wiki for patch installation instructions,
Hi all,
I'm interested in exploring the use of TermsComponent, but I don't want to
upgrade Solr to 1.4 until it's been officially released. I've tried
extracting the component and building it as an external lib but I'm having
problems getting it working.
I've copied TermsComponent and TermsParams
thanks. that's what I was looking for.
On Mon, Jun 22, 2009 at 4:15 PM, Chris Hostetter
wrote:
> : and then ask,
> :- how can i set the value of query so that it is reflected in the 'q'
> : node of the search results e.g. solr.
> : the example 'process' method above works, but the original
Here at the Library of Congress we've got several production Solr
instances running v1.3. We've been itching to get at what will be v1.4
and were wondering if anyone else happens to be using it in production
yet. Any information you can provide would be most welcome.
//Ed
On Mon, 29 Jun 2009 15:10:59 +0100
Ben wrote:
> Hi Erik,
>
> I'm not sure exactly how much context you need here, so I'll try to keep
> it short and expand as needed.
>
> The column I am faceting contains a comma deliniated set of vectors.
> Each vector is made up of {Make,Year,Model} e.g.
>
>
> Suppose i wanted to search for red dress and i want to get facet count for
> term size-medium, size-large... Basically i wanted to get facet count for
> some predefined terms in result set. How can i do it?
> once i got facet count now i want result set for red dress and size-medium.
> i hope i
Suppose i wanted to search for red dress and i want to get facet count for
term size-medium, size-large... Basically i wanted to get facet count for
some predefined terms in result set. How can i do it?
once i got facet count now i want result set for red dress and size-medium.
i hope i can achive
David Baker wrote:
I am trying to index a solr server from a nightly build. I get the
following error in my catalina.out:
26-Jun-2009 5:52:06 PM
org.apache.solr.update.processor.LogUpdateProcessor
finish
On Wed, Jul 1, 2009 at 5:15 PM, con wrote:
>
> Hi
> I am setting up a solr search index which makes use of dataimporthandler.
> Here i have a column, called rank. it will contain values from 1 to 10.
> But this is of type varchar in oracle db and TextField in corresponding
> schema.
>
I think it
On Wed, Jul 1, 2009 at 5:07 PM, Ben wrote:
> my brain was switched off. I'm using SOLRJ, which means I'll need to
> specify multiple :
>
> addMultipleFields(solrDoc, "vector", "vectorvalue", 1.0f);
>
> for each value to be added to the multiValuedField.
>
> Then, with luck, the simple wildcard q
Hi
I am setting up a solr search index which makes use of dataimporthandler.
Here i have a column, called rank. it will contain values from 1 to 10.
But this is of type varchar in oracle db and TextField in corresponding
schema.
The issue here is when i do a:
sort=rank asc in the query, i am n
my brain was switched off. I'm using SOLRJ, which means I'll need to
specify multiple :
addMultipleFields(solrDoc, "vector", "vectorvalue", 1.0f);
for each value to be added to the multiValuedField.
Then, with luck, the simple wildcard query will be executed over each
individual value when l
2009/7/1 Ben
> I'm not quite sure I understand exactly what you mean.
> The string I'm processing could have many tens of thousands of values... I
> hope you aren't implying I'd need to split it into many tens of thousands of
> "columns".
No, that is not what I meant. It will be one field (colu
I'm not quite sure I understand exactly what you mean.
The string I'm processing could have many tens of thousands of values...
I hope you aren't implying I'd need to split it into many tens of
thousands of "columns".
If you're saying what I think you're saying, you're saying that I should
le
To get the desired efffect I described you have to do the split before you
send the document to solr. I'm not aware of an analyzer that can split one
field value into several field values. The analyzers and tokenizers do
create tokens from field values in many different ways.
As I see it you have
Is there a way in the Schema to specify that the comma should be used to
split the values up?
e.g. Can I specify my "vector" field as multivalue and also specify some
sort of tokeniser to automatically split on commas?
Ben
Uwe Klosa wrote:
You should split the strings at the comma yourself
You should split the strings at the comma yourself and store the values in a
multivalued field? Then wildcard search like A1_* are not a problem. I don't
know so much about facets. But if they work on multivalued fields that
should be then no problem at all.
Uwe
2009/7/1 Ben
> Yes, I had done t
Yes, I had done that... however, I'm beginning to see now that what I am
doing is called a "wildcard query" which is going via Lucene's queryparser.
Lucene's query parser doesn't not support the regexp idea of character
exclusion ... i.e. I'm not trying to match "[" I'm trying to express
"Match
You have to escape all special characters. Even [ to \[
Have a look here http://lucene.apache.org/java/2_4_0/queryparsersyntax.html
Uwe
2009/7/1 Ben
> I only just noticed that this is an exception being thrown by the
> lucene.queryParser. Should I be mailing on the lucene list, or is it ok
> h
I only just noticed that this is an exception being thrown by the
lucene.queryParser. Should I be mailing on the lucene list, or is it ok
here?
I'm beginning to wonder if the "fq" can handle the type of character
exclusion I'm trying in the RegExp.
Escaping the string also doesn't work :
Ca
On Wed, Jul 1, 2009 at 2:56 AM, ashokc wrote:
>
> I have the following fieldType that processes korean/chinese/japanese text
>
>
>
>
>
>
>
>
>
>
> When I supply korean words/phrases in the query, I do get several expected
> Korean URLs as search results,
Sorry , I was too cryptic.
I you follow this link
http://projecte01.development.barcelonamedia.org/fonetic/
you will see a "Top Words" list (in Spanish and stemmed) in the list there
is the word "si" which is in 20649 documents.
If you click at this word, the system will perform the query
Ben wrote:
The exception SOLR raises is :
org.apache.lucene.queryParser.ParseException: Cannot parse
'vector:_*[^_]*_[^_]*_[^_]*': Encountered "]" at line 1, column 12.
Was expecting one of:
"TO" ...
...
...
Ben wrote:
Passing in a RegularExpression like "[^_]*_[^_]*" (e.g. match
Michael Ludwig schrieb:
Kraus, Ralf | pixelhouse GmbH schrieb:
When I am searching for ONE word with an german umlaut like
"kräuterkeckse" (the right word is kräuterkekse) the spellchecker
gives me two corrections :
Spellcheck for kr = kren
Spellcheck for uterkeksse = butterkekse
WHY is SOLR b
akinori schrieb:
When I search "make for", solr returns words include both "make" and
"for", but when I type more than 3 words such as "in order to", the
result becomes 0 though the index is sure to have several words
including 3 of the words. 2 words are ok but more than 3 words
resulted zero. W
89 matches
Mail list logo