In solrconfig.xml I was experimenting with Indexing Performance. When I set the
maxDocs (in autoCommit) to say 1 documents the index size is double to if I
just dont use autoCommit (i.e. keep it commented, i.e commit at the end only
after adding documents).
Does autoCommit affect the index
Hi Nagendra,
I tried to use solr-nrt-ra-3.4, while the dataimporthandler does not work.
The error message is:
INFO: created /dataimport:
org.apache.solr.handler.dataimport.DataImportHandler
Dec 6, 2011 1:16:18 AM org.apache.solr.common.SolrException log
SEVERE:
Hello everybody,
I'm trying to use LineEntityProcessor of DIH but somehow without success.
I've create data-lep-config.xml, added request handler in solrconfig.xml.
During full-import I get a response saying that x rows were fetched, 0 docs
added/updated.
I defined also very basic regex for
is it possible to lower the score for synonym matches?
we setup...
admin = administration
but if someone searches specifically for admin, we want those
specific matches to rank higher than matches for administration
--
IntelCompute
Web Design Local Online Marketing
When searching against 1 field, is it possible to have highlighting
returned 2 different ways?
We'd like the full field returned with keywords highlighted, but then
also returned as snippets.
Any possible approaches?
--
IntelCompute
Web Design Local Online Marketing
Hi,
I want to test a custom implementation CommonsHttpSolrServer, which is
required so that we can enable
it to use SSL certificates and proxies when accessing the Solr REST api.
One thing I want to avoid is having to have a Solr instance set up on
every developers sandbox in order
for the tests
Hello,
You could create an other field and link to it the synonym analyzer. When
querying set a lower boost for this field.
Marc.
On Tue, Dec 6, 2011 at 11:31 AM, Robert Brown r...@intelcompute.com wrote:
is it possible to lower the score for synonym matches?
we setup...
admin =
Within one request, it isn't possible to highlight the same field twice
differently (what's the use case here?), but you could either make multiple
requests or copyField to have two stored copies that could be highlighted
separately in a single request.
Erik
On Dec 6, 2011, at 06:01 ,
Mark -
So you want the *server* to be started programmatically? You could use
Jetty's API to do this... or fork a JVM.
As for client-side SolrJ, you can pass an HttpClient to CommonsHttpSolrServer's
constructor to customize how the HTTP connection is configured.
EmbeddedSolrServer - no, it
Hi all,
I have developed a solr request handler in which i am querying for shards
and mergin the results but i do not see any queries in the fiddler. How can
i track or capture the queries from the request handler in the fiddler to
see the queries and what setting i have to do for that. Please
Hi,
I am getting this weird error message `can not sort on multivalued
field: fieldname` on all the indexed fields. This is the full error message
from solr
/headbodyh1HTTP Status 400 - can not sort on multivalued field:
price/h1hr/pbtype/b Status report/ppbmessage/bcan not
sort on
As for XML overloading Solr... certainly it will add processing time to the
situation as well as additional memory requirements. At worst it'd require
more RAM and slow things down, but all depends on scale of ingestion rate and
size of the documents whether it'd be prohibitive.
Erik
Replication is basically a background file transfer, your slave shouldn't
notice.
But what your slave will notice is two things:
1 after replication if your first few queries are slow, you need to
autowarm your caches.
2 you will see some memory footprint increase while autowarming
is
OK, I'm not understanding here. You get the counts and the results if you facet
on a single category field. The facet counts are the counts of the
*values* in that
field. So it would help me if you showed the output of faceting on a single
category field and why that didn't work for you
But
I'm going to defer to the folks who actually know the guts here.
If you've turned down the cache entries for your Solr caches,
you're pretty much left with Lucene caching which is a mystery...
Best
Erick
On Mon, Dec 5, 2011 at 9:23 AM, Jeff Crump jeffrey.cr...@gmail.com wrote:
Yes, and without
Hi,
I've been trying to use the UUIDField in solr to maintain ids of the pages I've
crawled with nutch (as per http://wiki.apache.org/solr/UniqueKey). The use case
is that I want to have the server able to use these ids in another database for
various statistics gathering. So I want the link
Hello,
We're encountering delays of 10+ minutes when trying to delete from our Solr
3.4 instance. We have 335k documents indexed and interface using SolrJ. Our
schema basically consists of a parent object with multiple child objects.
Every object is indexed as a separate document
Hmmm, does this help?
In Solr 1.4 and prior, you should basically set mm=0 if you want the
equivilent of q.op=OR, and mm=100% if you want the equivilent of
q.op=AND. In 3.x and trunk the default value of mm is dictated by the
q.op param (q.op=AND = mm=100%; q.op=OR = mm=0%). Keep in mind the
Looks like you must have a mix of old and new jars.
On Tuesday, December 6, 2011, Pawan Darira pawan.dar...@gmail.com wrote:
Hi
I am trying to upgrade my SOLR version from 1.4 to 3.2. but it's giving me
below exception. I have checked solr home path it is correct.. Please
help
SEVERE:
Details matter. Your analysis chain on the field may well
be the issue.
Look at the terms in the field (admin/schema browser).
Look at debugQuery=on to see how the query is parsed
Look at the admin/analysis page to see the effects of the analysis chain.
You might review:
My previous subject line was not very scannable. Apologies for the re-post,
I'm just hoping to get more eye-balls and hopefully some insights. Thank
you in advance for your time. See below.
-GS
On Mon, Dec 5, 2011 at 1:37 PM, George Stathis gstat...@gmail.com wrote:
Currently, solr grouping
I'm working with SOLR on amainly MS Word, Powerpoint, Excel and PDFs.
Is there a best practice schema.xml and/or solrconfig.xml to use in SOLR when
using theExtractingRequestHandler?
I have been doing tweaks to the default schema to attempt to get facets working
on date modification times, but
Spark:
The code is compiled to be compliant with JDK 1.5 and above. So you will
need to use at least JDK 1.5 for this to work.
BTW, make sure you add the lib path to the dataimporthandler-3.4.0.jar
in you solrconfig.xml. If you want your data import to be searchable in
real time, please make
Sorry to jump into this thread, but are you saying that the facet count is
not # of result hits?
So if I have 1 document with field CAT that has 10 values and I do a query
that returns this 1 document with faceting, that the CAT facet count will
be 10 not 1? I don't seem to be seeing that
Are there any migration utilities to move from an index built by a
Solr 4.0 snapshot to Solr Trunk? The issue is referenced here
http://markmail.org/thread/4ruznwzofyrh776j
https://issues.apache.org/jira/browse/LUCENE-3490
Anyone?
On 12/5/11 11:04 AM, Mark wrote:
*pk*: The primary key for the entity. It is*optional*and only needed
when using delta-imports. It has no relation to the uniqueKey defined
in schema.xml but they both can be the same.
When using in a nested entity is the PK the primary key column of
Hi.
I'm just thinking in the option of using solr to search in a huge document
repository.
My idea is reading
documents(pdf,html,outlook,excel,doc,openoffice,powerpoint...) and extract
the information from them and index it in Solr.
Basically i'm looking for a solution to search in my documents.
Does anyone know if this has been finalized yet?
Hi,
I was trying Solr Join across 2 cores on the same Solr installation. Per
example:
/solr/index1/select?q={!join fromIndex=index2 from=tag to=tag}restaurant
My understanding is that the restaurant query will be executed on index2
and the results of this query will be joined with the documents
On Tue, Dec 6, 2011 at 12:51 PM, Jamie Johnson jej2...@gmail.com wrote:
Does anyone know if this has been finalized yet?
It's subject to change up till release.
--
- Mark
http://www.lucidimagination.com
On 12/6/2011 1:01 AM, Husain, Yavar wrote:
In solrconfig.xml I was experimenting with Indexing Performance. When I set the
maxDocs (in autoCommit) to say 1 documents the index size is double to if I
just dont use autoCommit (i.e. keep it commented, i.e commit at the end only
after adding
Hello,
I need the tf-idf-values from texts and now Im using Apache-Solr.
I am a novice and have some Problems.
My question is, how can I extract the tf-idf-values?
There are many files in the folder apache-solr-3.5.0\example\solr\data\index
but I cant use them.
Is the Output only as a
Is there a way within Solr to instruct the system that a certain set
of values should always appear regardless of their counts when
faceting?
If you're not using Drupal, understand that Solr is an *engine*, not a full
application. You download solr from the website and install it, which is
just basically unpacking it and executing ant -jar start.jar. From there
you send documents to Solr (there are a number of ways to accomplish
this).
Hi Shawn
Absolutely perfect. It is always great reading your answers again and again as
you explain the concepts so very well. Three cheers and thanks for your reply.
Regards,
Yavar
From: Shawn Heisey [s...@elyograg.org]
Sent: Wednesday, December 07,
field name=id type=string stored=true indexed=true required=true /
field name=data type=text_en stored=true indexed=false /
Then sometime later
uniqueKeyid/uniqueKey
(all this in your schema.xml file).
That's it. The data field isn't analyzed at all, so the type is largely
irrelevant. what you
Go for it, it's perfect for that!
Here's a good starting point for you:
http://lucene.apache.org/solr/tutorial.html
/ pål
On Dec 6, 2011, at 6:31 PM, marotosg wrote:
Hi.
I'm just thinking in the option of using solr to search in a huge document
repository.
My idea is reading
Hi Pascal:
I have an issue similar to yours, but also need to facet the joined documents...
I've been playing with various things. There's not much documentation I can
find.
Looking at http://wiki.apache.org/solr/Join, in the fourth example you can see
the join being relegated to a filter
Is there a way to specify the index version solr uses? We're
currently using SolrCloud but with the index format changing I'd be
preferable to be able to specify a particular index format to avoid
having to do a complete reindex. Is this possible?
Hi,
Thanks for this! But your partner-tmo request handler is probably
configured with your ing-content index, no? In my case, I'd like to execute
a dismax query on the fromIndex.
On Tue, Dec 6, 2011 at 2:57 PM, Jeff Schmidt j...@535consulting.com wrote:
Hi Pascal:
I have an issue similar to
Yonik Seeley skrev:
On Mon, Dec 5, 2011 at 6:23 AM, Per Steffensen st...@designware.dk wrote:
Will it be possible to maintain a how-to-use section on
http://wiki.apache.org/solr/NewSolrCloudDesign with examples, e.g. like to ones
on http://wiki.apache.org/solr/SolrCloud,
Yep, it was
You're totally correct. There's actually a link on the DIH page now which
wasn't there when I had read it a long time ago. I'm really looking forward to
4.0, it's got a ton of great new features. Thanks for the links!!
-Original Message-
From: Mikhail Khludnev
Hi, I'm not sure if it would help.
in solrconfig.xml:
!-- Controls what version of Lucene various components of Solr
adhere to. Generally, you want to use the latest version to
get all bug fixes and improvements. It is highly recommended
that you fully re-index after
Andy skrev:
Hi,
add features corresponding to stuff that we used to use in ElasticSearch
Does that mean you have used ElasticSearch but decided to try SolrCloud instead?
Yes, or at least we are looking for altertives right now. Considering
Solandra, SolrCloud, Katta, Riak Search,
Thanks, but I don't believe that will do it. From my understanding
that does not control the index version written, it's used to control
the behavior of some analyzers (taken from some googling). I'd love
if someone told me otherwise though.
On Tue, Dec 6, 2011 at 3:48 PM, Alireza Salimi
: I've been trying to use the UUIDField in solr to maintain ids of the
: pages I've crawled with nutch (as per
: http://wiki.apache.org/solr/UniqueKey). The use case is that I want to
: have the server able to use these ids in another database for various
: statistics gathering. So I want the
Jamie -
I think the best thing that you could do here would be to lock in a version of
Lucene (all the Lucene libraries) that you use with SolrCloud. Certainly not
out of the realm of possibilities of some upcoming SolrCloud capability that
requires some upgrading of Lucene though, but you
Just FYI that the final piece of SOLR-2382 has not been committed, and instead
has been spun off to SOLR-2943. So it you're using Trunk and you need the
ability to persist a cache on disk and then read it back again later as an DIH
entity, you'll need both SOLR-2943 and also a cache
So if I wanted to used lucene index 3.5 with SolrCloud I should be
able to just move the 3.5 jars in and remove any of the snapshot jars
that are present when I build locally?
On Tue, Dec 6, 2011 at 4:06 PM, Erik Hatcher erik.hatc...@gmail.com wrote:
Jamie -
I think the best thing that you
Just getting started with DIH and I have a very simple setup.
My dih-config.xml is querying my postgres db and does a select on a crosstab()
table that returns just 100 rows.
When i do a full-import i see that 22 docs fail but what debug settings do i
have to tweak to see why the docs failed?
Oh geez... no... I didn't mean 3.x JARs... I meant the trunk/4.0 ones that are
there now.
Erik
On Dec 6, 2011, at 16:22 , Jamie Johnson wrote:
So if I wanted to used lucene index 3.5 with SolrCloud I should be
able to just move the 3.5 jars in and remove any of the snapshot jars
Hello Michael,
I can help you with using the UIMA UpdateRequestProcessor [1]; the current
implementation uses in-memory execution of UIMA pipelines but since I was
planning to add the support for higher scalability (with UIMA-AS [2]) that
may help you as well.
Tommaso
[1] :
Problem is that really doesn't help me. We still have the same issue
that when the 4.0 becomes final there is no migration utility from
this pre 4.0 version to 4.0, right?
On Tue, Dec 6, 2011 at 4:36 PM, Erik Hatcher erik.hatc...@gmail.com wrote:
Oh geez... no... I didn't mean 3.x JARs... I
Wasn't sure which mailing list to send this to. I'm writing an application
that can be configured to run directly with lucene or with solr and I'm trying
to figure out whether optimization of the index should be totally eliminated,
eliminated in the lucene case only or what.
If I read the 3.5
Right. Not sure what to advise you. We have worked on this problem with our
LucidWorks platform and have some tools available to do this sort of thing, I
think, but it's not generally something that you can do with Lucene going from
a snapshot to a released version. Perhaps others with
What about modifying something like SolrIndexConfig.java to change the
lucene version that is used when creating the index? (may not be the
right place, but is something like this possible?)
On Tue, Dec 6, 2011 at 5:13 PM, Erik Hatcher erik.hatc...@gmail.com wrote:
Right. Not sure what to
On Tue, Dec 6, 2011 at 5:04 PM, Scott Smith ssm...@mainstreamdata.com wrote:
If I read the 3.5 lucene javadocs, optimize() has been deprecated because it
is rarely justified with the current lucene index implementation
It's functionality is not being deprecated... it's just that the
method is
Thanks for the response Mark. Is there any details on the expected
Freeze date (not looking for exacts) for this? I'm thinking I'm going
to catch hell if I tell our team we need to reindex the entire data
set.
On Tue, Dec 6, 2011 at 1:25 PM, Mark Miller markrmil...@gmail.com wrote:
On Tue, Dec
In my experience with DIH, the errors for failed documents end up in the
log files. Catalina.out for Tomcat.
Can you check your log files?
Cody
-Original Message-
From: Alan Miller [mailto:alan.mill...@gmail.com]
Sent: Tuesday, December 06, 2011 1:25 PM
To: Solr
Subject: debugging
: I'm working on using trigrams for similarity matching on some data,
: where there's a canonical name and lots of personalised variants, e.g.:
:
: canonical: My Wonderful Thing
: variant: My Wonderful Thing (for Matt Patterson)
I'm really not sure why you would need trigrams for something
OK, why not just bump the boost on the site field way higher than you
already have?
A note of caution. You'll drive yourself crazy trying to get *exact*
ordering based
on some arbitrary (and usually changing) set of requirements. Put what you have
working in front of product management and see if
Cool! thanks, Hoss.
On Mon, Dec 5, 2011 at 6:40 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
: Have you looked at:
: http://wiki.apache.org/solr/SolrCaching
this page was actually a little light on details about fieldValueCache, so
i tried to fill in some of hte blanks in the latest
(11/12/07 3:42), Nejla Karacan wrote:
Hello,
I need the tf-idf-values from texts and now Im using Apache-Solr.
I am a novice and have some Problems.
My question is, how can I extract the tf-idf-values?
Nejla,
You can use TermVectorComponent on your field which is needed to be set
Hi all,
I'm wondering if it's possible to configure solrconfig.xml so that the
updateHandler invokes an updateRequestProcessorChain?
At the moment I have modified the /update requestHandler to invoke an
updateRequestProcessorChain, which is working nicely. The catch is that I
have to POST
You should use the LWE forums for questions about it.
The crawlers are hard coded to use the lucid-update-chain currently. If you
want them to use the UIMA processor you will have to modify that chain
definition to include it.
On Dec 6, 2011, at 8:16 PM, Jan wrote:
Hi all,
I'm wondering
thanks for the information
2011/12/6 Nagendra Nagarajayya nnagaraja...@transaxtions.com
Spark:
The code is compiled to be compliant with JDK 1.5 and above. So you will
need to use at least JDK 1.5 for this to work.
BTW, make sure you add the lib path to the dataimporthandler-3.4.0.jar in
I checked that. there are only latest jars. I am not able to figure out the
issue.
On Tue, Dec 6, 2011 at 6:57 PM, Mark Miller markrmil...@gmail.com wrote:
Looks like you must have a mix of old and new jars.
On Tuesday, December 6, 2011, Pawan Darira pawan.dar...@gmail.com wrote:
Hi
I
Hi, one of problem is now alleviated.
Number of lines with can't identify protocol in lsof output is now
reduced very much. Earlier it kept increasing upto ulimit -n thus causing
Too many open files error but now it is contained to a quite lesser
number. This happened after I changed
Hello list,
We've noticed quite huge strain on the filterCache in facet queries against
trigram fields (see schema in the end of this e-mail). The typical query
contains some keywords in the q parameter and boolean filter query on other
solr fields. It is also facet query, the facet field is of
AFAIK DIH jar is separated from Solr war. Isn't there a chance to use DIH
from 4.0 in Solr 3.4?
James,
Sorry for hijacking the thread.
But, do you have a chance to review
https://issues.apache.org/jira/browse/SOLR-2947 I want to provide a patch
for fixing multi-threading in DIH. But formally
hi Everyone,
I am wondering how much benefit I get if I move from SQL server to Solr
in my text -baed search project.
Any help is apprechiated !
best
Mersad
If you mean debugging the queries, you can use eclipse+jetty plugin setup (
http://code.google.com/p/run-jetty-run/) with solr web app (
http://hokiesuns.blogspot.com/2010/01/setting-up-apache-solr-in-eclipse.html
)
On Tue, Dec 6, 2011 at 2:57 PM, Kashif Khan uplink2...@gmail.com wrote:
Hi all,
72 matches
Mail list logo