It is possible to store the entire conf/ directory somewhere. To store
only the schema.xml file, try soft links or the XML include feature:
conf/schema.xml includes from somewhere else.
On Tue, Aug 21, 2012 at 11:31 PM, Alexander Cougarman wrote:
> Hi. For our Solr instance, we need to put the sc
There is no copyField in the schema. You have to store the parsed
text in a field which is stored! Highlighting works on stored fields.
There is no "text" field in the schema. I don't know how the DIH
automatically creates it.
On Tue, Aug 21, 2012 at 2:10 PM, anarchos78
wrote:
> Any help? Anyone
Another option is to take the minimum time interval and record every
active interval during an employee record. Make a compound key of the
employee and the time range. (Look at the SignatureUpdateProcessor for
how to do this.) Add one multi-valued field that contains all of the
time intervals for w
Solr has a separate feature called 'autoCommit'. This is configured in
solrconfig.xml. You can set Solr to commit all documents every N
milliseconds or every N documents, whichever comes first. If you want
intermediate commits during a long DIH session, you have to use this
or make your own script
How do you separate the documents among the shards? Can you set up the
shards such that one "collapse group" is only on a single shard? That
you never have to do distributed grouping?
On Tue, Aug 21, 2012 at 4:10 PM, Tirthankar Chatterjee
wrote:
> This wont work, see my thread on Solr3.6 Field co
ZK has a 'chroot' feature (named after the Unix multi-tenancy feature).
http://zookeeper.apache.org/doc/r3.2.2/zookeeperProgrammers.html#ch_zkSessions
https://issues.apache.org/jira/browse/ZOOKEEPER-237
The last I heard, this feature could work for making a single ZK
cluster support multiple Solr
Hi. For our Solr instance, we need to put the schema.xml file in a different
location than where it resides now. Is this possible? Thanks.
Sincerely,
Alex
Hello,
- embedded server is not the best way, usually
- lucene perfectly indexes in multiple thread concurrently. Single
writer per directory is called concurrently.
- with solrj you can use ConcurrentUpdateSolr server, or call
StreamingUpdateSolrServer in multiple threads, or just
We have a webapp that has embedded solr integrated in it.
It essentially handles creating separate index (core) per client and it is
currently setup such that there can only be one index write operation per
core.
Say if we have 1 Million documents that needs be to Indexed, our app reads
each docume
You can use a connect string of host:port/path to 'chroot' a path. I
think currently you have to manually create the path first though. See
the ZkCli tool (doc'd on SolrCloud wiki) for a simple way to do that.
I keep meaning to look into auto making it if it doesn't exist, but
have not gotten to i
Jack
Reading through the documentation for UpdateRequestProcessor my
understanding is that its good for handling processing of documents before
analysis.
Is it true that processAdd (where we can have custom logic) is invoked once
per document and is invoked before any of the analyzers gets invoke
This wont work, see my thread on Solr3.6 Field collapsing
Thanks,
Tirthankar
-Original Message-
From: Tom Burton-West
Date: Tue, 21 Aug 2012 18:39:25
To: solr-user@lucene.apache.org
Reply-To: "solr-user@lucene.apache.org"
Cc: William Dueber; Phillip Farber
Subject: Scalability of Solr R
Hello all,
We are thinking about using Solr Field Collapsing on a rather large scale
and wonder if anyone has experience with performance when doing Field
Collapsing on millions of or billions of documents (details below. ) Are
there performance issues with grouping large result sets?
Details:
W
Thanks Jack for your reply.
I don't have much documents which have a null field value.
I added ReversedWildcardFilterFactory to test the performance improvement
only, but that didn't help.
What else changes I can do to the fieldType?
thanks
Srini
--
View this message in context:
http://luce
: Our environment is Solr 3.6.1. I have the following fieldType. There is a
: field called 'body' of this fieldType. When I make a query: q=body:*, it is
: talking longer than the expected. What are the changes I need to do to this
: fieldType for better query performance? Some other fieldTypes in
Any help? Anyone?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-search-Tika-extracted-text-from-PDF-not-return-highlighting-snippet-tp3999647p4002513.html
Sent from the Solr - User mailing list archive at Nabble.com.
You could try a "[* TO *]" range query. It will also match all documents
which have a non-null field value.
So: q=body:[* TO *]
Actually, I see that you have reverse wildcard enabled. Try removing that.
"*" would normally map to PrefixQuery, which is normally more efficient than
a WildcardQue
Our environment is Solr 3.6.1. I have the following fieldType. There is a
field called 'body' of this fieldType. When I make a query: q=body:*, it is
talking longer than the expected. What are the changes I need to do to this
fieldType for better query performance? Some other fieldTypes in our sche
Hi all,
I would like to use a single zookeeper cluster to manage multiple Solr cloud
installations. However, the current design of how Solr uses zookeeper seems to
preclude that. Have I missed a configuration option to set a zookeeper prefix
for all of a Solr cloud configuration directories?
Perfect. I reindexed the whole index and everything worked fine. The
exception was just a little bit confusing.
Best
Dirk
Am 21.08.2012 14:39 schrieb "Jack Krupansky" :
> Did you explicitly run the IndexUpgrader before adding new documents?
>
> In theory, you don't have to do that, but... who know
can someone let me know how to configure DIH in a cloud environment, should
it point to one specific server or to the load balancer for distributing dih
on all the servers.
One problem with the load balancer approach is that there is no good way to
tell whether a dih is already running in the clou
Yes, the mm is 100%. Thank you for a detailed answer.
Regards!
Dalius Sidlauskas
On 21/08/12 15:21, Jack Krupansky wrote:
Solr doesn't actually "know" any natural language, so it has no way of
assessing whether two token streams "have the same meaning." In your
case, the surface forms/syntax a
OK, one other piece of information that would help a lot (or maybe
lead you to the
answer). Attach &debugQuery=on to the URL and look at the debug information,
particularly the parsed query down below. I'm going to guess that
you're searching
on something that is actually found in nearly all your d
On 8/21/2012 6:41 AM, Alexandre Rafalovitch wrote:
I am doing an import of large records (with large full-text fields)
and somewhere around 30 records DataImportHandler runs out of
memory (Heap) on a TIKA import (triggered from custom Processor) and
does roll-back. I am using store=false and
Thanks Jack,
On 08/20/2012 06:41 PM, Jack Krupansky wrote:
How are you ingesting the offic documents? SolrCell, or some other
method?
I am using pytika, a python module that uses Tika to extract the content.
I then add it using a python tool called sunburnt.
Do you have CopyFields?
Yes I ha
Did you explicitly run the IndexUpgrader before adding new documents?
In theory, you don't have to do that, but... who knows for sure.
While you wait for one of the hard-core Lucene guys to respond, you could
try IndexUpgrader, if you haven't already.
OTOH, if you are in fact reindexing (rath
Solr doesn't actually "know" any natural language, so it has no way of
assessing whether two token streams "have the same meaning." In your case,
the surface forms/syntax are subtly different - two separate terms vs. a
single source term with embedded punctuation.
It appears that you are probb
Hello, here is my index and index analyzer configuration:
replacement=" "/>
Search for "d Osona" and "d’Osona" creates "d" and "osona" tokens. But
ParsedQuery is different:
#1 "d Osona"
+((
DisjunctionMaxQuery((search_definitions:d | search_title:d))
DisjunctionMaxQuery((search_definition
Hello,
I am trying to make our search application Solr 4.0 (Beta) ready and
elaborate on the tasks necessary to accomplish this.
When I try to reindex our documents I get the following exception:
auto commit error...:java.lang.UnsupportedOperationException: this codec
can only be used for readin
Read through the update processor stuff. Maybe that might suggest a good
place to put processing that should occur after all input has been analyzed.
http://wiki.apache.org/solr/UpdateRequestProcessor
-- Jack Krupansky
-Original Message-
From: ksu wildcats
Sent: Tuesday, August 21, 2
Hi James,
Thanks for the suggestions.
Actually it is cacheLookup="ent1.id" . had misspelt it. Also, I will be
needing the transformers mentioned as there are other columns as well.
Actually tried using the 3.5 DIH jars in 3.6.1 and indexed the same and the
indexing was successful. But I wanted
Hello,
I am doing an import of large records (with large full-text fields)
and somewhere around 30 records DataImportHandler runs out of
memory (Heap) on a TIKA import (triggered from custom Processor) and
does roll-back. I am using store=false and trying some tricks and
tracking possible memo
Thanka again Erick.
I have read some of Yonik's posts also.
I think 1M is closer to my number (i'm more interested in using Solr to improve
the quality of search over a limited doc with lots of metadata set than
quantity).
I'll make sure to stress test.
Cheers,/Steven
> Date: Tue, 21 Aug 2012 06
Hmmm, how many employees/services/dates are we talking about
here? Is the cross product 1M? 1B? 1G records?
You could try the Solr join stuff (Solr 4x), be aware that it performs
best on join fields with a limited number of unique values.
Best
Erick
On Tue, Aug 21, 2012 at 4:05 AM, Stefan Burkar
Steven:
Nope, I don't have any benchmarks off the top of my head.
You could probably compare this pretty quickly by using one of the
benchmarking tools (http://wiki.apache.org/solr/BenchmarkingSolr)
jMeter works as well, using two different schemas and
configuring, say, an edismax request handler
On 21 August 2012 15:15, prasad deshpande wrote:
>
Use a lunar calendar to synchronise them with the phase
of the moon.
Regards,
Gora
P.S. You might wish to clarify what you mean: Synchronise
Solr import processes with what?
Hi Jack
Thanks for your answer. Do I understand that correctly that I must
create a "merge-entity" that contains all the different
validFrom/validUntil dates as fields (and of course the other
search-related fields).
This would mean that the number of index entries is equal to the
number of all p
Many Thanks Erick.
Are you aware of any real world metrics or best practice/pattern samples that
use a lot of fields?
I'm looking to get an ideas of the pros/cons as I scale.
On what you're saying it defo looks like I'll try keeping a flat structure
(which means perhaps 300 fields) but given some
Hi Shalin,
Thanks very much for your detailed explanation!
Regards,
Yandong
2012/8/21 Shalin Shekhar Mangar
> On Tue, Aug 21, 2012 at 8:47 AM, Yandong Yao wrote:
>
> > Hi guys,
> >
> > From http://wiki.apache.org/solr/MergingSolrIndexes, it said 'Using
> > "srcCore", care is taken to ensure
On Tue, Aug 21, 2012 at 8:47 AM, Yandong Yao wrote:
> Hi guys,
>
> From http://wiki.apache.org/solr/MergingSolrIndexes, it said 'Using
> "srcCore", care is taken to ensure that the merged index is not corrupted
> even if writes are happening in parallel on the source index'.
>
> What does it mea
40 matches
Mail list logo