Re: Use a different folder for schema.xml

2012-08-21 Thread Lance Norskog
It is possible to store the entire conf/ directory somewhere. To store only the schema.xml file, try soft links or the XML include feature: conf/schema.xml includes from somewhere else. On Tue, Aug 21, 2012 at 11:31 PM, Alexander Cougarman wrote: > Hi. For our Solr instance, we need to put the sc

Re: Solr search – Tika extracted text from PDF not return highlighting snippet

2012-08-21 Thread Lance Norskog
There is no copyField in the schema. You have to store the parsed text in a field which is stored! Highlighting works on stored fields. There is no "text" field in the schema. I don't know how the DIH automatically creates it. On Tue, Aug 21, 2012 at 2:10 PM, anarchos78 wrote: > Any help? Anyone

Re: How to design index for related versioned database records

2012-08-21 Thread Lance Norskog
Another option is to take the minimum time interval and record every active interval during an employee record. Make a compound key of the employee and the time range. (Look at the SignatureUpdateProcessor for how to do this.) Add one multi-valued field that contains all of the time intervals for w

Re: Does DIH commit during large import?

2012-08-21 Thread Lance Norskog
Solr has a separate feature called 'autoCommit'. This is configured in solrconfig.xml. You can set Solr to commit all documents every N milliseconds or every N documents, whichever comes first. If you want intermediate commits during a long DIH session, you have to use this or make your own script

Re: Scalability of Solr Result Grouping/Field Collapsing: Millions/Billions of documents?

2012-08-21 Thread Lance Norskog
How do you separate the documents among the shards? Can you set up the shards such that one "collapse group" is only on a single shard? That you never have to do distributed grouping? On Tue, Aug 21, 2012 at 4:10 PM, Tirthankar Chatterjee wrote: > This wont work, see my thread on Solr3.6 Field co

Re: Co-existing solr cloud installations

2012-08-21 Thread Lance Norskog
ZK has a 'chroot' feature (named after the Unix multi-tenancy feature). http://zookeeper.apache.org/doc/r3.2.2/zookeeperProgrammers.html#ch_zkSessions https://issues.apache.org/jira/browse/ZOOKEEPER-237 The last I heard, this feature could work for making a single ZK cluster support multiple Solr

Use a different folder for schema.xml

2012-08-21 Thread Alexander Cougarman
Hi. For our Solr instance, we need to put the schema.xml file in a different location than where it resides now. Is this possible? Thanks. Sincerely, Alex

Re: Solr - Index Concurrency - Is it possible to have multiple threads write to same index?

2012-08-21 Thread Mikhail Khludnev
Hello, - embedded server is not the best way, usually - lucene perfectly indexes in multiple thread concurrently. Single writer per directory is called concurrently. - with solrj you can use ConcurrentUpdateSolr server, or call StreamingUpdateSolrServer in multiple threads, or just

Solr - Index Concurrency - Is it possible to have multiple threads write to same index?

2012-08-21 Thread ksu wildcats
We have a webapp that has embedded solr integrated in it. It essentially handles creating separate index (core) per client and it is currently setup such that there can only be one index write operation per core. Say if we have 1 Million documents that needs be to Indexed, our app reads each docume

Re: Co-existing solr cloud installations

2012-08-21 Thread Mark Miller
You can use a connect string of host:port/path to 'chroot' a path. I think currently you have to manually create the path first though. See the ZkCli tool (doc'd on SolrCloud wiki) for a simple way to do that. I keep meaning to look into auto making it if it doesn't exist, but have not gotten to i

Re: Solr Custom Filter Factory - How to pass parameters?

2012-08-21 Thread ksu wildcats
Jack Reading through the documentation for UpdateRequestProcessor my understanding is that its good for handling processing of documents before analysis. Is it true that processAdd (where we can have custom logic) is invoked once per document and is invoked before any of the analyzers gets invoke

Re: Scalability of Solr Result Grouping/Field Collapsing: Millions/Billions of documents?

2012-08-21 Thread Tirthankar Chatterjee
This wont work, see my thread on Solr3.6 Field collapsing Thanks, Tirthankar -Original Message- From: Tom Burton-West Date: Tue, 21 Aug 2012 18:39:25 To: solr-user@lucene.apache.org Reply-To: "solr-user@lucene.apache.org" Cc: William Dueber; Phillip Farber Subject: Scalability of Solr R

Scalability of Solr Result Grouping/Field Collapsing: Millions/Billions of documents?

2012-08-21 Thread Tom Burton-West
Hello all, We are thinking about using Solr Field Collapsing on a rather large scale and wonder if anyone has experience with performance when doing Field Collapsing on millions of or billions of documents (details below. ) Are there performance issues with grouping large result sets? Details: W

Re: Solr 3.6.1: query performance is slow when asterisk is in the query

2012-08-21 Thread srinalluri
Thanks Jack for your reply. I don't have much documents which have a null field value. I added ReversedWildcardFilterFactory to test the performance improvement only, but that didn't help. What else changes I can do to the fieldType? thanks Srini -- View this message in context: http://luce

Re: Solr 3.6.1: query performance is slow when asterisk is in the query

2012-08-21 Thread Chris Hostetter
: Our environment is Solr 3.6.1. I have the following fieldType. There is a : field called 'body' of this fieldType. When I make a query: q=body:*, it is : talking longer than the expected. What are the changes I need to do to this : fieldType for better query performance? Some other fieldTypes in

Re: Solr search – Tika extracted text from PDF not return highlighting snippet

2012-08-21 Thread anarchos78
Any help? Anyone? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-search-Tika-extracted-text-from-PDF-not-return-highlighting-snippet-tp3999647p4002513.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 3.6.1: query performance is slow when asterisk is in the query

2012-08-21 Thread Jack Krupansky
You could try a "[* TO *]" range query. It will also match all documents which have a non-null field value. So: q=body:[* TO *] Actually, I see that you have reverse wildcard enabled. Try removing that. "*" would normally map to PrefixQuery, which is normally more efficient than a WildcardQue

Solr 3.6.1: query performance is slow when asterisk is in the query

2012-08-21 Thread srinalluri
Our environment is Solr 3.6.1. I have the following fieldType. There is a field called 'body' of this fieldType. When I make a query: q=body:*, it is talking longer than the expected. What are the changes I need to do to this fieldType for better query performance? Some other fieldTypes in our sche

Co-existing solr cloud installations

2012-08-21 Thread Buttler, David
Hi all, I would like to use a single zookeeper cluster to manage multiple Solr cloud installations. However, the current design of how Solr uses zookeeper seems to preclude that. Have I missed a configuration option to set a zookeeper prefix for all of a Solr cloud configuration directories?

Re: Auto commit exception in Solr 4.0 Beta

2012-08-21 Thread Dirk Högemann
Perfect. I reindexed the whole index and everything worked fine. The exception was just a little bit confusing. Best Dirk Am 21.08.2012 14:39 schrieb "Jack Krupansky" : > Did you explicitly run the IndexUpgrader before adding new documents? > > In theory, you don't have to do that, but... who know

Re: SOLR 4 Alpha - distributed DIH available?

2012-08-21 Thread sausarkar
can someone let me know how to configure DIH in a cloud environment, should it point to one specific server or to the load balancer for distributing dih on all the servers. One problem with the load balancer approach is that there is no good way to tell whether a dih is already running in the clou

Re: Different queries for same meaning searches

2012-08-21 Thread Dalius Sidlauskas
Yes, the mm is 100%. Thank you for a detailed answer. Regards! Dalius Sidlauskas On 21/08/12 15:21, Jack Krupansky wrote: Solr doesn't actually "know" any natural language, so it has no way of assessing whether two token streams "have the same meaning." In your case, the surface forms/syntax a

Re: solr finds allways all documents

2012-08-21 Thread Erick Erickson
OK, one other piece of information that would help a lot (or maybe lead you to the answer). Attach &debugQuery=on to the URL and look at the debug information, particularly the parsed query down below. I'm going to guess that you're searching on something that is actually found in nearly all your d

Re: Does DIH commit during large import?

2012-08-21 Thread Shawn Heisey
On 8/21/2012 6:41 AM, Alexandre Rafalovitch wrote: I am doing an import of large records (with large full-text fields) and somewhere around 30 records DataImportHandler runs out of memory (Heap) on a TIKA import (triggered from custom Processor) and does roll-back. I am using store=false and

Re: solr finds allways all documents

2012-08-21 Thread robert rottermann
Thanks Jack, On 08/20/2012 06:41 PM, Jack Krupansky wrote: How are you ingesting the offic documents? SolrCell, or some other method? I am using pytika, a python module that uses Tika to extract the content. I then add it using a python tool called sunburnt. Do you have CopyFields? Yes I ha

Re: Auto commit exception in Solr 4.0 Beta

2012-08-21 Thread Jack Krupansky
Did you explicitly run the IndexUpgrader before adding new documents? In theory, you don't have to do that, but... who knows for sure. While you wait for one of the hard-core Lucene guys to respond, you could try IndexUpgrader, if you haven't already. OTOH, if you are in fact reindexing (rath

Re: Different queries for same meaning searches

2012-08-21 Thread Jack Krupansky
Solr doesn't actually "know" any natural language, so it has no way of assessing whether two token streams "have the same meaning." In your case, the surface forms/syntax are subtly different - two separate terms vs. a single source term with embedded punctuation. It appears that you are probb

Different queries for same meaning searches

2012-08-21 Thread Dalius Sidlauskas
Hello, here is my index and index analyzer configuration: replacement=" "/> Search for "d Osona" and "d’Osona" creates "d" and "osona" tokens. But ParsedQuery is different: #1 "d Osona" +(( DisjunctionMaxQuery((search_definitions:d | search_title:d)) DisjunctionMaxQuery((search_definition

Auto commit exception in Solr 4.0 Beta

2012-08-21 Thread Dirk Högemann
Hello, I am trying to make our search application Solr 4.0 (Beta) ready and elaborate on the tasks necessary to accomplish this. When I try to reindex our documents I get the following exception: auto commit error...:java.lang.UnsupportedOperationException: this codec can only be used for readin

Re: Solr Custom Filter Factory - How to pass parameters?

2012-08-21 Thread Jack Krupansky
Read through the update processor stuff. Maybe that might suggest a good place to put processing that should occur after all input has been analyzed. http://wiki.apache.org/solr/UpdateRequestProcessor -- Jack Krupansky -Original Message- From: ksu wildcats Sent: Tuesday, August 21, 2

RE: Dataimport Handler in solr 3.6.1

2012-08-21 Thread mechravi25
Hi James, Thanks for the suggestions. Actually it is cacheLookup="ent1.id" . had misspelt it. Also, I will be needing the transformers mentioned as there are other columns as well. Actually tried using the 3.5 DIH jars in 3.6.1 and indexed the same and the indexing was successful. But I wanted

Does DIH commit during large import?

2012-08-21 Thread Alexandre Rafalovitch
Hello, I am doing an import of large records (with large full-text fields) and somewhere around 30 records DataImportHandler runs out of memory (Heap) on a TIKA import (triggered from custom Processor) and does roll-back. I am using store=false and trying some tricks and tracking possible memo

RE: Many fields versus join

2012-08-21 Thread Steven Livingstone Pérez
Thanka again Erick. I have read some of Yonik's posts also. I think 1M is closer to my number (i'm more interested in using Solr to improve the quality of search over a limited doc with lots of metadata set than quantity). I'll make sure to stress test. Cheers,/Steven > Date: Tue, 21 Aug 2012 06

Re: How to design index for related versioned database records

2012-08-21 Thread Erick Erickson
Hmmm, how many employees/services/dates are we talking about here? Is the cross product 1M? 1B? 1G records? You could try the Solr join stuff (Solr 4x), be aware that it performs best on join fields with a limited number of unique values. Best Erick On Tue, Aug 21, 2012 at 4:05 AM, Stefan Burkar

Re: Many fields versus join

2012-08-21 Thread Erick Erickson
Steven: Nope, I don't have any benchmarks off the top of my head. You could probably compare this pretty quickly by using one of the benchmarking tools (http://wiki.apache.org/solr/BenchmarkingSolr) jMeter works as well, using two different schemas and configuring, say, an edismax request handler

Re: How to synchronize Solr import processes?

2012-08-21 Thread Gora Mohanty
On 21 August 2012 15:15, prasad deshpande wrote: > Use a lunar calendar to synchronise them with the phase of the moon. Regards, Gora P.S. You might wish to clarify what you mean: Synchronise Solr import processes with what?

Re: How to design index for related versioned database records

2012-08-21 Thread Stefan Burkard
Hi Jack Thanks for your answer. Do I understand that correctly that I must create a "merge-entity" that contains all the different validFrom/validUntil dates as fields (and of course the other search-related fields). This would mean that the number of index entries is equal to the number of all p

RE: Many fields versus join

2012-08-21 Thread Steven Livingstone Pérez
Many Thanks Erick. Are you aware of any real world metrics or best practice/pattern samples that use a lot of fields? I'm looking to get an ideas of the pros/cons as I scale. On what you're saying it defo looks like I'll try keeping a flat structure (which means perhaps 300 fields) but given some

Re: mergeindex: what happens if there is deletion during index merging

2012-08-21 Thread Yandong Yao
Hi Shalin, Thanks very much for your detailed explanation! Regards, Yandong 2012/8/21 Shalin Shekhar Mangar > On Tue, Aug 21, 2012 at 8:47 AM, Yandong Yao wrote: > > > Hi guys, > > > > From http://wiki.apache.org/solr/MergingSolrIndexes, it said 'Using > > "srcCore", care is taken to ensure

Re: mergeindex: what happens if there is deletion during index merging

2012-08-21 Thread Shalin Shekhar Mangar
On Tue, Aug 21, 2012 at 8:47 AM, Yandong Yao wrote: > Hi guys, > > From http://wiki.apache.org/solr/MergingSolrIndexes, it said 'Using > "srcCore", care is taken to ensure that the merged index is not corrupted > even if writes are happening in parallel on the source index'. > > What does it mea