Volatile spellcheck index

2014-02-05 Thread Alejandro Marqués Rodríguez
Hi, I'm having a problem with the spell check index building. I've configured the spell checker component to have the index built on optimize. * !-- Spell Check http://wiki.apache.org/solr/SpellCheckComponent http://wiki.apache.org/solr/SpellCheckComponent --* * searchComponent name=spellcheck

Re: need help in understating solr cloud stats data

2014-02-05 Thread Ramkumar R. Aiyengar
We have had success with starting up Jolokia in the same servlet container as Solr, and then using its REST/Bulk API to JMX from the application of choice. On 4 Feb 2014 17:16, Walter Underwood wun...@wunderwood.org wrote: I agree that sorting and filtering stats in Solr is not a good idea.

Disable searching on ddm tika metadata

2014-02-05 Thread Mauro Gregorio Binetti
Hi everybody, I'm a newbie and I'm working on searching performance in a project withou any type of documentation. I think searching is very slow because of the presence of all tika metadata, what do you think about it? I'm trying to disable this searching in al of these technical fields to test

Re: Disable searching on ddm tika metadata

2014-02-05 Thread Alexandre Rafalovitch
Did you reindex? Also, how are you submitting data? Are you using ExtractingRequestHandler (defined in your solrconfig.xml)? If so, there is already a mechanism for that. Just search for ignored in the documentation: http://wiki.apache.org/solr/ExtractingRequestHandler . Regards, Alex.

Re: Disable searching on ddm tika metadata

2014-02-05 Thread Jack Krupansky
Run some test queries with the debug=true parameter and check the timing section of the response to see what search components are consuming the time. Highlighting of large documents can be very slow, for example. Or, if you return the full text of a document, the raw size can slow the response

Re: Import data from mysql to sold

2014-02-05 Thread rachun
Hi gurus, after I got Solr DIH work with mysql now I need to make figure out how to make it work with MongoDB. I have been searching for all day but I don't have luck with it. So could anyone please suggest me any site for my solution? Thank you very much, Chun. -- View this message in

Re: Disable searching on ddm tika metadata

2014-02-05 Thread Mauro Gregorio Binetti
I'm submitting data via Liferay that uses Apache Lucene/Solr for searching feature. Nothing is done directly on Solr. solrconfig.xml is actually done in this way: requestHandler name=/update/extract startup=lazy class=solr.extraction.ExtractingRequestHandler

Re: Import data from mysql to sold

2014-02-05 Thread Jack Krupansky
I think this is the first time I have seen a request for import from MongoDB on this list. Do a Google search for solr and mongodb and you will find a bunch of links right away. Are you not seeing these? There is something called Mongo Connector, but it uses the push model, as opposed to the

Re: Disable searching on ddm tika metadata

2014-02-05 Thread Mauro Gregorio Binetti
Hi Jack, please can you give me some other details? Are you referring to a tool in particular? Mauro On Wed, Feb 5, 2014 at 12:20 PM, Jack Krupansky j...@basetechnology.comwrote: Run some test queries with the debug=true parameter and check the timing section of the response to see what

Re: Solr Searching Issue

2014-02-05 Thread Toke Eskildsen
On Wed, 2014-02-05 at 08:17 +0100, Sathya wrote: I am running single instance solr and the JVM heap space is minimum 6.3gb and maximum 24.31gb. Nothing is running to complete the 24gb except tomcat server. I have only 2 copyField entries only. Your Xmx is the same size as your RAM. It should

geofilt customization

2014-02-05 Thread Sohan Kalsariya
I am using geofilt() to filter my results according to my location. Now I don't want to filter the results by the last parameter. i.e.= I don't need the last - distance parameter. that means i want results from all over the world. How should i get it? Regards, *Sohan Kalsariya*

Re: Disable searching on ddm tika metadata

2014-02-05 Thread Jack Krupansky
Simply post to this mail list the timing section of the query response for a test query that you feel is too slow, but be sure to add the debug=true parameter (or debug=timing.) -- Jack Krupansky -Original Message- From: Mauro Gregorio Binetti Sent: Wednesday, February 5, 2014 6:44

Join Scoring

2014-02-05 Thread anand chandak
Hi Why doesn't the solr join query doesn't return the score when returning the response. Although I see JoinScorer in the JoinQParserPlugin class ? Also, to evaluate the join performance, I filed a join query aganist solr's join - JoinQParserPlugin and aganist lucene

Re: Max Limit to Schema Fields - Solr 4.X

2014-02-05 Thread Mike L.
Thanks Shawn. This is good to know. Sent from my iPhone On Feb 5, 2014, at 12:53 AM, Shawn Heisey s...@elyograg.org wrote: On 2/4/2014 8:00 PM, Mike L. wrote: I'm just wondering here if there is any defined limit to how many fields can be created within a schema? I'm sure the

Re: SolrCloud query results order master vs replica

2014-02-05 Thread M. Flatterie
Good morning, so based on your answer, there is no garantee that the results will be the same from one replica to the other. I ran the queries in debug mode and I see... MASTER 321240: \n1.7046129 = (MATCH) weight(prod_doc:tylenol in 20206) [DefaultSimilarity], result of:\n 1.7046129 =

RE: Volatile spellcheck index

2014-02-05 Thread Dyer, James
Alejandro, Assuming you're using Solr 3.x, under: searchComponent name=spellcheck class=solr.SpellCheckComponent lst name=spellchecker ... /lst /searchComponent ...you can add: str name=spellcheckIndexDir./spellchecker/str ...then the spell check index will be created on-disk and not in

expungeDeletes vs optimize

2014-02-05 Thread Bryan Bende
Does calling commit with expungeDeletes=true result in a full rewrite of the index like an optimize does? or does it only merge away the documents that were deleted by commit? Every two weeks or so we run a process to rebuild our index from the original documents resulting in a large amount of

Re: Volatile spellcheck index

2014-02-05 Thread Alejandro Marqués Rodríguez
Thanks for the answer James. My fault not specifying the Solr version, we are working with solr 4.5. Anyway, thank you very much for pointing the change to DirectSolrSpellChecker. I hadn't even realized that change, and I think I wasn't using it, as the line str

Re: Disable searching on ddm tika metadata

2014-02-05 Thread Jack Krupansky
I’m not interested in the log (although maybe somebody else can spot something there) – it’s the query response that is returned on your query HTTP request (XML or JSON.) The specific parameter to add to your HTTP query request is “debug=true”. -- Jack Krupansky From: Mauro Gregorio Binetti

How to merge collections split across multiple shards?

2014-02-05 Thread izomorfizm
--- Brief overview of the setup: --- 5 x SolrCloud (Solr 4.6.1) node instances (separate machines) The setup is intended to store last 48 hours webapp logs (which are pretty intense... ~ 3MB/sec) logs

Optimize and replication: some questions battery.

2014-02-05 Thread Luis Cappa Banda
Hello! I've got an scenario where I index very frequently on master servers and replicate to slave servers with one minute polling. Master indexes are growing fast and I would like to optimize indexes to improve search queries. However... 1. During an optimize operation, can master servers index

Re: weird exception on update

2014-02-05 Thread Dmitry Kan
Hi Hoss, Thanks for replying. I have created a jira: https://issues.apache.org/jira/browse/SOLR-5697 It contains the required configs (actually a shard) and a query parser maven project. These illustrate the issue. I had to omit the solr.war from the webapps of the shard as it exceeded the upload

Re: high memory usage with small data set

2014-02-05 Thread Johannes Siegert
Hi Erick, thanks for your reply. What do you exactly mean with Do your used entries in your caches increase in parallel?? I update the indices every hour and commit the changes. So a new searcher with empty or autowarmed caches should be created and the old one should be removed.

Re: SolrCloud query results order master vs replica

2014-02-05 Thread Chris Hostetter
: Just to make sure I interpret the results correctly: : - they all have a score of 1.7046129 : - the order they are presented in is therefore not related to the score, : it is just the order in which the data is internally stored (like an SQL : SELECT statement without ORDER BY clause) The

Solr4 performance

2014-02-05 Thread Joshi, Shital
Hi, We have SolrCloud cluster (5 shards and 2 replicas) on 10 dynamic compute boxes (cloud). We're using local disk (/local/data) to store solr index files. All hosts have 60GB ram and Solr4 JVM are running with max 30GB heap size. So far we have 470 million documents. We are using custom

Re: Optimize and replication: some questions battery.

2014-02-05 Thread Chris Hostetter
: I've got an scenario where I index very frequently on master servers and : replicate to slave servers with one minute polling. Master indexes are : growing fast and I would like to optimize indexes to improve search : queries. However... For a scenerio where your index is changing that

Problem querying large StrField?

2014-02-05 Thread Luis Lebolo
Hi All, It seems that I can't query on a StrField with a large value (say 70k characters). I have a Solr document with a string type: fieldType name=string class=solr.StrField sortMissingLast=true/ and field: dynamicField name=someFieldName_* type=string indexed=true stored=true / Note

Re: Disable searching on ddm tika metadata

2014-02-05 Thread Mauro Gregorio Binetti
Ok... I get it. I understood what you mean. But I have some troubles in encoding query logged in the application to submit to Solr Admin:

Re: Problem querying large StrField?

2014-02-05 Thread Luis Lebolo
Update: It seems I get the bad behavior (no documents returned) when the length of a value in the StrField is greater than or equal to 32,767 (2^15). Is this some type of bit overflow somewhere? On Wed, Feb 5, 2014 at 12:32 PM, Luis Lebolo luis.leb...@gmail.com wrote: Hi All, It seems that I

UTF-8 encoding problems while replicating an index using SolrCloud

2014-02-05 Thread Ugo Matrangolo
Hi, we are having problems with an installation of SolrCloud where a leader node kicks off an indexing and tries to replicate all the updates using the UpdateHandler. What we get instead is an error around a wrong UTF-8 encoding from the leader trying to call the /udpate endpoint on the replica:

Re: Problem querying large StrField?

2014-02-05 Thread Yonik Seeley
On Wed, Feb 5, 2014 at 1:04 PM, Luis Lebolo luis.leb...@gmail.com wrote: Update: It seems I get the bad behavior (no documents returned) when the length of a value in the StrField is greater than or equal to 32,767 (2^15). Is this some type of bit overflow somewhere? I believe that's the

Re: Problem querying large StrField?

2014-02-05 Thread Chris Hostetter
: Update: It seems I get the bad behavior (no documents returned) when the : length of a value in the StrField is greater than or equal to 32,767 : (2^15). Is this some type of bit overflow somewhere? IIRC there is a limit in the lower level lucene code to how many bytes a single term can be --

Re: UTF-8 encoding problems while replicating an index using SolrCloud

2014-02-05 Thread David Santamauro
I had that same error. I cleared it up by commenting out all the /update/xxx handlers and changing /update class to solr.UpdateRequestHandler Hope that helps David On 02/05/2014 01:37 PM, Ugo Matrangolo wrote: Hi, we are having problems with an installation of SolrCloud where a leader

Re: Java 7u51 and Guava... Does it affect Solr?

2014-02-05 Thread Mark Miller
Based on our current use of it and the nature of the issue, I don’t think we have anything to worry about. - Mark http://about.me/markrmiller On Jan 27, 2014, 9:52:05 PM, Shawn Heisey s...@elyograg.org wrote: The Internet is buzzing about the change in Java 7u51 that breaks Google Guava.

Re: SolrCloud fails to create new collections

2014-02-05 Thread Ray Cheng
Some more information that may help developers find out the cause. INFO 2014-02-04 14:47:08,931 DistributedQueue.java (line 211) Watcher fired on path: /overseer/collection-queue-work state: SyncConnected type NodeChildrenChanged ... (there were 393 of these Watcher fired on path lines in

Re: SolrCloud query results order master vs replica

2014-02-05 Thread M. Flatterie
Thank you Sir for that confirmation! Nic On Wed, 2/5/14, Chris Hostetter hossman_luc...@fucit.org wrote: Subject: Re: SolrCloud query results order master vs replica To: solr-user@lucene.apache.org Received: Wednesday, February 5, 2014, 11:33 AM

Re: Disable searching on ddm tika metadata

2014-02-05 Thread Jack Krupansky
(Gulp!) You could also set the debug parameter (temporarily) in the defaults section of your query request handler. But you still need to dump the text of the query response. -- Jack Krupansky -Original Message- From: Mauro Gregorio Binetti Sent: Wednesday, February 5, 2014 12:47

Re: Disable searching on ddm tika metadata

2014-02-05 Thread Mohit Sinha
Hi, if you wish to execute the query at solr admin you can do so by appending it to the select handler hostname:port/solr instance name/core name/select?query eg for localhost at 8080 with solr instance name solr-4.4 core name test the select query would be

RE: expungeDeletes vs optimize

2014-02-05 Thread Petersen, Robert
Hi Bryan, From what I've seen it will only get rid of the deletes in the segments that the commit merged and there will be some residual deleted docs still in the index. It doesn't do the full rewrite. Even if you play with merge factors etc, you'll still have lint. In your situation I'd

Default core for updates in multicore setup

2014-02-05 Thread Tom Burton-West
Hello, I'm running the example setup for Solr 4.6.1. In the ../example/solr/ directory, I set up a second core. I wanted to send updates to that core. I looked at .../exampledocs/post.sh and expected to see the URL as: URL= http://localhost:8983/solr/collection1/update However it does

4.3.1 SC - IndexWriter issues causing replication + failures

2014-02-05 Thread Tim Vaillancourt
Hey guys, I am troubleshooting an issue on a 4.3.1 SolrCloud: 1 collection and 2 shards over 4 Solr instances, (which results in 1 core per Solr instance). After some time in Production without issues, we are seeing errors related to the IndexWriter all over our logs and an infinite loop of

[REMINDER] ApacheCon NA 2014 Travel Assistance Applications Due Feb 7

2014-02-05 Thread Chris Hostetter
(NOTE: cross posted, if you feel the need to reply, please keep it on general@lucene) As a reminder, Travel Assistance Applications for ApacheCon NA 2014 are due on Feb 7th (about 48 hours from now) Details are below, please note that if you have any questions about this program or the

Re: Default core for updates in multicore setup

2014-02-05 Thread Chris Hostetter
: I then tried to locate some config somewhere that would specify that the : default core would be collection1, but could not find it. in the older style solr.xml, you can specify a defaultCoreName. Moving forward, relying on the default core name is discouraged (and will hopefully be removed

Partial Word Search

2014-02-05 Thread Teague James
I cannot get Solr 4.6.0 to do partial word search on a particular field that is used for faceting. Most of the information I have found suggests modifying the fieldType text to include either the NGramFilterFactory or EdgeNGramFilterFactory in the filter. However since I am copying many other

Re: SOLR suggester with highlighting

2014-02-05 Thread Areek Zillur
This can be achieved using payloads in the suggester dictionary. The suggester based on spellcheck component does not support payloads in dictionary. You can use the new suggester component ( https://issues.apache.org/jira/browse/SOLR-5378), which allows you to highlight and return payloads. The

Re: Default core for updates in multicore setup

2014-02-05 Thread Tom Burton-West
Thanks Hoss, hardcoded default of collection1 is still used for backcompat when there is no defaultCoreName configured by the user. Aha, it's hardcoded if there is nothing set in a config. No wonder I couldn't find it by grepping around the config files. I'm still trying to sort out the old

Re: Partial Word Search

2014-02-05 Thread Jack Krupansky
1. The ngramming occurs in the index, but does not modify the original, stored value that a query will return. So, Example will be returned even though the index will have all the sub-terms indexed (but not stored.) 2. You need the ngram filters to be asymmetric with regard to indexing and

Re: Default core for updates in multicore setup

2014-02-05 Thread Jack Krupansky
Tom, I did make an effort to sort out both the old and newer solr.xml features in my Solr 4.x Deep Dive e-book. -- Jack Krupansky -Original Message- From: Tom Burton-West Sent: Wednesday, February 5, 2014 5:56 PM To: solr-user@lucene.apache.org Subject: Re: Default core for updates

Re: Disable searching on ddm tika metadata

2014-02-05 Thread Mauro Gregorio Binetti
Ahahhaha gulp is really funny :) Back to us... Do you mean modifying solrconfig.xml? Mauro Il giorno 05/feb/2014 20:45, Jack Krupansky j...@basetechnology.com ha scritto: (Gulp!) You could also set the debug parameter (temporarily) in the defaults section of your query request handler. But

Re: Disable searching on ddm tika metadata

2014-02-05 Thread Mauro Gregorio Binetti
Yes Mohit... What you said is what I tried but I have found really big problems in appending query with a correct syntax starting from the tracing I have posted in the last mail... Any suggestion about encoding? Il giorno 05/feb/2014 20:55, Mohit Sinha msinha1...@gmail.com ha scritto: Hi, if

Re: Disable searching on ddm tika metadata

2014-02-05 Thread Jack Krupansky
Yes. Look at the example solrconfig.xml for a section labeled defaults for the /select request handler. You should see df as one parameter. Just copy that and change df to debug and change the field name to true. -- Jack Krupansky -Original Message- From: Mauro Gregorio Binetti

RE: SolrCloud multiple data center support

2014-02-05 Thread Darrell Burgan
Let's say I was primarily interested in ensuring there is a DR copy of the search index that is replicated to the remote data center, but I do not want the Solr instances in the remote data center to be part of the SolrCloud cluster, and that I am willing to accept some downtime in bringing up

Re: Import data from mysql to sold

2014-02-05 Thread rachun
Hi Jack, Thank you very much for your reply. I've read that article already but it seem not what i am looking for. I'm not sure if it's possible to do what i want or not. Now I can import all data from mysql to Solr but my boss just want me to find the way to import from MongoDB into solr.

Re: Import data from mysql to sold

2014-02-05 Thread Jack Krupansky
It appears that at this moment the best approach would be to write a Java program that reads from MongoDB and writes to Solr (Solr XML update requests.) Or, write a program that reads from MongDB and outputs a CSV format text file and then import that directly into Solr. -- Jack Krupansky

Optimize Index in solr 4.6

2014-02-05 Thread Sesha Sendhil Subramanian
Hi, I am running solr cloud with 10 shards. I do a batch indexing once everyday and once indexing is done I call optimize. I see that optimize happens on each shard one at a time and not in parallel. Is it possible for the optimize to happen in parallel? Each shard is on a separate box. Thanks

Re: Import data from mysql to sold

2014-02-05 Thread rachun
I agree with you. I finally have another solution for this problem we will import all data directly from mysql instead. Thank you for all comments, keep sharing keep learning :) _/|\_ Chun. -- View this message in context:

RE: Indexing multiple files in Lucene Solr

2014-02-05 Thread Anita Nair (UST, IND)
Hi Solr team, I am trying to look for a solution for indexing multiple text files with the same unique key, in Lucene. Is there a way to do this? Saw a posting in the mail archives (below), but I wonder if a solution was given to the users. Kindly respond , Anita Nair

Re: Indexing multiple files in Lucene Solr

2014-02-05 Thread Alexandre Rafalovitch
Sure you can, Use overwrite flag in your update messages, as per http://wiki.apache.org/solr/UpdateXmlMessages Or have a different key (e.g. signature) nominated as your unique key. The issue is if you just allow duplicates all together, do you want to be able to delete a particular single

Re: Join Scoring

2014-02-05 Thread anand chandak
Resending, if somebody can please respond. Thanks, Anand On 2/5/2014 6:26 PM, anand chandak wrote: Hi, Having a question on join score, why doesn't the solr join query return the scores. Looking at the code, I see there's JoinScorer defined in the JoinQParserPlugin class ? If its not