Re: Pointing to Hbase for Docuements or Directly Saving Documents at Hbase

2013-04-17 Thread adfel70
Any rule of thumb regarding the size of document limitation when storing it in solr? Otis Gospodnetic-5 wrote Use Solr. It's pretty clear you don't yet have any problems that would make you think about alternatives. Using Solr to store and not just index will make your life simpler (and

Re: CloudSolrServer vs ConcurrentUpdateSolrServer for indexing

2013-04-17 Thread rulinma
you can use multithread. for fast , you also can cal(general hash algrothim) solrserver to add docs. -- View this message in context: http://lucene.472066.n3.nabble.com/CloudSolrServer-vs-ConcurrentUpdateSolrServer-for-indexing-tp4055772p4056606.html Sent from the Solr - User mailing list

Re: Push/pull model between leader and replica in one shard

2013-04-17 Thread Furkan KAMACI
Hej Mark; What did you use to prepare your presentation, its really nice. 2013/4/17 Furkan KAMACI furkankam...@gmail.com Really nice presentation. 2013/4/17 Mark Miller markrmil...@gmail.com On Apr 16, 2013, at 1:36 AM, SuoNayi suonayi2...@163.com wrote: Hi, can someone explain more

Re: Function Query performance in combination with filters

2013-04-17 Thread Rogalon
Rogalon wrote Am 16. April 2013 um 14:46 schrieb Yonik Seeley-4 [via Lucene] lt; ml-node+s472066n4056299h21@.nabble gt;: On Tue, Apr 16, 2013 at 7:51 AM, Rogalon [hidden email] wrote: Hi, I am using pretty complex function queries to completely customize (not only boost) the score

RE: Document Missing from Share in Solr cloud

2013-04-17 Thread Cool Techi
Field type is string and this has happened for multiple docs over the past week. Regards, Ayush Date: Tue, 16 Apr 2013 14:06:40 -0600 Subject: Re: Document Missing from Share in Solr cloud From: thelabd...@gmail.com To: solr-user@lucene.apache.org btw ... what is the field type of your

Re: Solr using a ridiculous amount of memory

2013-04-17 Thread John Nielsen
I managed to get this done. The facet queries now facets on a multivalue field as opposed to the dynamic field names. Unfortunately it doesn't seem to have done much difference, if any at all. Some more information that might help: The JVM memory seem to be eaten up slowly. I dont think that

Re: Document Missing from Share in Solr cloud

2013-04-17 Thread Upayavira
Well, your numdocs *is* the same. Your maxdocs isn't, which sounds right to me. maxdocs is the number of documents, including deleted ones. Given deleted docs are purged by background merges, it makes sense that each index is deciding differently when to do those merges. But the number of

Re: CloudSolrServer vs ConcurrentUpdateSolrServer for indexing

2013-04-17 Thread jmozah
Sorry.. i didn't understand that... did you mean to configure CloudSolrServer with general hash algorithm? ./zahoor On 17-Apr-2013, at 1:06 PM, rulinma ruli...@gmail.com wrote: you also can cal(general hash algrothim) solrserver to add docs.

Re: first time with new keyword, solr take to much time to give the result

2013-04-17 Thread Duncan Irvine
tl;dr: retrieving 10,000 docs is a bad idea. Look into docValues for storing security info I suspect that you'll be better served by keeping the permissions up-to-date in solr and invalidating the caches rather than trying to return 10,000 docs. On average, you'll be attempting to read up to

RE: Document Missing from Share in Solr cloud

2013-04-17 Thread Cool Techi
Sorry, made a copy paste mistake. The numbers are different. My cloud has two shards with each shard having 1 replica. One of the shards and replica have the same number of docs, while in the other shard there is a mismatch. Regards, Ayush From: u...@odoko.co.uk To:

Max http connections in CloudSolrServer

2013-04-17 Thread J Mohamed Zahoor
Hi I am pumping parallel select queries using CloudSolrServer. It looks like it can handle only certain no of max connections... my Question is, How many concurrent queries does a CloudSolrServer can handle? An old thread tries to answer this by asking to give our own instance of

RE: Solr using a ridiculous amount of memory

2013-04-17 Thread Toke Eskildsen
John Nielsen [j...@mcb.dk] wrote: I managed to get this done. The facet queries now facets on a multivalue field as opposed to the dynamic field names. Unfortunately it doesn't seem to have done much difference, if any at all. I am sorry to hear that. documents = ~1.400.000 references

Re: Document Missing from Share in Solr cloud

2013-04-17 Thread Annette Newton
I have just experienced the same thing on 4.2.1. 4 Shards - each with 2 replicas. Did some bulk loading and all but one Shard match up. Small discrepancy between the replicas, but no obvious errors either. Will be doing further loading shortly and will report findings. Regards. Netty. On 17

RE: Scaling Solr on VMWare

2013-04-17 Thread adfel70
Hi We are currently considering running solr cloud on vmware. Di you have any insights regarding the issue you encountered and generally regarding using virtual machines instead of physical machines for solr cloud? Frank Wennerdahl wrote Hi Otis and thanks for your response. We are indeed

Re: Scaling Solr on VMWare

2013-04-17 Thread Peter Sturge
Hi, We have run solr in VM environments extensively (3.6 not Cloud, but the issues will be similar). There are some significant things to be aware of when running Solr in a virtualized environment (these can be equally true with Hyper-V and Xen as well): If you're doing heavy indexing, the

Re: Document adds, deletes, and commits ... a question about visibility.

2013-04-17 Thread Erick Erickson
Personally I've never heard of a 500 document limit, I routinely use 1,000 doc batches (relatively small documents). Possibly your co-worker exceeded the packet size or some other outside-solr limitation? Erick On Mon, Apr 15, 2013 at 6:06 PM, Michael McCandless luc...@mikemccandless.com wrote:

Re: Solr 4.2 Startup Detects Corrupt Log And is Really Slow to Start

2013-04-17 Thread Erick Erickson
How big are you transaction logs? They can be replayed on startup. They are truncated and a new one started when you do a hard commit (openSearcher true or false doesn't matter). So a quick test of this theory would be to just stop your indexing process, issue a hard commit on all your cores and

Re: JavaScript transform switch statement during Data Import

2013-04-17 Thread paulblyth
Sorry for not providing enough details initially. You're right, it's difficult for me to share the real code but let me try and give you an example. dataConfig xi:include href=mydatasource.xml xmlns:xi=http://www.w3.org/2001/XInclude/ document

Re: Solr using a ridiculous amount of memory

2013-04-17 Thread John Nielsen
I am surprised about the lack of UnInverted from your logs as it is logged on INFO level. Nope, no trace of it. No mention either in Logging - Level from the admin interface. It should also be available from the admin interface under collection/Plugin / Stats/CACHE/fieldValueCache. I never

Re: JavaScript transform switch statement during Data Import

2013-04-17 Thread paulblyth
That post lost a lot of formatting. Please find attached instead. db-data-config.xml http://lucene.472066.n3.nabble.com/file/n4056649/db-data-config.xml -- View this message in context:

Solr Example, Multi Word Search issue

2013-04-17 Thread zeroeffect
Version 4.2.0 collection1 example I currently have indexed over 1.5 million html files, with more to come. Here is an issue I am running into, if I search the word mayor I get a great list of results. Now if I search the word bing I get results. Searching the words together mayor bing with

Pattern Tokenizer Factory not working with negation regular expression

2013-04-17 Thread meghana
Hi, I need my tokenizer factory , to split on everything expect numbers , letters , '' , ':' and single quote character. I use 'PatternTokenizerFactory' as below, tokenizer class=solr.PatternTokenizerFactory pattern=[^a-zA-Z0-9amp;-:] / but, its spiting tokens by space only . not sure what I

Re: Pattern Tokenizer Factory not working with negation regular expression

2013-04-17 Thread Jack Krupansky
Hyphen indicates as character range (as in a-z), so if you want to include a hyphen as a character, escape it with a single backslash. -- Jack Krupansky -Original Message- From: meghana Sent: Wednesday, April 17, 2013 7:58 AM To: solr-user@lucene.apache.org Subject: Pattern Tokenizer

Re: JavaScript transform switch statement during Data Import

2013-04-17 Thread Gora Mohanty
On 17 April 2013 17:10, paulblyth blythy_...@hotmail.com wrote: That post lost a lot of formatting. Please find attached instead. db-data-config.xml http://lucene.472066.n3.nabble.com/file/n4056649/db-data-config.xml I do not see how this could be working in either case. Your select statement

Re: Pattern Tokenizer Factory not working with negation regular expression

2013-04-17 Thread meghana
Jack Krupansky-2 wrote Hyphen indicates as character range (as in a-z), so if you want to include a hyphen as a character, escape it with a single backslash. -- Jack Krupansky -Original Message- From: meghana Sent: Wednesday, April 17, 2013 7:58 AM To: solr-user@.apache

Re: Why filter query doesn't use the same query parser as the main query?

2013-04-17 Thread Yonik Seeley
On Tue, Apr 16, 2013 at 9:44 PM, Roman Chyla roman.ch...@gmail.com wrote: Is there some profound reason why the defType is not passed onto the filter query? defType is a convenience so that the main query parameter q can directly be the user query (without specifying it's type like edismax).

RE: Solr using a ridiculous amount of memory

2013-04-17 Thread Toke Eskildsen
John Nielsen [j...@mcb.dk]: I never seriously looked at my fieldValueCache. It never seemed to get used: http://screencast.com/t/YtKw7UQfU That was strange. As you are using a multi-valued field with the new setup, they should appear there. Can you find the facet fields in any of the other

Re: updateLog in Solr 4.2

2013-04-17 Thread vicky desai
If updateLog tag is manadatory than why is it given as a parameter in solrconfig.xml . I mean by default it should be always writing update logs in my data directory even if I dont use updateLog parameter in config file. Also the same config file works for solr 4.0 but not solr 4.2 I will be

Re: JavaScript transform switch statement during Data Import

2013-04-17 Thread paulblyth
Hi Gora, Please forgive the typo. This is merely a simplified example to illustrated the scenario (if/else and switch) we're trying to achieve; although the values have been changed the if/else and switch statements remain as is. The fact that the switch statement should work is the problem - it

RE: Solr using a ridiculous amount of memory

2013-04-17 Thread Toke Eskildsen
Whopps. I made some mistakes in the previous post. Toke Eskildsen [t...@statsbiblioteket.dk]: Extrapolating from 1.4M documents and 180 clients, let's say that there are 1.4M/180/5 unique terms for each sort-field and that their average length is 10. We thus have 1.4M*log2(1500*10*8) +

Doubts about solr stats component

2013-04-17 Thread kannan rbk
Hi Team, I am using solr for indexing data. I need some statistics information like max , min , stddev from indexed data. I read about `SolrStatsComponent` and I used this too. I read this line on `apache_solr_4_cookbook.pdf` Please be careful when using this component on the multivalued

Re: SolR InvalidTokenOffsetsException with Highlighter and Synonyms

2013-04-17 Thread Dmitry Kan
Hi, If you are not afraid of looking into the code, you could trace and possibly fix this. Remember to commit a patch :) Another (easier?) way is to compile a repeatable test and file a Jira. Dmitry On Tue, Apr 16, 2013 at 4:12 PM, juancesarvillalba juancesarvilla...@gmail.com wrote: Hi,

Re: updateLog in Solr 4.2

2013-04-17 Thread Mark Miller
On Apr 17, 2013, at 9:17 AM, vicky desai vicky.de...@germinait.com wrote: If updateLog tag is manadatory than why is it given as a parameter in solrconfig.xml Because its not mandatory. - Mark

Re: first time with new keyword, solr take to much time to give the result

2013-04-17 Thread Montu v Boda
Hi Thanks For your reply. we will try to index the permission in solr and add the filter query and try to get optimum(150 or 100 rows) in result from the solr. and in future we will try with SSD as well. Thanks to all For such a great response. Thanks Regards Montu v Boda -- View this

Re: using maven to deploy solr on tomcat

2013-04-17 Thread jnduan
hi Adeel, I have use solr with maven since 2011,and my dependency is not solr but solr-core and some other dependencies . therefore,my project structure is just like unpack the solr.war file with out the dir 'WEB-INF/lib'. So I can write some code work with solr ,e.g. a listener set up system

Re: Rejecting document already existing in different shard.

2013-04-17 Thread Dmitry Kan
Hi, Although we use logical sharding, there are cases in our environment as you described. We handle them manually: 0. prepare new version of a document 1. remove the old version of the document 2. post it and commit With logical sharding it is relatively easy, but we do need to store location

Re: Solr Example, Multi Word Search issue

2013-04-17 Thread Alexandre Rafalovitch
How are you searching? From WebUI Admin or from a client? If from a client, check number of rows being returned. For example SolrNet asks for 2 rows unless overruled (to force you being explicit about your paging), so you could be stuck on results serialization/deserialization. Try searching

Re: Push/pull model between leader and replica in one shard

2013-04-17 Thread Mark Miller
Thanks, the earlier presentation is done with KeyNote and the later (more animation) is done with Tumult Hype. - Mark On Apr 17, 2013, at 3:43 AM, Furkan KAMACI furkankam...@gmail.com wrote: Hej Mark; What did you use to prepare your presentation, its really nice. 2013/4/17 Furkan

Re: updateLog in Solr 4.2

2013-04-17 Thread Jack Krupansky
updateLog is not mandatory in general for Solr, but it is mandatory for cloud mode, right? Solrconfig mentions solr cloud replica recovery, but doesn't explicitly say that's a required part of cloud mode. Maybe just a little clarification in Solrconfig would help, like solr cloud replica

Re: using maven to deploy solr on tomcat

2013-04-17 Thread Adeel Qureshi
okay this looks promising. I will give it a try and let you know how it goes. Thanks On Wed, Apr 17, 2013 at 9:19 AM, jnduan jnd...@gmail.com wrote: hi Adeel, I have use solr with maven since 2011,and my dependency is not solr but solr-core and some other dependencies . therefore,my project

Re: dataimporter.last_index_time SolrCloud

2013-04-17 Thread jimtronic
Is this a bug? I can create the ticket in Jira if it is, but it's not clear to me what should be happening. I noticed that if it is using the value set in the home directory, but that value does not get updated, so my imports get slower and slower. I guess I could create a cron job to update

Re: Solr 4.2 Startup Detects Corrupt Log And is Really Slow to Start

2013-04-17 Thread Umesh Prasad
Thanks Erick. Couple of Questions : Our transaction logs are huge as we have disabled auto commit. The biggest one is 6.1 GB. *567M*autosuggest/data/tlog *22M* avmediaCore/data/tlog *388M*booksCore/data/tlog *4.9G * books/data/tlog *6.1G * mp3-downloads/data/tlog ( 150 % of index

Re: Solr Example, Multi Word Search issue

2013-04-17 Thread Otis Gospodnetic
Hi You probably AND them by default. Look at your mm value of default boolean operator setting in solrconfig.xml http://search-lucene.com/?q=mm+default+boolean+operatorfc_project=Solr Otis Solr ElasticSearch Support http://sematext.com/ On Apr 17, 2013 7:43 AM, zeroeffect

Re: Why filter query doesn't use the same query parser as the main query?

2013-04-17 Thread Roman Chyla
Makes sense, thanks. One more question. Shouldn't there be a mechanism to define a default query parser? something like (inside QParserPlugin): public static String DEFAULT_QTYPE = default; // now it is LuceneQParserPlugin.NAME; public static final Object[] standardPlugins = {

Re: Master slave replication with digest authentication

2013-04-17 Thread Shawn Heisey
On 4/17/2013 1:20 AM, Maciej Pestka wrote: Hi, I've configured basic authentication on tomcat my slave solr instance and it works. Any idea how to configure slave to replicate properly with digest authentication? on Solr WIKI I could find only basic authentication example:

Re: Why filter query doesn't use the same query parser as the main query?

2013-04-17 Thread Upayavira
You specify it as a default parameter for a requestHandler in your solrconfig.xml, giving a default value for defType. Not sure that you can set a default that will cover filter queries too. Upayavira On Wed, Apr 17, 2013, at 05:46 PM, Roman Chyla wrote: Makes sense, thanks. One more question.

Re: Why filter query doesn't use the same query parser as the main query?

2013-04-17 Thread Erik Hatcher
True, you cannot currently specify a default (other than the trick Roman showed earlier) query parser for fq parameters. I think of the bulk of my fq's in the form of fq={!term f=facet_field}value so setting a default term query parser for fq's wouldn't really help me exactly, as it needs an

Re: Solr 4.2 Startup Detects Corrupt Log And is Really Slow to Start

2013-04-17 Thread Shawn Heisey
On 4/17/2013 10:29 AM, Umesh Prasad wrote: We use DIH and have turned off the Auto commit because we have to sometimes build index from Scratch (clean=true) and we not want to Our master server sees a lot of restarts, sometimes 2-3 times a day. It polls other Data Sources for updates which are

Re: Solr 4.2 Startup Detects Corrupt Log And is Really Slow to Start

2013-04-17 Thread Mark Miller
On Apr 17, 2013, at 1:42 PM, Shawn Heisey s...@elyograg.org wrote: On 4/17/2013 10:29 AM, Umesh Prasad wrote: We use DIH and have turned off the Auto commit because we have to sometimes build index from Scratch (clean=true) and we not want to Our master server sees a lot of restarts,

facet.method enum vs fc

2013-04-17 Thread Mingfeng Yang
I am doing faceting on an index of 120M documents, on the field of url, using the following two queries. Note that the only difference of the two queries is that first one uses default facet.method, and the second one uses face.method=enum. ( each document in the index contains a review we

Re: Doubts about solr stats component

2013-04-17 Thread Gopal Patwa
please post field defination from solr schema.xml for stats.field=login_attemptshttp://localhost:8080/solr/daycore/select?q=*:*stats=truestats.field=login_attemptsrows=0 , it depends how you have defined stats field

Re: Max http connections in CloudSolrServer

2013-04-17 Thread Shawn Heisey
On 4/17/2013 3:46 AM, J Mohamed Zahoor wrote: Hi I am pumping parallel select queries using CloudSolrServer. It looks like it can handle only certain no of max connections... my Question is, How many concurrent queries does a CloudSolrServer can handle? Looking into the code for 4.x

Re: Spellchecker not working for Solr 4.1

2013-04-17 Thread davers
When I set distrib=false the spellchecker works perfectly. So I take it spellchecker doesn't work in solr 4.1 in cloud mode. Does anybody know if it works in 4.2.1? -- View this message in context: http://lucene.472066.n3.nabble.com/Spellchecker-not-working-for-Solr-4-1-tp4055450p4056768.html

Re: Solr 4.2 Startup Detects Corrupt Log And is Really Slow to Start

2013-04-17 Thread Shawn Heisey
On 4/17/2013 11:56 AM, Mark Miller wrote: There is one additional caveat - when you disable the updateLog, you have to switch to MMapDirectoryFactory instead of NRTCachingDirectoryFactory. The NRT directory implementation will cache a portion of a commit (including hard commits) into RAM

RE: Spellchecker not working for Solr 4.1

2013-04-17 Thread Dyer, James
Spellcheck is broken when using both distributed and grouping. The fix is here: https://issues.apache.org/jira/browse/SOLR-3758 . This will be part of 4.3, which likely will be released within the next few weeks. In the mean time you can apply the patch to 4.2 or as a workaround, re-issue a

RE: Spellchecker not working for Solr 4.1

2013-04-17 Thread davers
Thank you for the response -- View this message in context: http://lucene.472066.n3.nabble.com/Spellchecker-not-working-for-Solr-4-1-tp4055450p4056776.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: facet.method enum vs fc

2013-04-17 Thread Timothy Potter
What are your results when using facet.method=fcs? On Wed, Apr 17, 2013 at 12:06 PM, Mingfeng Yang mfy...@wisewindow.comwrote: I am doing faceting on an index of 120M documents, on the field of url, using the following two queries. Note that the only difference of the two queries is that

Re: Using multiple text files for Suggestor dictionarys

2013-04-17 Thread Chris Hostetter
: Is it possible to use multiple text files? I tried the following: ... : But the second list, the cities, are apparently undetected, after : restarting the tomcat and rebuilding the dictionary. Can this be done? If : not, how would you recommend managing different dictionaries? Skimming

Re: facet.method enum vs fc

2013-04-17 Thread Mingfeng Yang
Does Solr 3.6 has facet.method=fcs? I tried anyway, and got ERROR 500: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded. On Wed, Apr 17, 2013 at 12:38 PM, Timothy Potter thelabd...@gmail.comwrote: What are your results when using facet.method=fcs? On

Re: Max http connections in CloudSolrServer

2013-04-17 Thread Chris Hostetter
: Side issue: shouldn't that be setMaxConnectionsPerHost instead of including : the word Default? If there's no objection, I would plan on adding the renamed : method and using a typical deprecation procedure for the old one. I think the name comes from the effect it has on the underlying

Solr Caching

2013-04-17 Thread Furkan KAMACI
I've just started to read about Solr caching. I want to learn one thing. Let's assume that I have given 4 GB RAM into my Solr application and I have 10 GB RAM. When Solr caching mechanism starts to work, does it use memory from that 4 GB part or lets operating system to cache it from 6 GB part of

Re: Solr Caching

2013-04-17 Thread Walter Underwood
On Apr 17, 2013, at 3:09 PM, Furkan KAMACI wrote: I've just started to read about Solr caching. I want to learn one thing. Let's assume that I have given 4 GB RAM into my Solr application and I have 10 GB RAM. When Solr caching mechanism starts to work, does it use memory from that 4 GB part

Select Queris While Merging Indexes

2013-04-17 Thread Furkan KAMACI
I see that while merging indexes (I mean optimizing via admin gui), my Solr instance can still response select queries (as well). How that querying mechanism works (because merging not finished yet but my Solr instance still can return a consistent response)?

Re: solr 3.5 core rename issue

2013-04-17 Thread Shawn Heisey
On 4/16/2013 2:39 PM, Jie Sun wrote: ?xml version=1.0 encoding=UTF-8 ? solr persistent=true cores adminPath=/admin/cores core name=default instanceDir=.// core name=413a instanceDir=.// core name=blah instanceDir=.// ... /cores /solr the command I ran was to rename from

Re: Max http connections in CloudSolrServer

2013-04-17 Thread Shawn Heisey
On 4/17/2013 3:21 PM, Chris Hostetter wrote: I think the name comes from the effect it has on the underlying HttpClient code ... it's possible to configure a HttpConnectionManager such that it has different number of max connections per host -- ie: host1 has max connections of 23, host2 has max

Re: Select Queris While Merging Indexes

2013-04-17 Thread Jack Krupansky
merging indexes The proper terminology is merging segments. Until the new, merged segment is complete, the existing segments remain untouched and readable. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Wednesday, April 17, 2013 6:28 PM To:

Query Elevation Component

2013-04-17 Thread davers
I would like to use the Query Elevation Component. As I understand it only elevates based on term. I would also like it to consider the list of fq parameters. Well really just one fq parameter. ex (fq=siteid:4) since I used the same solr index for many sites. Is something like this available

Re: dataimporter.last_index_time SolrCloud

2013-04-17 Thread Chris Hostetter
: Is this a bug? I can create the ticket in Jira if it is, but it's not clear : to me what should be happening. It certainly sounds like it, but i too am not certian what is actaully suppose to be happening here, or why it changed. Please open a jira with the details of your DIH requestHandler

Re: solr 3.5 core rename issue

2013-04-17 Thread Jie Sun
thanks Shawn for filing the issue. by the way my solrconfig.xml has: dataDir${MYSOLRROOT:/mysolrroot}/messages/solr/data/${solr.core.name}/dataDir For now I will have to shutdown solr and write a script to modify the solr.xml manually and rename the core data directory to new one. by the way

Re: Query Elevation Component

2013-04-17 Thread Upayavira
Perhaps you should describe the problem you are tryin to solve. There may be other ways to solve it. Upayavira On Thu, Apr 18, 2013, at 01:08 AM, davers wrote: I would like to use the Query Elevation Component. As I understand it only elevates based on term. I would also like it to consider

Re: solr 3.5 core rename issue

2013-04-17 Thread Shawn Heisey
On 4/17/2013 7:07 PM, Jie Sun wrote: thanks Shawn for filing the issue. by the way my solrconfig.xml has: dataDir${MYSOLRROOT:/mysolrroot}/messages/solr/data/${solr.core.name}/dataDir For now I will have to shutdown solr and write a script to modify the solr.xml manually and rename the