Re: solrcloud indexing completed event

2014-07-01 Thread Giovanni Bricconi
Thank you Erick, Fortunately I can modify the data feeding process to start my post-indexing tasks. 2014-06-30 22:13 GMT+02:00 Erick Erickson erickerick...@gmail.com: The paradigm is different. In SolrCloud when a client sends an indexing request to any node in the system, when the

Re: why full-import not work well?

2014-07-01 Thread rulinma
done. There is a bug, remove something is ok. -- View this message in context: http://lucene.472066.n3.nabble.com/why-full-import-not-work-well-tp4142193p4144932.html Sent from the Solr - User mailing list archive at Nabble.com.

Disable all caches in Solr

2014-07-01 Thread vidit.asthana
I want to run some query benchmarks, so I want to disable all type of caches in solr. I commented out filterCache, queryResultCache and documentCache in solrConfig.xml. I don't care about Result Window Size cause numdocs is 10 in all the cases. Are there any other hidden caches which I should

Re: Disable all caches in Solr

2014-07-01 Thread Alexandre Rafalovitch
Have you also disabled the queries used to initialize searchers after commit? Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Tue, Jul 1, 2014 at 3:53 PM, vidit.asthana vidit.astha...@gmail.com wrote: I want to

Re: Disable all caches in Solr

2014-07-01 Thread vidit.asthana
Yes, I have also commented newSearcher and firstSearcher queries in solrConfig.xml -- View this message in context: http://lucene.472066.n3.nabble.com/Disable-all-caches-in-Solr-tp4144933p4144935.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: CopyField can't copy analyzers and Filters

2014-07-01 Thread benjelloun
Hello, here is my configuration which don't work: shema: field name=AllChamp type=text_general multiValued=true indexed=true required=false stored=false/ dynamicField name=*_en type=text_en indexed=true stored=true required=false multiValued=true/ dynamicField name=*_fr type=text_fr

Re: CopyField can't copy analyzers and Filters

2014-07-01 Thread Alexandre Rafalovitch
I believe, you were already answered. If you want to have text parsed/analyzed in different ways, you need to have them in separate fields with separate analyzer stacks. Then use disMax/eDisMax to search across those fields. copyField copies the original content and therefore when you search the

Restriction on type of uniqueKey field?

2014-07-01 Thread Alexandre Rafalovitch
Hello, I remember reading somewhere that id field (uniqueKey) must be String. But I cannot find the definitive confirmation, just that it should be non-analyzed. Can I use a single-valued TrieLongField type, with precision set to 0? Or am I going to hit issues? Regards, Alex. Personal

Re: CopyField can't copy analyzers and Filters

2014-07-01 Thread benjelloun
Hello, i have 300 feilds which are copied on AllChamp if i want to do separated fields then i need to create 300 * Number of languages i have, which is not logical for me. is there any other solution? Best regards Anass BENJELLOUN 2014-07-01 11:28 GMT+02:00 Alexandre Rafalovitch [via Lucene]

Re: CopyField can't copy analyzers and Filters

2014-07-01 Thread Alexandre Rafalovitch
But aren't you already creating those 300 fields anyway: dynamicField name=*_fr type=text_fr indexed=true stored=true required=false multiValued=true/ If you mean you have issues specifying them in eDisMax, I believe 'qf' parameter allows to specify a wildcard. Alternatively, you can look at the

Re: CopyField can't copy analyzers and Filters

2014-07-01 Thread benjelloun
i have documents (ar, en , fr) i need to index them and keeping analyzer and filter for each languages. here is all fields on schema to enderstand my probleme: fields field name=IdDocument type=string multiValued=false indexed=true required=true stored=true/ field name=NomDocument type=string

Re: Disable all caches in Solr

2014-07-01 Thread Toke Eskildsen
On Tue, 2014-07-01 at 10:53 +0200, vidit.asthana wrote: Are there any other hidden caches which I should know about before running my tests? Clear the disk cache? - Toke Eskildsen, State and University Library, Denmark

Re: Restriction on type of uniqueKey field?

2014-07-01 Thread Shalin Shekhar Mangar
No, you definitely can have an int or long uniqueKey. A lot of Solr's tests use such a uniqueKey. See solr/core/src/test-files/solr/collection1/conf/schema.xml On Tue, Jul 1, 2014 at 3:20 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Hello, I remember reading somewhere that id field

Out of Memory when i downdload 5 Million records from sqlserver to solr

2014-07-01 Thread mskeerthi
I have to download my 5 million records from sqlserver to solr into one index. I am getting below exception after downloading 1 Million records. Is there any configuration or another to download from sqlserver to solr. Below is the exception i am getting in solr:

Sharing single indexer for 2 different solr instance

2014-07-01 Thread deepakinniah
Hi, I have a solr indexer in my network path and i want to share this indexer(Without replication) for more than one solr instance. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/Sharing-single-indexer-for-2-different-solr-instance-tp4144954.html Sent from

Re: solr dedup on specific fields

2014-07-01 Thread Ali Nazemian
Any suggestion would be appreciated. Regards. On Mon, Jun 30, 2014 at 2:49 PM, Ali Nazemian alinazem...@gmail.com wrote: Hi, I used solr 4.8 for indexing the web pages that come from nutch. I know that solr deduplication operation works on uniquekey field. So I set that to URL field.

Re: CopyField can't copy analyzers and Filters

2014-07-01 Thread benjelloun
and i use dynamicfields for NomDocument,ContenuDocument,Postit exemple: ContenuDocument_fr, ContenuDocument_en,ContenuDocument_ar processor class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory lst name=defaults str

Re: Garbage collection issue and RELOADing cores

2014-07-01 Thread François Schiettecatte
Hi Just following up on my previous post about a memory leak when RELOADing cores, I narrowed it down to the SuggestComponent, specifically 'searchComponent name=suggest class=solr.SuggestComponent.../searchComponent' in solrconfig.xml. Comment that out and the leak goes away. The leak occurs

AUTO: Saravanan Chinnadurai is out of the office (returning 02/07/2014)

2014-07-01 Thread Saravanan . Chinnadurai
I will be out of the office starting 01/07/2014 and will not return until 02/07/2014 Please email itsta...@actionimages.com for any urgent queries. Note: This is an automated response to your message Strategy for removing an active shard from zookeeper sent on 7/1/2014 0:45:59. This is the

Re: Restriction on type of uniqueKey field?

2014-07-01 Thread Michael Della Bitta
Alex, maybe you're thinking of constraints put on shard keys? Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions

Language detection for solr 3.6.1

2014-07-01 Thread Poornima Jay
Hi, Can anyone please let me know how to integrate http://code.google.com/p/language-detection/ in solr 3.6.1. I want four languages (English, chinese simplified, chinese traditional, Japanes, and Korean) to be added in one schema ie. multilingual search from single schema file. I tried

RE: Multiterm analysis in complexphrase query

2014-07-01 Thread Michael Ryan
Thanks. This looks interesting... -Michael -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Monday, June 30, 2014 8:15 AM To: solr-user@lucene.apache.org Subject: RE: Multiterm analysis in complexphrase query Ahmet, please correct me if I'm wrong, but the

Best way to fix Document contains at least one immense term?

2014-07-01 Thread Michael Ryan
In LUCENE-5472, Lucene was changed to throw an error if a term is too long, rather than just logging a message. I have fields with terms that are too long, but I don't care - I just want to ignore them and move on. The recommended solution in the docs is to use LengthFilterFactory, but this

Re: solr dedup on specific fields

2014-07-01 Thread Alexandre Rafalovitch
Well, it's implemented in SignatureUpdateProcessorFactory. Worst case, you can clone that code and add your preserve-field functionality. Could even be a nice contribution. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ -

Re: Restriction on type of uniqueKey field?

2014-07-01 Thread Alexandre Rafalovitch
I wasn't thinking of shard keys, but may have been confused in the reading. Thank you everyone, the long key is working just fine for me. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Tue,

Re: Integrating solr with Hadoop

2014-07-01 Thread Erick Erickson
Should be fine. Things to watch: 1 solrconfig.xml has to have the HdfsDirectoryFactory enabled. 2 You probably want to configure ZooKeeper stand-alone, although it's possible to run embedded ZK it's just awkward since you can't really bounce Solr nodes running embedded ZK at

Re: Restriction on type of uniqueKey field?

2014-07-01 Thread Koji Sekiguchi
In addition, KeywordTokenizer can be seemingly used but it should be avoided for unique key field. One of my customers that used it and they had got OOM during a long term indexing. As it was difficult to find the problem, I'd like to share my experience. Koji --

Re: CopyField can't copy analyzers and Filters

2014-07-01 Thread Erick Erickson
OK, back up a bit and consider alternative indexing schemes. For instance, do you really need all those fields? Could you get away with one field where you indexed the field _name_ + associated value? (you'd have to be very careful with your analysis chain, but...) Something like: C67_val_value1

Re: Restriction on type of uniqueKey field?

2014-07-01 Thread Erick Erickson
non-String uniqueKey fields have historically popped out in weird places. I think at one point, for instance, QueryElevationComponent barfed on non-string types. So, there may still be edge cases in which this can be a problem. IMO, they're all bugs though. Erick On Tue, Jul 1, 2014 at 7:43 AM,

Throwing Error Missing Mandatory uniquekey field id

2014-07-01 Thread mskeerthi
I mentioned id as string in schema.xml and i copied the csv into example docs folder. I used the below commaand to download the data Java -Dtype=application/csv -jar post.jar import.csv it's throwing the below error.Please help in this regard. ERROR - 2014-07-01 19:57:43.902;

Re: CopyField can't copy analyzers and Filters

2014-07-01 Thread benjelloun
hello erick, unfortunately i can't modify the schema , me and my team analyzed carefully the problem, so all fields you seeing are required on schema. now i just tested to do different fields maybe it could work if i knew syntaxe of edismax: field name=AllChamp_ar type=text_ar multiValued=true

Re: CopyField can't copy analyzers and Filters

2014-07-01 Thread Daniel Collins
Ok, firstly to say you need to fix your problem but you can't modify the schema, doesn't really help. If the schema is setup badly, then no amount of help at search time will ever get you the results you want... Secondly, from what I can see in the schema, there is no AllChamp_fr, AllChamp_en,

Re: CopyField can't copy analyzers and Filters

2014-07-01 Thread benjelloun
Hello, for Cx_val, there is some fields which are multivalued :) for AllChamp_fr, AllChamp_en..., i juste added them to the schema to test if edismax work. 2014-07-01 17:13 GMT+02:00 Daniel Collins [via Lucene] ml-node+s472066n4145024...@n3.nabble.com: Ok, firstly to say you need to fix

Re: Out of Memory when i downdload 5 Million records from sqlserver to solr

2014-07-01 Thread Aman Tandon
You can try gave some more memory to solr On Jul 1, 2014 4:41 PM, mskeerthi mskeer...@gmail.com wrote: I have to download my 5 million records from sqlserver to solr into one index. I am getting below exception after downloading 1 Million records. Is there any configuration or another to

RE: Multiterm analysis in complexphrase query

2014-07-01 Thread Allison, Timothy B.
If there's enough interest, I might get back into the code and throw a standalone src (and jar) of the SpanQueryParser and the Solr wrapper onto github. That would make it more widely available until there's a chance to integrate it into Lucene/Solr. If you'd be interested in this, let me

Re: Out of Memory when i downdload 5 Million records from sqlserver to solr

2014-07-01 Thread IJ
We faced similar problems on our side. We found it more reliable to have a mechanism to extract all data from the Database into a flat file - and then use a JAVA program to bulk index into Solr from the file via SolrJ API. -- View this message in context:

Does Solr move documents between shards when the value of the shard key is updated ?

2014-07-01 Thread IJ
Lets say I create a Solr Collection with multiple shards (say 2 shards) and set the value of router.field to a field called CompanyName. Now - we all know that during Indexing Solr would compute a hash on the value indexed into the CompanyName and route to an appropriate shard. Lets say I index a

Re: Throwing Error Missing Mandatory uniquekey field id

2014-07-01 Thread Chris Hostetter
: I mentioned id as string in schema.xml and i copied the csv into example docs : folder. I used the below commaand to download the data Java : -Dtype=application/csv -jar post.jar import.csv : : it's throwing the below error.Please help in this regard. : : ERROR - 2014-07-01 19:57:43.902;

Re: Disable all caches in Solr

2014-07-01 Thread Chris Hostetter
: I want to run some query benchmarks, so I want to disable all type of caches Just to be clear: disabling all internal caching because you want to run a benchmark means you're probably going to wind up running a useless benchmark. Solr's internal caching is a key component of it's perormance

Confusion about location of + and - ?

2014-07-01 Thread Brett Hoerner
Can anyone explain the difference between these two queries? text:(+happy) AND -user:(123456789) = numFound 2912224 But text:(+happy) AND user:(-123456789) = numFound 0 Now, you may just say then just put - infront of your field, duh! Well, text:(+happy) = numFound 2912224

MLT weird behaviour in Solrcloud

2014-07-01 Thread Shamik Bandopadhyay
Hi, I'm trying to use mlt request handler in a Solrcloud cluster. Apparently, its showing some weird behavior. I'm getting response randomly, it's able to return results randomly for the same query. I'm using Solrj client which in turn communicates the cluster using zookeeper ensemble. Here's

Re: Confusion about location of + and - ?

2014-07-01 Thread Jack Krupansky
Yeah, there's a known bug that a negative-only query within parentheses doesn't match properly - you need to add a non-negative term, such as *:*. For example: text:(+happy) AND user:(*:* -123456789) -- Jack Krupansky -Original Message- From: Brett Hoerner Sent: Tuesday, July 1,

Re: Confusion about location of + and - ?

2014-07-01 Thread Brett Hoerner
Interesting, is there a performance impact to sending the *:*? On Tue, Jul 1, 2014 at 2:53 PM, Jack Krupansky j...@basetechnology.com wrote: Yeah, there's a known bug that a negative-only query within parentheses doesn't match properly - you need to add a non-negative term, such as *:*. For

Re: Confusion about location of + and - ?

2014-07-01 Thread Brett Hoerner
Also, does anyone have the Solr or Lucene bug # for this? On Tue, Jul 1, 2014 at 3:06 PM, Brett Hoerner br...@bretthoerner.com wrote: Interesting, is there a performance impact to sending the *:*? On Tue, Jul 1, 2014 at 2:53 PM, Jack Krupansky j...@basetechnology.com wrote: Yeah, there's

Re: Confusion about location of + and - ?

2014-07-01 Thread Jack Krupansky
No, that's what Solr would do if the bug were fixed. Matching all documents (*:*) is a constant score query, so it takes no significant amount of resources. Personally, I consider this a bug in Lucene, but try convincing them of that! The issue was filed as: SOLR-3744 - Solr LuceneQParser

Continue indexing doc after error

2014-07-01 Thread tedsolr
I need to index documents from a csv file that will have 1000s of rows and 100+ columns. To help the user loading the file I must return useful errors when indexing fails (schema violations). I'm using SolrJ to read the files line by line, build the document, and index/commit. This approach allows

Re: Continue indexing doc after error

2014-07-01 Thread Tomás Fernández Löbbe
I think what you want is what’s described in https://issues.apache.org/jira/browse/SOLR-445 This has not been committed because it still doesn’t work with SolrCloud. Hoss gave me the hint to look at DistributingUpdateProcessorFactory to solve the problem described in the last comments, but I

Re: Continue indexing doc after error

2014-07-01 Thread tedsolr
Thank you. That's a useful link. Maybe not quite what I'm looking for, as it appears to do with bulk loads of docs - returning an error for each bad doc. My question is more about getting all the errors for a single doc. I'm probably taking a performance hit by adding docs one at a time. I haven't

Re: Restriction on type of uniqueKey field?

2014-07-01 Thread Jack Krupansky
My vague recollection is that at least at one time there was a limitation somewhere in SolrCloud, but whether that is still true, I don't know. -- Jack Krupansky -Original Message- From: Alexandre Rafalovitch Sent: Tuesday, July 1, 2014 9:48 AM To: solr-user@lucene.apache.org

Re: Best way to fix Document contains at least one immense term?

2014-07-01 Thread Jack Krupansky
You could develop an update processor to skip or trim long terms as you see fit. You can even code a script in JavaScruipt using the stateless script update processor. Can you tell us more about the nature of your data? I mean, sometimes analyzer filters strip or fold accented characters

Re: Does Solr move documents between shards when the value of the shard key is updated ?

2014-07-01 Thread Erick Erickson
You would end up with duplicate docs on the two shards. Solr is doing its doc-id lookup on the shards, not on other shards. Routing takes place before this step, so you're going to have two docs. Best, Erick On Tue, Jul 1, 2014 at 9:42 AM, IJ jay...@gmail.com wrote: Lets say I create a Solr

RE: Best way to fix Document contains at least one immense term?

2014-07-01 Thread Michael Ryan
In this particular case, the fields are just using KeywordTokenizerFactory. I have other fields that are tokenized, but they use tokenizers with a short maxTokenLength. I'm not even all that concerned about my own data, but more curious if there's a general solution to this problem. I imagine

Understanding fieldNorm differences between 3.6.1 and 4.9 solrs

2014-07-01 Thread Aaron Daubman
In trying to determine some subtle scoring differences (causing occasionally significant ordering differences) among search results, I wrote a parser to normalize debug.explain.structured JSON output. It appears that every score that is different comes down to a difference in fieldNorm, where the

Re: MLT weird behaviour in Solrcloud

2014-07-01 Thread Pramod Negi
why there is no comma(,) in between textlanguage in str name=mlt.qftitle,textlanguage,caaskey/str On Wed, Jul 2, 2014 at 12:42 AM, Shamik Bandopadhyay sham...@gmail.com wrote: Hi, I'm trying to use mlt request handler in a Solrcloud cluster. Apparently, its showing some weird behavior.

Re: How to integrate nlp in solr

2014-07-01 Thread Aman Tandon
Any help here With Regards Aman Tandon On Mon, Jun 30, 2014 at 11:00 PM, Aman Tandon amantandon...@gmail.com wrote: Hi Alex, I was try to get knowledge from these tutorials http://www.slideshare.net/teofili/natural-language-search-in-solr https://wiki.apache.org/solr/OpenNLP: this one is

Re: How to integrate nlp in solr

2014-07-01 Thread Alexandre Rafalovitch
Not from me, no. I don't have any real examples for this ready. I suspect the path beyond the basics is VERY dependent on your data and your business requirements. I would start from thinking how would YOU (as a human) do that match. Where does the 'blue' and 'color' and 'college' and 'bags' come

Memory Leaks in solr 4.8.1

2014-07-01 Thread Aman Tandon
Hi, When i am shutting down the solr i am gettng the Memory Leaks error in logs. Jul 02, 2014 10:49:10 AM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks SEVERE: The web application [/solr] created a ThreadLocal with key of type

Re: How to integrate nlp in solr

2014-07-01 Thread Aman Tandon
Hi Alex, Thanks alex, one more thing i want to ask that so do we need to add the extra fields for those entities, e.g. Item (bags), color (blue), etc. If some how i managed to implement this nlp then i will definitely publish it on my blog :) With Regards Aman Tandon On Wed, Jul 2, 2014 at

Re: MLT weird behaviour in Solrcloud

2014-07-01 Thread shamik
Sorry, that's a typo when I copied the mlt definition from my solrconfig, but there's comma in my test environment. It's not the issue. -- View this message in context: http://lucene.472066.n3.nabble.com/MLT-weird-behaviour-in-Solrcloud-tp4145066p4145145.html Sent from the Solr - User mailing