Re: Unable to index rich-text documents in Solr Cloud

2015-03-18 Thread Zheng Lin Edwin Yeo
Hi Charlee, I've followed the setup from the Solr In Action book, and assign port 8983 to shard1 and port 8984 to shard2. Will it cause any issues? Regards, Edwin On 19 March 2015 at 13:02, Charlee Chitsuk wrote: > The http://192.168.2.2:8984/solr/ > < > http://192.168.2.2:8984/solr/logmill/up

Re: Unable to index rich-text documents in Solr Cloud

2015-03-18 Thread Zheng Lin Edwin Yeo
Oh ya. The previous log was from shard1. This log is from shard2. INFO - 2015-03-18 15:06:51.019; org.apache.solr.update.processor.LogUpdateProcessor; [logmill] webapp=/solr path=/update params={distrib.from= http://192.168.2.2:8983/solr/logmill/&update.distrib=TOLEADER&wt=javabin&version=2} {} 0

Re: Solr Deleted Docs Issue

2015-03-18 Thread vicky desai
Hi, Thanks erick and shawn for the reply. Just wanted to clarify that commit size of 10 was only an example and in production commit is handled via auto-commit feature of solr. The requirement we have is to store around 20-30 lakh docs out of which around 5-6 lakh docs get updated daily. What I h

Re: not able to import Data through DIH solr 4.2.1

2015-03-18 Thread Shawn Heisey
On 3/18/2015 11:00 PM, abhishek tiwari wrote: > my solrconfig : > > < > lib dir="../../../dist/" regex="solr-dataimporthandler-.*\.jar" /> The way that I always recommend dealing with extra jars: In your solr home, create a "lib" directory. Copy all the extra jars that you need into this direc

Re: not able to import Data through DIH solr 4.2.1

2015-03-18 Thread abhishek tiwari
but still not working On Thu, Mar 19, 2015 at 10:41 AM, Alexandre Rafalovitch wrote: > Try absolute path to the jar directory. Hard to tell whether relative > path is correct without knowing exactly how you are running it. > > Regards, > Alex. > > Solr Analyzers, Tokenizers, Filter

Re: Unable to index rich-text documents in Solr Cloud

2015-03-18 Thread Shawn Heisey
On 3/18/2015 1:22 AM, Zheng Lin Edwin Yeo wrote: > I'm having some issues with indexing rich-text documents from the Solr > Cloud. When I tried to index a pdf or word document, I get the following > error: > > > org.apache.solr.common.SolrException: Bad Request > > > > request: > http://192.1

Re: not able to import Data through DIH solr 4.2.1

2015-03-18 Thread Alexandre Rafalovitch
Try absolute path to the jar directory. Hard to tell whether relative path is correct without knowing exactly how you are running it. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 19 March 2015 at 01:00, abhishek tiwari wr

Re: Unable to index rich-text documents in Solr Cloud

2015-03-18 Thread Charlee Chitsuk
The http://192.168.2.2:8984/solr/ , the port number 8984 may be an HTTPS. The HTTP port should be 8983. Hope this help. -- Best Regards,

Re: not able to import Data through DIH solr 4.2.1

2015-03-18 Thread abhishek tiwari
Alex thanks for replying my solrconfig : < lib dir="../../../dist/" regex="solr-dataimporthandler-.*\.jar" /> ## data-config-new.xml On Thu, Mar 19, 2015 at 10:26 AM, Alexandre Rafalovitch wrote: > > Could not load driver: com.mysql.jdbc.Driver > > Looks like a custom driver. Is the dri

Re: not able to import Data through DIH solr 4.2.1

2015-03-18 Thread Alexandre Rafalovitch
> Could not load driver: com.mysql.jdbc.Driver Looks like a custom driver. Is the driver name correct? Is the library declared in solrconfig.xml? Is the library path correct (use absolute path if in doubt). Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: h

Re: Unable to index rich-text documents in Solr Cloud

2015-03-18 Thread Damien Kamerman
It sounds like https://issues.apache.org/jira/browse/SOLR-5551 Have you checked the solr.log for all nodes? On 19 March 2015 at 14:43, Zheng Lin Edwin Yeo wrote: > This is the logs that I got from solr.log. I can't seems to figure out > what's wrong with it. Does anyone knows? > > > > ERROR - 20

not able to import Data through DIH solr 4.2.1

2015-03-18 Thread abhishek tiwari
Please provide the basic steps to resolve the issue Getting following error Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Could not load driver: com.mysql.jdbc.Driver Processing Document # 1

Re: Unable to index rich-text documents in Solr Cloud

2015-03-18 Thread Zheng Lin Edwin Yeo
This is the logs that I got from solr.log. I can't seems to figure out what's wrong with it. Does anyone knows? ERROR - 2015-03-18 15:06:51.019; org.apache.solr.update.StreamingSolrClients$1; error org.apache.solr.common.SolrException: Bad Request request: http://192.168.2.2:8984/solr/logmill

Re: Unable to index rich-text documents in Solr Cloud

2015-03-18 Thread Zheng Lin Edwin Yeo
This is the logs that I got from solr.log. I can't seems to figure out what's wrong with it. Does anyone knows? ERROR - 2015-03-18 15:06:51.019; org.apache.solr.update.StreamingSolrClients$1; error org.apache.solr.common.SolrException: Bad Request request: http://192.168.2.2:8984/solr/logmill

Re: Unable to index rich-text documents in Solr Cloud

2015-03-18 Thread Damien Kamerman
I suggest you check your solr logs for more info as to the cause. On 19 March 2015 at 12:58, Zheng Lin Edwin Yeo wrote: > Hi Erick, > > No, the PDF file is a testing file which only contains 1 sentence. > > I've managed to get it to work by removing startup="lazy" in > the ExtractingRequestHandl

Re: Unable to index rich-text documents in Solr Cloud

2015-03-18 Thread Zheng Lin Edwin Yeo
Hi Erick, No, the PDF file is a testing file which only contains 1 sentence. I've managed to get it to work by removing startup="lazy" in the ExtractingRequestHandler and added the following lines: ignored_ true links ignored_ Does the presence of startup="lazy" affect th

Re: CloudSolrServer : Could not find collection : gettingstarted

2015-03-18 Thread Erick Erickson
Does the Solr admin UI>>cloud view show the gettingstarted collection? The "graph" view might help. It _sounds_ like somehow you didn't actually create the collection. What steps did you follow to create the collection in SolrCloud? It's possible you have the wrong ZK root somehow I suppose. Best

Re: schema.xml xsd file

2015-03-18 Thread Shawn Heisey
On 3/18/2015 8:45 AM, Pedro Figueiredo wrote: > Where can I find the xsd file for the schema.xml file? As Erick said, current XSD files do not exist. There are some (now probably outdated) XSD files in a patch on this issue: https://issues.apache.org/jira/browse/SOLR-1758 Thanks, Shawn

Re: High memory usage while querying with sort using cursor

2015-03-18 Thread Vaibhav Bhandari
Thanks Chris, that makes a lot of sense. On Wed, Mar 18, 2015 at 3:16 PM, Chris Hostetter wrote: > > : A simple query on the collection: ../select?q=*:* works perfectly fine. > : > : But as soon as i add sorting, it crashes the nodes with OOM: > : .../select?q=*:*&sort=unique_id asc&rows=0. >

RE: Distributed IDF performance

2015-03-18 Thread Markus Jelsma
Anshum, Jack - don't any of you have a cluster at hand to get some real results on this? After testing the actual functionality for a quite some time while the final patch was in development, we have not had the change to work on performance tests. We are still on Solr 4.10 and have to port lots

Re: High memory usage while querying with sort using cursor

2015-03-18 Thread Chris Hostetter
: A simple query on the collection: ../select?q=*:* works perfectly fine. : : But as soon as i add sorting, it crashes the nodes with OOM: : .../select?q=*:*&sort=unique_id asc&rows=0. if you don't have docValues="true" on your unique_id field, then sorting rquires it to build up a large in mem

Re: Whole RAM consumed while Indexing.

2015-03-18 Thread Erick Erickson
bq: As you said, do commits after 6 seconds No, No, No. I'm NOT saying 6 seconds! That time is in _milliseconds_ as Shawn said. So setting it to 6 is every minute. >From solrconfig.xml, conveniently located immediately above the tag: maxTime - Maximum amount of time in ms that is al

High memory usage while querying with sort using cursor

2015-03-18 Thread Vaibhav Bhandari
Hi all, My setup is as follows: *Collection* size: 32GB, 2 shards, replication factor: 2 (~16GB on each replica). Number of rows: 250million 4 *Solr* nodes: RAM: 30GB each. Heap size: 8GB. Version: 4.9.1 Besides the collection in question, the nodes have some other collections present. The total

CloudSolrServer : Could not find collection : gettingstarted

2015-03-18 Thread Adnan Yaqoob
I'm getting following exception while trying to upload document on SolrCloud using CloudSolrServer. Exception in thread "main" org.apache.solr.common.SolrException: *Could not find collection :* gettingstarted at org.apache.solr.common.cloud.ClusterState.getCollection(ClusterState.java:162)

Can we used CloudSolrServer for searching data

2015-03-18 Thread Adnan Yaqoob
I am using Solrcloud with zookeeper setup. but when I try to make query using following code snippet I get exception code: CloudSolrServer server = new CloudSolrServer("localhost:2181"); server.setDefaultCollection("gettingstarted"); server.connect(); SolrQuery query = new SolrQuery();

Re: copy field from boolean to int

2015-03-18 Thread Kevin Osborn
I already use this field elsewhere, so I don't want to change it's type. I did implement a UpdateRequestProcessor to copy from a bool to an int. This works, but even better would be to fix Solr so that I can use DocValues with boolean. So, I am going to try to get that working as well. On Tue, Mar

Re: Which one is it "cs" or "cz" for Czech language?

2015-03-18 Thread Chris Hostetter
: Probably a historical artifact. Yeah, probably. fixing the solr example configs would be fairly trivial -- the names are just symbolic strings -- but currently they are all consistent with the lucene packagine names, which would me a more complex cange from a back compat standpoint -- i've

RE: schema.xml xsd file

2015-03-18 Thread Pedro Figueiredo
:( ok, thank you. Pedro Figueiredo Senior Engineer -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 18 March 2015 15:28 To: solr-user@lucene.apache.org Subject: Re: schema.xml xsd file There isn't one. The question has ben bandied back and forth several tim

Re: Solr returns incorrect results after sorting

2015-03-18 Thread jim ferenczi
Hi Raj, The group.sort you are using defines multiple criterias. The first criteria is the big solr function starting with the "max". This means that inside each group the documents will be sorted by this criteria and if the values are equals between two documents then the comparison fallbacks to t

Re: Whole RAM consumed while Indexing.

2015-03-18 Thread Alexandre Rafalovitch
Probably merged somewhat differently with some terms indexes repeating between segments. Check the number of segments in data directory.And do search for *:* and make sure both do have the same document counts. Also, In all these discussions, you still haven't answered about how fast after indexin

Re: SolrCloud: data is present on one shard only

2015-03-18 Thread Aman Tandon
Okay shawn thanks I will try as per your suggestion. And will update here. With Regards Aman Tandon On Wed, Mar 18, 2015 at 9:39 PM, Shawn Heisey wrote: > On 3/18/2015 9:47 AM, Aman Tandon wrote: > > I have 1,20,000 documents. And I am indexing the data in my solrcloud > > architecture having t

Multiple words suggestion

2015-03-18 Thread Hakim Benoudjit
Hello there, Does Solr 4.x (or even 5) support *multiple words suggestions*? I mean if my query is: "*tozota hilox*": And when I activate the spellcheck component, each word is treated separately. So "*toyota*" is suggested for "*tozota*", and "*hilux*" is suggested for " *hilox*". But what I nee

Re: Whole RAM consumed while Indexing.

2015-03-18 Thread Nitin Solanki
When I kept my configuration to 300 for soft commit and 3000 for hard commit and indexed some amount of data, I got the data size of the whole index to be 6GB after completing the indexing. When I changed the configuration to 6 for soft commit and 6 for hard commit and indexed same data th

Re: SolrCloud: data is present on one shard only

2015-03-18 Thread Shawn Heisey
On 3/18/2015 9:47 AM, Aman Tandon wrote: > I have 1,20,000 documents. And I am indexing the data in my solrcloud > architecture having the two shards. I am having the mindset that some data > will be present on both the shards. But when I am looking on data size via > admin interface, I am able to

Re: Whole RAM consumed while Indexing.

2015-03-18 Thread Shawn Heisey
On 3/18/2015 9:44 AM, Nitin Solanki wrote: > I am just saying. I want to be sure on commits difference.. > What if I do frequent commits or not? And why I am saying that I need to > commit things so very quickly because I have to index 28GB of data which > takes 7-8 hours(frequent comm

Re: SolrCloud: data is present on one shard only

2015-03-18 Thread Aman Tandon
Hi Shawn, I apologize for my unclear mail, I have 1,20,000 documents. And I am indexing the data in my solrcloud architecture having the two shards. I am having the mindset that some data will be present on both the shards. But when I am looking on data size via admin interface, I am able to see

Re: Whole RAM consumed while Indexing.

2015-03-18 Thread Nitin Solanki
Hi Erick, I am just saying. I want to be sure on commits difference.. What if I do frequent commits or not? And why I am saying that I need to commit things so very quickly because I have to index 28GB of data which takes 7-8 hours(frequent commits). As you said, do commits after 6

Re: schema.xml xsd file

2015-03-18 Thread Erick Erickson
There isn't one. The question has ben bandied back and forth several times, but the reaction is that an XSD would be more trouble than it's worth, especially as it would have to handle any customizations that anyone wanted to throw at, say, custom field types. Best, Erick On Wed, Mar 18, 2015 at

Re: Whole RAM consumed while Indexing.

2015-03-18 Thread Erick Erickson
Don't do it. Really, why do you want to do this? This seems like an "XY" problem, you haven't explained why you need to commit things so very quickly. I suspect you haven't tried _searching_ while committing at such a rate, and you might as well turn all your top-level caches off in solrconfig.xml

Re: index duplicate records from data source into 1 document

2015-03-18 Thread Erick Erickson
I'd use SolrJ, pull the docs by productId order and combine records with the same product ID into a single doc. Here's a starter set for indexing form a DB with SolrJ. It has Tika processing in it as well, but you can pull that out pretty easily. https://lucidworks.com/blog/indexing-with-solrj/

Re: Unable to index rich-text documents in Solr Cloud

2015-03-18 Thread Erick Erickson
Shot in the dark, but is the PDF file significantly larger than the others? Perhaps your simply exceeding the packet limits for the servlet container? Best, Erick On Wed, Mar 18, 2015 at 12:22 AM, Zheng Lin Edwin Yeo wrote: > Hi everyone, > > I'm having some issues with indexing rich-text docume

schema.xml xsd file

2015-03-18 Thread Pedro Figueiredo
Hello, Where can I find the xsd file for the schema.xml file? Thanks in advanced! Best regards, Pedro Figueiredo Senior Engineer pjlfigueir...@criticalsoftware.com M. 934058150 Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da

Re: Add replica on shards

2015-03-18 Thread Nitin Solanki
Thanks Norgorn. I did the same thing but in different manner.. like - localhost:8983/solr/admin/cores?action=CREATE&name=wikingram_shard4_replica3&collection=wikingram&property.shard=shard4 On Wed, Mar 18, 2015 at 7:20 PM, Norgorn wrote: > > U can do the same simply by something like that > > >

Re: SolrCloud: data is present on one shard only

2015-03-18 Thread Shawn Heisey
On 3/17/2015 3:54 AM, Aman Tandon wrote: > I indexed the data in my SolrCoud architecture (2 shards present on 2 > separate instance & on one instance I have the replica of both the shards > which is present on other 2 instance). > > And when I am looking at the index via admin interface, it is pre

Re: Add replica on shards

2015-03-18 Thread Norgorn
U can do the same simply by something like that http://localhost:8983/solr/admin/cores?action=CREATE&collection=wikingram&name=ANY_NAME_HERE&shard=shard1 The main part is "shard=shard1", when you create core with existing shard (core name doesn't matter, we use "collection_shard1_replica2", but

Re: Whole RAM consumed while Indexing.

2015-03-18 Thread Nitin Solanki
Hi, If I do very very fast indexing(softcommit = 300 and hardcommit = 3000) v/s slow indexing (softcommit = 6 and hardcommit = 6) as you both said. Will fast indexing fail to index some data? Any suggestion on this ? On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar < andyetitmo

Re: Which one is it "cs" or "cz" for Czech language?

2015-03-18 Thread Jack Krupansky
It does indeed appear that use of the "_cz" suffix is a mistake - those suffixes are supposed to be language codes. Sure, generally, there tends to be a one-to-one relationship between language and country, but clearly that is not as absolute as a casual observer might misguidedly think. I think i

index duplicate records from data source into 1 document

2015-03-18 Thread Derek Poh
Hi If I have duplicaterecords in my source data (DB or delimited files). For simplicity sake they are of the following nature Product IdBusiness Type --- 12345 Exporter 12345 Agent 12366 Manufacturer 12377 Exporter 12377 Distributor

Re: Which one is it "cs" or "cz" for Czech language?

2015-03-18 Thread Eduard Moraru
Hi, On Wed, Mar 18, 2015 at 9:28 AM, steve wrote: > FYI:http://www.w3schools.com/tags/ref_country_codes.asp CZECH REPUBLICCZ > No entry for CS > Exactly, steve. "CZ" is the country code, however we are talking about language codes (which is "CS"), since those Solr types deal with languages not

RE: Which one is it "cs" or "cz" for Czech language?

2015-03-18 Thread steve
FYI:http://www.w3schools.com/tags/ref_country_codes.aspCZECH REPUBLICCZNo entry for CS > From: md...@apache.org > Date: Tue, 17 Mar 2015 12:45:57 -0500 > Subject: Re: Which one is it "cs" or "cz" for Czech language? > To: solr-user@lucene.apache.org > > Probably a historical artifact. > > cz is

Re: Add replica on shards

2015-03-18 Thread Nitin Solanki
Any help please... On Wed, Mar 18, 2015 at 12:02 PM, Nitin Solanki wrote: > Hi, > I have created 8 shards on a collection named as ***wikingram**. > Now at that time, I were not created any replica. Now, I want to add a > replica on each shard. How can I do? > I created this - ** sudo c

Unable to index rich-text documents in Solr Cloud

2015-03-18 Thread Zheng Lin Edwin Yeo
Hi everyone, I'm having some issues with indexing rich-text documents from the Solr Cloud. When I tried to index a pdf or word document, I get the following error: org.apache.solr.common.SolrException: Bad Request request: http://192.168.2.2:8984/solr/logmill/update?update.distrib=TOLEADER&d