Re: 2.1billion+ document

2013-07-05 Thread Gora Mohanty
On 6 July 2013 09:45, Ali, Saqib wrote: > Thanks Jason! That was very helpful. > > I read on the solr wiki that: > "Documents must have a unique key and the unique key must be stored > (stored="true" in schema.xml)" > > What is this unique key? Is this just a id that we define in the schema.xml >

Re: 2.1billion+ document

2013-07-05 Thread Ali, Saqib
Thanks Jason! That was very helpful. I read on the solr wiki that: "Documents must have a unique key and the unique key must be stored (stored="true" in schema.xml)" What is this unique key? Is this just a id that we define in the schema.xml that is unique to all documents? We have something as f

Re: 2.1billion+ document

2013-07-05 Thread Jason Hellman
Saqib: At the simplest level: 1) Source the machine 2) Install Java 3) Install a servlet container of your choice 4) Copy your Solr WAR and conf directories as desired (probably a rough mirror of your current single server) 5) Start it up and start sending data there 6) Query both by simpl

Re: 2.1billion+ document

2013-07-05 Thread Ali, Saqib
Hello Otis, I was thinking more in terms of Solr DistributedSearch rather than SolrCloud. I was hoping to add another Solr instance, when the time comes. This is a low use application, but with lot of data. Uptime and query speed are not of importance. However we would like to be able to index mor

Re: 2.1billion+ document

2013-07-05 Thread Otis Gospodnetic
Hi, It's a broad question, but it starts with getting a few servers, putting Solr 4.3.1 on it (soon 4.4), setting up Zookeeper, creating a Solr Collection (index) with N shards and M replicas, and reindexing your old data to this new cluster, which you can expand with new nodes over time. If you

2.1billion+ document

2013-07-05 Thread Ali, Saqib
Question regarding the 2.1 billion+ document. I understand that a single instance of solr has a limit of 2.1 billion documents. We currently have a single solr server. If we reach 2.1billion documents limit, what is involved in moving to the Solr DistributedSearch? Thanks! :)

Re: Changing the number of shards?

2013-07-05 Thread Otis Gospodnetic
Correct. ES currently does not let you change the number of shards after you've created an Index (Collection in SolrCloud). It does not let you split shards either. SolrCloud has an advantage over ES around this at this point. Otis -- Solr & ElasticSearch Support -- http://sematext.com/ Performan

Changing the number of shards?

2013-07-05 Thread Jack Krupansky
According to the ElasticSearch glossary, “You cannot change the number of primary shards in an index, once the index is created.” Really? Is that true? (A “primary shard” is what Solr calls a shard, or slice.) In other words, even though you can easily “add shards” on ES, those are really just

Re: any plans to remove int32 limitation on the number of the documents in the index?

2013-07-05 Thread Otis Gospodnetic
Does https://issues.apache.org/jira/browse/SOLR-2112 help? Otis -- Solr & ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Fri, Jul 5, 2013 at 5:57 PM, Valery Giner wrote: > As a simplest example, just write a query result into a file for proce

Re: Sending Documents via SolrServer as MapReduce Jobs at Solrj

2013-07-05 Thread Otis Gospodnetic
Furkan, It's perfectly fine. Some people have small indices and lots of queries, some have large indices and very few queries, and lucky ones have very large indices and lots of queries at the same time. We once helped a client take their indexing down from many hours to a couple of minutes by u

Re: [Announcement] Norch- a search engine for node.js

2013-07-05 Thread Jack Krupansky
And... is is based on Lucene/Solr? -- Jack Krupansky -Original Message- From: Ali, Saqib Sent: Friday, July 05, 2013 6:09 PM To: solr-user@lucene.apache.org Subject: Re: [Announcement] Norch- a search engine for node.js Very interesting. What is the upper limit on the number of docume

Re: [Announcement] Norch- a search engine for node.js

2013-07-05 Thread Ali, Saqib
Very interesting. What is the upper limit on the number of documents? Thanks! :) On Fri, Jul 5, 2013 at 11:53 AM, Fergus McDowall wrote: > Here is some news that might be of interest to users and implementers of > Solr > > > http://blog.comperiosearch.com/blog/2013/07/05/norch-a-search-engine-f

Re: Sending Documents via SolrServer as MapReduce Jobs at Solrj

2013-07-05 Thread Furkan KAMACI
Ok, I know that it is really unnecessary to start a complex design. On the other hand if your resources and needs are adequate and if you have a bottleneck at your design it is really a fail not to plan a new design. We have more than terabytes of data and we have dedicated some developers at Hado

Re: any plans to remove int32 limitation on the number of the documents in the index?

2013-07-05 Thread Valery Giner
As a simplest example, just write a query result into a file for processing by external programs (the programs are out of our control, and the result could contain millions of docs) Thanks, Val On 07/05/2013 04:41 PM, Walter Underwood wrote: What are you doing that start=50 is normal? --

solrj distributed solr example

2013-07-05 Thread Ali, Saqib
Hello all, Can anyone please share a solrj example for distributed solr? Thanks! :)

Re: any plans to remove int32 limitation on the number of the documents in the index?

2013-07-05 Thread Walter Underwood
What are you doing that start=50 is normal? --wunder On Jul 5, 2013, at 1:28 PM, Valery Giner wrote: > Eric, > > We did not have any RAM problems, but just the following official limitation > makes our life too miserable to use the shards: > > "Makes it more inefficient to use a high "sta

Re: any plans to remove int32 limitation on the number of the documents in the index?

2013-07-05 Thread Valery Giner
Eric, We did not have any RAM problems, but just the following official limitation makes our life too miserable to use the shards: "Makes it more inefficient to use a high "start" parameter. For example, if you request start=50&rows=25 on an index with 500,000+ docs per shard, this will

Re: Filter cache pollution during sharded edismax queries

2013-07-05 Thread Otis Gospodnetic
Hi Ken, Uh, I left this email until now hoping I could find you a reference to similar reports, but I can't find them now. I am quite sure I saw somebody with a similar report within the last month. Plus, several people have reported issues with performance dropping when they went from 3.x to 4.

Re: Sorting

2013-07-05 Thread Jack Krupansky
And don't forget to test with sortable DocValues. I mean, sorting (and faceting) was one of the main motivations for DocValues. -- Jack Krupansky -Original Message- From: Otis Gospodnetic Sent: Friday, July 05, 2013 3:42 PM To: solr-user@lucene.apache.org Subject: Re: Sorting Hi Kowi

Re: Sorting

2013-07-05 Thread Otis Gospodnetic
Hi Kowish, Here is an easy way to find out: 1 use copyField to copy from string to tlong 2 use ab or JMeter to hammer Solr while sorting on one or the other field (separate runs) 3 compare :) Since you have SLAs, I'm assuming you already have 2 and 3 in place. Otis -- Solr & ElasticSearch Suppor

[Announcement] Norch- a search engine for node.js

2013-07-05 Thread Fergus McDowall
Here is some news that might be of interest to users and implementers of Solr http://blog.comperiosearch.com/blog/2013/07/05/norch-a-search-engine-for-node-js/ Norch (http://fergiemcdowall.github.io/norch/) is a search engine written for Node.js. Norch uses the Node search-index module which is i

Re: Sending Documents via SolrServer as MapReduce Jobs at Solrj

2013-07-05 Thread Roman Chyla
I don't want to sound negative, but I think it is a valid question to consider - for the lack of information and certain mental rigidity may make it sound bad - first of all, it is probably not for few gigabytes of data and I can imagine that building indexes at the side when data lives is much fas

Re: Surprising score?

2013-07-05 Thread Jason Hellman
Also considering using the SweetSpotSimilarityFactory class which allows to to still engage normalization but control how intrusive it is. This, combined with the ability to set a custom Similarity class on a per-fieldType basis may be extremely useful. More info: http://lucene.apache.org/sol

Re: Sending Documents via SolrServer as MapReduce Jobs at Solrj

2013-07-05 Thread Jack Krupansky
Software developers are sometimes compensated based on the degree of complexity that they deal with. And managers are sometimes compensated based on the number of people they manage, as well as the degree of complexity of what they manage. And... training organizations can charge more and hav

Re: Sending Documents via SolrServer as MapReduce Jobs at Solrj

2013-07-05 Thread Walter Underwood
Why is it better to require another large software system (Hadoop), when it works fine without it? That just sounds like more stuff to configure, misconfigure, and cause problems with indexing. wunder On Jul 5, 2013, at 4:48 AM, Furkan KAMACI wrote: > We are using Nutch to crawl web sites and

Re: Invalid version (expected 2, but 60) or the data in not in 'javabin' format

2013-07-05 Thread Walter Underwood
Since it works to fetch 10K rows and doesn't work to fetch 100K rows in a single request, I very strongly suggest that you use the request that work. Make ten requests of 10K rows each. Or even better, 100 requests of 1K rows each. Large requests make large memory demands. wunder On Jul 5, 20

RE: Solr 4.3 Master/Slave Issues

2013-07-05 Thread Cool Techi
The normal tomcat shutdown doesn't stop the server and take a long time, so i do issue a kill -9 command. Any other suggestion to do this without the locking. I would initiate a backup again and send the logs. regards, Ayush > Date: Fri, 5 Jul 2013 19:40:12 +0530 > Subject: Re: Solr 4.3 Master/

Re: [Solr 4.2] deleteInstanceDir is added to CoreAdminHandler but is not supported in Unload CoreAdminRequest

2013-07-05 Thread Shalin Shekhar Mangar
SolrJ doesn't have explicit support for that param but you can always add it yourself. For example: CoreAdminRequest.Unload req = new CoreAdminRequest.Unload(false); ((ModifiableSolrParams) req.getParams()).set("deleteInstanceDir", true); req.process(server); On Thu, Jul 4, 2013 at 12:50 PM, Lyub

Re: Invalid version (expected 2, but 60) or the data in not in 'javabin' format

2013-07-05 Thread Shalin Shekhar Mangar
Oops I actually meant to say that search engines *are not* optimized for large pages. See https://issues.apache.org/jira/browse/SOLR-1726 Well one of the shards involved in the request is throwing an error. Check the logs of your shards. You can also add a shards.info=true param to your search whi

Re: Concurrent Modification Exception

2013-07-05 Thread adityab
well our observation leads us that this happens only during spell check. If we turn off the spell check we don't see this issue occurring at all from our 24hrs test run. We have Jboss5.1 in production running Solr 4.2.1 (without spellcheck) no issues at all. Aditya -- View this message in co

Re: Invalid version (expected 2, but 60) or the data in not in 'javabin' format

2013-07-05 Thread eakarsu
Thanks for your answer, I can fetch 10K documents without any issue. I don't think we are having out of memory exception because each tomcat server in cluster has 8GB memory allocated. -- View this message in context: http://lucene.472066.n3.nabble.com/Invalid-version-expected-2-but-60-or-the-

Sorting

2013-07-05 Thread kowish.adamosh
Hi, What should be faster: sorting by field of type string (solr.StrField) or long (solr.TrieLongField). In both cases values are numbers so I can decide what type of field to use. Is it possible to speed up sorting by unique field? With sorting my queries are 10-100 times slower and I can't meet

Re: ClassNotFoundException regarding SolrInfoMBean under Tomcat 7

2013-07-05 Thread Shalin Shekhar Mangar
Okay so just for the rest of the people who dig up this thread. You had to put all the extra jar files required by typo3 into WEB-INF/lib to make this work. Is that right? On Fri, Jul 5, 2013 at 8:03 PM, Michael Bakonyi wrote: > Hi Shalin, > > Am 05.07.2013 um 16:23 schrieb Shalin Shekhar Mangar:

Re: Invalid version (expected 2, but 60) or the data in not in 'javabin' format

2013-07-05 Thread Shalin Shekhar Mangar
Can you try to fetch a smaller number of documents? Search engines are optimized for returning large pages. My guess is that one of the shards is returning an error (maybe an OutOfMemoryError) for this query. On Fri, Jul 5, 2013 at 7:56 PM, eakarsu wrote: > I am using Solr 4.3.1 on solrcloud with

Re: ClassNotFoundException regarding SolrInfoMBean under Tomcat 7

2013-07-05 Thread Michael Bakonyi
Hi Shalin, Am 05.07.2013 um 16:23 schrieb Shalin Shekhar Mangar: > There are plenty of use-cases for having multiple cores. You may have > two different schemas for two different kind of documents. Perhaps you > are indexing content in multiple languages and you may want a core per > language. In

Re: ClassNotFoundException regarding SolrInfoMBean under Tomcat 7

2013-07-05 Thread Michael Bakonyi
Hi Giovanni, damn, you were right! I would have never hit on that! Indeed I copied a jar into that dir as in one post I found somebody recommended that. Thx a lot for your help, now I have a look at the next error which appears ;) Cheers, Michael Am 05.07.2013 um 15:25 schrieb Giovanni Bri

Invalid version (expected 2, but 60) or the data in not in 'javabin' format

2013-07-05 Thread eakarsu
I am using Solr 4.3.1 on solrcloud with 10 nodes. I added 3 million documents from a csv file with this command curl 'http://localhost:8080/solr/trcollection2/update/csv?stream.file=/home/hduser/csvFile.csv&skipLines=1&fieldnames=,cache,segment,digest,tstamp,lang,url,,content,id,title,boost&stre

Re: ClassNotFoundException regarding SolrInfoMBean under Tomcat 7

2013-07-05 Thread Shalin Shekhar Mangar
On Thu, Jul 4, 2013 at 4:32 PM, Michael Bakonyi wrote: > Hi everyone, > > I'm trying to get the CMS "TYPO3" connected with Solr 3.6.2. > > By now I followed the installation at http://wiki.apache.org/solr/SolrTomcat > except that I didn't copy the .war-file into the $SOLR_HOME but referencing >

RE: Question about weighted spell check

2013-07-05 Thread Dyer, James
The current implementation doesn't sort strictly on hit-counts. Rather it gives you collations that have corrections with thenearest distance from the original terms. Sorting on query result score sounds like an interesting and do-able alternative, although not supported currently. The cave

Re: Solr 4.3 Master/Slave Issues

2013-07-05 Thread Shalin Shekhar Mangar
On Fri, Jul 5, 2013 at 6:14 PM, Cool Techi wrote: > > 1) That was my initial suspicion, but when I run ps -aux | grep "java", but > there it doesn't show any other program running. I kill the process and start > again and it locks. How are you killing the process? A SIGKILL will leave a lock fi

Re: ClassNotFoundException regarding SolrInfoMBean under Tomcat 7

2013-07-05 Thread Giovanni Bricconi
I saw something similar when I placed some jar in tomcat/lib (data import handler), the right place was instead WEB-INF/lib. I would try placing al needed jars there. 2013/7/5 Michael Bakonyi > Hm, can't anybody help me out? I still can't get my installation run > correctly ... > > What I've fo

Re: AW: Surprising score?

2013-07-05 Thread pravesh
>>Is there a way to omitNorms and still be able to use {!boost b=boost} ? OR you could let /omitNorms="false"/ as usual and have your custom Similarity implementation with the length normalization method overridden for using a constant value of 1. Regards Pravesh -- View this message in con

AW: Surprising score?

2013-07-05 Thread Lochschmied, Alexander
Thanks Jeroen and Upayavira! I read the warning about losing the ability to use index time boosts when I disable length normalization. And we actually use it; at least if it means having a boost field in the index and doing queries like this: "{!boost b=boost}( series:RCWP^10 OR otherFileds:que

RE: Solr 4.3 Master/Slave Issues

2013-07-05 Thread Cool Techi
1) That was my initial suspicion, but when I run ps -aux | grep "java", but there it doesn't show any other program running. I kill the process and start again and it locks. 2) When we fire backup on Slave, the whole core hangs after a while and also replication stops. This was not happening w

Re: ClassNotFoundException regarding SolrInfoMBean under Tomcat 7

2013-07-05 Thread Michael Bakonyi
Hm, can't anybody help me out? I still can't get my installation run correctly ... What I've found out recently – if I understand it aright: SolrInfoMBean has somehow to do with JMX. So I manually activated JMX via inserting within my solrconfig.xml as described here: http://wiki.apache.org/

Sending Documents via SolrServer as MapReduce Jobs at Solrj

2013-07-05 Thread Furkan KAMACI
We are using Nutch to crawl web sites and it stores documents at Hbase. Nutch uses Solrj to send documents to be indexed. We have Hadoop at our ecosystem as well. I think that there should be an implementation at Solrj that sends documents (via CloudSolrServer or something like that) as MapReduce j

Re: Solr 4.3 Master/Slave Issues

2013-07-05 Thread Shalin Shekhar Mangar
This can mean multiple things: 1. You had killed a solr process earlier which left the lock file in place 2. You have more than one Solr core pointing to the same data directory 3. A solr process is already running and you are trying to start another one with the same config. On Fri, Jul 5, 2013 a

Solr 4.3 Master/Slave Issues

2013-07-05 Thread Cool Techi
We have set up solr 4.3 with master/setup and are facing a couple of issues, Index locking, the index on slave hangs at time and when we restart the core the core get's locked up. I have checked the logs and there are no OOM error or anything else other than the error given below,Caused by: or