Re: solr dedup on specific fields

2014-07-07 Thread Ali Nazemian
Dears, Is there any way that I can do that in other way? I mean if you look at my main problem again you will find out that I have two types of fields in my documents. 1) The ones that should be overwritten on duplicates, 2) The ones that should not change during duplicates. So Is it another way

Re: solr dedup on specific fields

2014-07-07 Thread Alexandre Rafalovitch
Can you use Update operation instead of Create? Then, you can supply only the fields that need to be changed and use atomic update to preserve the others. But then you will have issues when you _are_ creating new documents and you do need to store all fields. Regards, Alex. Personal website:

Re: How to index data by hadoop and solr?

2014-07-07 Thread wanggaohang
use map-reduce index solr(like the solrindexjob in nutch) On 2014年07月07日 11:55, toothlou_t...@163.com wrote: Hello: I want to use hadoop and solr to index data, is there someone can tell me how to do it? toothlou_t...@163.com

Need of hadoop

2014-07-07 Thread search engn dev
Currently i am exploring hadoop with solr, Somewhere it is written as This does not use Hadoop Map-Reduce to process Solr data, rather it only uses the HDFS filesystem for index and transaction log file storage. , then what is the advantage of using using hadoop over local file system? will use

Re: Language detection for solr 3.6.1

2014-07-07 Thread Poornima Jay
Hi, Please let me know if anyone had used google language detection for implementing multilanguage search in one schema. Thanks, Poornima On Tuesday, 1 July 2014 6:54 PM, Poornima Jay poornima...@rocketmail.com wrote: Hi, Can anyone please let me know how to integrate

Re: solr dedup on specific fields

2014-07-07 Thread Ali Nazemian
Updating documents will add some extra time to indexing process. (I send the documents via apache Nutch) I prefer to make indexing as fast as possible. On Mon, Jul 7, 2014 at 12:05 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Can you use Update operation instead of Create? Then, you can

Re: solr dedup on specific fields

2014-07-07 Thread Alexandre Rafalovitch
Well, let us know when you figure out a way to satisfy all your requirements. Solr is designed for a full-document replace to be efficient at it's primary function (search). Any workaround require some sort of sacrifice. Good luck, Alex. Personal website: http://www.outerthoughts.com/ Current

Re: solr dedup on specific fields

2014-07-07 Thread Ali Nazemian
Dear Alexande, What if I use ExternalFileFiled for the fields that I dont want to be changed? Does that work for me? Regards. On Mon, Jul 7, 2014 at 2:05 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Well, let us know when you figure out a way to satisfy all your requirements. Solr is

Re: solr dedup on specific fields

2014-07-07 Thread Alexandre Rafalovitch
It's an interesting thought. I haven't tried those. But I don't think the EFFs are searchable. Do you need them to be searchable? Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Mon, Jul 7,

Re: solr dedup on specific fields

2014-07-07 Thread Ali Nazemian
Yeah, unfortunately I want it to be searchable:( On Mon, Jul 7, 2014 at 2:23 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: It's an interesting thought. I haven't tried those. But I don't think the EFFs are searchable. Do you need them to be searchable? Regards, Alex. Personal

Re: java.net.SocketException: Connection reset

2014-07-07 Thread Michael Della Bitta
I don't see anything out of the ordinary thus far, except your heap looks a little big. I usually run with 6-7gb. I'm wondering if maybe you're running into a juliet pause and that's causing your sockets to time out. Have you gathered any GC stats? Also, what are you doing with respect to

Re: Need of hadoop

2014-07-07 Thread Ali Nazemian
I think this will not improve the performance of indexing but probably it would be a solution for using HDFS HA with replication factor. But I am not sure about that. On Mon, Jul 7, 2014 at 12:53 PM, search engn dev sachinyadav0...@gmail.com wrote: Currently i am exploring hadoop with solr,

Re: java.net.SocketException: Connection reset

2014-07-07 Thread heaven
Yeah. the heap is huge, need to optimize the caches. It was 8Gb previously, had to increase because there were out of memory errors. Using ConcMarkSweepGC, which is supposed to not lock the world. Had to disable optimize (previously we did so by a cron task) because the index is big and optimize

Re: java.net.SocketException: Connection reset

2014-07-07 Thread santosh sidnal
Even i am facing same issue. AFTER doing a server restart again indexing can run fine once, but for second time same issue. On 3 Jul 2014 23:37, heaven aheave...@gmail.com wrote: Hi, trying DigitalOcean for Solr, everything seems well, except sometimes I see these errors:

Re: Need of hadoop

2014-07-07 Thread Erick Erickson
OK, _where_ is that written? The HdfsDirectoryFactory code? Someone's blog somewhere? Your notes? Ali has one part of the answer, using HDFS will redundantly store your index, which is good. Furthermore, the MapReduceIndexerTool (see the contribs) _will_ use HDFS to do the classic M/R indexing

Facets on Nested documents

2014-07-07 Thread adfel70
Hi, I indexed different types(different fields) of child docs for every parent. I want to do facet on field in one type of child doc and after it to do another of facet on different type of child doc. It doesn't work.. Any idea how i can do something like that? thanks. -- View this message

Re: java.net.SocketException: Connection reset

2014-07-07 Thread Shawn Heisey
On 7/7/2014 7:30 AM, heaven wrote: Yeah. the heap is huge, need to optimize the caches. It was 8Gb previously, had to increase because there were out of memory errors. Using ConcMarkSweepGC, which is supposed to not lock the world. At one time the only thing I was using that was non-default

Re: Long ParNew GC pauses - even when young generation is small

2014-07-07 Thread aferdous
Hi Shawn - I was just wondering how did you resolve this issue in the end. We are seeing the same issue with our platform (similar heap size) and updater volume. It would be nice if you could provide us with your final findings/configs. -- View this message in context:

Re: Long ParNew GC pauses - even when young generation is small

2014-07-07 Thread Shawn Heisey
On 7/7/2014 10:22 AM, aferdous wrote: Hi Shawn - I was just wondering how did you resolve this issue in the end. We are seeing the same issue with our platform (similar heap size) and updater volume. It would be nice if you could provide us with your final findings/configs. I use the

Transparently rebalancing a Solr cluster without splitting or moving shards

2014-07-07 Thread Damien Dykman
I have a cluster of N boxes/nodes and I'd like to add M boxes/nodes and rebalance data accordingly. Lets add the following constraints: - 1. boxes have different characteristics (RAM, CPU, disks) - 2. different number of shards per box/node (lets pretend we have found the sweet spot for

Re: Transparently rebalancing a Solr cluster without splitting or moving shards

2014-07-07 Thread Shawn Heisey
On 7/7/2014 12:41 PM, Damien Dykman wrote: I have a cluster of N boxes/nodes and I'd like to add M boxes/nodes and rebalance data accordingly. Lets add the following constraints: - 1. boxes have different characteristics (RAM, CPU, disks) - 2. different number of shards per box/node

Re: Transparently rebalancing a Solr cluster without splitting or moving shards

2014-07-07 Thread Damien Dykman
Thanks Shawn, clean way to do it, indeed. And going your route, one could even copy the existing shards into the new collection and then delete the data which is getting reindexed on the new nodes. That would spare reindexing everything. But in my case, I add boxes after a noticeable

Exact Match first in the list.

2014-07-07 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
HI, I HAVE A situation where applying below search rules. When I search columns for the full text search. Product Variant Name, the exact match has to be in the first list and other match like , product or variant or name or any combination will be next in the results. Any thoughts, why

Re: Transparently rebalancing a Solr cluster without splitting or moving shards

2014-07-07 Thread Shawn Heisey
Thanks Shawn, clean way to do it, indeed. And going your route, one could even copy the existing shards into the new collection and then delete the data which is getting reindexed on the new nodes. That would spare reindexing everything. But in my case, I add boxes after a noticeable

Re: Exact Match first in the list.

2014-07-07 Thread Shawn Heisey
HI, I HAVE A situation where applying below search rules. When I search columns for the full text search. Product Variant Name, the exact match has to be in the first list and other match like , product or variant or name or any combination will be next in the results. Any thoughts, why

Re: Need of hadoop

2014-07-07 Thread search engn dev
It is written here https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS -- View this message in context: http://lucene.472066.n3.nabble.com/Need-of-hadoop-tp4145846p4146033.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Need of hadoop

2014-07-07 Thread Erick Erickson
And that's exactly what it means. The HdfsDirectoryFactory is intended to use the HDFS file system to store the Solr (well, actually Lucene) index. It's (by default), triply redundant and vastly reduces your chances of losing your index due to disk errors. That's what HDFS does. If that's not