Dears,
Is there any way that I can do that in other way?
I mean if you look at my main problem again you will find out that I have
two types of fields in my documents. 1) The ones that should be overwritten
on duplicates, 2) The ones that should not change during duplicates. So Is
it another way
Can you use Update operation instead of Create? Then, you can supply
only the fields that need to be changed and use atomic update to
preserve the others. But then you will have issues when you _are_
creating new documents and you do need to store all fields.
Regards,
Alex.
Personal website:
use map-reduce index solr(like the solrindexjob in nutch)
On 2014年07月07日 11:55, toothlou_t...@163.com wrote:
Hello:
I want to use hadoop and solr to index data, is there someone
can tell me how to do it?
toothlou_t...@163.com
Currently i am exploring hadoop with solr, Somewhere it is written as This
does not use Hadoop Map-Reduce to process Solr data, rather it only uses the
HDFS filesystem for index and transaction log file storage. ,
then what is the advantage of using using hadoop over local file system?
will use
Hi,
Please let me know if anyone had used google language detection for
implementing multilanguage search in one schema.
Thanks,
Poornima
On Tuesday, 1 July 2014 6:54 PM, Poornima Jay poornima...@rocketmail.com
wrote:
Hi,
Can anyone please let me know how to integrate
Updating documents will add some extra time to indexing process. (I send
the documents via apache Nutch) I prefer to make indexing as fast as
possible.
On Mon, Jul 7, 2014 at 12:05 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:
Can you use Update operation instead of Create? Then, you can
Well, let us know when you figure out a way to satisfy all your requirements.
Solr is designed for a full-document replace to be efficient at it's
primary function (search). Any workaround require some sort of
sacrifice.
Good luck,
Alex.
Personal website: http://www.outerthoughts.com/
Current
Dear Alexande,
What if I use ExternalFileFiled for the fields that I dont want to be
changed? Does that work for me?
Regards.
On Mon, Jul 7, 2014 at 2:05 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:
Well, let us know when you figure out a way to satisfy all your
requirements.
Solr is
It's an interesting thought. I haven't tried those.
But I don't think the EFFs are searchable. Do you need them to be searchable?
Regards,
Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency
On Mon, Jul 7,
Yeah, unfortunately I want it to be searchable:(
On Mon, Jul 7, 2014 at 2:23 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:
It's an interesting thought. I haven't tried those.
But I don't think the EFFs are searchable. Do you need them to be
searchable?
Regards,
Alex.
Personal
I don't see anything out of the ordinary thus far, except your heap looks a
little big. I usually run with 6-7gb. I'm wondering if maybe you're running
into a juliet pause and that's causing your sockets to time out.
Have you gathered any GC stats?
Also, what are you doing with respect to
I think this will not improve the performance of indexing but probably it
would be a solution for using HDFS HA with replication factor. But I am not
sure about that.
On Mon, Jul 7, 2014 at 12:53 PM, search engn dev sachinyadav0...@gmail.com
wrote:
Currently i am exploring hadoop with solr,
Yeah. the heap is huge, need to optimize the caches. It was 8Gb previously,
had to increase because there were out of memory errors. Using
ConcMarkSweepGC, which is supposed to not lock the world.
Had to disable optimize (previously we did so by a cron task) because the
index is big and optimize
Even i am facing same issue. AFTER doing a server restart again indexing
can run fine once, but for second time same issue.
On 3 Jul 2014 23:37, heaven aheave...@gmail.com wrote:
Hi, trying DigitalOcean for Solr, everything seems well, except sometimes I
see these errors:
OK, _where_ is that written? The HdfsDirectoryFactory code?
Someone's blog somewhere? Your notes?
Ali has one part of the answer, using HDFS will redundantly
store your index, which is good.
Furthermore, the MapReduceIndexerTool (see the contribs)
_will_ use HDFS to do the classic M/R indexing
Hi,
I indexed different types(different fields) of child docs for every parent.
I want to do facet on field in one type of child doc and after it to do
another of facet on different type of child doc. It doesn't work..
Any idea how i can do something like that?
thanks.
--
View this message
On 7/7/2014 7:30 AM, heaven wrote:
Yeah. the heap is huge, need to optimize the caches. It was 8Gb previously,
had to increase because there were out of memory errors. Using
ConcMarkSweepGC, which is supposed to not lock the world.
At one time the only thing I was using that was non-default
Hi Shawn - I was just wondering how did you resolve this issue in the end. We
are seeing the same issue with our platform (similar heap size) and updater
volume.
It would be nice if you could provide us with your final findings/configs.
--
View this message in context:
On 7/7/2014 10:22 AM, aferdous wrote:
Hi Shawn - I was just wondering how did you resolve this issue in the end. We
are seeing the same issue with our platform (similar heap size) and updater
volume.
It would be nice if you could provide us with your final findings/configs.
I use the
I have a cluster of N boxes/nodes and I'd like to add M boxes/nodes and
rebalance data accordingly.
Lets add the following constraints:
- 1. boxes have different characteristics (RAM, CPU, disks)
- 2. different number of shards per box/node (lets pretend we have
found the sweet spot for
On 7/7/2014 12:41 PM, Damien Dykman wrote:
I have a cluster of N boxes/nodes and I'd like to add M boxes/nodes
and rebalance data accordingly.
Lets add the following constraints:
- 1. boxes have different characteristics (RAM, CPU, disks)
- 2. different number of shards per box/node
Thanks Shawn, clean way to do it, indeed. And going your route, one
could even copy the existing shards into the new collection and then
delete the data which is getting reindexed on the new nodes. That would
spare reindexing everything.
But in my case, I add boxes after a noticeable
HI, I HAVE A situation where applying below search rules.
When I search columns for the full text search. Product Variant Name, the
exact match has to be in the first list and other match like , product or
variant or name or any combination will be next in the results.
Any thoughts, why
Thanks Shawn, clean way to do it, indeed. And going your route, one
could even copy the existing shards into the new collection and then
delete the data which is getting reindexed on the new nodes. That would
spare reindexing everything.
But in my case, I add boxes after a noticeable
HI, I HAVE A situation where applying below search rules.
When I search columns for the full text search. Product Variant Name,
the exact match has to be in the first list and other match like , product
or variant or name or any combination will be next in the results.
Any thoughts, why
It is written here
https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS
--
View this message in context:
http://lucene.472066.n3.nabble.com/Need-of-hadoop-tp4145846p4146033.html
Sent from the Solr - User mailing list archive at Nabble.com.
And that's exactly what it means. The HdfsDirectoryFactory is intended
to use the HDFS file system to store the Solr (well, actually Lucene)
index. It's (by default), triply redundant and vastly reduces your
chances of losing your index due to disk errors. That's what HDFS
does.
If that's not
27 matches
Mail list logo