hi:
1.nutch will not copy files automaticlly for you.you can control it by program.
2.nutch use distribute search to perform search.what technology nutch
uses -i don't known exactly.but you can easy configure a distributed
search.
you you can find more information at http://wiki.apache.org/nutch/FrontPage

On Wed, Jun 30, 2010 at 6:47 PM, 罗磊 <luole...@gmail.com> wrote:
> Hi:
>
> Though I'm really not good at English, I still prefer English to let others
> know what we are talking about.
>
> As cn.jiangmingy...@gmail.com said, Nutch will copy the index file to native
> filesystem. Could you tell me  what technology Nutch use to search? Is RMI
> or something else used?
>
> Thanks
>
> 2010/6/30 蒋明原 <cn.jiangmingy...@gmail.com>
>
>> hi luo:
>>
>>
>>  
>> Nutch使用的确是Lucene索引,不过将索引放在HDFS上面是为了利用Hadoop平台的计算性能对索引进行合并等一些操作。在hadoop平台上进行这些操作比单机处理强很多。处理完成之后,可以将索引下载到本地进行访问,并不是提供搜索服务的时候也是在hdfs上面的。
>>
>> 使用mapfile sequencefile是为了利用Hadoop平台处理数据,最终生成索引。mapfile
>>
>> sequencefile并不是索引存储的方式,里面存储的有原始数据,比如网页源码......(这点我说的只是大概意思,可以参考Hadoop权威指南关于mapfile
>> sequencefile的介绍,了解他们的特性)
>> On Wed, Jun 30, 2010 at 10:06 AM, 罗磊 <luole...@gmail.com> wrote:
>> > Hi all:
>> >
>> > I heard that Nutch put Lucene index file on HDFS, and wait for searcher.
>> As
>> > far as I know, HDFS is not designed for low-latency visiting.
>> >
>> > So why Nutch put index file on HDFS? why not stored on local filesystem,
>> and
>> > use normally RPC to call search function?
>> >
>> > I also heard that Nutch used MapFile a lot, do you think put those data
>> on
>> > HBase is a good alternative?
>> >
>> > Thank you in advance
>> >
>>
>

Reply via email to