Re: Need Help: The problem with text key of MapFile

2009-10-28 Thread Jeff Zhang
Hi Wang, The keys of MapFile should be in order, so when you add records into MapFile, you should make sure you insert them in order Best Regards, Jeff Zhang On Wed, Oct 28, 2009 at 4:14 PM, lei wang wrote: > Hi, friends > I need store the web pages(a huge one) in the MapFile of the hadoop,

Re: Need Help: The problem with text key of MapFile

2009-10-28 Thread lei wang
but now, "url" is not in order, must the key be intwritable ? should it be comparable ? How to make sure them in order?sort it first? I just want to insert the pages for random acess by "url ". On Wed, Oct 28, 2009 at 4:26 PM, Jeff Zhang wrote: > Hi Wang, > > The keys of MapFile should be in

Re: Need Help: The problem with text key of MapFile

2009-10-28 Thread Jeff Zhang
I do not know why you need use MapFile, could you use SequenceFile instead ? The MapFile's advantage is its read performance, because it build index on its keys. So its keys must be in order. If you really want to use MapFile, you can first write your data to SequenceFile and then covert it to Ma

Re: Need Help: The problem with text key of MapFile

2009-10-28 Thread lei wang
hi,juff, thanks for your comments. I did read this book early, I use MapFile to store my web pages for random access. First I think the SquenceFile conversion as a solution, howerve, the problem is that I need append the new pages to the MapFile by minute or second, so I didn't think SquenceFile

Re: Need Help: The problem with text key of MapFile

2009-10-28 Thread Jeff Zhang
I guess maybe HBase will be fit for you. HBase is a distributed database built upon Hadoop. You can use the url as the row key and put other fields into columns. then you can retrieve the web page through HBase Client API and insert new web page into it. The performance of HBase 0.20 is good eno

Re: Need Help: The problem with text key of MapFile

2009-10-28 Thread lei wang
Oh, I have tried hbase in the early. But I think HDFS may give me a choice. Thanks. On Thu, Oct 29, 2009 at 10:16 AM, Jeff Zhang wrote: > I guess maybe HBase will be fit for you. HBase is a distributed database > built upon Hadoop. > You can use the url as the row key and put other fields int

RE: Need Help: The problem with text key of MapFile

2009-10-28 Thread Lori Ann Martin
heck out www.HiQube.com or www.pbsgridworks.com -Original Message- From: lei wang [mailto:hadoopmaill...@gmail.com] Sent: Wednesday, October 28, 2009 7:22 PM To: general@hadoop.apache.org Subject: Re: Need Help: The problem with text key of MapFile Oh, I have tried hbase in the early

Re: Need Help: The problem with text key of MapFile

2009-10-28 Thread lei wang
8, 2009 7:22 PM > To: general@hadoop.apache.org > Subject: Re: Need Help: The problem with text key of MapFile > > Oh, I have tried hbase in the early. > But I think HDFS may give me a choice. > Thanks. > > On Thu, Oct 29, 2009 at 10:16 AM, Jeff Zhang wrote: > > >