data in yahoo / facebook hdfs

2009-06-13 Thread PORTO aLET
Hi, I am just wondering what do facebook/yahoo do with the data in hdfs after they finish processing the log files or whatever that are in hdfs? Are they simply deleted? or get backed up in tape ? whats the typical process? Also what is the process of adding a new node to the hadoop cluster?

hadoop hardware suggestion

2009-06-09 Thread PORTO aLET
Hi, I am trying to setup a hadoop cluster to process our apache log (about 500MB a day). I am just not sure what kind of pc configuration I should use? We have a few windows xp machines (about 100+, too many to process 'just' 500MB ?) that I am thinking of using sparingly (during the night) to

hadoop MapReduce and stop words

2009-05-16 Thread PORTO aLET
Hi, I am trying to include the stop words into hadoop map reduce, and later on, into hive. What is the accepted solution regarding the stop words in hadoop? All I can think is to load all the stop words into an array in the mapper, and then check each token against the stop words..(this would be

Re: hadoop MapReduce and stop words

2009-05-16 Thread PORTO aLET
did similar with polygon indexes and point data. It requires careful memory planning on the nodes if the indexes are large (mine were several GB). Just a thought, Tim On Sat, May 16, 2009 at 1:56 PM, PORTO aLET portoa...@gmail.com wrote: Hi, I am trying to include the stop words

Indexing pdfs and docs

2009-05-14 Thread PORTO aLET
Hi, My company has about 50GB of pdfs and docs, and we would like to be able to do some text search over a web interface. Is there any good tutorial that specifies hardware requirements and software specs to do this? Regards

Make money from Hadoop ?

2009-05-08 Thread PORTO aLET
Hi All, Just wondering if anybody has any idea about making money from using hadoop? i.e. found a company that provides DFS/MapReduce service ? or something like that? Or maybe something else?