Re: Most Common ways to load data into Hadoop in production systems

2010-07-23 Thread Gautam
On Wed, Jul 21, 2010 at 10:25 PM, Edward Capriolo wrote: > On Wed, Jul 21, 2010 at 12:42 PM, Xavier Stevens > wrote: > > Hi Urckle, > > > > A lot of the more "advanced" setups just record data directly to HDFS to > > start with. You have to write some custom code using the HDFS API but > > that

Re: Most Common ways to load data into Hadoop in production systems

2010-07-21 Thread Urckle
On 21/07/2010 17:55, Edward Capriolo wrote: On Wed, Jul 21, 2010 at 12:42 PM, Xavier Stevens wrote: Hi Urckle, A lot of the more "advanced" setups just record data directly to HDFS to start with. You have to write some custom code using the HDFS API but that way you don't need to import

Re: Most Common ways to load data into Hadoop in production systems

2010-07-21 Thread Edward Capriolo
On Wed, Jul 21, 2010 at 12:42 PM, Xavier Stevens wrote: >  Hi Urckle, > > A lot of the more "advanced" setups just record data directly to HDFS to > start with.  You have to write some custom code using the HDFS API but > that way you don't need to import large masses of data.  People also use > "

Re: Most Common ways to load data into Hadoop in production systems

2010-07-21 Thread Urckle
Hi Xavier, thanks for replying. Your input is very much appreciated! This is exactly what I need. Thanks again, Regards Enthusiastic Hadoop newbie!! :-D On 21/07/2010 17:42, Xavier Stevens wrote: Hi Urckle, A lot of the more "advanced" setups just record data directly to HDFS to start w

Re: Most Common ways to load data into Hadoop in production systems

2010-07-21 Thread Xavier Stevens
Hi Urckle, A lot of the more "advanced" setups just record data directly to HDFS to start with. You have to write some custom code using the HDFS API but that way you don't need to import large masses of data. People also use "distcp" to do large scale imports, but if you're hitting something l

Most Common ways to load data into Hadoop in production systems

2010-07-21 Thread Urckle
Hi, I have a newbie question. Scenario: Hadoop version: 0.20.2 MR coding will be done in java. Just starting out with my first Hadoop setup. I would like to know are there any best practice ways to load data into the dfs? I have (obviously) manually put data files into hdfs using the shell co