you don't have to copy the data to local to do a count.
%hdfs dfs -cat file1 | wc -l
will do the job
On Fri, Dec 12, 2014 at 1:58 AM, Susheel Kumar Gadalay
wrote:
>
> Simple solution..
>
> Copy the HDFS file to local and use OS commands to count no of lines
>
> cat file1 | wc -l
>
> and cut it ba
eel Kumar Gadalay <mailto:skgada...@gmail.com>
> Sent: 12/12/2014 12:00
> To: user@hadoop.apache.org <mailto:user@hadoop.apache.org>
> Subject: Re: Split files into 80% and 20% for building model and
> prediction
>
> Simple solution..
>
> Copy the HDFS file to
Try Cascading multitool: http://docs.cascading.org/multitool/2.6/
- André
On Fri, Dec 12, 2014 at 10:30 AM, unmesha sreeveni
wrote:
> I am trying to divide my HDFS file into 2 parts/files
> 80% and 20% for classification algorithm(80% for modelling and 20% for
> prediction)
> Please provide sug
How about doing something on the lines of bucketing: Pick a field that is
unique for each record and if hash of the field mod 10 is 8 or less it goes
in one bin, otherwise into the other one.
Cheers
Chris
On Dec 12, 2014 1:32 AM, "unmesha sreeveni" wrote:
> I am trying to divide my HDFS file into
Hi Unmesha
With the random approach you don't need to write the MR job for counting.
Mikael.s
-Original Message-
From: "Hitarth"
Sent: 12/12/2014 15:20
To: "user@hadoop.apache.org"
Subject: Re: Split files into 80% and 20% for building model and predicti
t() then if the number generated by random is below 0.8
>> then the row would go to key for training otherwise go to key for the test.
>> Mikael.s
>> From: Susheel Kumar Gadalay
>> Sent: 12/12/2014 12:00
>> To: user@hadoop.apache.org
>> Subject: Re: Split f
erated by random is below 0.8
> then the row would go to key for training otherwise go to key for the test.
> Mikael.s
> --
> From: Susheel Kumar Gadalay
> Sent: 12/12/2014 12:00
> To: user@hadoop.apache.org
> Subject: Re: Split files into 80%
uot;
Sent: 12/12/2014 12:00
To: "user@hadoop.apache.org"
Subject: Re: Split files into 80% and 20% for building model and prediction
Simple solution..
Copy the HDFS file to local and use OS commands to count no of lines
cat file1 | wc -l
and cut it based on line number.
Simple solution..
Copy the HDFS file to local and use OS commands to count no of lines
cat file1 | wc -l
and cut it based on line number.
On 12/12/14, unmesha sreeveni wrote:
> I am trying to divide my HDFS file into 2 parts/files
> 80% and 20% for classification algorithm(80% for modelling a
I am trying to divide my HDFS file into 2 parts/files
80% and 20% for classification algorithm(80% for modelling and 20% for
prediction)
Please provide suggestion for the same.
To take 80% and 20% to 2 seperate files we need to know the exact number of
record in the data set
And it is only known if
10 matches
Mail list logo