Re: how to parse a sequence file in my local filesystem

2011-06-27 Thread ling cao
Maybe i didn't describe my question clearly, i know the hadoop fs command can do it but i need to parse it without hdfs environment, the file is on my disk(for example: D://test.seq), and how to write a java class to parse it? 2011/6/27 Joey Echeverria > If the data is text you can always prin

Loading seq file into hive

2011-06-27 Thread Mapred Learn
Hi, I have seq files with key as line number and value is ctrl B delimited text. a sample value is: 45454^B567^Brtrt^B-7.8 56577^B345^Bdrtd^B-0.9 when I create a table like: create table temp_seq (no. int, code string, rank string, amt string) row format delimited fields terminated by '\002' lines

Re: How to select random n records using mapreduce ?

2011-06-27 Thread David Rosenstrauch
Building on this, you could do something like the following to make it more random: if (numRecordsWritten < NUM_RECORDS_DESIRED) { int n = generateARandomNumberBetween1and100(); if (n == 100) { context.write(key, value); } } The above would somewhat rando

Re: How to select random n records using mapreduce ?

2011-06-27 Thread Anthony Urso
On Mon, Jun 27, 2011 at 12:11 AM, Jeff Zhang wrote: > > Hi all, > I'd like to select random N records from a large amount of data using > hadoop, just wonder how can I archive this ? Currently my idea is that let > each mapper task select N / mapper_number records. Does anyone has such > experien

Re: How to select random n records using mapreduce ?

2011-06-27 Thread Niels Basjes
The only solution I can think of is by creating a counter in Hadoop that is incremented each time a mapper lets a record through. As soon as the value reaches a preselected value the mappers simply discard the additional input they receive. Note that this will not at all be random yet it's the

Re: Resend -> how to load sequence file with decimal data

2011-06-27 Thread Mapred Learn
Hi Steven, With load data you give some info about data also. As in Tom' White's book: create external table external_table(dummy string) location load data Now dummy string is a field in this data. Similarly, what I have is a dcimal field. How do I specify it in the create command ? On

Re: how to parse a sequence file in my local filesystem

2011-06-27 Thread Joey Echeverria
If the data is text you can always print out the sequence file using this command: hadoop fs -text file:///my/directory/file.seq This will parse the sequence file, convert each key and value to a string and print it to stdout. Notice the file:// in the path, that will cause hadoop to access the l

how to parse a sequence file in my local filesystem

2011-06-27 Thread ling cao
hi i have a small sequence file (about 1k) which is produced by a hive job, and i need to parse it in my local filesystem,not in hdfs is there any easy way to do it ? Thanks

How to select random n records using mapreduce ?

2011-06-27 Thread Jeff Zhang
Hi all, I'd like to select random N records from a large amount of data using hadoop, just wonder how can I archive this ? Currently my idea is that let each mapper task select N / mapper_number records. Does anyone has such experience ? -- Best Regards Jeff Zhang