Maybe i didn't describe my question clearly, i know the hadoop fs command
can do it
but i need to parse it without hdfs environment,
the file is on my disk(for example: D://test.seq),
and how to write a java class to parse it?
2011/6/27 Joey Echeverria
> If the data is text you can always prin
Hi,
I have seq files with key as line number and value is ctrl B delimited text.
a sample value is:
45454^B567^Brtrt^B-7.8
56577^B345^Bdrtd^B-0.9
when I create a table like:
create table temp_seq (no. int, code string, rank string, amt string)
row format delimited fields terminated by '\002' lines
Building on this, you could do something like the following to make it
more random:
if (numRecordsWritten < NUM_RECORDS_DESIRED) {
int n = generateARandomNumberBetween1and100();
if (n == 100) {
context.write(key, value);
}
}
The above would somewhat rando
On Mon, Jun 27, 2011 at 12:11 AM, Jeff Zhang wrote:
>
> Hi all,
> I'd like to select random N records from a large amount of data using
> hadoop, just wonder how can I archive this ? Currently my idea is that let
> each mapper task select N / mapper_number records. Does anyone has such
> experien
The only solution I can think of is by creating a counter in Hadoop
that is incremented each time a mapper lets a record through.
As soon as the value reaches a preselected value the mappers simply
discard the additional input they receive.
Note that this will not at all be random yet it's the
Hi Steven,
With load data you give some info about data also. As in Tom' White's book:
create external table external_table(dummy string)
location
load data
Now dummy string is a field in this data. Similarly, what I have is a dcimal
field. How do I specify it in the create command ?
On
If the data is text you can always print out the sequence file using
this command:
hadoop fs -text file:///my/directory/file.seq
This will parse the sequence file, convert each key and value to a
string and print it to stdout. Notice the file:// in the path, that
will cause hadoop to access the l
hi
i have a small sequence file (about 1k) which is produced by a hive job, and
i need to parse it in my local filesystem,not in hdfs
is there any easy way to do it ?
Thanks
Hi all,
I'd like to select random N records from a large amount of data using
hadoop, just wonder how can I archive this ? Currently my idea is that let
each mapper task select N / mapper_number records. Does anyone has such
experience ?
--
Best Regards
Jeff Zhang