Put the input path like : dir1/type1*.txt
Hi,
I need a help in setting my map-reduce job to consider only certain type
of files as input in a specific directory.
For example, Suppose there is a directory dir1 and I have files like
type1_1.txt
type1_2.txt
type1_3.txt
type2_1.txt
type2_2.txt
Hi,
I need a help in setting my map-reduce job to consider only certain type
of files as input in a specific directory.
For example, Suppose there is a directory dir1 and I have files like
type1_1.txt
type1_2.txt
type1_3.txt
type2_1.txt
type2_2.txt
and If I want to consider only those files
Our ultimate goal is to basically replicate gigablast.com search engine.
They claim to have less than 500 servers that contain 10billion pages
indexed, spidered and updated on a routine basis... I am looking at
featuring 500 million pages indexed per node, and have a total of 20 nodes.
Each node
On Wed, Jun 04, 2008 at 07:56:45PM -0700, Otis Gospodnetic said:
The videos from the Hadoop summit are still not available:
http://developer.yahoo.com/blogs/hadoop/2008/04/hadoop_summit_slides_and_video.html
And at this point it looks like they never will be available :(
I followed the link
Aha, I see, I see, the videos were added to http://research.yahoo.com/node/2104
. When I checked that page last time there were only slides there. Thanks.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Chris Doherty [EMAIL PROTECTED]
Arun/John, Thanks for the update.
For security reasons, we also need to encrypt the file, there is no
support for encryption currently, so we will have to roll our own.
Again, I'd like to know if anybody here do encryption, if yes, what
algorithm and how key/password distribution is handled.
On 6/5/08 11:38 AM, Ted Dunning [EMAIL PROTECTED] wrote:
We use encryption on log files using standard AES. I wrote an input format
to deal with it.
Key distribution should be done better than we do it. My preference would
be to insert an auth key into the job conf which is then used by
Security and hadoop are not particularly compatible concepts. Things may
improve when user authentication exists. The lack of security on job confs
is the major motivation for making sure the auth is time limited. If and
when something like kerberos user authentication exists, then kerberos
On 6/5/08 11:57 AM, Ted Dunning [EMAIL PROTECTED] wrote:
Can you suggest an alternative way to communicate a secret to hadoop tasks
short of embedding it into source code?
This is one of the reasons why we use hod--job isolation such that it
helps prevent data leaks from one job to the
Our ultimate goal is to basically replicate gigablast.com search engine. They
claim to have less than 500 servers that contain 10billion pages indexed,
spidered and updated on a routine basis... I am looking at featuring 500
million pages indexed per node, and have a total of 20 nodes. Each
On Wed, Jun 4, 2008 at 6:52 PM, Arun C Murthy [EMAIL PROTECTED] wrote:
With the current compression codecs available in Hadoop (zlib/gzip/lzo) it
is not possible to split up a compressed file and then process it in a
parallel manner. However once we get bzip2 to work we could split up the
hi,
Can I use MapWritable as an output value of a Reducer ?
If yes, how will the (key, value) pairs in the MapWritable object will be
written to the file ? What output format should I use in this case ?
Further, I want to chain the output of the first map reduce job to another
map reduce job,
Yes, that is what I meant.
Not particularly good, but possibly the best we can do with hadoop (for a
while). If hadoop handles the ticket for us in a secure way, then I would
feel better.
On Thu, Jun 5, 2008 at 3:40 PM, Haijun Cao [EMAIL PROTECTED] wrote:
If and when something like kerberos
I believe the (key, value) structure is same both input and output file. In
this case, you can consider the job flow.
Like below,
JobConf confA = new JobConf(A.class);
conf.setJobName(A);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
I noticed that local bytes written/read stat in my map reduce job is
really high, 2x, 3x, 4x of the hdfs bytes.
When does hadoop mapred framework write to local fs? Is it done when the
jvm memory is not enough and data is spill to disk? how I can configure
so that it does not spill to disk?
15 matches
Mail list logo