date:20091006

Re: FileSystem Caching in Hadoop

2009-10-06 Thread Todd Lipcon

I think this is the wrong angle to go about it - like you mentioned in your first post, the Linux file system cache *should* be taking care of this for us. That it is not is a fault of the current implementation and not an inherent problem. I think one solution is HDFS-347 - I'm putting the finish

Re: Creating Lucene index in Hadoop

2009-10-06 Thread ctam

hi Ning , I am also looking at different approaches on indexing with hadoop , I could index using contrib package for hadoop into HDFS but since its not designed for random access what would be the other recommended ways to move them to Local file system Also what would be the best approach to b

Re: A question on dfs.safemode.threshold.pct

2009-10-06 Thread Manhee Jo

Now it's clear. Thank you, Raghu. But if you set it to 1.1, the safemode is permanent :). Thanks, Manhee - Original Message - From: "Raghu Angadi" To: Sent: Wednesday, October 07, 2009 10:03 AM Subject: Re: A question on dfs.safemode.threshold.pct I am not sure what the real concer

Re: A question on dfs.safemode.threshold.pct

2009-10-06 Thread Tsz Wo (Nicholas), Sze

If I remember correctly, Having dfs.safemode.threshold.pct = 1 may lead to a problem that the Namenode is not leaving safemode because of floating point round off errors. Having dfs.safemode.threshold.pct > 1 means that Namenode can never exit safemode since it is not achievable. Nicholas Sze

Re: A question on dfs.safemode.threshold.pct

2009-10-06 Thread Raghu Angadi

I am not sure what the real concern is... You can set it to 1.0 (or even 1.1 :)) if you prefer. Many admins do. Raghu. On Tue, Oct 6, 2009 at 5:20 PM, Manhee Jo wrote: > Thank you, Raghu. > Then, when the percentage is below 0.999, how can you tell > if some datanodes are just slower than other

Re: Having multiple values in Value field

2009-10-06 Thread akshaya iyengar

Thanks Tom. The link really was helpful. the CSVs were getting nasty to handle. On Tue, Oct 6, 2009 at 12:27 PM, Tom Chen wrote: > Hi Akshaya, > > Take a look at the yahoo hadoop tutorial for custom data types. > > http://developer.yahoo.com/hadoop/tutorial/module5.html#types > > It's actually

Reading a block of data in Map function

2009-10-06 Thread akshaya iyengar

I am wondering how to read a block of data in Map. I have a file with a single number on every line and I wish to calculate some statistics. Once the file is divided into blocks and sent to different nodes by hadoop, is it possible to read a chunk of the data in each map function? Right now each m

Re: FileSystem Caching in Hadoop

2009-10-06 Thread Edward Capriolo

On Tue, Oct 6, 2009 at 6:12 PM, Aaron Kimball wrote: > Edward, > > Interesting concept. I imagine that implementing "CachedInputFormat" over > something like memcached would make for the most straightforward > implementation. You could store 64MB chunks in memcached and try to retrieve > them from

Re: A question on dfs.safemode.threshold.pct

2009-10-06 Thread Manhee Jo

Thank you, Raghu. Then, when the percentage is below 0.999, how can you tell if some datanodes are just slower than others or some of the data blocks are lost? I think "percentage 1" should have speacial meaning like it guarantees integrity of data in HDFS. If it's below 1, then the integrity is

Re: Locality when placing Map tasks

2009-10-06 Thread Aaron Kimball

Map tasks are generated based on InputSplits. An InputSplit is a logical description of the work that a task should use. The array of InputSplit objects is created on the client by the InputFormat. org.apache.hadoop.mapreduce.InputSplit has an abstract method: /** * Get the list of nodes by n

Re: FileSystem Caching in Hadoop

2009-10-06 Thread Aaron Kimball

Edward, Interesting concept. I imagine that implementing "CachedInputFormat" over something like memcached would make for the most straightforward implementation. You could store 64MB chunks in memcached and try to retrieve them from there, falling back to the filesystem on failure. One obvious po

RE: Custom Record Reader Example?

2009-10-06 Thread Omer Trajman

I found the DBRecordReader a good example - it's in o.a.h.m.lib.db.DBInputFormat -Original Message- From: Mark Vigeant [mailto:mark.vige...@riskmetrics.com] Sent: Tuesday, October 06, 2009 5:22 PM To: common-user@hadoop.apache.org Subject: Custom Record Reader Example? Hey- I'm trying t

Custom Record Reader Example?

2009-10-06 Thread Mark Vigeant

Hey- I'm trying to update a custom recordreader written for 0.18.3 and was wondering if either A) Anyone has any example code for extending RecordReader in 0.20.1 (in the mapreduce package, not the mapred interface)? or B) Anyone can give me tips on how to write getCurrentKey() and g

state of the art WebDAV + HDFS

2009-10-06 Thread brien colwell

hi all, What would you consider the state of the art for WebDAV integration with HDFS? I'm having trouble discerning the functionality that aligns with each patch on HDFS-225 (https://issues.apache.org/jira/browse/HDFS-225) . I've read some patches do not support write operations. Not sure if

First Boston Hadoop Meetup, Wed Oct 28th

2009-10-06 Thread Dan Milstein

'lo all, We're starting a Boston Hadoop Meetup (finally ;-) -- first meeting will be on Wednesday, October 28th, 7 pm, at the HubSpot offices: http://www.meetup.com/bostonhadoop/ (HubSpot is at 1 Broadway, Cambridge on the fifth floor. There Will Be Food.) I'm stealing the organizing pl

Re: Having multiple values in Value field

2009-10-06 Thread Tom Chen

Hi Akshaya, Take a look at the yahoo hadoop tutorial for custom data types. http://developer.yahoo.com/hadoop/tutorial/module5.html#types It's actually quite easy to create your own types and stream them. You can use the void readFields(DataInput in); void write(DataOutput out); methods to

Re: A question on dfs.safemode.threshold.pct

2009-10-06 Thread Raghu Angadi

Yes, it is mostly geared towards replication greater than 1. One of the reasons for waiting for this threshold is to avoid HDFS starting unnecessary replications of blocks at the start up when some of the datanodes are slower to start up. When the replication is 1, you don't have that issue. A blo

FileSystem Caching in Hadoop

2009-10-06 Thread Edward Capriolo

After looking at the HBaseRegionServer and its functionality, I began wondering if there is a more general use case for memory caching of HDFS blocks/files. In many use cases people wish to store data on Hadoop indefinitely, however the last day,last week, last month, data is probably the most acti

RE: Having multiple values in Value field

2009-10-06 Thread Amogh Vasekar

>> You can always pass them as comma delimited strings Which would be pretty expensive per right? Would avro be looking into solving such problems? Amogh -Original Message- From: Jason Venner [mailto:jason.had...@gmail.com] Sent: Tuesday, October 06, 2009 11:33 AM To: common-user@hadoo

Re: FileSystem Caching in Hadoop

Re: Creating Lucene index in Hadoop

Re: A question on dfs.safemode.threshold.pct

Re: A question on dfs.safemode.threshold.pct

Re: A question on dfs.safemode.threshold.pct

Re: Having multiple values in Value field

Reading a block of data in Map function

Re: FileSystem Caching in Hadoop

Re: A question on dfs.safemode.threshold.pct

Re: Locality when placing Map tasks

Re: FileSystem Caching in Hadoop

RE: Custom Record Reader Example?

Custom Record Reader Example?

state of the art WebDAV + HDFS

First Boston Hadoop Meetup, Wed Oct 28th

Re: Having multiple values in Value field

Re: A question on dfs.safemode.threshold.pct

FileSystem Caching in Hadoop

RE: Having multiple values in Value field

19 matches

Site Navigation

Mail list logo

Footer information