date:20100924

A new way to merge up those small files!

2010-09-24 Thread Edward Capriolo

Many times a hadoop job produces a file per reducer and the job has many reducers. Or a map only job one output file per input file and you have many input files. Or you just have many small files from some external process. Hadoop has sub optimal handling of small files. There are some ways to han

Re: do you need to call super in Mapper.Context.setup()?

2010-09-24 Thread Lance Norskog

Maybe. Maybe not. Maybe not this time, but next time yes. It's just bulletproofing, like checking for nulls everywhere. Mark Kerzner wrote: Hi, any need for this, protected void setup(Mapper.Context context) throws IOException, InterruptedException { super.setup(context); // TODO - d

tasktracker takes long time to start?

2010-09-24 Thread jiang licht

I noticed a tasktracker (not a new node) starts to do hearbeating with job tracker after couple of minutes. What can cause this? Here's log output when a task tracker is restarted after a clean shutdown of the cluster but it hangs for 9 minutes (network connection was good when the tt started):

Re: jdbc in Hadoop mapper

2010-09-24 Thread Shi Yu

You are right. I ran stop-all.sh and start-all.sh, now it works fine. Thanks! On 2010-9-24 15:30, Harsh J wrote: My guess: Either you haven't put the JAR in all the tasktracker machines, or you have not restarted your tasktrackers and jobtracker after doing so. On Sat, Sep 25, 2010 at 1:47 AM,

Proper blocksize and io.sort.mb setting when using compressed LZO files

2010-09-24 Thread pig

Hello, We just recently switched to using lzo compressed file input for our hadoop cluster using Kevin Weil's lzo library. The files are pretty uniform in size at around 200MB compressed. Our block size is 256MB. Decompressed the average LZO input file is around 1.0GB. I noticed lots of our jo

Re: jdbc in Hadoop mapper

2010-09-24 Thread Harsh J

My guess: Either you haven't put the JAR in all the tasktracker machines, or you have not restarted your tasktrackers and jobtracker after doing so. On Sat, Sep 25, 2010 at 1:47 AM, Shi Yu wrote: > Hi, > > I tried to combine in memory mysql database with Mapreduce to do some value > exchanges. In

jdbc in Hadoop mapper

2010-09-24 Thread Shi Yu

Hi, I tried to combine in memory mysql database with Mapreduce to do some value exchanges. In the Mapper, I declare the mysql driver like this import com.mysql.jdbc.*; import java.sql.DriverManager; import java.sql.SQLException; String driver =

Re: Shuffle tasks getting killed

2010-09-24 Thread cliff palmer

I'm glad it helped Aniket. I would recommend that you start working on performance improvement with your network infrastructure and the balance of data across your logical racks.Cliff On Fri, Sep 24, 2010 at 12:12 AM, aniket ray wrote: > Hi Cliff, > > Thanks it did turn out to be speculative ex

Input splits not working correctly

2010-09-24 Thread Matthew John

Hi all , I am working on a sort function and it is working perfectly fine with a single map task. When I give 2 map tasks, the entire data is replicated twice (sorted output) . When giving 4 map tasks , it gives 4 times the sorted data. and so on I modified the Terasort for this. Major

Help for Sqlserver querying with hadoop

2010-09-24 Thread Biju .B

Hi Need urgent help on using sql server with hadoop am using following code to connect to database DBConfiguration.configureDB(conf,"com.microsoft.sqlserver.jdbc.SQLServerDriver","jdbc:sqlserver://xxx.xxx.xxx.xxx;user=abc;password=abc;DatabaseName=dbname"); String [] fields = { "id", "url" }; St

A new way to merge up those small files!

Re: do you need to call super in Mapper.Context.setup()?

tasktracker takes long time to start?

Re: jdbc in Hadoop mapper

Proper blocksize and io.sort.mb setting when using compressed LZO files

Re: jdbc in Hadoop mapper

jdbc in Hadoop mapper

Re: Shuffle tasks getting killed

Input splits not working correctly

Help for Sqlserver querying with hadoop

10 matches

Site Navigation

Mail list logo

Footer information