Many times a hadoop job produces a file per reducer and the job has
many reducers. Or a map only job one output file per input file and
you have many input files. Or you just have many small files from some
external process. Hadoop has sub optimal handling of small files.
There are some ways to han
Maybe. Maybe not. Maybe not this time, but next time yes. It's just
bulletproofing, like checking for nulls everywhere.
Mark Kerzner wrote:
Hi,
any need for this,
protected void setup(Mapper.Context context) throws IOException,
InterruptedException {
super.setup(context); // TODO - d
I noticed a tasktracker (not a new node) starts to do hearbeating with job
tracker after couple of minutes. What can cause this?
Here's log output when a task tracker is restarted after a clean shutdown of
the cluster but it hangs for 9 minutes (network connection was good when the tt
started):
You are right. I ran stop-all.sh and start-all.sh, now it works fine.
Thanks!
On 2010-9-24 15:30, Harsh J wrote:
My guess: Either you haven't put the JAR in all the tasktracker
machines, or you have not restarted your tasktrackers and jobtracker
after doing so.
On Sat, Sep 25, 2010 at 1:47 AM,
Hello,
We just recently switched to using lzo compressed file input for our hadoop
cluster using Kevin Weil's lzo library. The files are pretty uniform in
size at around 200MB compressed. Our block size is 256MB. Decompressed the
average LZO input file is around 1.0GB. I noticed lots of our jo
My guess: Either you haven't put the JAR in all the tasktracker
machines, or you have not restarted your tasktrackers and jobtracker
after doing so.
On Sat, Sep 25, 2010 at 1:47 AM, Shi Yu wrote:
> Hi,
>
> I tried to combine in memory mysql database with Mapreduce to do some value
> exchanges. In
Hi,
I tried to combine in memory mysql database with Mapreduce to do some
value exchanges. In the Mapper, I declare the mysql driver like this
import com.mysql.jdbc.*;
import java.sql.DriverManager;
import java.sql.SQLException;
String driver =
I'm glad it helped Aniket. I would recommend that you start working on
performance improvement with your network infrastructure and the balance of
data across your logical racks.Cliff
On Fri, Sep 24, 2010 at 12:12 AM, aniket ray wrote:
> Hi Cliff,
>
> Thanks it did turn out to be speculative ex
Hi all ,
I am working on a sort function and it is working perfectly fine with a
single map task.
When I give 2 map tasks, the entire data is replicated twice (sorted
output) . When giving 4 map tasks , it gives 4 times the sorted data. and so
on
I modified the Terasort for this.
Major
Hi
Need urgent help on using sql server with hadoop
am using following code to connect to database
DBConfiguration.configureDB(conf,"com.microsoft.sqlserver.jdbc.SQLServerDriver","jdbc:sqlserver://xxx.xxx.xxx.xxx;user=abc;password=abc;DatabaseName=dbname");
String [] fields = { "id", "url" };
St
10 matches
Mail list logo