Johannes,
Couple of questions - do you happen to know why the splits are actually taking
21 minutes to generate? Is the namenode overloaded, or is it just the large
number of files ? Is the input format (assuming you're using an inputFormat for
splits) going beyond analyzing block boundaries to
That's not it. Please open a new one. Thanks!
-Original Message-
From: Johannes Zillmann [mailto:jzillm...@googlemail.com]
Sent: Thursday, March 12, 2015 1:14 PM
To: user@tez.apache.org
Subject: Re: streamed splitting
So.. its complex ;)
Regarding the jira, closest thing i fou
So.. its complex ;)
Regarding the jira, closest thing i found is
https://issues.apache.org/jira/browse/TEZ-1166
Should i add to this or create a new one ?
Johannes
> On 12 Mar 2015, at 15:44, Hitesh Shah wrote:
>
> Hello Johannes,
>
> This is something we have discussed quite often but have
Hello Johannes,
This is something we have discussed quite often but have not got around to
implementing this. There might be an open jira related to “pipelining” of
splits. If you cannot find it, please go ahead and create one.
The general issues with these are:
- how to handle dynamic crea
Hey Jeff,
so one scenario i recently encountered was an job on about 300.000 files in
hdfs.
The splitting alone took 21 minutes. So i thought until the splitting is
completed completely the a lot of splits could have already been processed…
thanks for you answer!
Johannes
> On 12 Mar 2015, at
HI Johannes,
If the input-initlizeer is not done, workers can not be started.
What¹s your scenario ? Why do you want to start the workers before
splitting is generated ? Just save the launch time or let the worker to do
other stuff ?
Best Regard,
Jeff Zhang
On 3/12/15, 5:38 PM, "Johannes Z
Hey guys,
dump question. With Tez can i have a input-initializaer which don’t require to
create every split before starting the processing of already created splits ?
Means if i have a lot of splits and my splitting process takes a long time, can
the workers start working already while still doi