subject:"streamed splitting"

Re: streamed splitting

2015-03-13 Thread Siddharth Seth

Johannes, Couple of questions - do you happen to know why the splits are actually taking 21 minutes to generate? Is the namenode overloaded, or is it just the large number of files ? Is the input format (assuming you're using an inputFormat for splits) going beyond analyzing block boundaries to

RE: streamed splitting

2015-03-12 Thread Bikas Saha

That's not it. Please open a new one. Thanks! -Original Message- From: Johannes Zillmann [mailto:jzillm...@googlemail.com] Sent: Thursday, March 12, 2015 1:14 PM To: user@tez.apache.org Subject: Re: streamed splitting So.. its complex ;) Regarding the jira, closest thing i fou

Re: streamed splitting

2015-03-12 Thread Johannes Zillmann

So.. its complex ;) Regarding the jira, closest thing i found is https://issues.apache.org/jira/browse/TEZ-1166 Should i add to this or create a new one ? Johannes > On 12 Mar 2015, at 15:44, Hitesh Shah wrote: > > Hello Johannes, > > This is something we have discussed quite often but have

Re: streamed splitting

2015-03-12 Thread Hitesh Shah

Hello Johannes, This is something we have discussed quite often but have not got around to implementing this. There might be an open jira related to “pipelining” of splits. If you cannot find it, please go ahead and create one. The general issues with these are: - how to handle dynamic crea

Re: streamed splitting

2015-03-12 Thread Johannes Zillmann

Hey Jeff, so one scenario i recently encountered was an job on about 300.000 files in hdfs. The splitting alone took 21 minutes. So i thought until the splitting is completed completely the a lot of splits could have already been processed… thanks for you answer! Johannes > On 12 Mar 2015, at

Re: streamed splitting

2015-03-12 Thread Jianfeng (Jeff) Zhang

HI Johannes, If the input-initlizeer is not done, workers can not be started. What¹s your scenario ? Why do you want to start the workers before splitting is generated ? Just save the launch time or let the worker to do other stuff ? Best Regard, Jeff Zhang On 3/12/15, 5:38 PM, "Johannes Z

streamed splitting

2015-03-12 Thread Johannes Zillmann

Hey guys, dump question. With Tez can i have a input-initializaer which don’t require to create every split before starting the processing of already created splits ? Means if i have a lot of splits and my splitting process takes a long time, can the workers start working already while still doi

Re: streamed splitting

RE: streamed splitting

Re: streamed splitting

Re: streamed splitting

Re: streamed splitting

Re: streamed splitting

streamed splitting

7 matches

Site Navigation

Mail list logo

Footer information