o/hadoop/inputformat
> >>>
> >>> I think the downside of #2 is that it hides hbase, which I think
> deserves
> >>> to be top level.
> >>>
> >>> Other comments:
> >>> It should be noted that when we have all modules use hadoop-common,
> we&
IO transform has its own hadoop dependency"
>>>
>>> On the naming discussion: I personally prefer "inputformat" as the
>>> name of
>>> the directory, but I defer to the folks who know the hadoop community
>>> more.
>>>
>>> S
>&
17, 2017 at 9:38 AM, Dipti Kulkarni <
dipti_dkulka...@persistent.com> wrote:
Thank you all for your inputs!
-Original Message-
From: Dan Halperin [mailto:dhalp...@google.com.INVALID]
Sent: Friday, February 17, 2017 12:17 PM
To: dev@beam.apache.org
Subject: Re: Merge HadoopInputFormat
ub.com/apache/beam/pull/2087
On Fri, Feb 17, 2017 at 9:38 AM, Dipti Kulkarni <
dipti_dkulka...@persistent.com> wrote:
Thank you all for your inputs!
-Original Message-
From: Dan Halperin [mailto:dhalp...@google.com.INVALID]
Sent: Friday, February 17, 2017 12:17 PM
To: dev@beam.apache.o
eam/pull/2087
On Fri, Feb 17, 2017 at 9:38 AM, Dipti Kulkarni <
dipti_dkulka...@persistent.com> wrote:
> Thank you all for your inputs!
>
>
> -Original Message-
> From: Dan Halperin [mailto:dhalp...@google.com.INVALID]
> Sent: Friday, February 17, 2017 12:17 PM
&g
Thank you all for your inputs!
-Original Message-
From: Dan Halperin [mailto:dhalp...@google.com.INVALID]
Sent: Friday, February 17, 2017 12:17 PM
To: dev@beam.apache.org
Subject: Re: Merge HadoopInputFormatIO and HDFSIO in a single module
Raghu, Amit -- +1 to your expertise :)
On
Raghu, Amit -- +1 to your expertise :)
On Thu, Feb 16, 2017 at 3:39 PM, Amit Sela wrote:
> I agree with Dan on everything regarding HdfsFileSystem - it's super
> convenient for users to use TextIO with HdfsFileSystem rather then
> replacing the IO and also specifying the InputFormat type.
>
> I
I agree with Dan on everything regarding HdfsFileSystem - it's super
convenient for users to use TextIO with HdfsFileSystem rather then
replacing the IO and also specifying the InputFormat type.
I disagree on "HadoopIO" - I think that people who work with Hadoop would
find this name intuitive, and
FileInputFormat is extremely widely used, pretty much all the file based
input formats extend it. All of them call into to list the input files,
split (with some tweaks on top of that). The special API (
*FileInputFormat.setMinInputSplitSize(job,
desiredBundleSizeBytes)* ) is how the split size is
Chiming in a bit late, but here's my 2 cents.
HdfsFileSystem vs Hadoop*InputFormatIO is a red herring:
* HdfsFileSystem is for file-format-specific, Beam-native, parsers of
files. It will make TextIO, AvroIO, etc., work for files that happen to be
located at hdfs:// URIs.
* This is complementa
Dipti,
Also how about calling it just HadoopIO?
On Wed, Feb 15, 2017 at 11:13 AM, Raghu Angadi wrote:
> I skimmed through HdfsIO and I think it is essentially HahdoopInpuFormatIO
> with FileInputFormat. I would pretty much move most of the code to
> HadoopInputFormatIO (just make HdfsIO a speci
Hi dipti!
It sounds like there are two possible implementation options:
1. HdfsIO that is implemented using HadoopInputFormatIO
2. HdfsIO that is implemented using IOChannelFactory (I think
BeamFileSystem is the new name?)
Either way, I agree that it makes sense to have one module that contains
t
Hi
I guess your saw my comment in the PR. Basically I was waiting the refactoring
of IOChannelFactory to refactore hdfs IO as hadoop file format on top of
IOChannelFactory. I would have wait a bit and I would be more than happy to
help you on the PR.
Regards
JB
On Feb 15, 2017, 14:55, at 14:5
Hi
It's what I said in the hadoop file format PR.
When I discussed with Davor and Pei about the refactoring of the
IOChannelFactory, I proposed to refactore hdfs IO to deal with hadoop file
format on top of the file IO.
Regards
JB
On Feb 15, 2017, 15:13, at 15:13, Raghu Angadi
wrote:
>I skim
I skimmed through HdfsIO and I think it is essentially HahdoopInpuFormatIO
with FileInputFormat. I would pretty much move most of the code to
HadoopInputFormatIO (just make HdfsIO a specific instance of HIF_IO).
On Wed, Feb 15, 2017 at 9:15 AM, Dipti Kulkarni <
dipti_dkulka...@persistent.com> wrot
Hello there!
I am working on writing a Read IO for Hadoop InputFormat. This will enable
reading from any datasource which supports Hadoop InputFormat, i.e. provides
source to read from InputFormat for integration with Hadoop.
It makes sense for the HadoopInputFormatIO to share some code with the
16 matches
Mail list logo