We handle a similar situation using CTAS and then retrieve the resulting data using webhdfs.
James On Thu, 3 May 2018, 17:18 Bryan Bende, <bbe...@gmail.com> wrote: > The two step idea makes sense... > > If you did want to go with the OS call you would probably want > ExecuteStreamCommand. > > On Thu, May 3, 2018 at 12:06 PM, Shawn Weeks <swe...@weeksconsulting.us> > wrote: > > I'm thinking about ways to do the operation in two steps where the first > > request starts the process of generating the data and returns an uuid and > > the second request can check on the status and download the file. Still > have > > to workout how to collect the output from the Hive table so I'll look at > the > > rest calls. Not sure of a good way to make an OS call as ExecuteProcess > > doesn't support inputs either. > > > > > > Thanks > > > > Shawn > > > > ________________________________ > > From: Bryan Bende <bbe...@gmail.com> > > Sent: Thursday, May 3, 2018 10:51:03 AM > > To: users@nifi.apache.org > > Subject: Re: Fetch Contents of HDFS Directory as a Part of a Larger Flow > > > > Another option would be if the Hadoop client was installed on the NiFi > > node then you could use one of the script processors to make a call to > > "hadoop fs -ls ...". > > > > If the response is so large that it requires heavy lifting of writing > > out temp tables to HDFS and then fetching those files into NiFi, and > > most likely merging to a single response flow file, is that really > > expected to happen in the context of a single web request/response? > > > > On Thu, May 3, 2018 at 11:45 AM, Pierre Villard > > <pierre.villard...@gmail.com> wrote: > >> Hi Shawn, > >> > >> If you know the path of the files to retrieve in HDFS, you could use > >> FetchHDFS processor. > >> If you need to retrieve all the files within the directory created by > >> Hive, > >> I guess you could list the existing files calling the REST API of > WebHDFS > >> and then use the FetchHDFS processor. > >> > >> Not sure that's the best solution to your requirement though. > >> > >> Pierre > >> > >> 2018-05-03 17:35 GMT+02:00 Shawn Weeks <swe...@weeksconsulting.us>: > >>> > >>> I'm building a rest service with the HTTP Request and Response > Processors > >>> to support data extracts from Hive. Since some of the extracts can be > >>> quiet > >>> large using the SelectHiveQL Processor isn't a performant option and > >>> instead > >>> I'm trying to use on demand Hive Temporary Tables to do the heavy > lifting > >>> via CTAS(Create Table as Select). Since GetHDFS doesn't support an > >>> incoming > >>> connection I'm trying to figure out another way to fetch the files Hive > >>> creates and return them as a download in the web service. Has anyone > else > >>> worked out a good solution for fetching the contents of a directory > from > >>> HDFS as a part of larger flow? > >>> > >>> > >>> Thanks > >>> > >>> Shawn > >> > >> >