Thank you Joe. Sqoop to HDFS data load is outside the NiFi flow. Once the data is pushed to HDFS then I have to process each record and perform validations.
By Validation i meant that we will be picking a particular column for each record store in HDFS and the performing a SQL query against another database. On Sun, Jan 10, 2016 at 9:17 AM, Joe Witt <[email protected]> wrote: > Hello Sudeep, > > "Which NiFi processor can I use to split each record (separated by a > new line character)" > > For this the SplitText processor is rather helpful if you want to > split each line. I recommend you do two SplitText processors in a > chain where one splits on every 1000 lines for example and then the > next one splits each line. As long as you have back-pressure setup > this means you could split arbitrarily larger (in terms of number of > lines) source files and have good behavior. > > ..."and perform validations?" > > Consider if you want to validate each line in a text file and route > valid lines one way and invalid lines another way. If this is the > case then you may be able to avoid using SplitText and simply use > RouteText instead as it can operate on the original file in a line by > line manner and perform expression based validation. This would > operate in bulk and be quite efficient. > > "For validations I want to verify a particular column value for each > record using a SQL query" > > Our ExecuteSQL processor is designed for executing SQL against a > JDBC accessible database. It is not helpful at this point for > executing queries on line oriented data even if that data were valid > DML or something. Interesting idea but not something we support at > this time. > > I'm interested to understand your case more if you don't mind though. > You mention you're getting data from Sqoop into HDFS. How is NiFi > involved in that flow - is it after data lands in HDFS you're pulling > it into NiFi? > > Thanks > Joe > > On Sat, Jan 9, 2016 at 10:32 PM, sudeep mishra <[email protected]> > wrote: > > Hi, > > > > I am pushing some database records into HDFS using Sqoop. > > > > I want to perform some validations on each record in the HDFS data. Which > > NiFi processor can I use to split each record (separated by a new line > > character) and perform validations? > > > > For validations I want to verify a particular column value for each > record > > using a SQL query. I can see an ExecuteQuery processor. How can I > > dynamically pass query parameters to it. Also is there a way to execute > the > > queries in bulk rather for each record. > > > > Kindly suggest. > > > > Apprecuate your help. > > > > > > Thanks & Regards, > > > > Sudeep Shekhar Mishra > > > > > > > > > > > > -- > > Thanks & Regards, > > > > Sudeep Shekhar Mishra > > > > +91-9167519029 > > [email protected] > -- Thanks & Regards, Sudeep Shekhar Mishra +91-9167519029 [email protected]
