So, I added one record to your sample to match all the conditions you have in your filter statement.
New input: [csingh]$ hadoop fs -cat test.txt 1,,2,76 1,,,76 ,2,,76 1,1,2, 1,1,1,76 1,2,1,76 I modified the load statement to use PigStorage delimited by comma. D = LOAD 'test.txt' USING PigStorage(',') AS (IS_REPORTED:INT, PROCESSING_STATUS_ID:INT, PROGRAM_ID:INT, AFFINITY_GROUP_ID:INT); Output: (1,2,1,76) So, the NOT NULL's seem to be working. Pig Log’s: grunt> D = LOAD 'test.txt' USING PigStorage(',') AS (IS_REPORTED:INT, PROCESSING_STATUS_ID:INT, PROGRAM_ID:INT, AFFINITY_GROUP_ID:INT); grunt> X = FILTER D BY (IS_REPORTED is not null) AND (PROCESSING_STATUS_ID is not null) AND (IS_REPORTED==1) AND (PROGRAM_ID==1) AND (PROCESSING_STATUS_ID==2) AND (AFFINITY_GROUP_ID==76); grunt> DUMP X; 2016-02-18 23:01:06,336 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: FILTER 2016-02-18 23:01:06,366 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier, PartitionFilterOptimizer]} 2016-02-18 23:01:06,480 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2016-02-18 23:01:10,798 [JobControl] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2016-02-18 23:01:11,345 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1454499131434_9884 2016-02-18 23:01:11,542 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1454499131434_9884 2016-02-18 23:01:11,597 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2016-02-18 23:01:31,393 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete 2016-02-18 23:01:36,818 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 2016-02-18 23:01:36,875 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2016-02-18 23:01:36,878 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 2.6.0-cdh5.4.8 0.12.0-cdh5.4.8 csingh 2016-02-18 23:01:06 2016-02-18 23:01:36 FILTER Success! Job Stats (time in seconds): JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs job_1454499131434_9884 1 0 8 8 8 8 n/a n/a n/a n/a D,X MAP_ONLY Input(s): Successfully read 6 records (418 bytes) from: Output(s): Successfully stored 1 records (10 bytes) in: Counters: Total records written : 1 Total bytes written : 10 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_1454499131434_9884 2016-02-18 23:01:36,976 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! 2016-02-18 23:01:36,992 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2016-02-18 23:01:36,993 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 (1,2,1,76) > On Feb 18, 2016, at 10:13 PM, Parth Sawant <parth.sawan...@gmail.com> wrote: > > Attaching a sample input. Basically 5 rows with only 4 Integer values in > each. Some are NULL values. > > Thanks. > > On Thu, Feb 18, 2016 at 2:03 PM, Chandeep Singh <c...@chandeep.com > <mailto:c...@chandeep.com>> wrote: > I’m just looking for one sample record (which has NULL's) and not the entire > input so that its easier for me to debug. > > > On Feb 18, 2016, at 9:40 PM, Parth Sawant <parth.sawan...@gmail.com > > <mailto:parth.sawan...@gmail.com>> wrote: > > > > The input is simply too large to relay to others. A simplified schema is > > below. I only have INT columns with some null values in them. This is my > > Pig code snippet: > > > > D= LOAD 'src_locatn' as > > IS_REPORTED:INT, PROCESSING_STATUS_ID:INT, PROGRAM_ID:INT, > > AFFINITY_GROUP_ID:INT; > > > > X = FILTER D BY (IS_REPORTED is not null) AND (PROCESSING_STATUS_ID is not > > null) AND (IS_REPORTED==1) AND (PROGRAM_ID==1) AND > > (PROCESSING_STATUS_ID==2) AND (AFFINITY_GROUP_ID==76); > > > > Thanks > > > > On Thu, Feb 18, 2016 at 12:59 PM, Chandeep Singh <c...@chandeep.com > > <mailto:c...@chandeep.com>> wrote: > > > >> Any chance you could share a sample record which has NULL’s in it? as well > >> as your pig script? > >> > >>> On Feb 18, 2016, at 8:36 PM, Parth Sawant <parth.sawan...@gmail.com > >>> <mailto:parth.sawan...@gmail.com>> > >> wrote: > >>> > >>> I had anticipated it would throw a similar error with this suggestion as > >>> the last one... and it did. My fields are declared as INT, just to > >>> re-iterate. I don't think they can be compared to regexes. Here is the > >>> error: > >>> > >>> ERROR 1037: > >>> <file LeadSales.pig, line 19, column 29> Operands of Regex can be > >>> CharArray only :(Name: Regex Type: null Uid: null) > >>> > >>> org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: ERROR > >> 1037: > >>> <file LeadSales.pig, line 19, column 29> Operands of Regex can be > >>> CharArray only :(Name: Regex Type: null Uid: null) > >>> > >>> > >>> > >>> Thanks. > >>> > >>> > >>> On Thu, Feb 18, 2016 at 5:24 AM, Chandeep Singh <c...@chandeep.com > >>> <mailto:c...@chandeep.com>> wrote: > >>> > >>>> Since you integers in this field can you try matching to a regular > >>>> expression? > >>>> > >>>> Something like: X matches '\\d+' > >>>> > >>>>> On Feb 18, 2016, at 12:55 AM, Parth Sawant <parth.sawan...@gmail.com > >>>>> <mailto:parth.sawan...@gmail.com>> > >>>> wrote: > >>>>> > >>>>> Hi Chandeep. I tried that already but it gave me the following error: > >>>>> > >>>>> ERROR 1039: > >>>>> <file LeadSales.pig, line 19, column 27> In alias X, incompatible > >>>>> types in NotEqual Operator left hand side:int right hand > >>>>> side:chararray. > >>>>> > >>>>> The error makes sense cause the fields I have are INT type and hence > >>>>> cannot be compared to a chararray. > >>>>> > >>>>> > >>>>> Thanks for the prompt response though. > >>>>> > >>>>> > >>>>> > >>>>> On Feb 17, 2016 16:32, "Chandeep Singh" <c...@chandeep.com > >>>>> <mailto:c...@chandeep.com>> wrote: > >>>>> > >>>>> Try adding != '' along with IS NOT NULL. > >>>>>> > >>>>>>> On Feb 18, 2016, at 12:26 AM, Parth Sawant <parth.sawan...@gmail.com > >>>>>>> <mailto:parth.sawan...@gmail.com> > >>> > >>>>>> wrote: > >>>>>>> > >>>>>>> I'm trying to Filter some null fields in Pig using 'IS NOT NULL' . > >> For > >>>>>> some > >>>>>>> reason the null data values persist. > >>>>>>> For eg: the following filter on storing it's contents, contains null > >>>>>> values > >>>>>>> for ABC and PQR. > >>>>>>> > >>>>>>> X = FILTER D BY (ABC IS NOT NULL) AND (ABC==1) AND (PQR==1) AND (PQR > >> IS > >>>>>> NOT > >>>>>>> NULL) ; > >>>>>>> > >>>>>>> > >>>>>>> Can someone help with this? > >>>>>>> > >>>>>>> Thanks > >>>>>>> > >>>>>>> Parth S > >>>>>> > >>>>>> > >>>> > >>>> > >> > >> > > > <Sample_in.txt>