Ravi,

Thanks for the help - I am sorry I am not finding the upsert statement.  
Attache are the logs and output.  I specify the columns because I get errors if 
I do not.

I ran a test on 10K records.  Pig states it processed 10K records.  Select 
count() says 9030.  I analyzed the 10k data in excel and there are no duplicates

Thanks!
Ralph

__________________________________________________
Ralph Perko
Pacific Northwest National Laboratory
(509) 375-2272
[email protected]

From: Ravi Kiran <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Monday, February 2, 2015 at 12:23 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Pig vs Bulk Load record count

Hi Ralph,

   Regarding the upsert query in the logs, it should be Phoenix Custom Upsert 
Statement:  as you have explicitly specified the fields in STORE .    Is it 
possible to give it a try with a smaller set of records , say 8k to see the 
behavior.

Regards
Ravi

On Mon, Feb 2, 2015 at 11:27 AM, Perko, Ralph J 
<[email protected]<mailto:[email protected]>> wrote:
Thanks for the quick response.  Here is what I have below:

========================================
Pig script:
-------------------------------
register $phoenix_jar;

Z = load '$data' USING PigStorage(',') as (
  file_name,
  rec_num,
  epoch_time,
  timet,
  site,
  proto,
  saddr,
  daddr,
  sport,
  dport,
  mf,
  cf,
  dur,
  sdata,
  ddata,
  sbyte,
  dbyte,
  spkt,
  dpkt,
  siopt,
  diopt,
  stopt,
  dtopt,
  sflags,
  dflags,
  flags,
  sfseq,
  dfseq,
  slseq,
  dlseq,
  category);

STORE Z into 
'hbase://$table_name/FILE_NAME,REC_NUM,EPOCH_TIME,TIMET,SITE,PROTO,SADDR,DADDR,SPORT,DPORT,MF,CF,DUR,SDATA,DDATA,SBYTE,DBYTE,SPKT,DPKT,SIOPT,DIOPT,STOPT,DTOPT,SFLAGS,DFLAGS,FLAGS,SFSEQ,DFSEQ,SLSEQ,DLSEQ,CATEGORY'
 using org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize 
5000');

=========================

I cannot find the upsert statement you are referring to in either the MR logs 
or Pig output but I do have this below – Pig thinks it output the correct 
number of records

Input(s):
Successfully read 42871627 records (1479463169 bytes) from: 
"/data/incoming/201501124931/SAMPLE"

Output(s):
Successfully stored 42871627 records in: 
"hbase://TEST/FILE_NAME,REC_NUM,EPOCH_TIME,TIMET,SITE,PROTO,SADDR,DADDR,SPORT,DPORT,MF,CF,DUR,SDATA,DDATA,SBYTE,DBYTE,SPKT,DPKT,SIOPT,DIOPT,STOPT,DTOPT,SFLAGS,DFLAGS,FLAGS,SFSEQ,DFSEQ,SLSEQ,DLSEQ,CATEGORY"


Count command:
select count(1) from TEST;

__________________________________________________
Ralph Perko
Pacific Northwest National Laboratory
(509) 375-2272<tel:%28509%29%20375-2272>
[email protected]<mailto:[email protected]>

From: Ravi Kiran <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Monday, February 2, 2015 at 11:01 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Pig vs Bulk Load record count

Hi Ralph,

   That's definitely a cause of worry. Can you please share the UPSERT query 
being built by Phoenix . You should see it in the logs with an entry "Phoenix 
Generic Upsert Statement: ..
Also, what do the MapReduce counters say for the job.  If possible can you 
share the pig script as sometimes the order of columns in the STORE command 
impacts.

Regards
Ravi


On Mon, Feb 2, 2015 at 10:46 AM, Perko, Ralph J 
<[email protected]<mailto:[email protected]>> wrote:
Hi, I’ve run into a peculiar issue between loading data using Pig vs the 
CsvBulkLoadTool.  I have 42M csv records to load and I am comparing the 
performance.

In both cases the MR jobs are successful, and there are no errors.
In both cases the MR job counters state there are 42M Map input and output 
records

However, when I run count on the table when the jobs are complete something is 
terribly off.
After the bulk load, select count shows all 42M recs in Phoenix as is expected.
After the pig load there are only 3M recs in Phoenix – not even close.

I have no errors to send.  I have run the same test multiple times and gotten 
the same results.    The pig script is not doing any transformations.  It is a 
simple LOAD and STORE
I get the same result using client jars from 4.2.2 and 4.2.3-SNAPSHOT.  
4.2.3-SNAPSHOT is running on the region servers.

Thanks,
Ralph



syslog
===========================
2015-02-02 12:58:26,654 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
application appattempt_1418147347769_0194_000001
2015-02-02 12:58:27,392 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with tokens:
2015-02-02 12:58:27,392 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: YARN_AM_RM_TOKserverEN, 
Service: , Ident: (org.apache.hadoop.yarn.security.AMRMTokenIdentifier@3ef88e3e)
2015-02-02 12:58:27,430 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Using mapred newApiCommitter.
2015-02-02 12:58:28,168 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter set in config 
null
2015-02-02 12:58:28,411 INFO [main] 
org.apache.hadoop.conf.Configuration.deprecation: 
mapred.map.tasks.speculative.execution is deprecated. Instead, use 
mapreduce.map.speculative
2015-02-02 12:58:28,412 INFO [main] 
org.apache.hadoop.conf.Configuration.deprecation: 
mapred.reduce.tasks.speculative.execution is deprecated. Instead, use 
mapreduce.reduce.speculative
2015-02-02 12:58:28,433 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter is 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter
2015-02-02 12:58:28,454 INFO [main] 
org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class 
org.apache.hadoop.mapreduce.jobhistory.EventType for class 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler
2015-02-02 12:58:28,455 INFO [main] 
org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class 
org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher
2015-02-02 12:58:28,456 INFO [main] 
org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class 
org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for class 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher
2015-02-02 12:58:28,457 INFO [main] 
org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class 
org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptEventType for class 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher
2015-02-02 12:58:28,457 INFO [main] 
org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for class 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler
2015-02-02 12:58:28,458 INFO [main] 
org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class 
org.apache.hadoop.mapreduce.v2.app.speculate.Speculator$EventType for class 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$SpeculatorEventDispatcher
2015-02-02 12:58:28,459 INFO [main] 
org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class 
org.apache.hadoop.mapreduce.v2.app.rm.ContainerAllocator$EventType for class 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter
2015-02-02 12:58:28,460 INFO [main] 
org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncher$EventType for 
class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerLauncherRouter
2015-02-02 12:58:28,514 INFO [main] 
org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system 
[hdfs://server01:8020]
2015-02-02 12:58:28,551 INFO [main] 
org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system 
[hdfs://server01:8020]
2015-02-02 12:58:28,585 INFO [main] 
org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system 
[hdfs://server01:8020]
2015-02-02 12:58:28,678 INFO [main] 
org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class 
org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler
2015-02-02 12:58:28,931 INFO [main] 
org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from 
hadoop-metrics2.properties
2015-02-02 12:58:29,017 INFO [main] 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 
60 second(s).
2015-02-02 12:58:29,017 INFO [main] 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MRAppMaster metrics system 
started
2015-02-02 12:58:29,029 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Adding job token for 
job_1418147347769_0194 to jobTokenSecretManager
2015-02-02 12:58:29,203 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Not uberizing 
job_1418147347769_0194 because: not enabled; too much RAM;
2015-02-02 12:58:29,219 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Input size for job 
job_1418147347769_0194 = 2648661. Number of splits = 1
2015-02-02 12:58:29,219 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Number of reduces for job 
job_1418147347769_0194 = 0
2015-02-02 12:58:29,219 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1418147347769_0194Job 
Transitioned from NEW to INITED
2015-02-02 12:58:29,220 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster launching normal, 
non-uberized, multi-container job job_1418147347769_0194.
2015-02-02 12:58:29,250 INFO [main] org.apache.hadoop.ipc.CallQueueManager: 
Using callQueue class java.util.concurrent.LinkedBlockingQueue
2015-02-02 12:58:29,259 INFO [Socket Reader #1 for port 45357] 
org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 45357
2015-02-02 12:58:29,283 INFO [main] 
org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding 
protocol org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB to the server
2015-02-02 12:58:29,283 INFO [IPC Server Responder] 
org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2015-02-02 12:58:29,283 INFO [IPC Server listener on 45357] 
org.apache.hadoop.ipc.Server: IPC Server listener on 45357: starting
2015-02-02 12:58:29,284 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.client.MRClientService: Instantiated 
MRClientService at server-dn03/192.168.243.113:45357
2015-02-02 12:58:29,344 INFO [main] org.mortbay.log: Logging to 
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2015-02-02 12:58:29,348 INFO [main] org.apache.hadoop.http.HttpRequestLog: Http 
request log for http.requests.mapreduce is not defined
2015-02-02 12:58:29,358 INFO [main] org.apache.hadoop.http.HttpServer2: Added 
global filter 'safety' 
(class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2015-02-02 12:58:29,365 INFO [main] org.apache.hadoop.http.HttpServer2: Added 
filter AM_PROXY_FILTER 
(class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context 
mapreduce
2015-02-02 12:58:29,365 INFO [main] org.apache.hadoop.http.HttpServer2: Added 
filter AM_PROXY_FILTER 
(class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context 
static
2015-02-02 12:58:29,369 INFO [main] org.apache.hadoop.http.HttpServer2: adding 
path spec: /mapreduce/*
2015-02-02 12:58:29,369 INFO [main] org.apache.hadoop.http.HttpServer2: adding 
path spec: /ws/*
2015-02-02 12:58:29,380 INFO [main] org.apache.hadoop.http.HttpServer2: Jetty 
bound to port 34110
2015-02-02 12:58:29,380 INFO [main] org.mortbay.log: jetty-6.1.26
2015-02-02 12:58:29,405 INFO [main] org.mortbay.log: Extract 
jar:file:/usr/lib/hadoop-yarn/hadoop-yarn-common-2.4.0.2.1.5.0-695.jar!/webapps/mapreduce
 to /tmp/Jetty_0_0_0_0_34110_mapreduce____rw9s7e/webapp
2015-02-02 12:58:29,814 INFO [main] org.mortbay.log: Started 
[email protected]:34110
2015-02-02 12:58:29,814 INFO [main] org.apache.hadoop.yarn.webapp.WebApps: Web 
app /mapreduce started at 34110
2015-02-02 12:58:30,208 INFO [main] org.apache.hadoop.yarn.webapp.WebApps: 
Registered webapp guice modules
2015-02-02 12:58:30,213 INFO [main] org.apache.hadoop.ipc.CallQueueManager: 
Using callQueue class java.util.concurrent.LinkedBlockingQueue
2015-02-02 12:58:30,213 INFO [Socket Reader #1 for port 43810] 
org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 43810
2015-02-02 12:58:30,221 INFO [IPC Server Responder] 
org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2015-02-02 12:58:30,221 INFO [IPC Server listener on 43810] 
org.apache.hadoop.ipc.Server: IPC Server listener on 43810: starting
2015-02-02 12:58:30,236 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 
nodeBlacklistingEnabled:true
2015-02-02 12:58:30,236 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 
maxTaskFailuresPerNode is 3
2015-02-02 12:58:30,236 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 
blacklistDisablePercent is 33
2015-02-02 12:58:30,360 INFO [main] org.apache.hadoop.yarn.client.RMProxy: 
Connecting to ResourceManager at server01/192.168.243.110:8030
2015-02-02 12:58:30,428 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: 
maxContainerCapability: 64512
2015-02-02 12:58:30,428 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: queue: default
2015-02-02 12:58:30,432 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Upper limit 
on the thread pool size is 500
2015-02-02 12:58:30,434 INFO [main] 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: 
yarn.client.max-nodemanagers-proxies : 500
2015-02-02 12:58:30,440 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1418147347769_0194Job 
Transitioned from INITED to SETUP
2015-02-02 12:58:30,442 INFO [CommitterEvent Processor #0] 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the 
event EventType: JOB_SETUP
2015-02-02 12:58:30,468 INFO [CommitterEvent Processor #0] 
org.apache.hadoop.conf.Configuration.deprecation: 
mapred.map.tasks.speculative.execution is deprecated. Instead, use 
mapreduce.map.speculative
2015-02-02 12:58:30,468 INFO [CommitterEvent Processor #0] 
org.apache.hadoop.conf.Configuration.deprecation: 
mapred.reduce.tasks.speculative.execution is deprecated. Instead, use 
mapreduce.reduce.speculative
2015-02-02 12:58:30,483 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1418147347769_0194Job 
Transitioned from SETUP to RUNNING
2015-02-02 12:58:30,499 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.yarn.util.RackResolver: Resolved server-dn03 to /default-rack
2015-02-02 12:58:30,500 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.yarn.util.RackResolver: Resolved server-dn05 to /default-rack
2015-02-02 12:58:30,500 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.yarn.util.RackResolver: Resolved server-dn01 to /default-rack
2015-02-02 12:58:30,504 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
task_1418147347769_0194_m_000000 Task Transitioned from NEW to SCHEDULED
2015-02-02 12:58:30,505 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
attempt_1418147347769_0194_m_000000_0 TaskAttempt Transitioned from NEW to 
UNASSIGNED
2015-02-02 12:58:30,506 INFO [Thread-51] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: mapResourceReqt:4096
2015-02-02 12:58:30,533 INFO [eventHandlingThread] 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer 
setup for JobId: job_1418147347769_0194, File: 
hdfs://server01:8020/user/perko/.staging/job_1418147347769_0194/job_1418147347769_0194_1.jhist
2015-02-02 12:58:31,431 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: 
PendingReds:0 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 
CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 HostLocal:0 RackLocal:0
2015-02-02 12:58:31,478 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for 
application_1418147347769_0194: ask=5 release= 0 newContainers=0 
finishedContainers=0 resourcelimit=<memory:315392, vCores:0> knownNMs=5
2015-02-02 12:58:32,488 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated 
containers 1
2015-02-02 12:58:32,489 INFO [RMCommunicator Allocator] 
org.apache.hadoop.yarn.util.RackResolver: Resolved server-dn04 to /default-rack
2015-02-02 12:58:32,490 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container 
container_1418147347769_0194_01_000002 to attempt_1418147347769_0194_m_000000_0
2015-02-02 12:58:32,491 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 
CompletedMaps:0 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:1
2015-02-02 12:58:32,544 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.yarn.util.RackResolver: Resolved server-dn04 to /default-rack
2015-02-02 12:58:32,558 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: The job-jar file 
on the remote FS is 
hdfs://server01:8020/user/perko/.staging/job_1418147347769_0194/job.jar
2015-02-02 12:58:32,560 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: The job-conf file 
on the remote FS is /user/perko/.staging/job_1418147347769_0194/job.xml
2015-02-02 12:58:32,563 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Adding #1 tokens 
and #1 secret keys for NM use for launching container
2015-02-02 12:58:32,563 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Size of 
containertokens_dob is 2
2015-02-02 12:58:32,563 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Putting shuffle 
token in serviceData
2015-02-02 12:58:32,585 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
attempt_1418147347769_0194_m_000000_0 TaskAttempt Transitioned from UNASSIGNED 
to ASSIGNED
2015-02-02 12:58:32,590 INFO [ContainerLauncher #0] 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing 
the event EventType: CONTAINER_REMOTE_LAUNCH for container 
container_1418147347769_0194_01_000002 taskAttempt 
attempt_1418147347769_0194_m_000000_0
2015-02-02 12:58:32,592 INFO [ContainerLauncher #0] 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching 
attempt_1418147347769_0194_m_000000_0
2015-02-02 12:58:32,592 INFO [ContainerLauncher #0] 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: 
Opening proxy : server-dn04:45454
2015-02-02 12:58:32,642 INFO [ContainerLauncher #0] 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port 
returned by ContainerManager for attempt_1418147347769_0194_m_000000_0 : 13562
2015-02-02 12:58:32,644 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: 
[attempt_1418147347769_0194_m_000000_0] using containerId: 
[container_1418147347769_0194_01_000002 on NM: [server-dn04:45454]
2015-02-02 12:58:32,648 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
attempt_1418147347769_0194_m_000000_0 TaskAttempt Transitioned from ASSIGNED to 
RUNNING
2015-02-02 12:58:32,648 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
task_1418147347769_0194_m_000000 Task Transitioned from SCHEDULED to RUNNING
2015-02-02 12:58:33,494 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for 
application_1418147347769_0194: ask=5 release= 0 newContainers=0 
finishedContainers=0 resourcelimit=<memory:311296, vCores:-1> knownNMs=5
2015-02-02 12:58:35,136 INFO [Socket Reader #1 for port 43810] 
SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for 
job_1418147347769_0194 (auth:SIMPLE)
2015-02-02 12:58:35,159 INFO [IPC Server handler 0 on 43810] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : 
jvm_1418147347769_0194_m_000002 asked for a task
2015-02-02 12:58:35,159 INFO [IPC Server handler 0 on 43810] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: 
jvm_1418147347769_0194_m_000002 given task: 
attempt_1418147347769_0194_m_000000_0
2015-02-02 12:58:45,342 INFO [IPC Server handler 1 on 43810] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1418147347769_0194_m_000000_0 is : 1.0
2015-02-02 12:58:45,982 INFO [IPC Server handler 2 on 43810] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1418147347769_0194_m_000000_0 is : 1.0
2015-02-02 12:58:46,043 INFO [IPC Server handler 6 on 43810] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit-pending state update 
from attempt_1418147347769_0194_m_000000_0
2015-02-02 12:58:46,044 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
attempt_1418147347769_0194_m_000000_0 TaskAttempt Transitioned from RUNNING to 
COMMIT_PENDING
2015-02-02 12:58:46,044 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
attempt_1418147347769_0194_m_000000_0 given a go for committing the task output.
2015-02-02 12:58:46,045 INFO [IPC Server handler 5 on 43810] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit go/no-go request from 
attempt_1418147347769_0194_m_000000_0
2015-02-02 12:58:46,046 INFO [IPC Server handler 5 on 43810] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Result of canCommit for 
attempt_1418147347769_0194_m_000000_0:true
2015-02-02 12:58:46,102 INFO [IPC Server handler 7 on 43810] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
attempt_1418147347769_0194_m_000000_0 is : 1.0
2015-02-02 12:58:46,105 INFO [IPC Server handler 8 on 43810] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement from 
attempt_1418147347769_0194_m_000000_0
2015-02-02 12:58:46,107 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
attempt_1418147347769_0194_m_000000_0 TaskAttempt Transitioned from 
COMMIT_PENDING to SUCCESS_CONTAINER_CLEANUP
2015-02-02 12:58:46,108 INFO [ContainerLauncher #1] 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing 
the event EventType: CONTAINER_REMOTE_CLEANUP for container 
container_1418147347769_0194_01_000002 taskAttempt 
attempt_1418147347769_0194_m_000000_0
2015-02-02 12:58:46,108 INFO [ContainerLauncher #1] 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING 
attempt_1418147347769_0194_m_000000_0
2015-02-02 12:58:46,117 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
attempt_1418147347769_0194_m_000000_0 TaskAttempt Transitioned from 
SUCCESS_CONTAINER_CLEANUP to SUCCEEDED
2015-02-02 12:58:46,124 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with 
attempt attempt_1418147347769_0194_m_000000_0
2015-02-02 12:58:46,125 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
task_1418147347769_0194_m_000000 Task Transitioned from RUNNING to SUCCEEDED
2015-02-02 12:58:46,127 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 1
2015-02-02 12:58:46,128 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1418147347769_0194Job 
Transitioned from RUNNING to COMMITTING
2015-02-02 12:58:46,129 INFO [CommitterEvent Processor #1] 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the 
event EventType: JOB_COMMIT
2015-02-02 12:58:46,171 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Calling handler for 
JobFinishedEvent 
2015-02-02 12:58:46,172 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1418147347769_0194Job 
Transitioned from COMMITTING to SUCCEEDED
2015-02-02 12:58:46,173 INFO [Thread-64] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: We are finishing cleanly so 
this is the last retry
2015-02-02 12:58:46,173 INFO [Thread-64] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify RMCommunicator 
isAMLastRetry: true
2015-02-02 12:58:46,173 INFO [Thread-64] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: RMCommunicator 
notified that shouldUnregistered is: true
2015-02-02 12:58:46,173 INFO [Thread-64] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify JHEH isAMLastRetry: true
2015-02-02 12:58:46,173 INFO [Thread-64] 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: 
JobHistoryEventHandler notified that forceJobCompletion is true
2015-02-02 12:58:46,173 INFO [Thread-64] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Calling stop for all the 
services
2015-02-02 12:58:46,173 INFO [Thread-64] 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopping 
JobHistoryEventHandler. Size of the outstanding queue size is 0
2015-02-02 12:58:46,212 INFO [eventHandlingThread] 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copying 
hdfs://server01:8020/user/perko/.staging/job_1418147347769_0194/job_1418147347769_0194_1.jhist
 to 
hdfs://server01:8020/mr-history/tmp/perko/job_1418147347769_0194-1422910702455-perko-PigLatin%3Asimple.pig-1422910726169-1-0-SUCCEEDED-default-1422910710435.jhist_tmp
2015-02-02 12:58:46,234 INFO [eventHandlingThread] 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copied to done 
location: 
hdfs://server01:8020/mr-history/tmp/perko/job_1418147347769_0194-1422910702455-perko-PigLatin%3Asimple.pig-1422910726169-1-0-SUCCEEDED-default-1422910710435.jhist_tmp
2015-02-02 12:58:46,237 INFO [eventHandlingThread] 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copying 
hdfs://server01:8020/user/perko/.staging/job_1418147347769_0194/job_1418147347769_0194_1_conf.xml
 to 
hdfs://server01:8020/mr-history/tmp/perko/job_1418147347769_0194_conf.xml_tmp
2015-02-02 12:58:46,258 INFO [eventHandlingThread] 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copied to done 
location: 
hdfs://server01:8020/mr-history/tmp/perko/job_1418147347769_0194_conf.xml_tmp
2015-02-02 12:58:46,264 INFO [eventHandlingThread] 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to 
done: 
hdfs://server01:8020/mr-history/tmp/perko/job_1418147347769_0194.summary_tmp to 
hdfs://server01:8020/mr-history/tmp/perko/job_1418147347769_0194.summary
2015-02-02 12:58:46,265 INFO [eventHandlingThread] 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to 
done: 
hdfs://server01:8020/mr-history/tmp/perko/job_1418147347769_0194_conf.xml_tmp 
to hdfs://server01:8020/mr-history/tmp/perko/job_1418147347769_0194_conf.xml
2015-02-02 12:58:46,266 INFO [eventHandlingThread] 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to 
done: 
hdfs://server01:8020/mr-history/tmp/perko/job_1418147347769_0194-1422910702455-perko-PigLatin%3Asimple.pig-1422910726169-1-0-SUCCEEDED-default-1422910710435.jhist_tmp
 to 
hdfs://server01:8020/mr-history/tmp/perko/job_1418147347769_0194-1422910702455-perko-PigLatin%3Asimple.pig-1422910726169-1-0-SUCCEEDED-default-1422910710435.jhist
2015-02-02 12:58:46,266 INFO [Thread-64] 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopped 
JobHistoryEventHandler. super.stop()
2015-02-02 12:58:46,268 INFO [Thread-64] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Setting job 
diagnostics to 
2015-02-02 12:58:46,269 INFO [Thread-64] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: History url is 
http://server01:19888/jobhistory/job/job_1418147347769_0194
2015-02-02 12:58:46,273 INFO [Thread-64] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Waiting for 
application to be successfully unregistered.
2015-02-02 12:58:47,275 INFO [Thread-64] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Final Stats: 
PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 
CompletedMaps:0 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:1
2015-02-02 12:58:47,276 INFO [Thread-64] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Deleting staging directory 
hdfs://server01:8020 /user/perko/.staging/job_1418147347769_0194
2015-02-02 12:58:47,281 INFO [Thread-64] org.apache.hadoop.ipc.Server: Stopping 
server on 43810
2015-02-02 12:58:47,283 INFO [IPC Server listener on 43810] 
org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 43810
2015-02-02 12:58:47,283 INFO [TaskHeartbeatHandler PingChecker] 
org.apache.hadoop.mapreduce.v2.app.TaskHeartbeatHandler: TaskHeartbeatHandler 
thread interrupted
2015-02-02 12:58:47,283 INFO [IPC Server Responder] 
org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
[perko@server01 ~]$ pig -param_file cic-pig.param -param 
data=/data/incoming/sample.csv -param table_name=TEST simple.pig
2015-02-02 12:58:14,670 [main] INFO  org.apache.pig.Main - Apache Pig version 
0.12.1.2.1.5.0-695 (rexported) compiled Aug 27 2014, 23:56:19
2015-02-02 12:58:14,671 [main] INFO  org.apache.pig.Main - Logging error 
messages to: /home/perko/pig_1422910694669.log
2015-02-02 12:58:15,477 [main] INFO  
org.apache.hadoop.conf.Configuration.deprecation - user.name is deprecated. 
Instead, use mapreduce.job.user.name
2015-02-02 12:58:15,613 [main] INFO  org.apache.pig.impl.util.Utils - Default 
bootup file /home/perko/.pigbootup not found
2015-02-02 12:58:15,787 [main] INFO  
org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is 
deprecated. Instead, use mapreduce.jobtracker.address
2015-02-02 12:58:15,787 [main] INFO  
org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is 
deprecated. Instead, use fs.defaultFS
2015-02-02 12:58:15,787 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to 
hadoop file system at: hdfs://server01.cpp:8020
2015-02-02 12:58:16,328 [main] INFO  
org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is 
deprecated. Instead, use fs.defaultFS
2015-02-02 12:58:17,039 [main] INFO  org.apache.pig.tools.pigstats.ScriptState 
- Pig features used in the script: UNKNOWN
2015-02-02 12:58:17,076 [main] INFO  
org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - 
{RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, GroupByConstParallelSetter, 
LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, 
NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, 
PushUpFilter, SplitFilter, StreamTypeCastInserter], 
RULES_DISABLED=[FilterLogicExpressionSimplifier]}
2015-02-02 12:58:17,122 [main] INFO  
org.apache.hadoop.conf.Configuration.deprecation - 
mapred.map.tasks.speculative.execution is deprecated. Instead, use 
mapreduce.map.speculative
2015-02-02 12:58:17,122 [main] INFO  
org.apache.hadoop.conf.Configuration.deprecation - 
mapred.reduce.tasks.speculative.execution is deprecated. Instead, use 
mapreduce.reduce.speculative
2015-02-02 12:58:17,181 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File 
concatenation threshold: 100 optimistic? false
2015-02-02 12:58:17,235 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
 - MR plan size before optimization: 1
2015-02-02 12:58:17,235 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
 - MR plan size after optimization: 1
2015-02-02 12:58:17,986 [main] INFO  
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service 
address: http://server01.cpp:8188/ws/v1/timeline/
2015-02-02 12:58:18,110 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - 
Connecting to ResourceManager at server01.cpp/192.168.243.110:8050
2015-02-02 12:58:18,272 [main] INFO  org.apache.pig.tools.pigstats.ScriptState 
- Pig script settings are added to the job
2015-02-02 12:58:18,278 [main] INFO  
org.apache.hadoop.conf.Configuration.deprecation - 
mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use 
mapreduce.reduce.markreset.buffer.percent
2015-02-02 12:58:18,278 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler 
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2015-02-02 12:58:18,278 [main] INFO  
org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is 
deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2015-02-02 12:58:18,819 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler 
- creating jar file Job3779859737544170369.jar
2015-02-02 12:58:25,469 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler 
- jar file Job3779859737544170369.jar created
2015-02-02 12:58:25,470 [main] INFO  
org.apache.hadoop.conf.Configuration.deprecation - mapred.jar is deprecated. 
Instead, use mapreduce.job.jar
2015-02-02 12:58:25,517 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler 
- Setting up single store job
2015-02-02 12:58:25,578 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 1 map-reduce job(s) waiting for submission.
2015-02-02 12:58:25,580 [main] INFO  
org.apache.hadoop.conf.Configuration.deprecation - 
mapred.job.tracker.http.address is deprecated. Instead, use 
mapreduce.jobtracker.http.address
2015-02-02 12:58:25,793 [JobControl] INFO  
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service 
address: http://server01.cpp:8188/ws/v1/timeline/
2015-02-02 12:58:25,794 [JobControl] INFO  
org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at 
server01.cpp/192.168.243.110:8050
2015-02-02 12:58:25,825 [JobControl] INFO  
org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is 
deprecated. Instead, use fs.defaultFS
2015-02-02 12:58:26,178 [JobControl] INFO  
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to 
process : 1
2015-02-02 12:58:26,178 [JobControl] INFO  
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input 
paths to process : 1
2015-02-02 12:58:26,197 [JobControl] INFO  
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input 
paths (combined) to process : 1
2015-02-02 12:58:26,237 [JobControl] INFO  
org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2015-02-02 12:58:26,547 [JobControl] INFO  
org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: 
job_1418147347769_0194
2015-02-02 12:58:26,754 [JobControl] INFO  
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application 
application_1418147347769_0194
2015-02-02 12:58:26,788 [JobControl] INFO  org.apache.hadoop.mapreduce.Job - 
The url to track the job: 
http://server01.cpp:8088/proxy/application_1418147347769_0194/
2015-02-02 12:58:26,789 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- HadoopJobId: job_1418147347769_0194
2015-02-02 12:58:26,789 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Processing aliases Z
2015-02-02 12:58:26,789 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- detailed locations: M: Z[3,4] C:  R:
2015-02-02 12:58:26,843 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 0% complete
2015-02-02 12:58:47,715 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 50% complete
2015-02-02 12:58:52,121 [main] INFO  
org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is 
deprecated. Instead, use mapreduce.job.reduces
2015-02-02 12:58:52,179 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 100% complete
2015-02-02 12:58:52,181 [main] INFO  
org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:

HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
2.4.0.2.1.5.0-695       0.12.1.2.1.5.0-695      perko   2015-02-02 12:58:18     
2015-02-02 12:58:52     UNKNOWN

Success!

Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      
MedianMapTime   MaxReduceTime   MinReduceTime   AvgReduceTime   
MedianReducetime        Alias   Feature Outputs
job_1418147347769_0194  1       0       13      13      13      13      n/a     
n/a     n/a     n/a     Z       MAP_ONLY        
hbase://TEST/FILE_NAME,REC_NUM,EPOCH_TIME,TIMET,SITE,PROTO,SADDR,DADDR,SPORT,DPORT,MF,CF,DUR,SDATA,DDATA,SBYTE,DBYTE,SPKT,DPKT,SIOPT,DIOPT,STOPT,DTOPT,SFLAGS,DFLAGS,FLAGS,SFSEQ,DFSEQ,SLSEQ,DLSEQ,CATEGORY,
Input(s):
Successfully read 10000 records (2649054 bytes) from: 
"/data/incoming/sample.csv"

Output(s):
Successfully stored 10000 records in: 
"hbase://TEST/FILE_NAME,REC_NUM,EPOCH_TIME,TIMET,SITE,PROTO,SADDR,DADDR,SPORT,DPORT,MF,CF,DUR,SDATA,DDATA,SBYTE,DBYTE,SPKT,DPKT,SIOPT,DIOPT,STOPT,DTOPT,SFLAGS,DFLAGS,FLAGS,SFSEQ,DFSEQ,SLSEQ,DLSEQ,CATEGORY"

Counters:
Total records written : 10000
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_1418147347769_0194


2015-02-02 12:58:52,282 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Success!

Reply via email to