Ravi,
Thanks for the help - I am sorry I am not finding the upsert statement.
Attache are the logs and output. I specify the columns because I get errors if
I do not.
I ran a test on 10K records. Pig states it processed 10K records. Select
count() says 9030. I analyzed the 10k data in excel and there are no duplicates
Thanks!
Ralph
__________________________________________________
Ralph Perko
Pacific Northwest National Laboratory
(509) 375-2272
[email protected]
From: Ravi Kiran <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Date: Monday, February 2, 2015 at 12:23 PM
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: Re: Pig vs Bulk Load record count
Hi Ralph,
Regarding the upsert query in the logs, it should be Phoenix Custom Upsert
Statement: as you have explicitly specified the fields in STORE . Is it
possible to give it a try with a smaller set of records , say 8k to see the
behavior.
Regards
Ravi
On Mon, Feb 2, 2015 at 11:27 AM, Perko, Ralph J
<[email protected]<mailto:[email protected]>> wrote:
Thanks for the quick response. Here is what I have below:
========================================
Pig script:
-------------------------------
register $phoenix_jar;
Z = load '$data' USING PigStorage(',') as (
file_name,
rec_num,
epoch_time,
timet,
site,
proto,
saddr,
daddr,
sport,
dport,
mf,
cf,
dur,
sdata,
ddata,
sbyte,
dbyte,
spkt,
dpkt,
siopt,
diopt,
stopt,
dtopt,
sflags,
dflags,
flags,
sfseq,
dfseq,
slseq,
dlseq,
category);
STORE Z into
'hbase://$table_name/FILE_NAME,REC_NUM,EPOCH_TIME,TIMET,SITE,PROTO,SADDR,DADDR,SPORT,DPORT,MF,CF,DUR,SDATA,DDATA,SBYTE,DBYTE,SPKT,DPKT,SIOPT,DIOPT,STOPT,DTOPT,SFLAGS,DFLAGS,FLAGS,SFSEQ,DFSEQ,SLSEQ,DLSEQ,CATEGORY'
using org.apache.phoenix.pig.PhoenixHBaseStorage('$zookeeper','-batchSize
5000');
=========================
I cannot find the upsert statement you are referring to in either the MR logs
or Pig output but I do have this below – Pig thinks it output the correct
number of records
Input(s):
Successfully read 42871627 records (1479463169 bytes) from:
"/data/incoming/201501124931/SAMPLE"
Output(s):
Successfully stored 42871627 records in:
"hbase://TEST/FILE_NAME,REC_NUM,EPOCH_TIME,TIMET,SITE,PROTO,SADDR,DADDR,SPORT,DPORT,MF,CF,DUR,SDATA,DDATA,SBYTE,DBYTE,SPKT,DPKT,SIOPT,DIOPT,STOPT,DTOPT,SFLAGS,DFLAGS,FLAGS,SFSEQ,DFSEQ,SLSEQ,DLSEQ,CATEGORY"
Count command:
select count(1) from TEST;
__________________________________________________
Ralph Perko
Pacific Northwest National Laboratory
(509) 375-2272<tel:%28509%29%20375-2272>
[email protected]<mailto:[email protected]>
From: Ravi Kiran <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Date: Monday, February 2, 2015 at 11:01 AM
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Subject: Re: Pig vs Bulk Load record count
Hi Ralph,
That's definitely a cause of worry. Can you please share the UPSERT query
being built by Phoenix . You should see it in the logs with an entry "Phoenix
Generic Upsert Statement: ..
Also, what do the MapReduce counters say for the job. If possible can you
share the pig script as sometimes the order of columns in the STORE command
impacts.
Regards
Ravi
On Mon, Feb 2, 2015 at 10:46 AM, Perko, Ralph J
<[email protected]<mailto:[email protected]>> wrote:
Hi, I’ve run into a peculiar issue between loading data using Pig vs the
CsvBulkLoadTool. I have 42M csv records to load and I am comparing the
performance.
In both cases the MR jobs are successful, and there are no errors.
In both cases the MR job counters state there are 42M Map input and output
records
However, when I run count on the table when the jobs are complete something is
terribly off.
After the bulk load, select count shows all 42M recs in Phoenix as is expected.
After the pig load there are only 3M recs in Phoenix – not even close.
I have no errors to send. I have run the same test multiple times and gotten
the same results. The pig script is not doing any transformations. It is a
simple LOAD and STORE
I get the same result using client jars from 4.2.2 and 4.2.3-SNAPSHOT.
4.2.3-SNAPSHOT is running on the region servers.
Thanks,
Ralph
syslog
===========================
2015-02-02 12:58:26,654 INFO [main]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for
application appattempt_1418147347769_0194_000001
2015-02-02 12:58:27,392 INFO [main]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with tokens:
2015-02-02 12:58:27,392 INFO [main]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: YARN_AM_RM_TOKserverEN,
Service: , Ident: (org.apache.hadoop.yarn.security.AMRMTokenIdentifier@3ef88e3e)
2015-02-02 12:58:27,430 INFO [main]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Using mapred newApiCommitter.
2015-02-02 12:58:28,168 INFO [main]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter set in config
null
2015-02-02 12:58:28,411 INFO [main]
org.apache.hadoop.conf.Configuration.deprecation:
mapred.map.tasks.speculative.execution is deprecated. Instead, use
mapreduce.map.speculative
2015-02-02 12:58:28,412 INFO [main]
org.apache.hadoop.conf.Configuration.deprecation:
mapred.reduce.tasks.speculative.execution is deprecated. Instead, use
mapreduce.reduce.speculative
2015-02-02 12:58:28,433 INFO [main]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter is
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter
2015-02-02 12:58:28,454 INFO [main]
org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class
org.apache.hadoop.mapreduce.jobhistory.EventType for class
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler
2015-02-02 12:58:28,455 INFO [main]
org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class
org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher
2015-02-02 12:58:28,456 INFO [main]
org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class
org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for class
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher
2015-02-02 12:58:28,457 INFO [main]
org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class
org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptEventType for class
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher
2015-02-02 12:58:28,457 INFO [main]
org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for class
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler
2015-02-02 12:58:28,458 INFO [main]
org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class
org.apache.hadoop.mapreduce.v2.app.speculate.Speculator$EventType for class
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$SpeculatorEventDispatcher
2015-02-02 12:58:28,459 INFO [main]
org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class
org.apache.hadoop.mapreduce.v2.app.rm.ContainerAllocator$EventType for class
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter
2015-02-02 12:58:28,460 INFO [main]
org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncher$EventType for
class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerLauncherRouter
2015-02-02 12:58:28,514 INFO [main]
org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system
[hdfs://server01:8020]
2015-02-02 12:58:28,551 INFO [main]
org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system
[hdfs://server01:8020]
2015-02-02 12:58:28,585 INFO [main]
org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system
[hdfs://server01:8020]
2015-02-02 12:58:28,678 INFO [main]
org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class
org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler
2015-02-02 12:58:28,931 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
hadoop-metrics2.properties
2015-02-02 12:58:29,017 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at
60 second(s).
2015-02-02 12:58:29,017 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MRAppMaster metrics system
started
2015-02-02 12:58:29,029 INFO [main]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Adding job token for
job_1418147347769_0194 to jobTokenSecretManager
2015-02-02 12:58:29,203 INFO [main]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Not uberizing
job_1418147347769_0194 because: not enabled; too much RAM;
2015-02-02 12:58:29,219 INFO [main]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Input size for job
job_1418147347769_0194 = 2648661. Number of splits = 1
2015-02-02 12:58:29,219 INFO [main]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Number of reduces for job
job_1418147347769_0194 = 0
2015-02-02 12:58:29,219 INFO [main]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1418147347769_0194Job
Transitioned from NEW to INITED
2015-02-02 12:58:29,220 INFO [main]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster launching normal,
non-uberized, multi-container job job_1418147347769_0194.
2015-02-02 12:58:29,250 INFO [main] org.apache.hadoop.ipc.CallQueueManager:
Using callQueue class java.util.concurrent.LinkedBlockingQueue
2015-02-02 12:58:29,259 INFO [Socket Reader #1 for port 45357]
org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 45357
2015-02-02 12:58:29,283 INFO [main]
org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding
protocol org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB to the server
2015-02-02 12:58:29,283 INFO [IPC Server Responder]
org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2015-02-02 12:58:29,283 INFO [IPC Server listener on 45357]
org.apache.hadoop.ipc.Server: IPC Server listener on 45357: starting
2015-02-02 12:58:29,284 INFO [main]
org.apache.hadoop.mapreduce.v2.app.client.MRClientService: Instantiated
MRClientService at server-dn03/192.168.243.113:45357
2015-02-02 12:58:29,344 INFO [main] org.mortbay.log: Logging to
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2015-02-02 12:58:29,348 INFO [main] org.apache.hadoop.http.HttpRequestLog: Http
request log for http.requests.mapreduce is not defined
2015-02-02 12:58:29,358 INFO [main] org.apache.hadoop.http.HttpServer2: Added
global filter 'safety'
(class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2015-02-02 12:58:29,365 INFO [main] org.apache.hadoop.http.HttpServer2: Added
filter AM_PROXY_FILTER
(class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context
mapreduce
2015-02-02 12:58:29,365 INFO [main] org.apache.hadoop.http.HttpServer2: Added
filter AM_PROXY_FILTER
(class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context
static
2015-02-02 12:58:29,369 INFO [main] org.apache.hadoop.http.HttpServer2: adding
path spec: /mapreduce/*
2015-02-02 12:58:29,369 INFO [main] org.apache.hadoop.http.HttpServer2: adding
path spec: /ws/*
2015-02-02 12:58:29,380 INFO [main] org.apache.hadoop.http.HttpServer2: Jetty
bound to port 34110
2015-02-02 12:58:29,380 INFO [main] org.mortbay.log: jetty-6.1.26
2015-02-02 12:58:29,405 INFO [main] org.mortbay.log: Extract
jar:file:/usr/lib/hadoop-yarn/hadoop-yarn-common-2.4.0.2.1.5.0-695.jar!/webapps/mapreduce
to /tmp/Jetty_0_0_0_0_34110_mapreduce____rw9s7e/webapp
2015-02-02 12:58:29,814 INFO [main] org.mortbay.log: Started
[email protected]:34110
2015-02-02 12:58:29,814 INFO [main] org.apache.hadoop.yarn.webapp.WebApps: Web
app /mapreduce started at 34110
2015-02-02 12:58:30,208 INFO [main] org.apache.hadoop.yarn.webapp.WebApps:
Registered webapp guice modules
2015-02-02 12:58:30,213 INFO [main] org.apache.hadoop.ipc.CallQueueManager:
Using callQueue class java.util.concurrent.LinkedBlockingQueue
2015-02-02 12:58:30,213 INFO [Socket Reader #1 for port 43810]
org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 43810
2015-02-02 12:58:30,221 INFO [IPC Server Responder]
org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2015-02-02 12:58:30,221 INFO [IPC Server listener on 43810]
org.apache.hadoop.ipc.Server: IPC Server listener on 43810: starting
2015-02-02 12:58:30,236 INFO [main]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor:
nodeBlacklistingEnabled:true
2015-02-02 12:58:30,236 INFO [main]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor:
maxTaskFailuresPerNode is 3
2015-02-02 12:58:30,236 INFO [main]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor:
blacklistDisablePercent is 33
2015-02-02 12:58:30,360 INFO [main] org.apache.hadoop.yarn.client.RMProxy:
Connecting to ResourceManager at server01/192.168.243.110:8030
2015-02-02 12:58:30,428 INFO [main]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
maxContainerCapability: 64512
2015-02-02 12:58:30,428 INFO [main]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: queue: default
2015-02-02 12:58:30,432 INFO [main]
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Upper limit
on the thread pool size is 500
2015-02-02 12:58:30,434 INFO [main]
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy:
yarn.client.max-nodemanagers-proxies : 500
2015-02-02 12:58:30,440 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1418147347769_0194Job
Transitioned from INITED to SETUP
2015-02-02 12:58:30,442 INFO [CommitterEvent Processor #0]
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the
event EventType: JOB_SETUP
2015-02-02 12:58:30,468 INFO [CommitterEvent Processor #0]
org.apache.hadoop.conf.Configuration.deprecation:
mapred.map.tasks.speculative.execution is deprecated. Instead, use
mapreduce.map.speculative
2015-02-02 12:58:30,468 INFO [CommitterEvent Processor #0]
org.apache.hadoop.conf.Configuration.deprecation:
mapred.reduce.tasks.speculative.execution is deprecated. Instead, use
mapreduce.reduce.speculative
2015-02-02 12:58:30,483 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1418147347769_0194Job
Transitioned from SETUP to RUNNING
2015-02-02 12:58:30,499 INFO [AsyncDispatcher event handler]
org.apache.hadoop.yarn.util.RackResolver: Resolved server-dn03 to /default-rack
2015-02-02 12:58:30,500 INFO [AsyncDispatcher event handler]
org.apache.hadoop.yarn.util.RackResolver: Resolved server-dn05 to /default-rack
2015-02-02 12:58:30,500 INFO [AsyncDispatcher event handler]
org.apache.hadoop.yarn.util.RackResolver: Resolved server-dn01 to /default-rack
2015-02-02 12:58:30,504 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:
task_1418147347769_0194_m_000000 Task Transitioned from NEW to SCHEDULED
2015-02-02 12:58:30,505 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1418147347769_0194_m_000000_0 TaskAttempt Transitioned from NEW to
UNASSIGNED
2015-02-02 12:58:30,506 INFO [Thread-51]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: mapResourceReqt:4096
2015-02-02 12:58:30,533 INFO [eventHandlingThread]
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Event Writer
setup for JobId: job_1418147347769_0194, File:
hdfs://server01:8020/user/perko/.staging/job_1418147347769_0194/job_1418147347769_0194_1.jhist
2015-02-02 12:58:31,431 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling:
PendingReds:0 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 AssignedReds:0
CompletedMaps:0 CompletedReds:0 ContAlloc:0 ContRel:0 HostLocal:0 RackLocal:0
2015-02-02 12:58:31,478 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for
application_1418147347769_0194: ask=5 release= 0 newContainers=0
finishedContainers=0 resourcelimit=<memory:315392, vCores:0> knownNMs=5
2015-02-02 12:58:32,488 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated
containers 1
2015-02-02 12:58:32,489 INFO [RMCommunicator Allocator]
org.apache.hadoop.yarn.util.RackResolver: Resolved server-dn04 to /default-rack
2015-02-02 12:58:32,490 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container
container_1418147347769_0194_01_000002 to attempt_1418147347769_0194_m_000000_0
2015-02-02 12:58:32,491 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling:
PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0
CompletedMaps:0 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:1
2015-02-02 12:58:32,544 INFO [AsyncDispatcher event handler]
org.apache.hadoop.yarn.util.RackResolver: Resolved server-dn04 to /default-rack
2015-02-02 12:58:32,558 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: The job-jar file
on the remote FS is
hdfs://server01:8020/user/perko/.staging/job_1418147347769_0194/job.jar
2015-02-02 12:58:32,560 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: The job-conf file
on the remote FS is /user/perko/.staging/job_1418147347769_0194/job.xml
2015-02-02 12:58:32,563 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Adding #1 tokens
and #1 secret keys for NM use for launching container
2015-02-02 12:58:32,563 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Size of
containertokens_dob is 2
2015-02-02 12:58:32,563 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Putting shuffle
token in serviceData
2015-02-02 12:58:32,585 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1418147347769_0194_m_000000_0 TaskAttempt Transitioned from UNASSIGNED
to ASSIGNED
2015-02-02 12:58:32,590 INFO [ContainerLauncher #0]
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing
the event EventType: CONTAINER_REMOTE_LAUNCH for container
container_1418147347769_0194_01_000002 taskAttempt
attempt_1418147347769_0194_m_000000_0
2015-02-02 12:58:32,592 INFO [ContainerLauncher #0]
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching
attempt_1418147347769_0194_m_000000_0
2015-02-02 12:58:32,592 INFO [ContainerLauncher #0]
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy:
Opening proxy : server-dn04:45454
2015-02-02 12:58:32,642 INFO [ContainerLauncher #0]
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle port
returned by ContainerManager for attempt_1418147347769_0194_m_000000_0 : 13562
2015-02-02 12:58:32,644 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt:
[attempt_1418147347769_0194_m_000000_0] using containerId:
[container_1418147347769_0194_01_000002 on NM: [server-dn04:45454]
2015-02-02 12:58:32,648 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1418147347769_0194_m_000000_0 TaskAttempt Transitioned from ASSIGNED to
RUNNING
2015-02-02 12:58:32,648 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:
task_1418147347769_0194_m_000000 Task Transitioned from SCHEDULED to RUNNING
2015-02-02 12:58:33,494 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for
application_1418147347769_0194: ask=5 release= 0 newContainers=0
finishedContainers=0 resourcelimit=<memory:311296, vCores:-1> knownNMs=5
2015-02-02 12:58:35,136 INFO [Socket Reader #1 for port 43810]
SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for
job_1418147347769_0194 (auth:SIMPLE)
2015-02-02 12:58:35,159 INFO [IPC Server handler 0 on 43810]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID :
jvm_1418147347769_0194_m_000002 asked for a task
2015-02-02 12:58:35,159 INFO [IPC Server handler 0 on 43810]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID:
jvm_1418147347769_0194_m_000002 given task:
attempt_1418147347769_0194_m_000000_0
2015-02-02 12:58:45,342 INFO [IPC Server handler 1 on 43810]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt
attempt_1418147347769_0194_m_000000_0 is : 1.0
2015-02-02 12:58:45,982 INFO [IPC Server handler 2 on 43810]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt
attempt_1418147347769_0194_m_000000_0 is : 1.0
2015-02-02 12:58:46,043 INFO [IPC Server handler 6 on 43810]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit-pending state update
from attempt_1418147347769_0194_m_000000_0
2015-02-02 12:58:46,044 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1418147347769_0194_m_000000_0 TaskAttempt Transitioned from RUNNING to
COMMIT_PENDING
2015-02-02 12:58:46,044 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:
attempt_1418147347769_0194_m_000000_0 given a go for committing the task output.
2015-02-02 12:58:46,045 INFO [IPC Server handler 5 on 43810]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit go/no-go request from
attempt_1418147347769_0194_m_000000_0
2015-02-02 12:58:46,046 INFO [IPC Server handler 5 on 43810]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Result of canCommit for
attempt_1418147347769_0194_m_000000_0:true
2015-02-02 12:58:46,102 INFO [IPC Server handler 7 on 43810]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt
attempt_1418147347769_0194_m_000000_0 is : 1.0
2015-02-02 12:58:46,105 INFO [IPC Server handler 8 on 43810]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement from
attempt_1418147347769_0194_m_000000_0
2015-02-02 12:58:46,107 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1418147347769_0194_m_000000_0 TaskAttempt Transitioned from
COMMIT_PENDING to SUCCESS_CONTAINER_CLEANUP
2015-02-02 12:58:46,108 INFO [ContainerLauncher #1]
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing
the event EventType: CONTAINER_REMOTE_CLEANUP for container
container_1418147347769_0194_01_000002 taskAttempt
attempt_1418147347769_0194_m_000000_0
2015-02-02 12:58:46,108 INFO [ContainerLauncher #1]
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING
attempt_1418147347769_0194_m_000000_0
2015-02-02 12:58:46,117 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1418147347769_0194_m_000000_0 TaskAttempt Transitioned from
SUCCESS_CONTAINER_CLEANUP to SUCCEEDED
2015-02-02 12:58:46,124 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with
attempt attempt_1418147347769_0194_m_000000_0
2015-02-02 12:58:46,125 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:
task_1418147347769_0194_m_000000 Task Transitioned from RUNNING to SUCCEEDED
2015-02-02 12:58:46,127 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 1
2015-02-02 12:58:46,128 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1418147347769_0194Job
Transitioned from RUNNING to COMMITTING
2015-02-02 12:58:46,129 INFO [CommitterEvent Processor #1]
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the
event EventType: JOB_COMMIT
2015-02-02 12:58:46,171 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Calling handler for
JobFinishedEvent
2015-02-02 12:58:46,172 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1418147347769_0194Job
Transitioned from COMMITTING to SUCCEEDED
2015-02-02 12:58:46,173 INFO [Thread-64]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: We are finishing cleanly so
this is the last retry
2015-02-02 12:58:46,173 INFO [Thread-64]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify RMCommunicator
isAMLastRetry: true
2015-02-02 12:58:46,173 INFO [Thread-64]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: RMCommunicator
notified that shouldUnregistered is: true
2015-02-02 12:58:46,173 INFO [Thread-64]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify JHEH isAMLastRetry: true
2015-02-02 12:58:46,173 INFO [Thread-64]
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler:
JobHistoryEventHandler notified that forceJobCompletion is true
2015-02-02 12:58:46,173 INFO [Thread-64]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Calling stop for all the
services
2015-02-02 12:58:46,173 INFO [Thread-64]
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopping
JobHistoryEventHandler. Size of the outstanding queue size is 0
2015-02-02 12:58:46,212 INFO [eventHandlingThread]
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copying
hdfs://server01:8020/user/perko/.staging/job_1418147347769_0194/job_1418147347769_0194_1.jhist
to
hdfs://server01:8020/mr-history/tmp/perko/job_1418147347769_0194-1422910702455-perko-PigLatin%3Asimple.pig-1422910726169-1-0-SUCCEEDED-default-1422910710435.jhist_tmp
2015-02-02 12:58:46,234 INFO [eventHandlingThread]
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copied to done
location:
hdfs://server01:8020/mr-history/tmp/perko/job_1418147347769_0194-1422910702455-perko-PigLatin%3Asimple.pig-1422910726169-1-0-SUCCEEDED-default-1422910710435.jhist_tmp
2015-02-02 12:58:46,237 INFO [eventHandlingThread]
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copying
hdfs://server01:8020/user/perko/.staging/job_1418147347769_0194/job_1418147347769_0194_1_conf.xml
to
hdfs://server01:8020/mr-history/tmp/perko/job_1418147347769_0194_conf.xml_tmp
2015-02-02 12:58:46,258 INFO [eventHandlingThread]
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Copied to done
location:
hdfs://server01:8020/mr-history/tmp/perko/job_1418147347769_0194_conf.xml_tmp
2015-02-02 12:58:46,264 INFO [eventHandlingThread]
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to
done:
hdfs://server01:8020/mr-history/tmp/perko/job_1418147347769_0194.summary_tmp to
hdfs://server01:8020/mr-history/tmp/perko/job_1418147347769_0194.summary
2015-02-02 12:58:46,265 INFO [eventHandlingThread]
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to
done:
hdfs://server01:8020/mr-history/tmp/perko/job_1418147347769_0194_conf.xml_tmp
to hdfs://server01:8020/mr-history/tmp/perko/job_1418147347769_0194_conf.xml
2015-02-02 12:58:46,266 INFO [eventHandlingThread]
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Moved tmp to
done:
hdfs://server01:8020/mr-history/tmp/perko/job_1418147347769_0194-1422910702455-perko-PigLatin%3Asimple.pig-1422910726169-1-0-SUCCEEDED-default-1422910710435.jhist_tmp
to
hdfs://server01:8020/mr-history/tmp/perko/job_1418147347769_0194-1422910702455-perko-PigLatin%3Asimple.pig-1422910726169-1-0-SUCCEEDED-default-1422910710435.jhist
2015-02-02 12:58:46,266 INFO [Thread-64]
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopped
JobHistoryEventHandler. super.stop()
2015-02-02 12:58:46,268 INFO [Thread-64]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Setting job
diagnostics to
2015-02-02 12:58:46,269 INFO [Thread-64]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: History url is
http://server01:19888/jobhistory/job/job_1418147347769_0194
2015-02-02 12:58:46,273 INFO [Thread-64]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Waiting for
application to be successfully unregistered.
2015-02-02 12:58:47,275 INFO [Thread-64]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Final Stats:
PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0
CompletedMaps:0 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:1
2015-02-02 12:58:47,276 INFO [Thread-64]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Deleting staging directory
hdfs://server01:8020 /user/perko/.staging/job_1418147347769_0194
2015-02-02 12:58:47,281 INFO [Thread-64] org.apache.hadoop.ipc.Server: Stopping
server on 43810
2015-02-02 12:58:47,283 INFO [IPC Server listener on 43810]
org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 43810
2015-02-02 12:58:47,283 INFO [TaskHeartbeatHandler PingChecker]
org.apache.hadoop.mapreduce.v2.app.TaskHeartbeatHandler: TaskHeartbeatHandler
thread interrupted
2015-02-02 12:58:47,283 INFO [IPC Server Responder]
org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
[perko@server01 ~]$ pig -param_file cic-pig.param -param
data=/data/incoming/sample.csv -param table_name=TEST simple.pig
2015-02-02 12:58:14,670 [main] INFO org.apache.pig.Main - Apache Pig version
0.12.1.2.1.5.0-695 (rexported) compiled Aug 27 2014, 23:56:19
2015-02-02 12:58:14,671 [main] INFO org.apache.pig.Main - Logging error
messages to: /home/perko/pig_1422910694669.log
2015-02-02 12:58:15,477 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - user.name is deprecated.
Instead, use mapreduce.job.user.name
2015-02-02 12:58:15,613 [main] INFO org.apache.pig.impl.util.Utils - Default
bootup file /home/perko/.pigbootup not found
2015-02-02 12:58:15,787 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is
deprecated. Instead, use mapreduce.jobtracker.address
2015-02-02 12:58:15,787 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
deprecated. Instead, use fs.defaultFS
2015-02-02 12:58:15,787 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to
hadoop file system at: hdfs://server01.cpp:8020
2015-02-02 12:58:16,328 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
deprecated. Instead, use fs.defaultFS
2015-02-02 12:58:17,039 [main] INFO org.apache.pig.tools.pigstats.ScriptState
- Pig features used in the script: UNKNOWN
2015-02-02 12:58:17,076 [main] INFO
org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer -
{RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, GroupByConstParallelSetter,
LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach,
NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten,
PushUpFilter, SplitFilter, StreamTypeCastInserter],
RULES_DISABLED=[FilterLogicExpressionSimplifier]}
2015-02-02 12:58:17,122 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation -
mapred.map.tasks.speculative.execution is deprecated. Instead, use
mapreduce.map.speculative
2015-02-02 12:58:17,122 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation -
mapred.reduce.tasks.speculative.execution is deprecated. Instead, use
mapreduce.reduce.speculative
2015-02-02 12:58:17,181 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File
concatenation threshold: 100 optimistic? false
2015-02-02 12:58:17,235 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2015-02-02 12:58:17,235 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2015-02-02 12:58:17,986 [main] INFO
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service
address: http://server01.cpp:8188/ws/v1/timeline/
2015-02-02 12:58:18,110 [main] INFO org.apache.hadoop.yarn.client.RMProxy -
Connecting to ResourceManager at server01.cpp/192.168.243.110:8050
2015-02-02 12:58:18,272 [main] INFO org.apache.pig.tools.pigstats.ScriptState
- Pig script settings are added to the job
2015-02-02 12:58:18,278 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation -
mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use
mapreduce.reduce.markreset.buffer.percent
2015-02-02 12:58:18,278 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2015-02-02 12:58:18,278 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is
deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2015-02-02 12:58:18,819 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- creating jar file Job3779859737544170369.jar
2015-02-02 12:58:25,469 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- jar file Job3779859737544170369.jar created
2015-02-02 12:58:25,470 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - mapred.jar is deprecated.
Instead, use mapreduce.job.jar
2015-02-02 12:58:25,517 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2015-02-02 12:58:25,578 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2015-02-02 12:58:25,580 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation -
mapred.job.tracker.http.address is deprecated. Instead, use
mapreduce.jobtracker.http.address
2015-02-02 12:58:25,793 [JobControl] INFO
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service
address: http://server01.cpp:8188/ws/v1/timeline/
2015-02-02 12:58:25,794 [JobControl] INFO
org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at
server01.cpp/192.168.243.110:8050
2015-02-02 12:58:25,825 [JobControl] INFO
org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is
deprecated. Instead, use fs.defaultFS
2015-02-02 12:58:26,178 [JobControl] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to
process : 1
2015-02-02 12:58:26,178 [JobControl] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths to process : 1
2015-02-02 12:58:26,197 [JobControl] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths (combined) to process : 1
2015-02-02 12:58:26,237 [JobControl] INFO
org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2015-02-02 12:58:26,547 [JobControl] INFO
org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job:
job_1418147347769_0194
2015-02-02 12:58:26,754 [JobControl] INFO
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application
application_1418147347769_0194
2015-02-02 12:58:26,788 [JobControl] INFO org.apache.hadoop.mapreduce.Job -
The url to track the job:
http://server01.cpp:8088/proxy/application_1418147347769_0194/
2015-02-02 12:58:26,789 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_1418147347769_0194
2015-02-02 12:58:26,789 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Processing aliases Z
2015-02-02 12:58:26,789 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- detailed locations: M: Z[3,4] C: R:
2015-02-02 12:58:26,843 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2015-02-02 12:58:47,715 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 50% complete
2015-02-02 12:58:52,121 [main] INFO
org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is
deprecated. Instead, use mapreduce.job.reduces
2015-02-02 12:58:52,179 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2015-02-02 12:58:52,181 [main] INFO
org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.4.0.2.1.5.0-695 0.12.1.2.1.5.0-695 perko 2015-02-02 12:58:18
2015-02-02 12:58:52 UNKNOWN
Success!
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime
MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime
MedianReducetime Alias Feature Outputs
job_1418147347769_0194 1 0 13 13 13 13 n/a
n/a n/a n/a Z MAP_ONLY
hbase://TEST/FILE_NAME,REC_NUM,EPOCH_TIME,TIMET,SITE,PROTO,SADDR,DADDR,SPORT,DPORT,MF,CF,DUR,SDATA,DDATA,SBYTE,DBYTE,SPKT,DPKT,SIOPT,DIOPT,STOPT,DTOPT,SFLAGS,DFLAGS,FLAGS,SFSEQ,DFSEQ,SLSEQ,DLSEQ,CATEGORY,
Input(s):
Successfully read 10000 records (2649054 bytes) from:
"/data/incoming/sample.csv"
Output(s):
Successfully stored 10000 records in:
"hbase://TEST/FILE_NAME,REC_NUM,EPOCH_TIME,TIMET,SITE,PROTO,SADDR,DADDR,SPORT,DPORT,MF,CF,DUR,SDATA,DDATA,SBYTE,DBYTE,SPKT,DPKT,SIOPT,DIOPT,STOPT,DTOPT,SFLAGS,DFLAGS,FLAGS,SFSEQ,DFSEQ,SLSEQ,DLSEQ,CATEGORY"
Counters:
Total records written : 10000
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1418147347769_0194
2015-02-02 12:58:52,282 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Success!