Hello Aaron,
  I have got the following Log from the server (Sorry for being late)

job_201304231203_0004
        attempt_201304231203_0004_m_000501_0
        
        2013-04-23 16:09:14,196 INFO org.apache.hadoop.util.NativeCodeLoader: 
Loaded the native-hadoop library
2013-04-23 16:09:14,438 INFO 
org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: 
/egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/jars/pigContext
 <- 
/egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/attempt_201304231203_0004_m_000501_0/work/pigContext
2013-04-23 16:09:14,453 INFO 
org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: 
/egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/jars/dk
 <- 
/egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/attempt_201304231203_0004_m_000501_0/work/dk
2013-04-23 16:09:14,456 INFO 
org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: 
/egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/jars/META-INF
 <- 
/egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/attempt_201304231203_0004_m_000501_0/work/META-INF
2013-04-23 16:09:14,459 INFO 
org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: 
/egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/jars/org
 <- 
/egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/attempt_201304231203_0004_m_000501_0/work/org
2013-04-23 16:09:14,469 INFO 
org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: 
/egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/jars/com
 <- 
/egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/attempt_201304231203_0004_m_000501_0/work/com
2013-04-23 16:09:14,471 INFO 
org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: 
/egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/jars/.job.jar.crc
 <- 
/egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/attempt_201304231203_0004_m_000501_0/work/.job.jar.crc
2013-04-23 16:09:14,474 INFO 
org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: 
/egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/jars/job.jar
 <- 
/egov/data/hadoop/mapred/local/taskTracker/cassandra/jobcache/job_201304231203_0004/attempt_201304231203_0004_m_000501_0/work/job.jar
2013-04-23 16:09:17,329 INFO org.apache.hadoop.util.ProcessTree: setsid exited 
with exit code 0
2013-04-23 16:09:17,387 INFO org.apache.hadoop.mapred.Task:  Using 
ResourceCalculatorPlugin : 
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@256ef705
2013-04-23 16:09:17,838 INFO 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader: 
Current split being processed ColumnFamilySplit((9197470410121435301, '-1] 
@[p00nosql02.00, p00nosql01.00])
2013-04-23 16:09:18,088 INFO org.apache.pig.data.SchemaTupleBackend: Key 
[pig.schematuple] was not set... will not generate code.
2013-04-23 16:09:19,784 INFO 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map: 
Aliases being processed per job phase (AliasName[line,offset]): M: 
data[12,7],null[-1,-1],filtered[14,11],null[-1,-1],c1[23,5],null[-1,-1],updated[111,10]
 C:  R: 
2013-04-23 17:35:11,199 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-04-23 17:35:11,384 INFO org.apache.hadoop.io.nativeio.NativeIO: 
Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.
2013-04-23 17:35:11,385 INFO org.apache.hadoop.io.nativeio.NativeIO: Got 
UserName cassandra for UID 500 from the native implementation
2013-04-23 17:35:11,417 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: TimedOutException()
        at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384)
        at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390)
        at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313)
        at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
        at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
        at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.getProgress(PigRecordReader.java:169)
        at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:514)
        at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:539)
        at 
org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: TimedOutException()
        at 
org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12932)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
        at 
org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:734)
        at 
org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:718)
        at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:346)
        ... 17 more
2013-04-23 17:35:11,427 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
for the task

These Two tasks hanged for long time and crashes with timeout exception. Very 
interesting part is as follows
2013-04-23 16:09:17,838 INFO 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader: 
Current split being processed ColumnFamilySplit((9197470410121435301, '-1] 
@[p00nosql02.00, p00nosql01.00])
Why it's split data from two nodes? we have 6 nodes cassandra cluster + hadoop 
slaves -  every task should get local input split from local cassandra - am i 
right? 

-- 
Best regards
  Shamim A.

24.04.2013, 10:59, "Shamim" <sre...@yandex.ru>:
> Hello Aron,
> We have build up our new cluster from the scratch with version 1.2 - 
> partition murmor3. We are not using vnodes at all.
> Actually log is clean and nothing serious, now investigating logs and post 
> soon if found something criminal
>
>>>>  Our cluster is evenly partitioned (Murmur3Partitioner) > > 
>>>> Murmor3Partitioner is only available in 1.2 and changing partitioners is 
>>>> not supported. Did you change from Random Partitioner under 1.1? > > Are 
>>>> you using virtual nodes in your 1.2 cluster ? > >>> We have roughly 
>>>> 97million rows in our cluster. Why we are getting above behavior? Do you 
>>>> have any suggestion or clue to trouble shoot in this issue? > > Can you 
>>>> make some of the logs from the tasks available? > > Cheers > > --
>
> --------------- > Aaron Morton > Freelance Cassandra Consultant > New Zealand 
> > > @aaronmorton > http://www.thelastpickle.com > > On 23/04/2013, at 5:50 
> AM, Shamim  wrote: > >> We are using Hadoop 1.0.3 and pig 0.11.1 version >> 
> >> -- >> Best regards >> Shamim A. >> >> 22.04.2013, 21:48, "Shamim" : >> >>> 
> Hello all, >>> recently we have upgrade our cluster (6 nodes) from cassandra 
> version 1.1.6 to 1.2.1. Our cluster is evenly partitioned 
> (Murmur3Partitioner). We are using pig for parse and compute aggregate data. 
> >>> >>> When we submit job through pig, what i consistently see is that, 
> while most of the task have 20-25k row assigned each (Map input records), 
> only 2 of them (always 2 ) getting more than 2 million rows. This 2 tasks 
> always complete 100% and hang for long time. Also most of the time we are 
> getting killed task (2%) with TimeoutException. >>> >>> We increased 
> rpc_timeout to 60000, also set cassandra.input.split.size=1024 but nothing 
> help. >>> >>> We have roughly 97million rows in our cluster. Why we are 
> getting above behavior? Do you have any suggestion or clue to trouble shoot 
> in this issue? Any help will be highly thankful. Thankx in advance. >>> >>> 
> -- >>> Best regards >>> Shamim A. -- Best regards
>   Shamim A.

Reply via email to