It might not be related only to memory issue. Memory issue is also there as you mentioned. I have seen that one too. The fine mode issue is mainly spark considering that it got two different block manager for same ID, whereas if I search for the ID in the mesos slave, it exist only on the one slave not on multiple of them. Theis might be due to the size of ID, as spark out the error as
14/09/16 08:04:29 ERROR BlockManagerMasterActor: Got two different block manager registrations on 20140822-112818-711206558-5050-25951-0 where as in the mesos slave I see logs as I0915 20:55:18.293903 31434 containerizer.cpp:392] Starting container '3aab2237-d32f-470d-a206-7bada454ad3f' for executor '20140822-112818-711206558-5050-25951-0' of framework '20140822-112818-711206558-5050-25951-0053' I0915 20:53:28.039218 31437 containerizer.cpp:392] Starting container 'fe4b344f-16c9-484a-9c2f-92bd92b43f6d' for executor '20140822-112818-711206558-5050-25951-0' of framework '20140822-112818-711206558-5050-25951-0050' you the last 3 digits of ID are missing in spark where as they are different in mesos slaves. - Gurvinder On 09/15/2014 11:13 PM, Brenden Matthews wrote: > I started hitting a similar problem, and it seems to be related to > memory overhead and tasks getting OOM killed. I filed a ticket > here: > > https://issues.apache.org/jira/browse/SPARK-3535 > > On Wed, Jul 16, 2014 at 5:27 AM, Ray Rodriguez > <rayrod2...@gmail.com <mailto:rayrod2...@gmail.com>> wrote: > > I'll set some time aside today to gather and post some logs and > details about this issue from our end. > > > On Wed, Jul 16, 2014 at 2:05 AM, Vinod Kone <vinodk...@gmail.com > <mailto:vinodk...@gmail.com>> wrote: > > > > > On Tue, Jul 15, 2014 at 11:02 PM, Vinod Kone <vi...@twitter.com > <mailto:vi...@twitter.com>> wrote: > > > On Fri, Jul 4, 2014 at 2:05 AM, Gurvinder Singh > <gurvinder.si...@uninett.no <mailto:gurvinder.si...@uninett.no>> > wrote: > > ERROR storage.BlockManagerMasterActor: Got two different block > manager registrations on 201407031041-1227224054-5050-24004-0 > > Googling about it seems that mesos is starting slaves at the same > time and giving them the same id. So may bug in mesos ? > > > Has this issue been resolved? We need more information to triage > this. Maybe some logs that show the lifecycle of the duplicate > instances? > > > @vinodkone > > > >