Re: RocksDB segfaults

2017-03-28 Thread Florian König
Thank you Stephan for spotting the problem. In hindsight it’s obvious that this can never work. I’ll figure something out :) > Am 24.03.2017 um 10:28 schrieb Stephan Ewen : > > The code will not work properly, sorry. > > The value returned by the state is whatever is stored under the key for wh

Re: RocksDB segfaults

2017-03-24 Thread Stephan Ewen
The code will not work properly, sorry. The value returned by the state is whatever is stored under the key for which the function was called the last time. In addition, the unsynchronized access is most likely causing the RocksDB fault. TL:DR - The "ValueState" / "ListState" / etc in Flink are n

Re: RocksDB segfaults

2017-03-24 Thread Florian König
Hi, @Robert: I have uploaded all the log files that I could get my hands on to https://www.dropbox.com/sh/l35q6979hy7mue7/AAAe1gABW59eQt6jGxA3pAYaa?dl=0. I tried to remove all unrelated messages logged by the job itself. In flink-root-jobmanager-0-micardo-dev.log I kept the Flink startup messag

Re: RocksDB segfaults

2017-03-23 Thread Robert Metzger
Florian, can you post the log of the Taskmanager where the segfault happened ? On Wed, Mar 22, 2017 at 6:19 PM, Stefan Richter wrote: > Hi, > > for the first checkpoint, from the stacktrace I assume that the backend is > not accessed as part of processing an element, but by another thread. Is >

Re: RocksDB segfaults

2017-03-22 Thread Stefan Richter
Hi, for the first checkpoint, from the stacktrace I assume that the backend is not accessed as part of processing an element, but by another thread. Is that correct? RocksDB requires accessing threads to hold the task’s checkpointing lock, otherwise they might call methods on an instance that i

Re: RocksDB segfaults

2017-03-22 Thread Florian König
Hi Stephen, you are right, the second stack trace is indeed from a run of Flink 1.1.4. Sorry, my bad. That leaves us with the first trace of a segfault for which I can guarantee that it brought down a 1.2.0 instance. Unfortunately I cannot reproduce the problem. It has happened twice so far, b

Re: RocksDB segfaults

2017-03-22 Thread Stephan Ewen
Hi! It looks like you are running the RocksDB state backend 1.1 (is still an old version packaged into your JAR file?) This line indicates that: org.apache.flink.contrib.streaming.state. RocksDBStateBackend.performSemiAsyncSnapshot (the method does not exist in 1.2 any more) Can you try and run

RocksDB segfaults

2017-03-22 Thread Florian König
Hi, I have experienced two crashes of Flink 1.2 by SIGSEGV, caused by something in RocksDB. What is the preferred way to report them? All I got at the moment are two hs_err_pid12345.log files. They are over 4000 lines long each. Is there anything significant that I should extract to help you gu