Sweet, you were right! libevent 1.4 was already installed on the machine
when I installed libevent 1.3e prior to installing thrift/scribe. Removing
libevent 1.3e and re-installing thrift/scribe from scratch fixed my issue.
Thanks David.


On Wed, Jan 20, 2010 at 6:07 PM, David Reiss <[email protected]> wrote:

> My best guess is that you are not using a consistent version of
> libevent when compiling, linking, and running.
>
>
> --David (mobile)
>
> On Jan 20, 2010, at 5:28 PM, "Nathan Marz" <[email protected]>
> wrote:
>
> > OK... jumped into gdb and here's what I found:
> >
> > (gdb) s
> > 483      event_set(&event_, socket_, eventFlags_,
> > TConnection::eventHandler,
> > this);
> > (gdb) p appState_
> > $8 = apache::thrift::server::APP_INIT
> > (gdb) s
> > 484      event_base_set(server_->getEventBase(), &event_);
> > (gdb) p appState_
> > $9 = 128
> > (gdb) s
> > 487      if (event_add(&event_, 0) == -1) {
> > (gdb) p appState_
> > $10 = 128
> > (gdb) s
> > 490    }
> > (gdb) p appState_
> > $11 = 130
> >
> > It appears to be getting corrupted twice, once during
> > "event_base_set" and
> > once during "event_add". Any ideas?
> >
> >
> >
> > On Wed, Jan 20, 2010 at 4:03 PM, David Reiss <[email protected]>
> > wrote:
> >
> >> So you're saying that this happens on the first received message?
> >> Should be relatively easy to debug.
> >>
> >> 1/ Make a debug build of Thrift and Scribe.
> >> 2/ Put a breakpoint in the constructor of of TConnection.
> >> 3/ When the breakpoint hits, get the address of the appState_.
> >> 4/ Put a watchpoint on the contents of that address.  If possible,
> >>  make it conditional on the new value not being one of the valid
> >>  enum values.
> >> 5/ Continue.
> >> 6/ When the watchpoint triggers (and is not a valid enum), do a
> >> backtrace
> >>  to find out how it was corrupted.  Usually it is a memory error.
> >>
> >> If it is a memory error, it might be more efficient to just run it
> >> under
> >> valgrind.
> >>
> >> --David
> >>
> >> Nathan Marz wrote:
> >>> Could use some help on this one. I'm running into this error when
> >>> using
> >>> scribe, and I traced back the error to TNonBlocking Server. Here's
> >>> the
> >> tail
> >>> of the log:
> >>>
> >>> Thrift: Wed Jan 20 23:11:06 2010 libevent 1.3e method epoll
> >>> Thrift: Wed Jan 20 23:14:08 2010 Totally Fucked. Application State
> >>> 130
> >>> scribed: src/server/TNonblockingServer.cpp:430: void
> >>> apache::thrift::server::TConnection::transition(): Assertion `0'
> >>> failed.
> >>>
> >>> In the code, this message is printed whenever a switch statement
> >>> doesn't
> >>> match any of the cases.
> >>>
> >>> I have scribe set up to have a "master" log server which
> >>> aggregates all
> >>> logs, and the "client" servers simply forward messages to the
> >>> master.
> >>> The clients work fine, it's the master that is crashing whenever it
> >> receives
> >>> a message. In case it's helpful, here's my scribe confs for
> >> master/client:
> >>>
> >>> master:
> >>>
> >>> port=1464
> >>>
> >>>
> >>> <store>
> >>> category=default
> >>> type=file
> >>> rotate_period=hourly
> >>> add_newlines=1
> >>> create_symlink=yes
> >>> file_path=/vol/scribe
> >>> base_filename=thisisoverwritten
> >>> fs_type=std
> >>> </store>
> >>>
> >>> client:
> >>>
> >>> port=1464
> >>>
> >>>
> >>> <store>
> >>> category=default
> >>> type=buffer
> >>>
> >>> target_write_size=20480
> >>> max_write_interval=1
> >>> buffer_send_rate=1
> >>> retry_interval=120
> >>> retry_interval_range=60
> >>>
> >>> <primary>
> >>> type=network
> >>> remote_host=XXX
> >>> remote_port=1464
> >>> </primary>
> >>>
> >>> <secondary>
> >>> type=file
> >>> fs_type=std
> >>> file_path=/mnt/scribe
> >>> base_filename=thisisoverwritten
> >>> max_size=300000000
> >>> </secondary>
> >>> </store>
> >>>
> >>>
> >>>
> >>>
> >>
> >
> >
> >
> > --
> > Nathan Marz
> > Twitter: @nathanmarz
> > http://nathanmarz.com
>



-- 
Nathan Marz
Twitter: @nathanmarz
http://nathanmarz.com

Reply via email to