[ https://issues.apache.org/jira/browse/MESOS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15843632#comment-15843632 ]
James Peach edited comment on MESOS-7017 at 1/27/17 11:10 PM: -------------------------------------------------------------- Here's the partial stack trace: {noformat} #2 0x00007fb830734696 in google::DumpStackTraceAndExit () at src/utilities.cc:147 #3 0x00007fb83072c08d in google::LogMessage::Fail () at src/logging.cc:1458 #4 0x00007fb83072de1d in google::LogMessage::SendToLog (this=Unhandled dwarf expression opcode 0xf3 ) at src/logging.cc:1412 #5 0x00007fb83072bc12 in google::LogMessage::Flush (this=0x7fb8227f3890) at src/logging.cc:1281 #6 0x00007fb83072e7f9 in google::LogMessageFatal::~LogMessageFatal (this=Unhandled dwarf expression opcode 0xf3 ) at src/logging.cc:1984 #7 0x00007fb82fb35113 in evolve<mesos::v1::master::Response> (response=...) at ../../src/internal/evolve.cpp:63 #8 mesos::internal::evolve (response=...) at ../../src/internal/evolve.cpp:218 #9 0x00007fb82fba8dd6 in mesos::internal::master::Master::Http::<lambda(const std::tuple<process::Owned<mesos::ObjectApprover>, process::Owned<mesos::ObjectApprover> >&)>::operator()(const std::tuple<process::Owned<mesos::ObjectApprover>, process::Owned<mesos::ObjectApprover> > &) const (__closure=0x7fb720c7a940, approvers=Unhandled dwarf expression opcode 0xf3 ) at ../../src/master/http.cpp:3772 #10 0x00007fb82fba9068 in operator() (__functor=Unhandled dwarf expression opcode 0xf3 ) at ../../3rdparty/libprocess/include/process/deferred.hpp:225 #11 std::_Function_handler<process::Future<process::http::Response>(), process::_Deferred<G>::operator std::function<R(P0)>() const::<lambda(P0)> [with R = process::Future<process::http::Response>; P0 = const std::tuple<process::Owned<mesos::ObjectApprover>, process::Owned<mesos::ObjectApprover> >&; F = mesos::internal::master::Master::Http::getTasks(const mesos::master::Call&, const Option<std::basic_string<char> >&, mesos::ContentType) const::<lambda(const std::tuple<process::Owned<mesos::ObjectApprover>, process::Owned<mesos::ObjectApprover> >&)>]::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=Unhandled dwarf expression opcode 0xf3 ) at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/functional:2025 #12 0x00007fb82faf73b3 in operator() (__functor=Unhandled dwarf expression opcode 0xf3 ) at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/functional:2439 #13 operator() (__functor=Unhandled dwarf expression opcode 0xf3 ) at ../../3rdparty/libprocess/include/process/dispatch.hpp:112 #14 std::_Function_handler<void(process::ProcessBase*), process::internal::Dispatch<process::Future<T> >::operator()(const process::UPID&, F&&) [with F = std::function<process::Future<process::http::Response>()>&; R = process::http::Response]::<lambda(process::ProcessBase*)> >::_M_invoke(const std::_Any_data &, process::ProcessBase *) (__functor=Unhandled dwarf expression opcode 0xf3 ) at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/functional:2039 {noformat} So the proximate cause of the crash is that {{evolve}} does a bidirectional serialization. For large messages this causes 2 large allocations even if is doesn't trigger the {{CHECK}}. was (Author: jamespeach): Here's the partial stack trace: {noformat} #2 0x00007fb830734696 in google::DumpStackTraceAndExit () at src/utilities.cc:147 #3 0x00007fb83072c08d in google::LogMessage::Fail () at src/logging.cc:1458 #4 0x00007fb83072de1d in google::LogMessage::SendToLog (this=Unhandled dwarf expression opcode 0xf3 ) at src/logging.cc:1412 #5 0x00007fb83072bc12 in google::LogMessage::Flush (this=0x7fb8227f3890) at src/logging.cc:1281 #6 0x00007fb83072e7f9 in google::LogMessageFatal::~LogMessageFatal (this=Unhandled dwarf expression opcode 0xf3 ) at src/logging.cc:1984 #7 0x00007fb82fb35113 in evolve<mesos::v1::master::Response> (response=...) at ../../src/internal/evolve.cpp:63 #8 mesos::internal::evolve (response=...) at ../../src/internal/evolve.cpp:218 #9 0x00007fb82fba8dd6 in mesos::internal::master::Master::Http::<lambda(const std::tuple<process::Owned<mesos::ObjectApprover>, process::Owned<mesos::ObjectApprover> >&)>::operator()(const std::tuple<process::Owned<mesos::ObjectApprover>, process::Owned<mesos::ObjectApprover> > &) const (__closure=0x7fb720c7a940, approvers=Unhandled dwarf expression opcode 0xf3 ) at ../../src/master/http.cpp:3772 #10 0x00007fb82fba9068 in operator() (__functor=Unhandled dwarf expression opcode 0xf3 ) at ../../3rdparty/libprocess/include/process/deferred.hpp:225 #11 std::_Function_handler<process::Future<process::http::Response>(), process::_Deferred<G>::operator std::function<R(P0)>() const::<lambda(P0)> [with R = process::Future<process::http::Response>; P0 = const std::tuple<process::Owned<mesos::ObjectApprover>, process::Owned<mesos::ObjectApprover> >&; F = mesos::internal::master::Master::Http::getTasks(const mesos::master::Call&, const Option<std::basic_string<char> >&, mesos::ContentType) const::<lambda(const std::tuple<process::Owned<mesos::ObjectApprover>, process::Owned<mesos::ObjectApprover> >&)>]::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=Unhandled dwarf expression opcode 0xf3 ) at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/functional:2025 #12 0x00007fb82faf73b3 in operator() (__functor=Unhandled dwarf expression opcode 0xf3 ) at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/functional:2439 #13 operator() (__functor=Unhandled dwarf expression opcode 0xf3 ) at ../../3rdparty/libprocess/include/process/dispatch.hpp:112 #14 std::_Function_handler<void(process::ProcessBase*), process::internal::Dispatch<process::Future<T> >::operator()(const process::UPID&, F&&) [with F = std::function<process::Future<process::http::Response>()>&; R = process::http::Response]::<lambda(process::ProcessBase*)> >::_M_invoke(const std::_Any_data &, process::ProcessBase *) (__functor=Unhandled dwarf expression opcode 0xf3 ) at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/functional:2039 {noformat} So the proximate cause of the crash is that {{evolve}} does an unnecessary bidirectional serialization. For large messages this causes 2 unnecessary large allocations even if is doesn't trigger the {{CHECK}}. > HTTP API responses can crash the master. > ---------------------------------------- > > Key: MESOS-7017 > URL: https://issues.apache.org/jira/browse/MESOS-7017 > Project: Mesos > Issue Type: Bug > Components: HTTP API > Reporter: James Peach > Priority: Critical > > The master can crash when generating large responses to small API requests. > One manifestation of this is querying the tasks. > {noformat} > [libprotobuf ERROR google/protobuf/io/coded_stream.cc:180] A protocol message > was rejected because it was too big (more than 67108864 bytes). To increase > the limit (or to disable these warnings), see > CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h. > F0126 18:34:18.790386 26230 evolve.cpp:63] Check failed: > t.ParsePartialFromString(data) Failed to parse mesos.v1.master.Response while > evolving from mesos.master.Response > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)