Bug Report
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 22.04.1 LTS (running in GKE)
- TensorFlow Serving installed from (source or binary): binary (from
tensorflow/serving:2.11.0
docker image)
- TensorFlow Serving version: 2.11.0
Describe the problem
When running tensorflow serving with a custom memory allocator (tcmalloc
), after a period of time in the event loop (generally less than 1 minute as long as there is load) tensorflow serving will crash due to a segmentation fault.
Similar issues (std::bad_alloc
) were present in tensorflow serving starting with 2.9+ when using tcmalloc.
The issue is not present in 2.8.3.
Exact Steps to Reproduce
No reproduction steps at this time.
Source code / logs
Here are backtraces generated when the segmentation fault occurs.
#0 0x00007fbea81dbe93 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned int, int) () from /usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4
#1 0x00007fbea81dc1fe in tcmalloc::ThreadCache::Scavenge() () from /usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4
#2 0x000055a1440aa6a2 in dnnl_primitive_desc_destroy ()
#3 0x000055a13b8db316 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() ()
#4 0x000055a13ec9658e in std::_Sp_counted_ptr<dnnl::inner_product_forward::primitive_desc*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() ()
#5 0x000055a13b8db316 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() ()
#6 0x000055a1401d3dbb in tensorflow::MklDnnMatMulFwdPrimitive<float, float, float, float, float>::Setup(tensorflow::MklDnnMatMulFwdParams const&) ()
#7 0x000055a1401d5b5a in tensorflow::MklDnnMatMulFwdPrimitiveFactory<float, float, float, float, float>::Get(tensorflow::MklDnnMatMulFwdParams const&, bool) ()
#8 0x000055a1401d6e18 in tensorflow::MklFusedMatMulOp<Eigen::ThreadPoolDevice, float, true>::Compute(tensorflow::OpKernelContext*) ()
#9 0x000055a142530ac8 in tensorflow::ThreadPoolDevice::Compute(tensorflow::OpKernel*, tensorflow::OpKernelContext*) ()
#10 0x000055a142585820 in tensorflow::(anonymous namespace)::ExecutorState<tensorflow::SimplePropagatorState>::ProcessInline(tensorflow::SimplePropagatorState::TaggedNodeReadyQueue*, long) ()
#11 0x000055a14258698c in tensorflow::(anonymous namespace)::ExecutorState<tensorflow::SimplePropagatorState>::Process(tensorflow::SimplePropagatorState::TaggedNode, long) ()
#12 0x000055a148210621 in Eigen::ThreadPoolTempl<tsl::thread::EigenEnvironment>::WorkerLoop(int) ()
#13 0x000055a14820e573 in std::_Function_handler<void (), tsl::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
#14 0x000055a14803e8a5 in tsl::(anonymous namespace)::PThread::ThreadFn(void*) ()
#15 0x00007fbea7cedb43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#16 0x00007fbea7d7fa00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
#0 0x000055640ec953aa in tensorflow::MklDnnMatMulFwdPrimitive<float, float, float, float, float>::Execute(float const*, float const*, float const*, float*, void*, std::shared_ptr<dnnl::stream>) ()
#1 0x000055640ec9b7f5 in tensorflow::MklFusedMatMulOp<Eigen::ThreadPoolDevice, float, true>::Compute(tensorflow::OpKernelContext*) ()
#2 0x0000556410ff4ac8 in tensorflow::ThreadPoolDevice::Compute(tensorflow::OpKernel*, tensorflow::OpKernelContext*) ()
#3 0x0000556411049820 in tensorflow::(anonymous namespace)::ExecutorState<tensorflow::SimplePropagatorState>::ProcessInline(tensorflow::SimplePropagatorState::TaggedNodeReadyQueue*, long) ()
#4 0x000055641104a98c in tensorflow::(anonymous namespace)::ExecutorState<tensorflow::SimplePropagatorState>::Process(tensorflow::SimplePropagatorState::TaggedNode, long) ()
#5 0x0000556416cd4621 in Eigen::ThreadPoolTempl<tsl::thread::EigenEnvironment>::WorkerLoop(int) ()
#6 0x0000556416cd2573 in std::_Function_handler<void (), tsl::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
#7 0x0000556416b028a5 in tsl::(anonymous namespace)::PThread::ThreadFn(void*) ()
#8 0x00007f04b9a8eb43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#9 0x00007f04b9b20a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
#0 0x00007f2c6c1d0eeb in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned int, int) () from /usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4
#1 0x00007f2c6c1d11fe in tcmalloc::ThreadCache::Scavenge() () from /usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4
#2 0x000055e25aac33f6 in Eigen::internal::TensorBlockScratchAllocator<Eigen::ThreadPoolDevice>::~TensorBlockScratchAllocator() ()
#3 0x000055e25d2f1734 in Eigen::internal::TensorExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 2, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorBroadcastingOp<Eigen::IndexList<long, Eigen::type2index<1l> > const, Eigen::TensorReshapingOp<Eigen::IndexList<Eigen::type2index<1l>, long> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const> const> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const> const> const, Eigen::ThreadPoolDevice, true, (Eigen::internal::TiledEvaluation)1>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 2, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_product_op<float, float>, Eigen::TensorBroadcastingOp<Eigen::IndexList<long, Eigen::type2index<1l> > const, Eigen::TensorReshapingOp<Eigen::IndexList<Eigen::type2index<1l>, long> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const> const> const, Eigen::TensorMap<Eigen::Tensor<float const, 2, 1, long>, 16, Eigen::MakePointer> const> const> const&, Eigen::ThreadPoolDevice const&) ()
#4 0x000055e25d318523 in tensorflow::BinaryOp<Eigen::ThreadPoolDevice, tensorflow::functor::mul<float> >::Compute(tensorflow::OpKernelContext*) ()
#5 0x000055e260ccaac8 in tensorflow::ThreadPoolDevice::Compute(tensorflow::OpKernel*, tensorflow::OpKernelContext*) ()
#6 0x000055e260d1f820 in tensorflow::(anonymous namespace)::ExecutorState<tensorflow::SimplePropagatorState>::ProcessInline(tensorflow::SimplePropagatorState::TaggedNodeReadyQueue*, long) ()
#7 0x000055e260d2098c in tensorflow::(anonymous namespace)::ExecutorState<tensorflow::SimplePropagatorState>::Process(tensorflow::SimplePropagatorState::TaggedNode, long) ()
#8 0x000055e2669aa621 in Eigen::ThreadPoolTempl<tsl::thread::EigenEnvironment>::WorkerLoop(int) ()
#9 0x000055e2669a8573 in std::_Function_handler<void (), tsl::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
#10 0x000055e2667d88a5 in tsl::(anonymous namespace)::PThread::ThreadFn(void*) ()
#11 0x00007f2c6bce2b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#12 0x00007f2c6bd74a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
#0 0x00007f40588a4ebb in tc_memalign () from /usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4
#1 0x00007f40588a4fda in tc_posix_memalign () from /usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4
#2 0x0000562df89f5cf4 in tsl::port::AlignedMalloc(unsigned long, int) ()
#3 0x0000562df856d688 in tensorflow::Tensor::Tensor(tsl::Allocator*, tensorflow::DataType, tensorflow::TensorShape const&) ()
#4 0x0000562dee05fcf8 in tensorflow::Tensor& std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor> >::emplace_back<tensorflow::DataType, tensorflow::TensorShape>(tensorflow::DataType&&, tensorflow::TensorShape&&) ()
#5 0x0000562df84e1972 in tensorflow::example::FastParseExample(tensorflow::example::FastParseExampleConfig const&, absl::lts_20220623::Span<tsl::tstring const>, absl::lts_20220623::Span<tsl::tstring const>, tsl::thread::ThreadPool*, tensorflow::example::Result*) ()
#6 0x0000562dedc562a6 in tensorflow::ParseExampleOp::Compute(tensorflow::OpKernelContext*) ()
#7 0x0000562df2d05ac8 in tensorflow::ThreadPoolDevice::Compute(tensorflow::OpKernel*, tensorflow::OpKernelContext*) ()
#8 0x0000562df2d5a820 in tensorflow::(anonymous namespace)::ExecutorState<tensorflow::SimplePropagatorState>::ProcessInline(tensorflow::SimplePropagatorState::TaggedNodeReadyQueue*, long) ()
#9 0x0000562df2d5b98c in tensorflow::(anonymous namespace)::ExecutorState<tensorflow::SimplePropagatorState>::Process(tensorflow::SimplePropagatorState::TaggedNode, long) ()
#10 0x0000562df89e5621 in Eigen::ThreadPoolTempl<tsl::thread::EigenEnvironment>::WorkerLoop(int) ()
#11 0x0000562df89e3573 in std::_Function_handler<void (), tsl::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
#12 0x0000562df88138a5 in tsl::(anonymous namespace)::PThread::ThreadFn(void*) ()
#13 0x00007f40583afb43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#14 0x00007f4058441a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
#0 0x00007fbf95329717 in tc_newarray () from /usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4
#1 0x0000562785fcf579 in std::unordered_map<int, dnnl::memory, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, dnnl::memory> > >& std::vector<std::unordered_map<int, dnnl::memory, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, dnnl::memory> > >, std::allocator<std::unordered_map<int, dnnl::memory, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, dnnl::memory> > > > >::emplace_back<std::unordered_map<int, dnnl::memory, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, dnnl::memory> > > >(std::unordered_map<int, dnnl::memory, std::hash<int>, std::equal_to<int>, std::allocator<std::pair<int const, dnnl::memory> > >&&) ()
#2 0x0000562787a31bb6 in tensorflow::MklDnnMatMulFwdPrimitive<float, float, float, float, float>::Setup(tensorflow::MklDnnMatMulFwdParams const&) ()
#3 0x0000562787a33b5a in tensorflow::MklDnnMatMulFwdPrimitiveFactory<float, float, float, float, float>::Get(tensorflow::MklDnnMatMulFwdParams const&, bool) ()
#4 0x0000562787a34e18 in tensorflow::MklFusedMatMulOp<Eigen::ThreadPoolDevice, float, true>::Compute(tensorflow::OpKernelContext*) ()
#5 0x0000562789d8eac8 in tensorflow::ThreadPoolDevice::Compute(tensorflow::OpKernel*, tensorflow::OpKernelContext*) ()
#6 0x0000562789de3820 in tensorflow::(anonymous namespace)::ExecutorState<tensorflow::SimplePropagatorState>::ProcessInline(tensorflow::SimplePropagatorState::TaggedNodeReadyQueue*, long) ()
#7 0x0000562789de498c in tensorflow::(anonymous namespace)::ExecutorState<tensorflow::SimplePropagatorState>::Process(tensorflow::SimplePropagatorState::TaggedNode, long) ()
#8 0x000056278fa6e621 in Eigen::ThreadPoolTempl<tsl::thread::EigenEnvironment>::WorkerLoop(int) ()
#9 0x000056278fa6c573 in std::_Function_handler<void (), tsl::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
#10 0x000056278f89c8a5 in tsl::(anonymous namespace)::PThread::ThreadFn(void*) ()
#11 0x00007fbf94e34b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#12 0x00007fbf94ec6a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
stat:awaiting response type:bug