I got a backtrace when the question occured??This problem occurred when the 
training program read the training set data and then read the verification set 
data. It seemed that a deadlock occurred when the first data file of the 
verification set was read. The training set and the verification set used 
different dataset objects to read.



#0&nbsp; futex_wait_cancelable (private=<optimized out&gt;, expected=0, 
futex_word=0x7f7010115e58) at ../sysdeps/nptl/futex-internal.h:183

#1&nbsp; __pthread_cond_wait_common (abstime=0x0, clockid=0, 
mutex=0x7f7010115e08, cond=0x7f7010115e30) at pthread_cond_wait.c:508

#2&nbsp; __pthread_cond_wait (cond=0x7f7010115e30, mutex=0x7f7010115e08) at 
pthread_cond_wait.c:647

#3&nbsp; 0x00007f74d8bfbe30 in 
std::condition_variable::wait(std::unique_lock<std::mutex&gt;&amp;) () from 
/usr/lib/x86_64-linux-gnu/libstdc++.so.6

#4&nbsp; 0x00007f70380a4152 in 
std::condition_variable::wait<arrow::ConcreteFutureImpl::DoWait()::{lambda()#1}&gt;(std::unique_lock<std::mutex&gt;&amp;,
 arrow::ConcreteFutureImpl::DoWait()::{lambda()#1}) (this=0x7f7010115e30, 
__lock=..., __p=...) at /usr/include/c++/9/condition_variable:101

#5&nbsp; 0x00007f70380a347c in arrow::ConcreteFutureImpl::DoWait 
(this=0x7f7010115dc0) at external/arrow/cpp/src/arrow/util/future.cc:348

#6&nbsp; 0x00007f70380a0114 in arrow::FutureImpl::Wait (this=0x7f7010115dc0) at 
external/arrow/cpp/src/arrow/util/future.cc:393

#7&nbsp; 0x00007f7037e678c6 in arrow::Future<std::shared_ptr<arrow::Buffer&gt; 
&gt;::Wait (this=0x7f679fffd640) at 
external/arrow/cpp/src/arrow/util/future.h:435

#8&nbsp; 0x00007f7037e64fec in arrow::Future<std::shared_ptr<arrow::Buffer&gt; 
&gt;::result() const &amp; (this=0x7f679fffd640) at 
external/arrow/cpp/src/arrow/util/future.h:405

#9&nbsp; 0x00007f7037f7f4b2 in arrow::io::internal::ReadRangeCache::Impl::Read 
(this=0x7f701011b340, range=...) at 
external/arrow/cpp/src/arrow/io/caching.cc:206

#10 0x00007f7037f7ebfb in arrow::io::internal::ReadRangeCache::Read 
(this=0x7f70101cc040, range=...) at 
external/arrow/cpp/src/arrow/io/caching.cc:310

#11 0x00007f7037ace885 in parquet::SerializedRowGroup::GetColumnPageReader 
(this=0x7f70101007c0, i=121) at 
external/arrow/cpp/src/parquet/file_reader.cc:203

#12 0x00007f7037aca426 in parquet::RowGroupReader::GetColumnPageReader 
(this=0x7f70101ab4b0, i=121) at 
external/arrow/cpp/src/parquet/file_reader.cc:126

#13 0x00007f7037d9f631 in parquet::arrow::FileColumnIterator::NextChunk 
(this=0x7f7010183f90) at 
external/arrow/cpp/src/parquet/arrow/reader_internal.h:80

#14 0x00007f7037d8dccd in parquet::arrow::(anonymous 
namespace)::LeafReader::NextRowGroup (this=0x7f70100d5300) at 
external/arrow/cpp/src/parquet/arrow/reader.cc:502

#15 0x00007f7037d8d7bf in parquet::arrow::(anonymous 
namespace)::LeafReader::LeafReader (this=0x7f70100d5300,&nbsp;

&nbsp; &nbsp; ctx=std::shared_ptr<parquet::arrow::ReaderContext&gt; (empty) = 
{...}, field=std::shared_ptr<arrow::Field&gt; (empty) = {...},&nbsp;

&nbsp; &nbsp; input=std::unique_ptr<parquet::arrow::FileColumnIterator&gt; = 
{...}, leaf_info=...) at external/arrow/cpp/src/parquet/arrow/reader.cc:452

#16 0x00007f7037d8fd77 in parquet::arrow::(anonymous namespace)::GetReader 
(field=..., arrow_field=std::shared_ptr<arrow::Field&gt; (use count 3, weak 
count 0) = {...},&nbsp;

&nbsp; &nbsp; ctx=std::shared_ptr<parquet::arrow::ReaderContext&gt; (use count 
2, weak count 0) = {...}, out=0x7f679fffdd40) at 
external/arrow/cpp/src/parquet/arrow/reader.cc:845

#17 0x00007f7037d914b9 in parquet::arrow::(anonymous namespace)::GetReader 
(field=...,&nbsp;

&nbsp; &nbsp; ctx=std::shared_ptr<parquet::arrow::ReaderContext&gt; (use count 
2, weak count 0) = {...}, out=0x7f679fffdd40) at 
external/arrow/cpp/src/parquet/arrow/reader.cc:957

#18 0x00007f7037d8fe57 in parquet::arrow::(anonymous namespace)::GetReader 
(field=..., arrow_field=std::shared_ptr<arrow::Field&gt; (use count 2, weak 
count 0) = {...},&nbsp;

&nbsp; &nbsp; ctx=std::shared_ptr<parquet::arrow::ReaderContext&gt; (use count 
2, weak count 0) = {...}, out=0x7f679fffdfc0) at 
external/arrow/cpp/src/parquet/arrow/reader.cc:852

#19 0x00007f7037d914b9 in parquet::arrow::(anonymous namespace)::GetReader 
(field=...,&nbsp;

&nbsp; &nbsp; ctx=std::shared_ptr<parquet::arrow::ReaderContext&gt; (use count 
2, weak count 0) = {...}, out=0x7f679fffdfc0) at 
external/arrow/cpp/src/parquet/arrow/reader.cc:957

#20 0x00007f7037d8bbda in parquet::arrow::(anonymous 
namespace)::FileReaderImpl::GetFieldReader (this=0x7f70101099a0, i=121,&nbsp;

&nbsp; &nbsp; included_leaves=std::shared_ptr<std::unordered_set<int, 
std::hash<int&gt;, std::equal_to<int&gt;, std::allocator<int&gt; &gt;&gt; (use 
count 4, weak count 0) = {...},&nbsp;

&nbsp; &nbsp; row_groups=std::vector of length 2, capacity 2 = {...}, 
out=0x7f679fffdfc0) at external/arrow/cpp/src/parquet/arrow/reader.cc:212

#21 0x00007f7037d8be1f in parquet::arrow::(anonymous 
namespace)::FileReaderImpl::GetFieldReaders (this=0x7f70101099a0,&nbsp;

&nbsp; &nbsp; column_indices=std::vector of length 154, capacity 256 = {...}, 
row_groups=std::vector of length 2, capacity 2 = {...}, 
out=0x7f679fffe0d0,&nbsp;

&nbsp; &nbsp; out_schema=0x7f679fffe0b0) at 
external/arrow/cpp/src/parquet/arrow/reader.cc:230

#22 0x00007f7037d93d2e in parquet::arrow::(anonymous 
namespace)::FileReaderImpl::DecodeRowGroups (this=0x7f70101099a0,&nbsp;

&nbsp; &nbsp; self=std::shared_ptr<parquet::arrow::(anonymous 
namespace)::FileReaderImpl&gt; (empty) = {...}, row_groups=std::vector of 
length 2, capacity 2 = {...},&nbsp;

&nbsp; &nbsp; column_indices=std::vector of length 154, capacity 256 = {...}, 
cpu_executor=0x0) at external/arrow/cpp/src/parquet/arrow/reader.cc:1228

#23 0x00007f7037d93496 in parquet::arrow::(anonymous 
namespace)::FileReaderImpl::ReadRowGroups (this=0x7f70101099a0,&nbsp;

&nbsp; &nbsp; row_groups=std::vector of length 2, capacity 2 = {...}, 
column_indices=std::vector of length 154, capacity 256 = {...}, 
out=0x7f679fffe3c0)

&nbsp; &nbsp; at external/arrow/cpp/src/parquet/arrow/reader.cc:1216

#24 0x00007f7037d8ba2d in parquet::arrow::(anonymous 
namespace)::FileReaderImpl::ReadTable (this=0x7f70101099a0,&nbsp;

&nbsp; &nbsp; indices=std::vector of length 154, capacity 256 = {...}, 
out=0x7f679fffe3c0) at external/arrow/cpp/src/parquet/arrow/reader.cc:199

#25 0x00007f70378c998c in 
tensorflow::data::ArrowS3DatasetOp::Dataset::Iterator::ReadFile 
(this=0x3a47b620, file_index=0, background=false)

&nbsp; &nbsp; at tensorflow_io/core/kernels/arrow/arrow_dataset_ops.cc:1237

#26 0x00007f70378c8a3d in 
tensorflow::data::ArrowS3DatasetOp::Dataset::Iterator::SetupStreamsLocked 
(this=0x3a47b620, env=0x1e17cd0)

&nbsp; &nbsp; at tensorflow_io/core/kernels/arrow/arrow_dataset_ops.cc:1110

#27 0x00007f70378d7758 in 
tensorflow::data::ArrowDatasetBase::ArrowBaseIterator<tensorflow::data::ArrowS3DatasetOp::Dataset&gt;::GetNextInternal
 (this=0x3a47b620,&nbsp;

--Type <RET&gt; for more, q to quit, c to continue without paging--

&nbsp; &nbsp; ctx=0x7f7008448c10, out_tensors=0x7f679fffeac0, 
end_of_sequence=0x7f7010275ea8) at 
tensorflow_io/core/kernels/arrow/arrow_dataset_ops.cc:110

#28 0x00007f73fcf42aa4 in 
tensorflow::data::DatasetBaseIterator::GetNext(tensorflow::data::IteratorContext*,
 std::vector<tensorflow::Tensor, std::allocator<tensorflow::Tensor&gt; &gt;*, 
bool*) () from 
/usr/local/lib/python3.8/dist-packages/tensorflow/python/../libtensorflow_framework.so.2

#29 0x00007f74174288f6 in 
tensorflow::data::ParallelMapDatasetOp::Dataset::Iterator::CallFunction(std::shared_ptr<tensorflow::data::IteratorContext&gt;
 const&amp;, 
std::shared_ptr<tensorflow::data::ParallelMapDatasetOp::Dataset::Iterator::InvocationResult&gt;
 const&amp;) ()

&nbsp;&nbsp; from 
/usr/local/lib/python3.8/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so

#30 0x00007f741742e022 in 
tensorflow::data::ParallelMapDatasetOp::Dataset::Iterator::RunnerThread(std::shared_ptr<tensorflow::data::IteratorContext&gt;
 const&amp;) ()

&nbsp;&nbsp; from 
/usr/local/lib/python3.8/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so

#31 0x00007f7416003145 in tensorflow::data::(anonymous 
namespace)::WorkQueueFunc(std::function<void ()&gt; const&amp;, 
std::shared_ptr<tensorflow::Notification&gt;) ()

&nbsp;&nbsp; from 
/usr/local/lib/python3.8/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so

#32 0x00007f7416003f5d in std::_Function_handler<void (), std::_Bind<void 
(*(std::function<void ()&gt;, 
std::shared_ptr<tensorflow::Notification&gt;))(std::function<void ()&gt; 
const&amp;, std::shared_ptr<tensorflow::Notification&gt;)&gt; 
&gt;::_M_invoke(std::_Any_data const&amp;) ()

&nbsp;&nbsp; from 
/usr/local/lib/python3.8/dist-packages/tensorflow/python/_pywrap_tensorflow_internal.so

#33 0x00007f73fd6eb791 in tensorflow::UnboundedWorkQueue::PooledThreadFunc() ()

&nbsp;&nbsp; from 
/usr/local/lib/python3.8/dist-packages/tensorflow/python/../libtensorflow_framework.so.2

#34 0x00007f73fd6f25d8 in tensorflow::(anonymous 
namespace)::PThread::ThreadFn(void*) ()

&nbsp;&nbsp; from 
/usr/local/lib/python3.8/dist-packages/tensorflow/python/../libtensorflow_framework.so.2

#35 0x00007f74e4cc6609 in start_thread (arg=<optimized out&gt;) at 
pthread_create.c:477

#36 0x00007f74e4e00133 in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95











1057445597
[email protected]



&nbsp;




------------------&nbsp;????????&nbsp;------------------
??????:                                                                         
                                               "user"                           
                                                         
<[email protected]&gt;;
????????:&nbsp;2023??3??23??(??????) ????4:01
??????:&nbsp;"user"<[email protected]&gt;;

????:&nbsp;Re: How to troubleshoot curlCode 18 errors



Your error looks very similar to one already reported [1] that had to do with 
using a non-AWS S3 compatible storage provider (R2 in this case), though a 
solution was never provided. Are you seeing this error using AWS S3 or another 
provider?

[1]&nbsp;https://github.com/apache/arrow/issues/33275


On Wed, Mar 22, 2023 at 5:42?6?2AM 1057445597 <[email protected]&gt; wrote:

my code is here
https://github.com/tensorflow/io/pull/1720/files#diff-7133d540dc86c9bb9e552655025061798314e226695c00b4e1d8cecb178a2920R1181


arrow_dataset_ops.cc:1181


I read the parquet file in columns from s3 storage. This error is very rare and 
cannot be repeated 100%. I would like to consult the possible cause




1057445597
[email protected]



&nbsp;




------------------&nbsp;????????&nbsp;------------------
??????:                                                                         
                                               "user"                           
                                                         
<[email protected]&gt;;
????????:&nbsp;2023??3??22??(??????) ????9:23
??????:&nbsp;"user"<[email protected]&gt;;

????:&nbsp;Re: How to troubleshoot curlCode 18 errors



Can you give a bit more details about what you were doing that caused this 
error? (and ideally a reproducible code example)


On Wed, 22 Mar 2023 at 14:14, 1057445597 <[email protected]&gt; wrote:

The error message is as follows.&nbsp;
ErrorType: 99 Message: curlCode: 18, Transferred a partial file ExceptionName


This error is very unusual.




1057445597
[email protected]



&nbsp;

Reply via email to