initialization on first call to pyarrow.array()?

Max Grossman Thu, 27 Aug 2020 16:11:38 -0700

Hi all,

Say I've got a simple program like the following that converts a numpyarray to a pyarrow array several times in a row, and times each of thoseconversions:


   import pyarrow
   import numpy as np
   import time

   arr = np.random.rand(1)

   t1 = time.time()
   pyarrow.array(arr)
   t2 = time.time()
   pyarrow.array(arr)
   t3 = time.time()
   pyarrow.array(arr)
   t4 = time.time()
   pyarrow.array(arr)
   t5 = time.time()

I'm noticing that the first call to pyarrow.array() is taking ~0.3-0.5 swhile the rest are nearly instantaneous (1e-05s).

Does anyone know what might be causing this? My assumption is someone-time initialization of pyarrow on the first call to the library, inwhich case I'd like to see if there's some way to explicitly triggerthat initialization earlier in the program. But also curious to hear ifthere is a different explanation.

Right now I'm working around this by just calling pyarrow.array([]) atapplication start up -- I realize this doesn't actually eliminate theadded time, but it does move it out of the critical section for anybenchmarking runs.


Thanks,

Max

initialization on first call to pyarrow.array()?

Reply via email to