I have a very large CSV file (nearly 13 million records) stored in Azure 
Storage and read via the Azure Storage plugin. The drillbit configuration has a 
modest 4GB heap size. Is there an effective way to select all the records from 
the file without running out of resources in Drill?

SELECT * … is too big

SELECT * with OFFSET and LIMIT sounds like the right approach, but OFFSET still 
requires scanning through the offset records, and this seems to hit the same 
memory issues even with small LIMITs once the offset is large enough.

Would it help to switch the format to something other than CSV? Or move it to a 
different storage mechanism? Or something else?

Reply via email to