Hey George, Can I ask why aren't using a distributed file system? You would see the behavior you expect when you use the dfs plug-in configured with a distributed file system (HDFS / MapR-FS).
In your case, the parquet files from CTAS will be written to a specific node's local file system, depending on which Drill-bit the client connects to. And if the table is moderate to large in size, Drill may process them in a distributed manner and write data into more than one node - hence the behavior you see. -Abhishek On Sun, May 31, 2015 at 6:34 PM, George Lu <luwenbin...@gmail.com> wrote: > Hi all, > > I use dfs.tmp as my schema and when I use CTAS create some tables over > 10000 rows the result parquet was created in like 2 nodes in the cluster. > However when I query the table, I only get the portion in that node. So, I > get 700 rows in one node when I use "select * from T1" and 10000 rows in > another. > > May I ask is that behavior correct? How to create or let Drill get all > tuples when I create or query in one node using dfs.tmp local? > > Else the exists query doesn't work. > > Thanks! > > George >