Hey George,

Can I ask why aren't using a distributed file system? You would see the
behavior you expect when you use the dfs plug-in configured with a
distributed file system (HDFS / MapR-FS).

In your case, the parquet files from CTAS will be written to a specific
node's local file system, depending on which Drill-bit the client connects
to. And if the table is moderate to large in size, Drill may process them
in a distributed manner and write data into more than one node - hence the
behavior you see.

-Abhishek

On Sun, May 31, 2015 at 6:34 PM, George Lu <luwenbin...@gmail.com> wrote:

> Hi all,
>
> I use dfs.tmp as my schema and when I use CTAS create some tables over
> 10000 rows the result parquet was created in like 2 nodes in the cluster.
> However when I query the table, I only get the portion in that node. So, I
> get 700 rows in one node when I use "select * from T1" and 10000 rows in
> another.
>
> May I ask is that behavior correct? How to create or let Drill get all
> tuples when I create or query in one node using dfs.tmp local?
>
> Else the exists query doesn't work.
>
> Thanks!
>
> George
>

Reply via email to