Re: [newbie]: how to query HDFS

2015-05-21 Thread Andries Engelbrecht
Alan, I don't think the path is correct in your query, it is best to set up workspaces in the HDFS plugin http://drill.apache.org/docs/file-system-storage-plugin/ See if that works. --Andries > On May 21, 2015, at 2:04 PM, Alan Miller wrote: > > First off, this is my first attempt at drill

RE: [newbie]: how to query HDFS

2015-05-21 Thread Alan Miller
I tried that initially, but since it didn't work I tried to simplify it as much as possible. Are saying it "should" work?. I mean all I need to do is point the connection parameter to a different namenode, right?

Re: [newbie]: how to query HDFS

2015-05-21 Thread Abhishek Girish
Hi Alan, What you are attempting to do wouldn't work. Without a drillbit running on the remote cluster, there is no way I see we can access that file system from Drill. If you'd like to connect to a remote cluster (cluster B), the options I see are (1) Install Drill on cluster B and use a local

Re: [newbie]: how to query HDFS

2015-05-21 Thread Tomer Shiran
You don't need a drillbit on the cluster. It will be faster (data locality etc.) but you can just run a drillbit on your client and access any remote cluster (or even join data from multiple clusters). It looks like you've created a new storage plugin. I would recommend copying the entire JSON con

RE: [newbie]: how to query HDFS

2015-05-22 Thread Alan Miller
Thanks Shiran, I tried that but get the same error (see below). Also, strangely I couldn't create the hdfs plugin in one step by using the same config as the "dfs" plugin and changing the connection string. The UI says Invalid JSON... I had to create the hdfs plugin in 2 steps. First using the

Re: [newbie]: how to query HDFS

2015-05-22 Thread Andries Engelbrecht
In step2 you need to have a back tick ` at the end of par and not a single quote ‘ . It has been mentioned that it may not work to just point to the name node on a remote cluster. I have not tried it, but suspect their may be various issues with the HDFS plug in and how you are trying to use it

Re: [newbie]: how to query HDFS

2015-05-22 Thread Abhishek Girish
I tried out Tomar's steps on MapR and it was pretty straight-forward. I have drill installed on one cluster. The only change I made was to add a new storage plug-in "dfs2" (duplicating the default dfs). I edited the connection string and changed "maprfs:///" to "maprfs://". And when i connected to

RE: [newbie]: how to query HDFS

2015-05-22 Thread Alan Miller
Thanks, yes it works now! The backtick was my problem. I can query a remote CDH5 HDFS from node1 (not part of the cluster) but I can't query a remote CDH5 HDFS. But that's not an issue for me. Thanks for the quick responses. 0: jdbc:drill:drillbit=localhost> select job_number,submit_time from

RE: [newbie]: how to query HDFS

2015-05-22 Thread Alan Miller
Sorry, I meant I can not query a remote CDH 4 cluster. -Original Message- From: Alan Miller [mailto:alan.mil...@synopsys.com] Sent: Friday, May 22, 2015 8:58 AM To: user@drill.apache.org Subject: RE: [newbie]: how to query HDFS Thanks, yes it works now! The backtick was my problem. I

Re: [newbie]: how to query HDFS

2015-05-22 Thread Ted Dunning
As a special case, with MapR, you can access all clusters in an administrative group by making sure that you have /mapr/ at the beginning of your path names. THis means that you can simply use different workspaces, or a workspace with a path consisting only of /mapr and still access files and tabl

Re: [newbie]: how to query HDFS

2015-05-22 Thread Ted Dunning
I should have added that there is nothing wrong with the dual plugin approach on MapR. Works fine and it is up to you as a matter of personal choice which is better. On Fri, May 22, 2015 at 4:46 PM, Ted Dunning wrote: > > As a special case, with MapR, you can access all clusters in an > admin