Re: Query local files on cluster? [Beginner]

2015-05-26 Thread Andries Engelbrecht
Perhaps I’m missing something here. Why not create a DFS plug in for HDFS and put the file in HDFS? On May 26, 2015, at 4:54 PM, Matt bsg...@gmail.com wrote: New installation with Hadoop 2.7 and Drill 1.0 on 4 nodes, it appears text files need to be on all nodes in a cluster? Using the

Custom UDFS slow

2015-05-26 Thread Adam Gilmore
Hi guys, I have written a couple of custom UDFS (specifically WEEK() and WEEKYEAR() to get that date information out of timestamps). I sampled two queries (on approx. 11 million records in Parquet files) select count(*) from `table` group by extract(day from `timestamp`) 750ms select count(*)

Re: Custom UDFS slow

2015-05-26 Thread Ted Dunning
On Tue, May 26, 2015 at 7:26 PM, Adam Gilmore dragoncu...@gmail.com wrote: The code for the WEEK() function is not far from the code from the source for the EXTRACT(DAY) function. Furthermore, even if I copy the exact code for the EXTRACT(DAY) function into that, it has the same performance

Re: Query local files on cluster? [Beginner]

2015-05-26 Thread Matt
That might be the end goal, but currently I don't have an HDFS ingest mechanism. We are not currently a Hadoop shop - can you suggest simple approaches for bulk loading data from delimited files into HDFS? On May 26, 2015, at 8:04 PM, Andries Engelbrecht aengelbre...@maprtech.com

Re: Query local files on cluster? [Beginner]

2015-05-26 Thread Matt
Thanks, I am incorrectly conflating the file system with data storage. Looking to experiment with the Parquet format, and was looking at CTAS queries as an import approach. Are direct queries over local files meant for an embedded drill, where on a cluster files should be moved into HDFS

Re: Query local files on cluster? [Beginner]

2015-05-26 Thread Matt
New installation with Hadoop 2.7 and Drill 1.0 on 4 nodes, it appears text files need to be on all nodes in a cluster? Using the dfs config below, I am only able to query if a csv file is on all 4 nodes. If the file is only on the local node and not others, I get errors in the form of: ~~~

Re: Query local files on cluster? [Beginner]

2015-05-26 Thread Andries Engelbrecht
You can use the HDFS shell hadoop fs -put To copy from local file system to HDFS For more robust mechanisms from remote systems you can look at using NFS, MapR has a really robust NFS integration and you can use it with the community edition. On May 26, 2015, at 5:11 PM, Matt

Re: To EMRFS or not to EMRFS?

2015-05-26 Thread Paul Mogren
Thank you. This kind of summary advice is helpful to getting started. On 5/22/15, 6:37 PM, Ted Dunning ted.dunn...@gmail.com wrote: The variation will have less to do with Drill (which can read all these options such as EMR resident MapR FS or HDFS or persistent MapR FS or HDFS or S3). The

Re: Connection timeout 1.0.0

2015-05-26 Thread Davide Giannella
On 22/05/2015 10:22, Davide Giannella wrote: ... I looked up the ports drill should be using that I know of: 31010 and 2181 but both are free to be taken: `sudo lsof -i TCP:${PORT}`. Solved it. Adding the enquired hostname to /etc/host as 127.0.0.1 did the trick. Cheers Davide

Re: SQL query : Question

2015-05-26 Thread Andries Engelbrecht
The query will typically fail. What source data are you looking at that may cause this issue? One way of working around this is to use a predicate to filter out rows that may cause such issues. But pending on the use case, there can be other ways of dealing with this. —Andries On May 26,

Re: Drill and Spark integration

2015-05-26 Thread Hanifi Gunes
Sounds cool. On Tue, May 26, 2015 at 12:40 PM, Jacques Nadeau jacq...@apache.org wrote: Let's get the current code into the public so more people can help get it fully integrated and tested. On Tue, May 26, 2015 at 12:16 PM, Hanifi Gunes hgu...@maprtech.com wrote: We have a fully

Drill and Spark integration

2015-05-26 Thread Christopher Matta
Spark integration with Drill is mentioned in this http://drill.apache.org/blog/2014/12/16/whats-coming-in-2015/ blog post, however I can’t find a JIRA for this feature on either the Drill or Spark trackers. What’s the status on this? Is there a timeframe? Is anyone working on it? --Chris ​

Re: Drill and Spark integration

2015-05-26 Thread Jacques Nadeau
Let's get the current code into the public so more people can help get it fully integrated and tested. On Tue, May 26, 2015 at 12:16 PM, Hanifi Gunes hgu...@maprtech.com wrote: We have a fully functional Spark integration that is not yet pushed to Apache master as it lacks proper testing. We

Hangout happening now

2015-05-26 Thread Jason Altekruse
Come join the Drill community as we discuss what has been happening lately and what is in the pipeline. All are welcome, if you know about Drill, want to know more or just want to listen in. https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc