date:20150526

Re: Query local files on cluster? [Beginner]

2015-05-26 Thread Andries Engelbrecht

Perhaps I’m missing something here. Why not create a DFS plug in for HDFS and put the file in HDFS? On May 26, 2015, at 4:54 PM, Matt bsg...@gmail.com wrote: New installation with Hadoop 2.7 and Drill 1.0 on 4 nodes, it appears text files need to be on all nodes in a cluster? Using the

Custom UDFS slow

2015-05-26 Thread Adam Gilmore

Hi guys, I have written a couple of custom UDFS (specifically WEEK() and WEEKYEAR() to get that date information out of timestamps). I sampled two queries (on approx. 11 million records in Parquet files) select count(*) from `table` group by extract(day from `timestamp`) 750ms select count(*)

Re: Custom UDFS slow

2015-05-26 Thread Ted Dunning

On Tue, May 26, 2015 at 7:26 PM, Adam Gilmore dragoncu...@gmail.com wrote: The code for the WEEK() function is not far from the code from the source for the EXTRACT(DAY) function. Furthermore, even if I copy the exact code for the EXTRACT(DAY) function into that, it has the same performance

Re: Query local files on cluster? [Beginner]

2015-05-26 Thread Matt

That might be the end goal, but currently I don't have an HDFS ingest mechanism. We are not currently a Hadoop shop - can you suggest simple approaches for bulk loading data from delimited files into HDFS? On May 26, 2015, at 8:04 PM, Andries Engelbrecht aengelbre...@maprtech.com

Re: Query local files on cluster? [Beginner]

2015-05-26 Thread Matt

Thanks, I am incorrectly conflating the file system with data storage. Looking to experiment with the Parquet format, and was looking at CTAS queries as an import approach. Are direct queries over local files meant for an embedded drill, where on a cluster files should be moved into HDFS

Re: Query local files on cluster? [Beginner]

2015-05-26 Thread Matt

New installation with Hadoop 2.7 and Drill 1.0 on 4 nodes, it appears text files need to be on all nodes in a cluster? Using the dfs config below, I am only able to query if a csv file is on all 4 nodes. If the file is only on the local node and not others, I get errors in the form of: ~~~

Re: Query local files on cluster? [Beginner]

2015-05-26 Thread Andries Engelbrecht

You can use the HDFS shell hadoop fs -put To copy from local file system to HDFS For more robust mechanisms from remote systems you can look at using NFS, MapR has a really robust NFS integration and you can use it with the community edition. On May 26, 2015, at 5:11 PM, Matt

Re: To EMRFS or not to EMRFS?

2015-05-26 Thread Paul Mogren

Thank you. This kind of summary advice is helpful to getting started. On 5/22/15, 6:37 PM, Ted Dunning ted.dunn...@gmail.com wrote: The variation will have less to do with Drill (which can read all these options such as EMR resident MapR FS or HDFS or persistent MapR FS or HDFS or S3). The

Re: Connection timeout 1.0.0

2015-05-26 Thread Davide Giannella

On 22/05/2015 10:22, Davide Giannella wrote: ... I looked up the ports drill should be using that I know of: 31010 and 2181 but both are free to be taken: `sudo lsof -i TCP:${PORT}`. Solved it. Adding the enquired hostname to /etc/host as 127.0.0.1 did the trick. Cheers Davide

Re: SQL query : Question

2015-05-26 Thread Andries Engelbrecht

The query will typically fail. What source data are you looking at that may cause this issue? One way of working around this is to use a predicate to filter out rows that may cause such issues. But pending on the use case, there can be other ways of dealing with this. —Andries On May 26,

Re: Drill and Spark integration

2015-05-26 Thread Hanifi Gunes

Sounds cool. On Tue, May 26, 2015 at 12:40 PM, Jacques Nadeau jacq...@apache.org wrote: Let's get the current code into the public so more people can help get it fully integrated and tested. On Tue, May 26, 2015 at 12:16 PM, Hanifi Gunes hgu...@maprtech.com wrote: We have a fully

Drill and Spark integration

2015-05-26 Thread Christopher Matta

Spark integration with Drill is mentioned in this http://drill.apache.org/blog/2014/12/16/whats-coming-in-2015/ blog post, however I can’t find a JIRA for this feature on either the Drill or Spark trackers. What’s the status on this? Is there a timeframe? Is anyone working on it? --Chris

Re: Drill and Spark integration

2015-05-26 Thread Jacques Nadeau

Let's get the current code into the public so more people can help get it fully integrated and tested. On Tue, May 26, 2015 at 12:16 PM, Hanifi Gunes hgu...@maprtech.com wrote: We have a fully functional Spark integration that is not yet pushed to Apache master as it lacks proper testing. We

Hangout happening now

2015-05-26 Thread Jason Altekruse

Come join the Drill community as we discuss what has been happening lately and what is in the pipeline. All are welcome, if you know about Drill, want to know more or just want to listen in. https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc

Re: Query local files on cluster? [Beginner]

Custom UDFS slow

Re: Custom UDFS slow

Re: Query local files on cluster? [Beginner]

Re: Query local files on cluster? [Beginner]

Re: Query local files on cluster? [Beginner]

Re: Query local files on cluster? [Beginner]

Re: To EMRFS or not to EMRFS?

Re: Connection timeout 1.0.0

Re: SQL query : Question

Re: Drill and Spark integration

Drill and Spark integration

Re: Drill and Spark integration

Hangout happening now

14 matches

Site Navigation

Mail list logo

Footer information