Perhaps I’m missing something here.
Why not create a DFS plug in for HDFS and put the file in HDFS?
On May 26, 2015, at 4:54 PM, Matt bsg...@gmail.com wrote:
New installation with Hadoop 2.7 and Drill 1.0 on 4 nodes, it appears text
files need to be on all nodes in a cluster?
Using the
Hi guys,
I have written a couple of custom UDFS (specifically WEEK() and WEEKYEAR()
to get that date information out of timestamps).
I sampled two queries (on approx. 11 million records in Parquet files)
select count(*) from `table` group by extract(day from `timestamp`)
750ms
select count(*)
On Tue, May 26, 2015 at 7:26 PM, Adam Gilmore dragoncu...@gmail.com wrote:
The code for the WEEK() function is not far from the code from the source
for the EXTRACT(DAY) function. Furthermore, even if I copy the exact code
for the EXTRACT(DAY) function into that, it has the same performance
That might be the end goal, but currently I don't have an HDFS ingest
mechanism.
We are not currently a Hadoop shop - can you suggest simple approaches for bulk
loading data from delimited files into HDFS?
On May 26, 2015, at 8:04 PM, Andries Engelbrecht aengelbre...@maprtech.com
Thanks, I am incorrectly conflating the file system with data storage.
Looking to experiment with the Parquet format, and was looking at CTAS queries
as an import approach.
Are direct queries over local files meant for an embedded drill, where on a
cluster files should be moved into HDFS
New installation with Hadoop 2.7 and Drill 1.0 on 4 nodes, it appears
text files need to be on all nodes in a cluster?
Using the dfs config below, I am only able to query if a csv file is on
all 4 nodes. If the file is only on the local node and not others, I get
errors in the form of:
~~~
You can use the HDFS shell
hadoop fs -put
To copy from local file system to HDFS
For more robust mechanisms from remote systems you can look at using NFS, MapR
has a really robust NFS integration and you can use it with the community
edition.
On May 26, 2015, at 5:11 PM, Matt
Thank you. This kind of summary advice is helpful to getting started.
On 5/22/15, 6:37 PM, Ted Dunning ted.dunn...@gmail.com wrote:
The variation will have less to do with Drill (which can read all these
options such as EMR resident MapR FS or HDFS or persistent MapR FS or HDFS
or S3).
The
On 22/05/2015 10:22, Davide Giannella wrote:
...
I looked up the ports drill should be using that I know of: 31010 and
2181 but both are free to be taken: `sudo lsof -i TCP:${PORT}`.
Solved it. Adding the enquired hostname to /etc/host as 127.0.0.1 did
the trick.
Cheers
Davide
The query will typically fail. What source data are you looking at that may
cause this issue?
One way of working around this is to use a predicate to filter out rows that
may cause such issues. But pending on the use case, there can be other ways of
dealing with this.
—Andries
On May 26,
Sounds cool.
On Tue, May 26, 2015 at 12:40 PM, Jacques Nadeau jacq...@apache.org wrote:
Let's get the current code into the public so more people can help get it
fully integrated and tested.
On Tue, May 26, 2015 at 12:16 PM, Hanifi Gunes hgu...@maprtech.com
wrote:
We have a fully
Spark integration with Drill is mentioned in this
http://drill.apache.org/blog/2014/12/16/whats-coming-in-2015/ blog post,
however I can’t find a JIRA for this feature on either the Drill or Spark
trackers. What’s the status on this? Is there a timeframe? Is anyone
working on it?
--Chris
Let's get the current code into the public so more people can help get it
fully integrated and tested.
On Tue, May 26, 2015 at 12:16 PM, Hanifi Gunes hgu...@maprtech.com wrote:
We have a fully functional Spark integration that is not yet pushed to
Apache master as it lacks proper testing. We
Come join the Drill community as we discuss what has been happening lately
and what is in the pipeline. All are welcome, if you know about Drill, want
to know more or just want to listen in.
https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc
14 matches
Mail list logo