Re: Setting up Spark/flume/? to Ingest 10TB from FTP

2015-08-17 Thread Steve Loughran
:23 PM Subject: Re: Setting up Spark/flume/? to Ingest 10TB from FTP Why do you need to use Spark or Flume for this? You can just use curl and hdfs: curl ftp://blahftp://blah/ | hdfs dfs -put - /blah On Fri, Aug 14, 2015 at 1:15 PM, Varadhan, Jawahar varad...@yahoo.com.invalidmailto:varad

Re: Setting up Spark/flume/? to Ingest 10TB from FTP

2015-08-14 Thread Varadhan, Jawahar
...@cloudera.com To: Varadhan, Jawahar varad...@yahoo.com Cc: d...@spark.apache.org d...@spark.apache.org Sent: Friday, August 14, 2015 3:23 PM Subject: Re: Setting up Spark/flume/? to Ingest 10TB from FTP Why do you need to use Spark or Flume for this? You can just use curl and hdfs:   curl

Re: Setting up Spark/flume/? to Ingest 10TB from FTP

2015-08-14 Thread Marcelo Vanzin
On Fri, Aug 14, 2015 at 2:11 PM, Varadhan, Jawahar varad...@yahoo.com.invalid wrote: And hence, I was planning to use Spark Streaming with Kafka or Flume with Kafka. But flume runs on a JVM and may not be the best option as the huge file will create memory issues. Please suggest someway to