from:"Xuelin Cao"

Why my SQL UDF cannot be registered?

2014-12-15 Thread Xuelin Cao

Hi, I tried to create a function that to convert an Unix time stamp to the hour number in a day. It works if the code is like this:sqlContext.registerFunction("toHour", (x:Long)=>{new java.util.Date(x*1000).getHours}) But, if I do it like this, it doesn't work: def toHour

When will Spark SQL support building DB index natively?

2014-12-17 Thread Xuelin Cao

Hi, In Spark SQL help document, it says "Some of these (such as indexes) are less important due to Spark SQL’s in-memory computational model. Others are slotted for future releases of Spark SQL. - Block level bitmap indexes and virtual columns (used to build indexes)" For our

Re: When will Spark SQL support building DB index natively?

2014-12-17 Thread Xuelin Cao

ioned table support? That would only scan data where the predicate matches the partition. Depending on the cardinality of the customerId column that could be a good option for you. On Wed, Dec 17, 2014 at 2:25 AM, Xuelin Cao wrote: Hi, In Spark SQL help document, it says "Some of thes

Why Parquet Predicate Pushdown doesn't work?

2015-01-06 Thread Xuelin Cao

Hi, I'm testing parquet file format, and the predicate pushdown is a very useful feature for us. However, it looks like the predicate push down doesn't work after I set sqlContext.sql("SET spark.sql.parquet.filterPushdown=true") Here is my sql: sqlContext.sql("

Spark SQL: The cached columnar table is not columnar?

2015-01-07 Thread Xuelin Cao

Hi, Curious and curious. I'm puzzled by the Spark SQL cached table. Theoretically, the cached table should be columnar table, and only scan the column that included in my SQL. However, in my test, I always see the whole table is scanned even though I only "select" one column i

Re: Why Parquet Predicate Pushdown doesn't work?

2015-01-07 Thread Xuelin Cao

null pointers >> when there are full row groups that are null. >> >> https://issues.apache.org/jira/browse/SPARK-4258 >> >> You can turn it on if you want: >> http://spark.apache.org/docs/latest/sql-programming-guide.html#configuration >> >> Daniel >> >>

Can spark supports task level resource management?

2015-01-07 Thread Xuelin Cao

Hi, Currently, we are building up a middle scale spark cluster (100 nodes) in our company. One thing bothering us is, the how spark manages the resource (CPU, memory). I know there are 3 resource management modes: stand-along, Mesos, Yarn In the stand along mode, the cluster maste

Re: Can spark supports task level resource management?

2015-01-07 Thread Xuelin Cao

on, I think it's important to see > what other applications you want to be running besides Spark in the same > cluster and also your use cases, to see what resource management fits your > need. > > Tim > > > On Wed, Jan 7, 2015 at 10:55 PM, Xuelin Cao > wrote: > >

Re: Can spark supports task level resource management?

2015-01-07 Thread Xuelin Cao

not. > > Tim > > On Wed, Jan 7, 2015 at 11:19 PM, Xuelin Cao > wrote: > >> >> Hi, >> >> Thanks for the information. >> >> One more thing I want to clarify, when does Mesos or Yarn allocate >> and release the resource? Aka,

Re: Spark SQL: The cached columnar table is not columnar?

2015-01-08 Thread Xuelin Cao

the input data for each task is also 1212.5MB On Thu, Jan 8, 2015 at 6:40 PM, Cheng Lian wrote: > Hey Xuelin, which data item in the Web UI did you check? > > > On 1/7/15 5:37 PM, Xuelin Cao wrote: > > > Hi, > >Curious and curious. I'm puzzl

Re: Spark SQL: The cached columnar table is not columnar?

2015-01-08 Thread Xuelin Cao

) > > The input data of the first statement is 292KB, the second is 49.1KB. > > The JSON file I used is examples/src/main/resources/people.json, I copied > its contents multiple times to generate a larger file. > > Cheng > > On 1/8/15 7:43 PM, Xuelin Cao wrote: > > &

Did anyone tried overcommit of CPU cores?

2015-01-08 Thread Xuelin Cao

Hi, I'm wondering whether it is a good idea to overcommit CPU cores on the spark cluster. For example, in our testing cluster, each worker machine has 24 physical CPU cores. However, we are allowed to set the CPU core number to 48 or more in the spark configuration file. As a result,

Re: Did anyone tried overcommit of CPU cores?

2015-01-09 Thread Xuelin Cao

ng etc.). > > Why not increase the tasks per core? > > Best regards > Le 9 janv. 2015 06:46, "Xuelin Cao" a écrit : > > >> Hi, >> >> I'm wondering whether it is a good idea to overcommit CPU cores on >> the spark cluster. >> >>

How to create an empty RDD with a given type?

2015-01-12 Thread Xuelin Cao

Hi, I'd like to create a transform function, that convert RDD[String] to RDD[Int] Occasionally, the input RDD could be an empty RDD. I just want to directly create an empty RDD[Int] if the input RDD is empty. And, I don't want to return None as the result. Is there an easy way to do

Re: How to create an empty RDD with a given type?

2015-01-12 Thread Xuelin Cao

ustin > > On Mon, Jan 12, 2015 at 9:50 PM, Xuelin Cao > wrote: > >> >> >> Hi, >> >> I'd like to create a transform function, that convert RDD[String] to >> RDD[Int] >> >> Occasionally, the input RDD could be an empty RDD. I ju

IF statement doesn't work in Spark-SQL?

2015-01-20 Thread Xuelin Cao

Hi, I'm trying to migrate some hive scripts to Spark-SQL. However, I found some statement is incompatible in Spark-sql. Here is my SQL. And the same SQL works fine in HIVE environment. SELECT *if(ad_user_id>1000, 1000, ad_user_id) as user_id* FROM ad_search_keywor

Re: IF statement doesn't work in Spark-SQL?

2015-01-20 Thread Xuelin Cao

Hi, I'm using Spark 1.2 On Tue, Jan 20, 2015 at 5:59 PM, Wang, Daoyuan wrote: > Hi Xuelin, > > > > What version of Spark are you using? > > > > Thanks, > > Daoyuan > > > > *From:* Xuelin Cao [mailto:xuelincao2...@gmail.com] > *Sent:* Tues

Re: IF statement doesn't work in Spark-SQL?

2015-01-20 Thread Xuelin Cao

APEETHAM | Amritapuri | Cell +919946535290 | > > > On Tue, Jan 20, 2015 at 4:45 PM, DEVAN M.S. wrote: > >> Which context are you using HiveContext or SQLContext ? Can you try with >> HiveContext >> ?? >> >> >> Devan M.S. | Research Associate | Cyber

In the HA master mode, how to identify the alive master?

2015-03-04 Thread Xuelin Cao

Hi, In our project, we use "stand alone duo master" + "zookeeper" to make the HA of spark master. Now the problem is, how do we know which master is the current alive master? We tried to read the info that the master stored in zookeeper. But we found there is no information to

Why spark master consumes 100% CPU when we kill a spark streaming app?

2015-03-10 Thread Xuelin Cao

Hey, Recently, we found in our cluster, that when we kill a spark streaming app, the whole cluster cannot response for 10 minutes. And, we investigate the master node, and found the master process consumes 100% CPU when we kill the spark streaming app. How could it happen? Did any

Is there a way to turn on spark eventLog on the worker node?

2014-11-21 Thread Xuelin Cao

Hi, I'm going to debug some spark applications on our testing platform. And it would be helpful if we can see the eventLog on the *worker *node. I've tried to turn on *spark.eventLog.enabled* and set *spark.eventLog.dir* parameters on the worker node. However, it doesn't work. I do

Is there a way to turn on spark eventLog on the worker node?

2014-11-24 Thread Xuelin Cao

Hi, I'm going to debug some spark applications on our testing platform. And it would be helpful if we can see the eventLog on the worker node. I've tried to turn on spark.eventLog.enabled and set spark.eventLog.dir parameters on the worker node. However, it doesn't work. I do ha

Is it possible to just change the value of the items in RDD without making a full copy?

2014-12-02 Thread Xuelin Cao

Hi, I'd like to make an operation on an RDD that ONLY change the value of some items, without make a full copy or full scan of each data. It is useful when I need to handle a large RDD, and each time I need only to change a little fraction of the data, and keeps other data unchanged.

Spark SQL: How to get the hierarchical element with SQL?

2014-12-07 Thread Xuelin Cao

Hi, I'm generating a Spark SQL table from an offline Json file. The difficulty is, in the original json file, there is a hierarchical structure. And, as a result, this is what I get: scala> tb.printSchemaroot |-- budget: double (nullable = true) |-- filterIp: array (nullable = true) |

Spark SQL: How to get the hierarchical element with SQL?

2014-12-07 Thread Xuelin Cao

Hi, I'm generating a Spark SQL table from an offline Json file. The difficulty is, in the original json file, there is a hierarchical structure. And, as a result, this is what I get: scala> tb.printSchemaroot |-- budget: double (nullable = true) |-- filterIp: array (nullable = true) |

Spark SQL: How to get the hierarchical element with SQL?

2014-12-07 Thread Xuelin Cao

Hi, I'm generating a Spark SQL table from an offline Json file. The difficulty is, in the original json file, there is a hierarchical structure. And, as a result, this is what I get: scala> tb.printSchemaroot |-- budget: double (nullable = true) |-- filterIp: array (nullable = true) |

Is there an efficient way to append new data to a registered Spark SQL Table?

2014-12-08 Thread Xuelin Cao

Hi, I'm wondering whether there is an efficient way to continuously append new data to a registered spark SQL table. This is what I want: I want to make an ad-hoc query service to a json formated system log. Certainly, the system log is continuously generated. I will use spark

Why my SQL UDF cannot be registered?

When will Spark SQL support building DB index natively?

Re: When will Spark SQL support building DB index natively?

Why Parquet Predicate Pushdown doesn't work?

Spark SQL: The cached columnar table is not columnar?

Re: Why Parquet Predicate Pushdown doesn't work?

Can spark supports task level resource management?

Re: Can spark supports task level resource management?

Re: Can spark supports task level resource management?

Re: Spark SQL: The cached columnar table is not columnar?

Re: Spark SQL: The cached columnar table is not columnar?

Did anyone tried overcommit of CPU cores?

Re: Did anyone tried overcommit of CPU cores?

How to create an empty RDD with a given type?

Re: How to create an empty RDD with a given type?

IF statement doesn't work in Spark-SQL?

Re: IF statement doesn't work in Spark-SQL?

Re: IF statement doesn't work in Spark-SQL?

In the HA master mode, how to identify the alive master?

Why spark master consumes 100% CPU when we kill a spark streaming app?

Is there a way to turn on spark eventLog on the worker node?

Is there a way to turn on spark eventLog on the worker node?

Is it possible to just change the value of the items in RDD without making a full copy?

Spark SQL: How to get the hierarchical element with SQL?

Spark SQL: How to get the hierarchical element with SQL?

Spark SQL: How to get the hierarchical element with SQL?

Is there an efficient way to append new data to a registered Spark SQL Table?

27 matches

Site Navigation

Mail list logo

Footer information