from:"ashokkumar rajendran"

Re: SPARK-13900 - Join with simple OR conditions take too long

2016-04-05 Thread ashokkumar rajendran

ing I/O Size 64 Kbytes for data pages. >| | | | | With LRU Buffer Replacement Strategy for data > pages. > > Total estimated I/O cost for statement 4 (at line 4): 2147483647. > > HTH > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/vie

Re: SPARK-13900 - Join with simple OR conditions take too long

2016-04-01 Thread ashokkumar rajendran

suggest. Now the question is, how should Spark decide when to do > what? > > > Hemant Bhanawat <https://www.linkedin.com/in/hemant-bhanawat-92a3811> > www.snappydata.io > > On Thu, Mar 31, 2016 at 2:28 PM, ashokkumar rajendran < > ashokkumar.rajend...@gmail.com

Re: SPARK-13900 - Join with simple OR conditions take too long

2016-04-01 Thread ashokkumar rajendran

So this looks like a bug! > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebz

SPARK-13900 - Join with simple OR conditions take too long

2016-03-31 Thread ashokkumar rajendran

Hi, I have filed ticket SPARK-13900. There was an initial reply from a developer but did not get any reply on this. How can we do multiple hash joins together for OR conditions based joins? Could someone please guide on how can we fix this? Regards Ashok

Re: How to efficiently query a large table with multiple dimensional table?

2016-03-12 Thread ashokkumar rajendran

Any input on this? Does it have something to do with SQL engine parser / optimizer? Please help. Regards Ashok On Fri, Mar 11, 2016 at 3:22 PM, ashokkumar rajendran < ashokkumar.rajend...@gmail.com> wrote: > Hi All, > > I have a large table with few billions of rows and have a v

How to efficiently query a large table with multiple dimensional table?

2016-03-11 Thread ashokkumar rajendran

Hi All, I have a large table with few billions of rows and have a very small table with 4 dimensional values. I would like to get rows that match any of these dimensions. For example, Select field1, field2 from A, B where A.dimension1 = B.dimension1 OR A.dimension2 = B.dimension2 OR A.dimension3

Re: Facing issue with floor function in spark SQL query

2016-03-04 Thread ashokkumar rajendran

rewrite them. Regards Ashok On Fri, Mar 4, 2016 at 3:52 PM, ayan guha wrote: > Most likely you are missing import of org.apache.spark.sql.functions. > > In any case, you can write your own function for floor and use it as UDF. > > On Fri, Mar 4, 2016 at 7:34 PM, ashok

Facing issue with floor function in spark SQL query

2016-03-04 Thread ashokkumar rajendran

Hi, I load json file that has timestamp (as long in milliseconds) and several other attributes. I would like to group them by 5 minutes and store them as separate file. I am facing couple of problems here.. 1. Using Floor function at select clause (to bucket by 5mins) gives me error saying "java.

Re: Do we need schema for Parquet files with Spark?

2016-03-04 Thread ashokkumar rajendran

e right encoding to use for you. > > Xinh > > > On Mar 3, 2016, at 7:32 PM, ashokkumar rajendran < > ashokkumar.rajend...@gmail.com> wrote: > > > > Hi, > > > > I am exploring to use Apache Parquet with Spark SQL in our project. I > notice that

Re: Do we need schema for Parquet files with Spark?

2016-03-03 Thread ashokkumar rajendran

developers. Regards Ashok On Fri, Mar 4, 2016 at 11:01 AM, Ted Yu wrote: > Have you taken a look at https://parquet.apache.org/community/ ? > > On Thu, Mar 3, 2016 at 7:32 PM, ashokkumar rajendran < > ashokkumar.rajend...@gmail.com> wrote: > >> Hi, >> >> I am exp

Do we need schema for Parquet files with Spark?

2016-03-03 Thread ashokkumar rajendran

Hi, I am exploring to use Apache Parquet with Spark SQL in our project. I notice that Apache Parquet uses different encoding for different columns. The dictionary encoding in Parquet will be one of the good ones for our performance. I do not see much documentation in Spark or Parquet on how to con

How to start spark streaming application with recent past timestamp for replay of old batches?

2016-02-21 Thread ashokkumar rajendran

Hi Folks, I am exploring spark for streaming from two sources (a) Kinesis and (b) HDFS for some of our use-cases. Since we maintain state gathered over last x hours in spark streaming, we would like to replay the data from last x hours as batches during deployment. I have gone through the Spark

Re: SPARK-13900 - Join with simple OR conditions take too long

Re: SPARK-13900 - Join with simple OR conditions take too long

Re: SPARK-13900 - Join with simple OR conditions take too long

SPARK-13900 - Join with simple OR conditions take too long

Re: How to efficiently query a large table with multiple dimensional table?

How to efficiently query a large table with multiple dimensional table?

Re: Facing issue with floor function in spark SQL query

Facing issue with floor function in spark SQL query

Re: Do we need schema for Parquet files with Spark?

Re: Do we need schema for Parquet files with Spark?

Do we need schema for Parquet files with Spark?

How to start spark streaming application with recent past timestamp for replay of old batches?

12 matches

Site Navigation

Mail list logo

Footer information