FileNotFoundExceptions while running CarbonData

2017-07-16 Thread Swapnil Shinde
Hello I am new to carbon data and we are trying to use carbon data in production. I built and installed it on Spark edge nodes as per given instruction - *Build -* No major issues. *Installation -* Followed yarn installation ( http://carbondata.apache.org/installation-guide.html) instructions

Re: FileNotFoundExceptions while running CarbonData

2017-07-17 Thread Swapnil Shinde
path. And also make > sure that storelocation inside carbon properties and store location while > creating carbon session must be same. > > Regards, > Ravindra. > > On 17 July 2017 at 11:25, Swapnil Shinde wrote: > > > Hello > > I am new to carbon data

carbon data performance doubts

2017-07-19 Thread Swapnil Shinde
Hello All I am trying carbon data for the first time and having few question on improving performance - 1. What is the use of *carbon.number.of.cores *property and how is it different from spark's executor cores? 2. Documentation says, by default, all non-numeric columns (except complex type

Re: carbon data performance doubts

2017-07-19 Thread Swapnil Shinde
Thank you, Manish. Is dictionary exclude supported for datatypes other than String? https://github.com/apache/carbondata/blob/6488bc018a2ec715b31407d12290680d388a43b3/integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala#L706 - Swapnil On Wed, Jul 19, 2017

Vectorized reader exceptions

2017-07-20 Thread Swapnil Shinde
Hi All I am not sure if this is random exception but this is what I have observed - Create and load carbondata table from Spark dataframe - Without dictionary_include on two INT columns - Works fine. I can "select *" on it. Create and load carbondata table from same Spark datafrmae - With dicti

Re: carbon data performance doubts

2017-07-20 Thread Swapnil Shinde
Ok. Just curious - Any reason not to support numeric columns with dictionary_exclude? Wouldn't it be useful for numeric unique column which should be dimension but avoid creating dictionary (as it may not be beneficial). Thanks Swapnil On Thu, Jul 20, 2017 at 4:20 AM, manishgupta88 wrote: > N

Re: carbon data performance doubts

2017-07-21 Thread Swapnil Shinde
;), on apache/encoding_override branch. Once > it is done and stable it will be merged into master. > > Please advise if you have any suggestions. > > Regards, > Jacky > > > > 在 2017年7月21日,上午12:12,Swapnil Shinde 写道: > > > > Ok. Just curious - Any reason

Carbon data vs parquet performance

2017-07-22 Thread Swapnil Shinde
Hello I am not sure what I am doing wrong but observing parquet running faster than carbon data - *Carbondata version - *1.1.0 *Data cardinality-* lineorder - 6B rows & date - 39,000 rows *Query -* select sum(loExtendedprice*loDiscount) as revenue from lineorder, date where loOrderda

Re: carbon data performance doubts

2017-07-22 Thread Swapnil Shinde
Thank you, Liang. I couldn't find this property "sort_columns" in documentation. It will be good to have it there. - Swapnil On Fri, Jul 21, 2017 at 9:31 PM, Liang Chen wrote: > > Hi > > Some more info : > In release 1.1.1, there was a good improvement "measure filter > optimization", system w

[POSSIBLE BUG] Carbondata 1.1.1 inaccurate results

2017-08-23 Thread Swapnil Shinde
compared to parquet. However, when you run above join query, carbondata generates very small subset of expected rows. If we run filter query for any specific key then that also returns no results. Not sure why v1.1.1 is producing incorrect results. My guess is that carbondata is skipping rows that it shouldn't in v1.1.1. Any help and suggestions are very much appreciated!! Thanks in advance.. Thanks Swapnil Shinde