+1。
This is neccsarry requirement for users.
Suggestion:
change CarbonSDKUID to common name.
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
+1,
If carbondata sdk can support load data from parquet, ORC, CSV, Avro and
JSON file, it will more convenient for users to use CarbonData. It avoid
every user to parser different fileformat and convert to carbondata format
by coding.
CarbonData SDK can refer spark-sql implementation, but
-1!
Why PyCarbon isn't key features and improvements ?
PyCarbon: provide python interface for users to use CarbonData by python
code
https://issues.apache.org/jira/browse/CARBONDATA-3254
Including:
1.PySDK: provide python interface to read and write CarbonData
2.Integrating deep learning
the number of calling S3 API. But it's not easy for them to use
carbon by Java/Scala/C++. So it's better to provide python interface for
these users to use CarbonData by python code
We already work for these feature several months in
https://github.com/xubo245/pycarbon
*Goals:
1. Apache CarbonData
no yet
-- 原始邮件 --
发件人: "melin li";
发送时间: 2019年7月22日(星期一) 凌晨0:12
收件人: "dev";
主题: sql parser use antlr4?
sql parser use antlr4?
There are some problem when user handle AI data. For example, it's very slow
when user upload or download lots of images from S3. It need about 10 hours
when user upload 10 million images(40GB) to S3 by using 1 threads. AI developer
also want to manage structured data and unstructured data for
CarbonData supports binary data type
Version Changes Owner Date
0.1 Init doc for Supporting binary data typeXubo2019-4-10
Background :
Binary is basic data type and widely used in various scenarios. So it’s
better to support binary data type in CarbonData. Download data from
Dear all,
Hive is a popular data warehouse software in big data domain. It's
better for enhance Hive + CarbonData, which will convenient for hive user to
read CarbonData. CarbonData supported hive before, and the hive test case
can run in CarbonData-1.5.2, but hive module is very old and not
it's ok, we can use // scalastyle:off println
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Optimize the properties documentation or comments:
Some properties have not documentation or comments, which will not easy to
understand for user.
We should add properties documentation or comments.
Unify documentation:
Some properties have not documentation or comments in code such as
Enable non dynamic configuration can be configured dynamically
There are only 29 properties can be configured dynamically, a lot of
properties can't be configured dynamically, we should analysis related
properties: which one can be configured dynamically? and then to support it.
It will more
+1, It will better if we can unify "carbon" and "carbondata",
SparkCarbonFileFormat uses carbon and SparkCarbonTableFormat use carbondata.
SDK should support transactional table and non-transactional table.
DataFrame also should support different type carbon data.
--
Sent from:
+1
Carbon already support RENAME TABLE, if carbon can support RENAME column
name and data type, it's better.
Can we support like this?
ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name
[column_type];
column_type is optional. default is keep the same data type with old column
why has two mail?
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-Complex-Delimiter-support-as-per-Hive-format-td69879.html
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
+1 for Support transactional table in SDK
SDK should can read transactional table written by carbonSession.
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Whether different data type affects performance? Have you test with long
string column?
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
'\001' and '\002' are invisible character, string won't contains these
character usually. But sometimes string will contain ¥,# and other visible
character
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
CSDK also used '\001' and '\002' for Array, I think it's better and
more common for different scenario.
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Thanks all. I am very glad that the Apache CarbonData PMC invited me to be a
committer.
I will continue to work hard to contribute to the Apache CarbonData
community.
Thank you!
Best wishes!
Xubo
--
Sent from:
When user use SDK and want to use LOCAL DICTIONARY, they can't use
LOCAL_DICTIONARY_INCLUDE and LOCAL_DICTIONARY_EXCLUDE because SDK only
support local_dictionary_threshold and local_dictionary_enable.
So we should support LOCAL_DICTIONARY_INCLUDE and LOCAL_DICTIONARY_EXCLUDE
in SDK, then use
Nice
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Please update CarbonData-1.5.1 in http://carbondata.apache.org
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
This bug can be fixed in next version(1.5.2)
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Are there any limit for supporting Spark-2.4?
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
hi, all
Spark has released spark-2.4 more than one month. CarbonData should
start to support spark-2.4.
I want to develop this, and raised a jira for
it:https://issues.apache.org/jira/browse/CARBONDATA-3144
is it ok?
--
Sent from:
@jackylk @ravipesala @KanakaKumar @kunal642
https://github.com/apache/carbondata/pull/2940 is a bug, please check it.
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
not mandatory。
CarbonData support local file system,HDFS and S3(Huawei OBS)
There are some examples, for
example:./carbondata/examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonSessionExample.scala
--
Sent from:
+1,
there are some users require support this feature, we can implement it.
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
is it repeat?
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
the Lise of topics is blank.
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
is it repeat?
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
It's great.
Are there any live link for this meetup? can you share the doc/slides/video
after this meetup?
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
I move readAllParallel to
https://github.com/xubo245/carbondata/tree/CARBONDATA-3094_cocurrentReadBackupreadAllParallel,
not include in this PR, after discussion, I will raise a new PR for it.
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Hi, anyone has good suggestion for it? I want to improve the performance for
it.
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
Hi, all
SDK/CSDK don't support read schema support S3, which is limit for user
to use SDK/CSDK, for example, some user save data in S3 and want to read
schema from the data with SDK/CSDK, it will throw some exception.
So we should support read schema support S3.
Thank you.
I rasie a PR for it and write a
demo:https://github.com/apache/carbondata/pull/2914
Please check it.
If the demo is ok, I will change other properties in this PR
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
hi, anyone has good suggestion for it? If not, we will start integrate
googletest for CSDK
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
+1, for 1,2,3,4,5,6,8,9,10
for 7, can we support do some test automatic? including performance for
common function.
It's great for CarbonData community,
Can we arrange someone or manager to manage all JIRA and PR, and urge
reviewer to review fast. The time will become slower after add these
maybe some developer has some unfinished low priority bugs/issues, it's nice
if you share it to new users.
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
In China, it's not convenient to access out of country internet network. So I
download some video from YouTube and upload to China tencent video, which
can be access by china users. If there are materials in other country, like
India, please tell/give me, thank you very much.
--
Sent from:
rent language), slides and others.
I collect some CarbonData learning materials in
https://github.com/xubo245/CarbonDataLearning/blob/master/docs/learningMaterials/CarbonData%20Learning%20Materials.md.
If you find other related materials, please tell me and give it in
comments。
A
Hi, all
Recently, we have a Apache CarbonData & Spark meetup in Shenzhen. There
are many new users want to learning CarbonData or Spark, we can guide new
user to contribute code for CarbonData. So we can collect minor bugs or
small requirements/features for then, then they can learning and fix
+1, Whether will it affect the SDK/CSDK reader after parallelizing block
pruning? please check. SDK and CSDK need keep the carbon files
sequence/order
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
-1,
mv module parent version is incorrect, Jonathan.Wei will raise a PR to fix
it this week.
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
SDK has supported local dictionary:
org.apache.carbondata.sdk.file.CarbonWriterBuilder#localDictionaryThreshold
org.apache.carbondata.sdk.file.CarbonWriterBuilder#enableLocalDictionary
But don't support LOCAL_DICTIONARY_INCLUDE and LOCAL_DICTIONARY_EXCLUDE.
I think we should support it. There
+1,
There are some random error in CI recently. and the performance only has a
little improvement between search mode and non search mode, including filter
query.
--
Sent from:
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
ok, this is better:
public static final Property CARBON_BAD_RECORDS_ACTION =
Property.buildStringProperty().
.name(“carbon.bad.records.action”)
.default(“FAIL”)
.doc(“keep the same description as .md file”)
.dynamic(true)
.build()
I will raise
Some users want to build one reader, and then read with different filter many
times.
But CarbonSDK only support add filter before build, then read with this
filter, user can't change filter after build.
Do we need implement one reader read with different filter many times?
--
Sent from:
For C++ SDK of carbonData, we need add a test framework to manage test case,
including unit test.
So add which test framework for C++ SDK? we should discuss in here.
I research before, find googletest is a popular test framework, we can try
to use it. are there any other better test framework?
the annotation mainly providing literal explain whether this parameter can be
dynamic configurable. What's more, it will throw exception if add
@CarbonProperty and can't be dynamic configurable. If don't add
@CarbonProperty for parameter, it won't throw exception and also won't take
effect
--
*Background:*
In CarbonData, there are many configuration in:
org.apache.carbondata.core.constants.CarbonCommonConstants,
org.apache.carbondata.core.constants.CarbonV3DataFormatConstants,
org.apache.carbondata.core.constants.CarbonLoadOptionConstants;
and so on. Which one can be dynamic
test name
-- Original --
From: "xubo245";<601450...@qq.com>;
Send time: Tuesday, Oct 30, 2018 8:15 PM
To: "dev";
Subject: Re: [Discussion] CarbonReader performance improvement
1. there are some user want to use filter and have
52 matches
Mail list logo