Hi,
I have a spark job output DataFrame which contains a column named Id, which
is a GUID string.
We will use Id to filter data in another spark application, so it should be
a partition key.
I found these two methods in Internet:
1.
DataFrame.write.save("Id") method will help, but the possible v
an :
> Is there any chance that " spark.sql.hive.convertMetastoreParquet" is
> turned off?
>
> Cheng
>
> On 11/4/15 5:15 PM, Rex Xiong wrote:
>
> Thanks Cheng Lian.
> I found in 1.5, if I use spark to create this table with partition
> discovery, the partition pr
plan optimization is different.
2015-11-03 23:10 GMT+08:00 Cheng Lian :
> SPARK-11153 should be irrelevant because you are filtering on a partition
> key while SPARK-11153 is about Parquet filter push-down and doesn't affect
> partition pruning.
>
> Cheng
>
>
> On 11/
5年10月31日 下午7:38,"Rex Xiong" 写道:
> Add back this thread to email list, forgot to reply all.
> 2015年10月31日 下午7:23,"Michael Armbrust" 写道:
>
>> Not that I know of.
>>
>> On Sat, Oct 31, 2015 at 12:22 PM, Rex Xiong wrote:
>>
>>> Good to kn
Add back this thread to email list, forgot to reply all.
2015年10月31日 下午7:23,"Michael Armbrust" 写道:
> Not that I know of.
>
> On Sat, Oct 31, 2015 at 12:22 PM, Rex Xiong wrote:
>
>> Good to know that, will have a try.
>> So there is no easy way to achieve it in p
Hi folks,
I have a Hive external table with partitions.
Every day, an App will generate a new partition day=-MM-dd stored by
parquet and run add-partition Hive command.
In some cases, we will add additional column to new partitions and update
Hive table schema, then a query across new and old
I resolve this issue finally by adding --conf spark.executor.extraClassPath=
snakeyaml-1.10.jar
2015-10-16 22:57 GMT+08:00 Rex Xiong :
> Hi folks,
>
> In my spark application, executor task depends on snakeyaml-1.10.jar
> I build it with Maven and it works fine:
> spark-submit
Hi folks,
In my spark application, executor task depends on snakeyaml-1.10.jar
I build it with Maven and it works fine:
spark-submit --master local --jars d:\snakeyaml-1.10.jar ...
But when I try to run it in yarn, I have issue, it seems spark executor
cannot find the jar file:
spark-subm
I use "spark-submit -master yarn-cluster hdfs://.../a.jar .." to submit
my app to yarn.
Then I update this a.jar in HDFS, run the command again, I found a line of
log that was been removed still exist in "yarn logs ".
Is there a cache mechanism I need to disable?
Thanks
In Yarn client mode, Spark driver URL will be redirected to Yarn web proxy
server, but I don't want to use this dynamic name, is it possible to still
use : as standalone mode?
Hi,
I try to use for one table created in spark, but it seems the results are
all empty, I want to get metadata for table, what's other options?
Thanks
+---+
|result |
+---+
| # col_name|
|
I remember in a previous PR, schema merging can be disabled by
setting spark.sql.hive.convertMetastoreParquet.mergeSchema to false.
But in 1.4 release, I don't see this config anymore, is there a new way to
do it?
Thanks
Hi,
We have a 3-node master setup with ZooKeeper HA.
Driver can find the master with spark://xxx:xxx,xxx:xxx,xxx:xxx
But how can I find out the valid Master UI without looping through all 3
nodes?
Thanks
sses will hit
> the metadata cache.
>
> Thanks,
>
> Yin
>
> On Tue, Apr 21, 2015 at 1:13 AM, Rex Xiong wrote:
>
>> We have the similar issue with massive parquet files, Cheng Lian, could
>> you have a look?
>>
>> 2015-04-08 15:47 GMT+08:00 Zheng, Xu
We have the similar issue with massive parquet files, Cheng Lian, could you
have a look?
2015-04-08 15:47 GMT+08:00 Zheng, Xudong :
> Hi Cheng,
>
> I tried both these patches, and seems still not resolve my issue. And I
> found the most time is spend on this line in newParquet.scala:
>
> ParquetF
Hi Spark Users,
I'm testing 1.3 new feature of parquet partition discovery.
I have 2 sub folders, each has 800 rows.
/data/table1/key=1
/data/table1/key=2
In spark-shell, run this command:
val t = sqlContext.createExternalTable("table1", "hdfs:///data/table1",
"parquet")
t.count
It shows
Hi,
I got this error when creating a hive table from parquet file:
DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.UnsupportedOperationException: Parquet does not support
timestamp. See HIVE-6384
I check HIVE-6384, it's fixed in 0.14.
The hive in spark build is a customized v
Hi there,
I have an app talking to Spark Hive Server using Hive ODBC, querying is OK.
But in this interface, I can't get much running details when my query goes
wrong, only one error message is shown.
I want to get jobid for my query, so that I can go to Application Detail UI
to see what's going o
18 matches
Mail list logo