Cool, thanks for the info.
I think this is something we are going to change to completely decouple the
> Hive support and catalog.
Is there a ticket for this? I did a search in jira and only found
"SPARK-16275: Implement all the Hive fallback functions", which seems to be
related to it.
On Mon
Reynold mentioned the direction we are heading. You can see many PRs the
community submitted are for this target. To achieve this, a lot of works we
need to do.
For example, for some serde, Hive metastore will infer the schema when the
schema is not provided, but our InMemoryCatalog does not have
Hi Stan,
What OS/version are you using?
Michael
> On Jan 22, 2017, at 11:36 PM, StanZhai wrote:
>
> I'm using Parallel GC.
> rxin wrote
>> Are you using G1 GC? G1 sometimes uses a lot more memory than the size
>> allocated.
>>
>>
>> On Sun, Jan 22, 2017 at 12:58 AM StanZhai <
>
>> mail@
>
Hi Spark dev,
Here is the voting thread for parquet 1.8.2 release.
Cheng or someone else we would appreciate you verify it as well and reply
to the thread.
On Mon, Jan 23, 2017 at 11:40 AM, Julien Le Dem wrote:
> +1
> Followed: https://cwiki.apache.org/confluence/display/PARQUET/
> How+To+Verify
Sorry for being late, I'm building a Spark branch based on the most
recent master to test out 1.8.2-rc1, will post my result here ASAP.
Cheng
On 1/23/17 11:43 AM, Julien Le Dem wrote:
Hi Spark dev,
Here is the voting thread for parquet 1.8.2 release.
Cheng or someone else we would appreciate
Thank you Cheng!
On Mon, Jan 23, 2017 at 12:02 PM, Cheng Lian wrote:
> Sorry for being late, I'm building a Spark branch based on the most recent
> master to test out 1.8.2-rc1, will post my result here ASAP.
>
> Cheng
>
> On 1/23/17 11:43 AM, Julien Le Dem wrote:
>
> Hi Spark dev,
> Here is the
Hi Seth,
The proposal is geared towards exactly the issue you're describing:
providing more visibility into the capacity and intentions of committers.
If there are things you'd add to it or change to improve further, it would
be great to hear ideas! The past roadmap JIRA has some more background
This thread is split off from the "Feedback on MLlib roadmap process
proposal" thread for discussing the high-level mission and goals for
MLlib. I hope this thread will collect feedback and ideas, not necessarily
lead to huge decisions.
Copying from the previous thread:
*Seth:*
"""
I would love
Along the lines of #1: the spark packages seemed to have had a good start
about two years ago: but now there are not more than a handful in general
use - e.g. databricks CSV.
When the available packages are browsed the majority are incomplete, empty,
unmaintained, or unclear.
Any ideas on how to