How many CPU cores are on that machine? Read http://qr.ae/8Uv3Xq
You can also confirm the above by running the pmap utility on your process
and most of the virtual memory would be under 'anon'.
On Fri, 13 May 2016 09:11 jone, wrote:
> The virtual memory is 9G When i run org.apache.spark.example
You should be able to cast the object type to the real underlying type
(GenericRecord (if generic, which is so by default), or the actual type
class (if specific)). The underlying implementation of KafkaAvroDecoder
seems to use either one of those depending on a config switch:
https://github.com/co
General note: The /root is a protected local directory, meaning that if
your program spawns as a non-root user, it will never be able to access the
file.
On Sat, Dec 12, 2015 at 12:21 AM Zhan Zhang wrote:
> As Sean mentioned, you cannot referring to the local file in your remote
> machine (execu
Do you have all your hive jars listed in the classpath.txt /
SPARK_DIST_CLASSPATH env., specifically the hive-exec jar? Is the location
of that jar also the same on all the distributed hosts?
Passing an explicit executor classpath string may also help overcome this
(replace HIVE_BASE_DIR to the ro
The option of "spark.sql.hive.metastorePartitionPruning=true" will not work
unless you have a partition column predicate in your query. Your query of
"select * from temp.log" does not do this. The slowdown appears to be due
to the need of loading all partition metadata.
Have you also tried to see
You could take a look at Livy also:
https://github.com/cloudera/livy#welcome-to-livy-the-rest-spark-server
On Fri, Dec 11, 2015 at 8:17 AM Andrew Or wrote:
> Hello,
>
> The hidden API was implemented for use internally and there are no plans
> to make it public at this point. It was originally i
Are you sure you do not have any messages preceding the trace, such as one
quoting which class is found to be missing? That'd be helpful to see and
suggest what may (exactly) be going wrong. It appear similar to
https://issues.apache.org/jira/browse/SPARK-8368, but I cannot tell for
certain cause I
Are you certain you are providing Spark with the right Hive configuration?
Is there a valid HIVE_CONF_DIR defined in your spark-env.sh, with a
hive-site.xml detailing the location/etc. of the metastore service and/or
DB?
Without a valid metastore config, Hive may switch to using a local
(embedded)
I couldn't spot it anywhere on the web so it doesn't look to be contributed
yet, but note that the HDFS APIs are already available per
https://issues.apache.org/jira/browse/HDFS-6634 (you can see the test case
for an implementation guideline in Java:
https://github.com/apache/hadoop/blob/trunk/hado
> and then calling getRowID() in the lambda, because the function gets sent
to the executor right?
Yes, that is correct (vs. a one time evaluation, as was with your
assignment earlier).
On Thu, Dec 10, 2015 at 3:34 AM Pinela wrote:
> Hey Bryan,
>
> Thank for the answer ;) I knew it was a basic
While the DataFrame lookups can identify that anonymous column name,
SparkSql does not appear to do so. You should use an alias instead:
val rows = Seq (("X", 1), ("Y", 5), ("Z", 4))
val rdd = sc.parallelize(rows)
val dataFrame = rdd.toDF("user","clicks")
val sumDf = dataFrame.groupBy("user").agg(
11 matches
Mail list logo