*Dear all,*
*I have no idea when it raises an error when I run the following code.*

def getRow(data):
        return data.msg

first_sql = "select * from logs.event where dt = '20150120' and et = 'ppc'
LIMIT 10"#error
#first_sql = "select * from hivecrawler.vip_crawler where src='xx' and
dt='" + timestamp + "'"#correct
sc = SparkContext(appName="parse")
sqlContext = HiveContext(sc)
data = sqlContext.sql(first_sql)
file_target = "/tmp/test/logdd"
data.map(getRow).saveAsTextFile(file_target)
sc.stop()
print 'stop'

*I submit the code by following script:*


/usr/local/spark-default/bin/spark-submit --master yarn-client
--executor-memory 8G --num-executors 20 --executor-cores 2 --py-files a.py



*It would raise a error.*
*The Spark Log shows that *

15/01/30 09:46:39 ERROR metastore.RetryingHMSHandler:
java.lang.OutOfMemoryError: GC overhead limit exceeded

*and the python code shows that:*

py4j.protocol.Py4JJavaError: An error occurred while calling
o26.javaToPython.
: java.lang.OutOfMemoryError: GC overhead limit exceeded
        at com.mysql.jdbc.SingleByteCharsetConverter.toString(
SingleByteCharsetConverter.java:333)
        at com.mysql.jdbc.ResultSetRow.getString(ResultSetRow.java:819)
        at com.mysql.jdbc.ByteArrayRow.getString(ByteArrayRow.java:70)
        at com.mysql.jdbc.ResultSetImpl.getStringInternal(
ResultSetImpl.java:5811)
        at com.mysql.jdbc.ResultSetImpl.getString(ResultSetImpl.java:5688)
        at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4985)
        at org.datanucleus.store.rdbms.datasource.dbcp.
DelegatingResultSet.getObject(DelegatingResultSet.java:325)
        at org.datanucleus.store.rdbms.datasource.dbcp.
DelegatingResultSet.getObject(DelegatingResultSet.java:325)
        at org.datanucleus.store.rdbms.query.ResultClassROF.getResultObject(
ResultClassROF.java:666)
        at org.datanucleus.store.rdbms.query.ResultClassROF.
getObject(ResultClassROF.java:309)
        at org.datanucleus.store.rdbms.query.ForwardQueryResult.
nextResultSetElement(ForwardQueryResult.java:181)
        at org.datanucleus.store.rdbms.query.ForwardQueryResult$
QueryResultIterator.next(ForwardQueryResult.java:403)
        at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.
loopJoinOrderedResult(MetaStoreDirectSql.java:665)
        at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.
getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:429)
        at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.
getPartitions(MetaStoreDirectSql.java:224)
        at org.apache.hadoop.hive.metastore.ObjectStore$1.
getSqlResult(ObjectStore.java:1563)
        at org.apache.hadoop.hive.metastore.ObjectStore$1.
getSqlResult(ObjectStore.java:1559)
        at org.apache.hadoop.hive.metastore.ObjectStore$
GetHelper.run(ObjectStore.java:2208)
        at org.apache.hadoop.hive.metastore.ObjectStore.
getPartitionsInternal(ObjectStore.java:1559)
        at org.apache.hadoop.hive.metastore.ObjectStore.
getPartitions(ObjectStore.java:1553)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.hive.metastore.RawStoreProxy.
invoke(RawStoreProxy.java:108)
        at com.sun.proxy.$Proxy25.getPartitions(Unknown Source)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$
HMSHandler.get_partitions(HiveMetaStore.java:2516)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.
invoke(RetryingHMSHandler.java:105)



*It looks like a memory problem. But If I switch another hive table to get
data, the code works fine.*
*Any idea which direction should I start with?Config?*

*Thanks.*

-- 
跑不完马拉松的摄影师不是好背包客。
下个目标,该是6K的峰了吧?恩。

Reply via email to