*Dear all,* *I have no idea when it raises an error when I run the following code.*
def getRow(data): return data.msg first_sql = "select * from logs.event where dt = '20150120' and et = 'ppc' LIMIT 10"#error #first_sql = "select * from hivecrawler.vip_crawler where src='xx' and dt='" + timestamp + "'"#correct sc = SparkContext(appName="parse") sqlContext = HiveContext(sc) data = sqlContext.sql(first_sql) file_target = "/tmp/test/logdd" data.map(getRow).saveAsTextFile(file_target) sc.stop() print 'stop' *I submit the code by following script:* /usr/local/spark-default/bin/spark-submit --master yarn-client --executor-memory 8G --num-executors 20 --executor-cores 2 --py-files a.py *It would raise a error.* *The Spark Log shows that * 15/01/30 09:46:39 ERROR metastore.RetryingHMSHandler: java.lang.OutOfMemoryError: GC overhead limit exceeded *and the python code shows that:* py4j.protocol.Py4JJavaError: An error occurred while calling o26.javaToPython. : java.lang.OutOfMemoryError: GC overhead limit exceeded at com.mysql.jdbc.SingleByteCharsetConverter.toString( SingleByteCharsetConverter.java:333) at com.mysql.jdbc.ResultSetRow.getString(ResultSetRow.java:819) at com.mysql.jdbc.ByteArrayRow.getString(ByteArrayRow.java:70) at com.mysql.jdbc.ResultSetImpl.getStringInternal( ResultSetImpl.java:5811) at com.mysql.jdbc.ResultSetImpl.getString(ResultSetImpl.java:5688) at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4985) at org.datanucleus.store.rdbms.datasource.dbcp. DelegatingResultSet.getObject(DelegatingResultSet.java:325) at org.datanucleus.store.rdbms.datasource.dbcp. DelegatingResultSet.getObject(DelegatingResultSet.java:325) at org.datanucleus.store.rdbms.query.ResultClassROF.getResultObject( ResultClassROF.java:666) at org.datanucleus.store.rdbms.query.ResultClassROF. getObject(ResultClassROF.java:309) at org.datanucleus.store.rdbms.query.ForwardQueryResult. nextResultSetElement(ForwardQueryResult.java:181) at org.datanucleus.store.rdbms.query.ForwardQueryResult$ QueryResultIterator.next(ForwardQueryResult.java:403) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql. loopJoinOrderedResult(MetaStoreDirectSql.java:665) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql. getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:429) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql. getPartitions(MetaStoreDirectSql.java:224) at org.apache.hadoop.hive.metastore.ObjectStore$1. getSqlResult(ObjectStore.java:1563) at org.apache.hadoop.hive.metastore.ObjectStore$1. getSqlResult(ObjectStore.java:1559) at org.apache.hadoop.hive.metastore.ObjectStore$ GetHelper.run(ObjectStore.java:2208) at org.apache.hadoop.hive.metastore.ObjectStore. getPartitionsInternal(ObjectStore.java:1559) at org.apache.hadoop.hive.metastore.ObjectStore. getPartitions(ObjectStore.java:1553) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke( NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RawStoreProxy. invoke(RawStoreProxy.java:108) at com.sun.proxy.$Proxy25.getPartitions(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$ HMSHandler.get_partitions(HiveMetaStore.java:2516) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke( NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingHMSHandler. invoke(RetryingHMSHandler.java:105) *It looks like a memory problem. But If I switch another hive table to get data, the code works fine.* *Any idea which direction should I start with?Config?* *Thanks.* -- 跑不完马拉松的摄影师不是好背包客。 下个目标,该是6K的峰了吧?恩。