On Thu, Jan 29, 2015 at 6:36 PM, QiuxuanZhu <ilsh1...@gmail.com> wrote: > Dear all, > > I have no idea when it raises an error when I run the following code. > > def getRow(data): > return data.msg > > first_sql = "select * from logs.event where dt = '20150120' and et = 'ppc' > LIMIT 10"#error > #first_sql = "select * from hivecrawler.vip_crawler where src='xx' and dt='" > + timestamp + "'"#correct > sc = SparkContext(appName="parse") > sqlContext = HiveContext(sc) > data = sqlContext.sql(first_sql) > file_target = "/tmp/test/logdd" > data.map(getRow).saveAsTextFile(file_target) > sc.stop() > print 'stop' > > I submit the code by following script: > > > /usr/local/spark-default/bin/spark-submit --master yarn-client > --executor-memory 8G --num-executors 20 --executor-cores 2 --py-files a.py > > It would raise a error. > > The Spark Log shows that > > 15/01/30 09:46:39 ERROR metastore.RetryingHMSHandler: > java.lang.OutOfMemoryError: GC overhead limit exceeded
This says that the metastore is out of memory. > and the python code shows that: > > py4j.protocol.Py4JJavaError: An error occurred while calling > o26.javaToPython. > : java.lang.OutOfMemoryError: GC overhead limit exceeded > at > com.mysql.jdbc.SingleByteCharsetConverter.toString(SingleByteCharsetConverter.java:333) > at com.mysql.jdbc.ResultSetRow.getString(ResultSetRow.java:819) > at com.mysql.jdbc.ByteArrayRow.getString(ByteArrayRow.java:70) > at > com.mysql.jdbc.ResultSetImpl.getStringInternal(ResultSetImpl.java:5811) > at com.mysql.jdbc.ResultSetImpl.getString(ResultSetImpl.java:5688) > at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4985) > at > org.datanucleus.store.rdbms.datasource.dbcp.DelegatingResultSet.getObject(DelegatingResultSet.java:325) > at > org.datanucleus.store.rdbms.datasource.dbcp.DelegatingResultSet.getObject(DelegatingResultSet.java:325) > at > org.datanucleus.store.rdbms.query.ResultClassROF.getResultObject(ResultClassROF.java:666) > at > org.datanucleus.store.rdbms.query.ResultClassROF.getObject(ResultClassROF.java:309) > at > org.datanucleus.store.rdbms.query.ForwardQueryResult.nextResultSetElement(ForwardQueryResult.java:181) > at > org.datanucleus.store.rdbms.query.ForwardQueryResult$QueryResultIterator.next(ForwardQueryResult.java:403) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.loopJoinOrderedResult(MetaStoreDirectSql.java:665) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:429) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitions(MetaStoreDirectSql.java:224) > at > org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:1563) > at > org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:1559) > at > org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2208) > at > org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsInternal(ObjectStore.java:1559) > at > org.apache.hadoop.hive.metastore.ObjectStore.getPartitions(ObjectStore.java:1553) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:108) > at com.sun.proxy.$Proxy25.getPartitions(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions(HiveMetaStore.java:2516) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105) > > It looks like a memory problem. But If I switch another hive table to get > data, the code works fine. > > Any idea which direction should I start with?Config? > > Thanks. > > -- > 跑不完马拉松的摄影师不是好背包客。 > 下个目标,该是6K的峰了吧?恩。 --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org