Re: Error when get data from hive table. Use python code.

Davies Liu Thu, 29 Jan 2015 20:12:13 -0800

On Thu, Jan 29, 2015 at 6:36 PM, QiuxuanZhu <ilsh1...@gmail.com> wrote:
> Dear all,
>
> I have no idea when it raises an error when I run the following code.
>
> def getRow(data):
>         return data.msg
>
> first_sql = "select * from logs.event where dt = '20150120' and et = 'ppc'
> LIMIT 10"#error
> #first_sql = "select * from hivecrawler.vip_crawler where src='xx' and dt='"
> + timestamp + "'"#correct
> sc = SparkContext(appName="parse")
> sqlContext = HiveContext(sc)
> data = sqlContext.sql(first_sql)
> file_target = "/tmp/test/logdd"
> data.map(getRow).saveAsTextFile(file_target)
> sc.stop()
> print 'stop'
>
> I submit the code by following script:
>
>
> /usr/local/spark-default/bin/spark-submit --master yarn-client
> --executor-memory 8G --num-executors 20 --executor-cores 2 --py-files a.py
>
> It would raise a error.
>
> The Spark Log shows that
>
> 15/01/30 09:46:39 ERROR metastore.RetryingHMSHandler:
> java.lang.OutOfMemoryError: GC overhead limit exceeded



This says that the metastore is out of memory.

> and the python code shows that:
>
> py4j.protocol.Py4JJavaError: An error occurred while calling
> o26.javaToPython.
> : java.lang.OutOfMemoryError: GC overhead limit exceeded
>         at
> com.mysql.jdbc.SingleByteCharsetConverter.toString(SingleByteCharsetConverter.java:333)
>         at com.mysql.jdbc.ResultSetRow.getString(ResultSetRow.java:819)
>         at com.mysql.jdbc.ByteArrayRow.getString(ByteArrayRow.java:70)
>         at
> com.mysql.jdbc.ResultSetImpl.getStringInternal(ResultSetImpl.java:5811)
>         at com.mysql.jdbc.ResultSetImpl.getString(ResultSetImpl.java:5688)
>         at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4985)
>         at
> org.datanucleus.store.rdbms.datasource.dbcp.DelegatingResultSet.getObject(DelegatingResultSet.java:325)
>         at
> org.datanucleus.store.rdbms.datasource.dbcp.DelegatingResultSet.getObject(DelegatingResultSet.java:325)
>         at
> org.datanucleus.store.rdbms.query.ResultClassROF.getResultObject(ResultClassROF.java:666)
>         at
> org.datanucleus.store.rdbms.query.ResultClassROF.getObject(ResultClassROF.java:309)
>         at
> org.datanucleus.store.rdbms.query.ForwardQueryResult.nextResultSetElement(ForwardQueryResult.java:181)
>         at
> org.datanucleus.store.rdbms.query.ForwardQueryResult$QueryResultIterator.next(ForwardQueryResult.java:403)
>         at
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.loopJoinOrderedResult(MetaStoreDirectSql.java:665)
>         at
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:429)
>         at
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitions(MetaStoreDirectSql.java:224)
>         at
> org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:1563)
>         at
> org.apache.hadoop.hive.metastore.ObjectStore$1.getSqlResult(ObjectStore.java:1559)
>         at
> org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2208)
>         at
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsInternal(ObjectStore.java:1559)
>         at
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitions(ObjectStore.java:1553)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:108)
>         at com.sun.proxy.$Proxy25.getPartitions(Unknown Source)
>         at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions(HiveMetaStore.java:2516)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
>
> It looks like a memory problem. But If I switch another hive table to get
> data, the code works fine.
>
> Any idea which direction should I start with?Config?
>
> Thanks.
>
> --
> 跑不完马拉松的摄影师不是好背包客。
> 下个目标，该是6K的峰了吧？恩。

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Error when get data from hive table. Use python code.

Reply via email to