Well I don't know about postgres but you can limit the number of columns abd rows fetched via JDBC at source rather than loading and filtering them in Spark
val c = HiveContext.load("jdbc", Map("url" -> _ORACLEserver, "dbtable" -> "(SELECT to_char(CHANNEL_ID) AS CHANNEL_ID, CHANNEL_DESC FROM sh.channels where ROWNUM <= 10000)", "user" -> _username, "password" -> _password)) or in your case "dbtable" -> "(SELECT COUNT(1) FROM FROM sh.channels where ROWNUM <= 10000)", c.show() HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 14 May 2016 at 04:56, Jyun-Fan Tsai <jyun...@gmail.com> wrote: > I try to load some rows from a big SQL table. Here is my code: > > === > jdbcDF = sqlContext.read.format("jdbc").options( > url="jdbc:postgresql://...", > dbtable="mytable", > partitionColumn="t", > lowerBound=1451577600, > upperBound=1454256000, > numPartitions=1).load() > print(jdbcDF.count()) > === > > The code runs very slow because Spark tries to load whole table. > I know there is a solution that uses subquery. I can use: > > dbtable="(SELECT * FROM mytable WHERE t>=1451577600 AND t<= 1454256000) > tmp". > However, it's still slow because the subquery creates a temp table. > > I would like to know how can I specify where filters so I don't need > to load the whole table? > > From spark source code I guess the filter in JDBCRelation is the > solution I'm looking for. However, I don't know how to create a > filter and pass it to jdbc driver. > === > > https://github.com/apache/spark/blob/40ed2af587cedadc6e5249031857a922b3b234ca/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala > === > > > > -- > Thanks for help, > Jyun-Fan Tsai > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >