Yes, actually that is what I mean exactly. And maybe you missed my last response, you can use the API: jsonRDD(json:RDD[String], schema:StructType) to clearly clarify your schema. For numbers bigger than Long, we can use DecimalType.
Thanks, Daoyuan From: Tobias Pfeiffer [mailto:t...@preferred.jp] Sent: Tuesday, January 20, 2015 9:26 AM To: Wang, Daoyuan Cc: user Subject: Re: MatchError in JsonRDD.toLong Hi, On Fri, Jan 16, 2015 at 6:14 PM, Wang, Daoyuan <daoyuan.w...@intel.com<mailto:daoyuan.w...@intel.com>> wrote: The second parameter of jsonRDD is the sampling ratio when we infer schema. OK, I was aware of this, but I guess I understand the problem now. My sampling ratio is so low that I only see the Long values of data items and infer it's a Long. When I meet the data that's actually longer than Long, I get the error I posted; basically it's the same situation as when specifying a wrong schema manually. So is there any way around this other than increasing the sample ratio to discover also the very BigDecimal-sized numbers? Thanks Tobias