I'm new to this field, but it seems like most "Big Data" examples --
Spark's included -- begin with reading in flat lines of text from a file.

How would I go about having Spark turn a large JSON file into an RDD?

So the file would just be a text file that looks like this:

[{...}, {...}, ...]


where the individual JSON objects are arbitrarily complex (i.e. not
necessarily flat) and may or may not be on separate lines.

Basically, I'm guessing Spark would need to parse the JSON since it cannot
rely on newlines as a delimiter. That sounds like a costly thing.

Is JSON a "bad" format to have to deal with, or can Spark efficiently
ingest and work with data in this format? If it can, can I get a pointer as
to how I would do that?

Nick




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Having-Spark-read-a-JSON-file-tp1963.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to