Re: Creating a DataFrame from scratch

Everett Anderson Fri, 22 Jul 2016 09:42:11 -0700

Actually, sorry, my mistake, you're calling

DataFrame df = sqlContext.createDataFrame(data,
org.apache.spark.sql.types.NumericType.class);


and giving it a list of objects which aren't NumericTypes, but the
wildcards in the signature let it happen.

I'm curious what'd happen if you gave it Integer.class, but I suspect it
still won't work because Integer may not have the bean-style getters.


On Fri, Jul 22, 2016 at 9:37 AM, Everett Anderson <ever...@nuna.com> wrote:

> Hey,
>
> I think what's happening is that you're calling this createDataFrame
> method
> <https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/SQLContext.html#createDataFrame(java.util.List,%20java.lang.Class)>
> :
>
> createDataFrame(java.util.List<?> data, java.lang.Class<?> beanClass)
>
> which expects a JavaBean-style class with get and set methods for the
> members, but Integer doesn't have such a getter.
>
> I bet there's an easier way if you just want a single-column DataFrame of
> a primitive type, but one way that would work is to manually construct the
> Rows using RowFactory.create()
> <https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/RowFactory.html#create(java.lang.Object...)>
> and assemble the DataFrame from that like
>
> List<Row> rows = convert your List<Integer> to this in a loop with
> RowFactory.create()
>
> StructType schema = DataTypes.createStructType(Collections.singletonList(
>      DataTypes.createStructField("int_field", DataTypes.IntegerType,
> true)));
>
> DataFrame intDataFrame = sqlContext.createDataFrame(rows, schema);
>
>
>
> On Fri, Jul 22, 2016 at 7:53 AM, Jean Georges Perrin <j...@jgp.net> wrote:
>
>>
>>
>> I am trying to build a DataFrame from a list, here is the code:
>>
>> private void start() {
>> SparkConf conf = new SparkConf().setAppName("Data Set from Array"
>> ).setMaster("local");
>> SparkContext sc = new SparkContext(conf);
>> SQLContext sqlContext = new SQLContext(sc);
>>
>> Integer[] l = new Integer[] { 1, 2, 3, 4, 5, 6, 7 };
>> List<Integer> data = Arrays.asList(l);
>>
>> System.out.println(data);
>>
>>
>> DataFrame df = sqlContext.createDataFrame(data,
>> org.apache.spark.sql.types.NumericType.class);
>> df.show();
>> }
>>
>> My result is (unpleasantly):
>>
>> [1, 2, 3, 4, 5, 6, 7]
>> ++
>> ||
>> ++
>> ||
>> ||
>> ||
>> ||
>> ||
>> ||
>> ||
>> ++
>>
>> I also tried with:
>> org.apache.spark.sql.types.NumericType.class
>> org.apache.spark.sql.types.IntegerType.class
>> org.apache.spark.sql.types.ArrayType.class
>>
>> I am probably missing something super obvious :(
>>
>> Thanks!
>>
>> jg
>>
>>
>>
>

Re: Creating a DataFrame from scratch

Reply via email to