Did you happened to try this?
JavaPairRDD<Integer, String> hadoopFile = sc.hadoopFile( "/sigmoid", DataInputFormat.class, LongWritable.class, Text.class) Thanks Best Regards On Tue, Jun 23, 2015 at 6:58 AM, 付雅丹 <yadanfu1...@gmail.com> wrote: > Hello, everyone! I'm new in spark. I have already written programs in > Hadoop2.5.2, where I defined my own InputFormat and OutputFormat. Now I > want to move my codes to spark using java language. The first problem I > encountered is how to transform big txt file in local storage to RDD, which > is compatible to my program written in hadoop. I found that there are > functions in SparkContext which maybe helpful. But I don't know how to use > them. > E.G. > > public <K,V,F extends org.apache.hadoop.mapreduce.InputFormat<K,V>> RDD > <http://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/rdd/RDD.html><scala.Tuple2<K,V>> > newAPIHadoopFile(String path, > Class<F> fClass, > Class<K> kClass, > Class<V> vClass, > org.apache.hadoop.conf.Configuration conf) > > Get an RDD for a given Hadoop file with an arbitrary new API InputFormat > and extra configuration options to pass to the input format. > > '''Note:''' Because Hadoop's RecordReader class re-uses the same Writable > object for each record, directly caching the returned RDD or directly > passing it to an aggregation or shuffle operation will create many > references to the same object. If you plan to directly cache, sort, or > aggregate Hadoop writable objects, you should first copy them using a map > function. > In java, the following is wrong. > > /////option one > Configuration confHadoop = new Configuration(); > JavaPairRDD<LongWritable,Text> distFile=sc.newAPIHadoopFile( > "hdfs://cMaster:9000/wcinput/data.txt", > DataInputFormat,LongWritable,Text,confHadoop); > > /////option two > Configuration confHadoop = new Configuration(); > DataInputFormat input=new DataInputFormat(); > LongWritable longType=new LongWritable(); > Text text=new Text(); > JavaPairRDD<LongWritable,Text> distFile=sc.newAPIHadoopFile( > "hdfs://cMaster:9000/wcinput/data.txt", > input,longType,text,confHadoop); > > Can anyone help me? Thank you so much. > >