Re: Can anyone recommend me a inter-language data file format?

2008-11-01 Thread Bryan Duxbury
Agree, we use Thrift at Rapleaf for this purpose. It's trivial to make a ThriftWritable if you want to be crafty, but you can also just use byte[]s and do the serialization and deserialization yourself. -Bryan On Nov 1, 2008, at 8:01 PM, Alex Loddengaard wrote: Take a look at Thrift:

Re: Can anyone recommend me a inter-language data file format?

2008-11-01 Thread Chris Collins
Consider talking to Doug Cutting. He is playing with the idea of a variant of JSON, I am sure he would love your help. Specifically he is looking at a coding scheme that is easy to read, does not duplicate key names per record and supports file splits. C On Nov 1, 2008, at 8:20 PM, Zhou,

Re: Can anyone recommend me a inter-language data file format?

2008-11-01 Thread Zhou, Yunqing
Can thrift be easily used in hadoop? a lot of things should be written, input/output format, writables,split method,etc. On Sun, Nov 2, 2008 at 11:01 AM, Alex Loddengaard <[EMAIL PROTECTED]> wrote: > Take a look at Thrift: > > > Alex > > On Sat, Nov 1, 200

Re: Can anyone recommend me a inter-language data file format?

2008-11-01 Thread Zhou, Yunqing
embedded database cannot handle large-scale data, not very efficient I have about 1 billion records. these records should be passed through some modules. I mean a data exchange format similar to XML but more flexible and efficient. On Sun, Nov 2, 2008 at 10:49 AM, lamfeeling <[EMAIL PROTECTED]> wr

Re: Can anyone recommend me a inter-language data file format?

2008-11-01 Thread Chris Collins
Sleepycat has a java edition: http://www.oracle.com/technology/products/berkeley-db/index.html I has an "interesting" open source license. If you dont need to ship it on an install disk your probably good to go with that too. you could also consider Derby. C On Nov 1, 2008, at 7:49 PM, lam

Re: Can anyone recommend me a inter-language data file format?

2008-11-01 Thread Alex Loddengaard
Take a look at Thrift: Alex On Sat, Nov 1, 2008 at 7:15 PM, Zhou, Yunqing <[EMAIL PROTECTED]> wrote: > The project I focused on has many modules written in different languages > (several modules are hadoop jobs). > So I'd like to utilize a common record b

Re:Can anyone recommend me a inter-language data file format?

2008-11-01 Thread lamfeeling
Consider Embeded Database? Berkeley DB, written in C++, and have interface for many languages. 在2008-11-02?10:15:22,"Zhou,?Yunqing"?<[EMAIL PROTECTED]>?写道: >The?project?I?focused?on?has?many?modules?written?in?different?languages >(several?modules?are?hadoop?jobs). >So?I'd?like?to?utilize?a

Can anyone recommend me a inter-language data file format?

2008-11-01 Thread Zhou, Yunqing
The project I focused on has many modules written in different languages (several modules are hadoop jobs). So I'd like to utilize a common record based data file format for data exchange. XML is not efficient for appending new records. SequenceFile seems not having API of other languages except Ja

Any Way to Skip Mapping?

2008-11-01 Thread Billy Pearson
I have a job that merges multi output directories of MR jobs that run over time. The output of them are all the same and the MR that merges them uses a mapper that just outputs the same key,value as its is given so basically the same as the IdentityMapper The Problem I am seeing is as I add

Re: How to read mapreduce output in HDFS directory from Web Application

2008-11-01 Thread Alex Loddengaard
I suppose it depends on what you're trying to do. One approach would be to output SQL insert statements and import them in to a database that a web app could query. On the other hand, you could output XML or JSON that can be queried by an AJAX app. Read more about MySQL connectivity here:

How to cache a folder in HDFS?

2008-11-01 Thread lamfeeling
Hi all! I have a problem here. In my program, my code will read some config files in a folder, but it always fail on hadoop and says "Can not find the file...", I looked up the reference, it told me to cache the files in HDFS instead read the file from local. Now I can cache a file, But I

How to read mapreduce output in HDFS directory from Web Application

2008-11-01 Thread GO-HADOOP
I am new to HADOOP, i am trying to understand what is the efficient method to read the output file from HDFS and display the result in simple web application? Thanks -- View this message in context: http://www.nabble.com/How-to-read-mapreduce-output-in-HDFS-directory-from-Web-Application-tp202

Please help, don't know how to solve--java.io.IOException: WritableName can't load class

2008-11-01 Thread Mudong Lu
Hello, guys, I am very new to hadoop. I was trying to read nutch data files using a script i found on http://wiki.apache.org/nutch/Getting_Started . And after 2 days of trying, I still cannot get it to work. now the error i got is "java.lang.RuntimeException: java.io.IOException: WritableName can'