Re: Wrapping around BitSet with the Writable interface

2013-05-13 Thread Jim Twensky
Thanks for the suggestions. I ended up switching to jdk 1.7+ just to make the code more readable. I will take a look at the EWAH implementation as well. Jim On Sun, May 12, 2013 at 3:40 PM, Bertrand Dechoux wrote: > You can disregard my links as their are only valid for java 1.7+. > The JavaSer

Re: Wrapping around BitSet with the Writable interface

2013-05-12 Thread Bertrand Dechoux
You can disregard my links as their are only valid for java 1.7+. The JavaSerialization might clean your code but shouldn't bring a significant boost in performance. The EWAH implementation has, at least, the methods you are looking for : serialize / deserialize. Regards Bertrand Note to myself

Re: Wrapping around BitSet with the Writable interface

2013-05-12 Thread Ted Dunning
Another interesting alternative is the EWAH implementation of java bitsets that allow efficient compressed bitsets with very fast OR operations. https://github.com/lemire/javaewah See also https://code.google.com/p/sparsebitmap/ by the same authors. On Sun, May 12, 2013 at 1:11 PM, Bertrand Dec

Re: Wrapping around BitSet with the Writable interface

2013-05-12 Thread Bertrand Dechoux
In order to make the code more readable, you could start by using the methods toByteArray() and valueOf(bytes) http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#toByteArray%28%29 http://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html#valueOf%28byte[]%29 Regards Bertrand On

Re: Wrapping around BitSet with the Writable interface

2013-05-12 Thread Harsh J
You can perhaps consider using the experimental JavaSerialization [1] enhancement to skip transforming to Writables/other-serialization-formats. It may be slower but looks like you are looking for a way to avoid transforming objects. Enable by adding the class org.apache.hadoop.io.serializer.JavaS

Wrapping around BitSet with the Writable interface

2013-05-12 Thread Jim Twensky
I have large java.util.BitSet objects that I want to bitwise-OR using a MapReduce job. I decided to wrap around each object using the Writable interface. Right now I convert each BitSet to a byte array and serialize the byte array on disk. Converting them to byte arrays is a bit inefficient but I