expect specific record but get generic

2013-10-21 Thread Koert Kuipers
i am observing that on a particular system (spark) my code breaks in that avro does not return the specific record i expected but instead returns generic records. i suspect this is some class loading issue on the distributed system (something about how the classpath is constructed for the spark

Re: expect specific record but get generic

2013-10-21 Thread Koert Kuipers
then the generic representation is used. So, yes, this sounds like a classpath problem. On Mon, Oct 21, 2013 at 8:41 AM, Koert Kuipers ko...@tresata.com wrote: i am observing that on a particular system (spark) my code breaks in that avro does not return the specific record i expected

Re: AVRO M/R: ClassNotFoundException: ...Paranamer

2012-12-14 Thread Koert Kuipers
i had this too ad some point. i just added paranemer to distributed cache (or classpath on hadoop) and it went away On Thu, Dec 13, 2012 at 2:21 PM, Terry Healy the...@bnl.gov wrote: paran

Re: version of avro

2012-10-20 Thread Koert Kuipers
try this first before going down the self patching route. Regards Jacob -- From: kkrugler_li...@transpac.com Subject: Re: version of avro Date: Fri, 19 Oct 2012 13:16:24 -0700 To: user@avro.apache.org On Oct 19, 2012, at 1:03pm, Koert Kuipers wrote: i

record schema names... a nuisance?

2012-10-20 Thread Koert Kuipers
we are on a fairly old avro (1.5.4) so not sure my observations apply to newer versions. i noticed that when i read from avro files in hadoop it does not expect the reader's schema (fully qualified) name to be equal to the writer's schema (fully qualified) name. this allows me to read from files

version of avro

2012-10-19 Thread Koert Kuipers
i noticed avro version 1.5.4 is included with some version/distros of hadoop and hive... is there a reason why 1.5.4 is included specifically and not newer ones? are there some incompatibilities to be aware of? i would like to use a newer version thanks! koert

using strings instead of utf8

2012-10-19 Thread Koert Kuipers
how do i tell (generic) avro to use strings for values instead of it's own utf8 class? i saw a way of doing it by modifying the schemas (adding a property). i also saw mention of a way to do it if you use maven (which i don't). is there a generic way to do this? like a system property perhaps?

Re: abuse of aliases?

2012-02-03 Thread Koert Kuipers
thanks doug On Fri, Feb 3, 2012 at 3:58 PM, Doug Cutting cutt...@apache.org wrote: On 02/02/2012 08:03 PM, Koert Kuipers wrote: i have many avro files with similar data (same meaning, same type, etc.) but different names for the fields. can i create a reader schema that for each field

Re: re-use or copy a Field

2012-02-03 Thread Koert Kuipers
ok i will do thanks On Fri, Feb 3, 2012 at 7:26 PM, Doug Cutting cutt...@apache.org wrote: On 02/03/2012 01:57 PM, Koert Kuipers wrote: I could create a copy myself using the Field constructor, however that way i lose the aliases and props. In avro 1.5.4 there is no way to get

abuse of aliases?

2012-02-02 Thread Koert Kuipers
i have many avro files with similar data (same meaning, same type, etc.) but different names for the fields. can i create a reader schema that for each field that i am interested in maps it to all the different possible fields in the files by using aliases, and then run map-reduce over the files

using avro schemas to select columns (abusing versioning?)

2012-01-23 Thread Koert Kuipers
we are working on a very sparse table with say 500 columns where we do batch uploads that typically only contain a subset of the columns (say 100), and we run multiple map-reduce queries on subsets of the columns (typically less than 50 columns go into a single map-reduce job). my question is the

override avro generic representation

2011-12-01 Thread Koert Kuipers
Is there a way to override the avro generic representation, or perhaps an easy way to create my own? For example, for FIXED i would like Byte[] instead of ByteBuffer, for STRING i would prefer String over CharArray, for arrays i would like to have a List instead of a Collection, etc. Right now i

reader in hadoop without reader's schema

2011-12-01 Thread Koert Kuipers
I am reading from avro container files in hadoop. I know the container files have a (writers) schema stored in them. My reader specifies it's schema using avro.input.schema job parameter. This way any schema changes are gracefully handled with both schema's present. However, i dont always need

avro compare() operation in hadoop

2011-11-03 Thread Koert Kuipers
If i use Avro in hadoop (and read my data from Avro container files), will i automatically get a very fast comparison for sorting in Hadoop (similar to what WritableComparator provides)? Are there benchmarks on sorting with Avro vs Writables? Best, Koert