Re: Accumulo and OSGi

2014-04-09 Thread Corey Nolet
Geoffry, As Josh pointed out, you should only need the Hadoop libraries on the client side to use the Text object. This means you won't have to go through the pain of placing the xml files in your root bundles. Did you try the JAAS export from the packages in your container? Did that help? I agr

Re: Accumulo and OSGi

2014-04-09 Thread Geoffry Roberts
Corey, Interesting you have Hadoop working in Karaf. I'm using equinox. It also sounds as if I don't need to access HFDS in order to get Accumulo to work in OSGi. If I understand you correctly, I only need have Text available. I'll look into that. It does answer my question and maybe I can avo

Re: Accumulo and OSGi

2014-04-09 Thread Corey Nolet
Geoffry, Interesting you have Hadoop working in Karaf. I'm using equinox. Sure, but are we talking Karaf-specific features here? You just want a Hadoop Client bundle that works, right? The author of the Karaf-Hadoop project already worked through the classpath issues so it's not a bad starting p

Re: Advice on increasing ingest rate

2014-04-09 Thread Mike Hugo
On Tue, Apr 8, 2014 at 4:35 PM, Adam Fuchs wrote: > MIke, > > What version of Accumulo are you using, how many tablets do you have, and > how many threads are you using for minor and major compaction pools? Also, > how big are the keys and values that you are using? > > 1.4.5 6 threads each for m

Re: Advice on increasing ingest rate

2014-04-09 Thread Mike Hugo
On Tue, Apr 8, 2014 at 5:35 PM, David Medinets wrote: > 20 cores and just one SSD? Is there a standard recommendation for a core > to SSD ratio? > > Other questions: > > How are you sharding your data (i.e., what does your row look like)? > we do something kind of like the entity attribute / grap

Re: Advice on increasing ingest rate

2014-04-09 Thread Mike Hugo
Currently using the default AccumuloOutputFormat settings which I believe is 2 threads On Tue, Apr 8, 2014 at 5:36 PM, wrote: > > > How many threads are you using in the AccumuloOutputFormat? What is your > latency set to? > > > > *From:* Adam Fuchs [mailto:afu...@apache.org] > *Sent:* Tuesday,

Re: Advice on increasing ingest rate

2014-04-09 Thread David Medinets
Pre-split as much as possible. Accumulo will take of spreading the tablets across the cluster. You have five nodes but only four tablets. This means that one node is not getting any data during the ingest process. If you can't pre-split because you don't have enough variability in the row value, an

Re: Advice on increasing ingest rate

2014-04-09 Thread Mike Hugo
Sorry - my mistake I read tables not tablets. The map reduce jobs insert into 4 different tables, each has between 100 and 200 tablets On Wed, Apr 9, 2014 at 4:03 PM, David Medinets wrote: > Pre-split as much as possible. Accumulo will take of spreading the tablets > across the cluster. You hav

Re: Advice on increasing ingest rate

2014-04-09 Thread Adam Fuchs
If the average is around 1k per k/v entry, then I would say that 400MB/s is very good performance for incremental/streaming ingest into Accumulo on that cluster. However, I suspect that your entries are probably not that big on average. Do you have a measurement for MB/s ingest? Adam On Apr 9, 201