If you have only 4/8 tablets for 4 tservers, you're not really parallelizing well.
That doesn't explain a 5 minute hold time though, that is strange. How large is your in memory map size? On Fri, Jul 31, 2015 at 11:53 AM Hai Pham <[email protected]> wrote: > Hi Josh and John, > > > Correct. Since one of my constraint was the time, I tested with wal flush > and wal disabled and the the lost data case happened in wal disabled mode - > my mistake for not having described. > > > I have 1 master + 16 hadoop slaves under Accumulo, all are Centos > 6.5 physical boxes times at least 500GB, 24G RAM each, but the network is > only 1G. DFS replication = 3 by default. I tested with 4 and 8 splits, the > hold time problem was likely happen more often in 4 splits. And you are > right, changing flushing scheme got the problem remediated. > > > Thank you a lot! > > Hai > ------------------------------ > *From:* John Vines <[email protected]> > *Sent:* Friday, July 31, 2015 10:29 AM > *To:* [email protected] > > *Subject:* Re: How to control Minor Compaction by programming > Data could be lost if walogs were disabled or configured to use a poor > flushing mechanism. > > However, I'm also concerned about the hold times from a single ingest > being enough to bring down a server. What's the environment you're running > in? Are these virtualized or real servers? How many splits did you make. > How many disks per node do you have? And are you using default hdfs > replication? > > On Fri, Jul 31, 2015 at 11:11 AM Josh Elser <[email protected]> wrote: > >> >> Hai Pham wrote: >> > Hi Keith, >> > >> > >> > I have 4 tablet servers + 1 master. I also did a pre-split before >> > ingesting and it increased the speed a lot. >> > >> > >> > And you're right, when I created too many ingest threads, many of them >> > were on the queue of thread pools and the hold time will increases. In >> > some intense ingest, there was a case when a tablet was killed by master >> > for the hold time exceeded 5 min. In this situation, all Tablets were in >> > stuck. Only after that one is dead, the ingest was back with the >> > comparable speed. But the entries in dead tablet were all gone and lost >> > to the table. >> >> You're saying that you lost data? If a server dies, all of the tablets >> that were hosted there are reassigned to other servers. This is done in >> a manner that guarantees that there is no data lost in this transition. >> If you actually lost data, this would be a critical bug, but I would >> certainly hope you just didn't realize that the data was automatically >> being hosted by another server. >> >> > I have had no idea to repair this except for regulating the number of >> > ingest threads and speed to make it more friendly to the terminal of >> > Accumulo itself. >> > >> > >> > Another myth to me is that when I did a pre-split to, e.g. 8 tablets. >> > But along with the ingest operation, the tablet number increases (e.g. >> > 10, 14 or bigger). Any idea? >> >> Yep, Accumulo will naturally split tablets when they exceed a certain >> size (1GB by default for normal tables). Unless you increase the >> property table.split.threshold, as you ingest more data, you will >> observe more tablets. >> >> Given enough time, Accumulo will naturally split your table enough. >> Pre-splitting quickly gets you to a good level of performance right away. >> >> > >> > Hai >> > ------------------------------------------------------------------------ >> > *From:* Keith Turner <[email protected]> >> > *Sent:* Friday, July 31, 2015 8:39 AM >> > *To:* [email protected] >> > *Subject:* Re: How to control Minor Compaction by programming >> > How many tablets do you have? Entire tablets are minor compacted at >> > once. If you have 1 tablet per tablet server, then minor compactions >> > will have a lot of work to do at once. While this work is being done, >> > the tablet servers memory may fill up, leading to writes being held. >> > >> > If you have 10 tablets per tablet server, then tablets can be compacted >> > in parallel w/ less work to do at any given point in time. This can >> > avoid memory filling up and writes being held. >> > >> > In short, its possible that adding good split points to the table (and >> > therefore creating more tablets) may help w/ this issue. >> > >> > Also, are you seeing hold times? >> > >> > On Thu, Jul 30, 2015 at 11:24 PM, Hai Pham < >> [email protected] >> > <mailto:[email protected]>> wrote: >> > >> > Hey William, Josh and David, >> > >> > Thanks for explaining, I might not have been clear: I used the web >> > interface with port 50095 to monitor the real-time charts (ingest, >> > scan, load average, minor compaction, major compaction, ...). >> > >> > Nonetheless, as I witnessed, when I ingested about 100k entries -> >> > then minor compaction happened -> ingest was stuck -> the level of >> > minor compaction on the charts was just about 1.0, 2.0 and max 3.0 >> > while about >20k entries were forced out of memory (I knew this by >> > looking at the number of entries in memory w.r.t the table being >> > ingested to) -> then when minor compaction ended, ingest resumed, >> > somewhat faster. >> > >> > Thus I presume the level 1.0, 2.0, 3.0 is not representative for >> > number of files being minor-compacted from memory? >> > >> > Hai >> > ________________________________________ >> > From: Josh Elser <[email protected] <mailto:[email protected] >> >> >> > Sent: Thursday, July 30, 2015 7:12 PM >> > To: [email protected] <mailto:[email protected]> >> > Subject: Re: How to control Minor Compaction by programming >> > >> > > >> > > Also, can you please explain the number 0, 1.0, 2.0, ... in >> > charts (web >> > > monitoring) denoting the level of Minor Compaction and Major >> > Compaction? >> > >> > On the monitor, the number of compactions are of the form: >> > >> > active (queued) >> > >> > e.g. 4 (2), would mean that 4 are running and 2 are queued. >> > >> > > >> > > >> > > Thank you! >> > > >> > > Hai Pham >> > > >> > > >> > > >> > > >> > >> > >> >
