This is the help I am getting from Symantec... hang tight, next mail is soon to arrive...
Hampus Lind Rikspolisstyrelsen National Police Board Tel dir: +46 (0)8 - 401 99 43 Tel mob: +46 (0)70 - 217 92 66 E-mail: [EMAIL PROTECTED] -----Ursprungligt meddelande----- > You haven't really answered anything, just talked about how things should > work when everything is OK. That's just it; NetBackup is not doing anything abnormal here (a.k.a. it's operating as designed given the environment it's running in). There's nothing we can do at the software level to "fix" performance bottlenecks at the filesystem level; it'll operate as quickly as the system calls allow it to. Every problem you brought up can be traced back to this one core issue. Think of it this way: if you fill your gas tank with the wrong type of petrol and as a result the vehicle starts sputtering, when you bring it to the mechanic, they'll make the assessment that the engine is working as well as it can given the circumstances. The problem is the petrol, not the engine. > 1. I could have a problem with my db, but if the bpdbm -consistency 2 check > wont finish who can I tell? If the bpdbm -consistency 2 check hangs, again > how can I tell whats wrong? Like I said previously, bpdbm -consistency is the tool... we don't have "alternate" tools or anything like that (what's the point in re-inventing the wheel?). You only other option is to manually check each and every image for oddities. And even then there's no guarantee you'll spot the corruption if it exists because over half of your images files are going to be in binary format, which is impossible to examine using your eyeballs. > 2. Maybe I haven?t been clear with my problem. The bpdbm processes don?t go > away, they are always there and are always working with something... So how > can I move on? I didn't find any evidence that bpdbm was caught in an infinite loop. All PID's are making progress with their respective tasks, albiet slow progress. I even checked to make sure the bpdbm's weren't stepping on each other's feet. They're all doing seperate tasks independently of each other, and no process was performing a redundant task that another bpdbm was processing. All evidnce points to file-read operations taking a lot of time to complete, and that's a problem that can be fixed by an application. Let's say for the sake of argument we could change NetBackup's behavior so that it doesn't spawn so many processes at once (which isn't actually possible, but let's just assume for a second). Will that solve the problem? No. It will still have to perform the same number of operations because it still has to go through the same data set as in the present situation. In fact, the process might be made *worse* not better, because the entire operation would in fact take longer. Disabling it entirely is not possible under NetBackup without shutting down bpdbm entirely (and it would be a bad idea anyways as the images cleanup process is vital for the application to function), which means of course then just about nothing would work under NetBackup. You will get no backups, and defintely no restores. So in summary, you have to wait until it finishes on its own. If the process takes more than 12 hours to complete, that means you're really stuck. Absolutely nothing can be done at the software level until something is done with the images database or the filesystem it resides on is fixed. > 4. Our db is about 60-65 GB, there are netbackup customers with much bigger > nbu databases. And this should by a enterprise solution and therefore be > able to handle this payload. Not many customers have as many individual images. Keep in mind here that there's more to this than "how much data am I backing up". If the bulk of your backups are Oracle RMAN, then the number of inodes in your environment increases dramaticly. I can almost always tell the difference between RMAN backups and regular backups when looking at the images database just by looking at the number of streams being generated at one go. The difference is not insignificant. Since images databases are unique to each and every customer (no two images databases are the same in a production environment), I can't give you the cookie-cutter solution that I am certain you would like to have. These sorts of things have to be analyzed in a case-by-case basis, and even Enterprise solutions are limited by the environment they are running in. You could own the nicest, most expensive BMW in the world, but if you don't have a road to drive it on, it probably won't work as well as you'd like. > 5. I have followed HP´s suggestings: > - I have patched the OS Recently? > - I have run defrag on that filesystem That's not a bad idea, but that usually has a minimal effect with modern-day UNIX operating systems, including HP, because the filesystem driver does that on the fly during normal operation anyways. > - I have increased scsi_queue depth That will prevent SCSI write failures, but won't necessarily make things run faster. It's like standing in line at the bank, you're not going to go any faster if the line is longer; you're just adding more people in the line. I've done my best to find a problem that we can address at the software level. I can't find anything to negate HP's recommendation, and if I could I'd have relayed that information. At this point I'm not sure what else to tell you, as the logs aren't changing their story. The bpdbm process is doing what it was designed to do, but something external is throttling it back. That's where the problem is and hence that's why I'm suggesting you follow HP's solution. Outside of that we'll have to look into a consulting solution to re-architect this envionrment to distribute the images database somewhat. _______________________________________________ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu