Jack wrote:
Hello,
I've got 9 domain Us. They are each a RHEL 5.2 instance. They have
> 1G ram, 1 cpu, 100G drive. They are paravirtualized. The drives used
> are created as such:
pfexec zfs create -s -V 100G datastore/virtMachine1
The hardware is a Dell 2900, 48G ram, 3.06T 15krm sas drives.
> OpenSolaris seems to be fairly happy on this system.
This doesn't have anything to do with your problem...
But I'm curious, what do you have for CPUs?
Are you limiting dom0 memory on boot? What about
the number of CPUs dom0 can use? e.g.
kernel /boot/amd64/xen.gz com1=9600,8n1 console=com1 dom0_mem=2g
dom0_max_vcpus=2 dom0_vcpus_pin=true
When I ran all zones, everything was fine and fast, however vendor
> requires RHEL, and I refuse to give up ZFS, so I had to fire up xVM
> just so I could run MySQL inside an x86 container called RHEL5.2
Anyway, these domainUs boot, run, work pretty well (slower than
> zones by about 17% btw), and generally work fine.
Except that they crash pretty regularly anywhere inbetween 6 and 10 days.
What version of Opensolaris are you using? Are you using stock Xen bits (that
come with opensolaris)? The bug looks familiar, but I'll have to do some
searching...
MRJ
I've been searching forums, etc. Not sure what to do. here's a log entry:
Jul 1 15:15:08 ecw-mysql1 unix: [ID 836849 kern.notice]
Jul 1 15:15:08 ecw-mysql1 ^Mpanic[cpu0]/thread=ffffff005b7e1c80:
Jul 1 15:15:08 ecw-mysql1 genunix: [ID 683410 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=ffffff005b7e1120 addr=fffffe0a3e18ec20
Jul 1 15:15:08 ecw-mysql1 unix: [ID 100000 kern.notice]
Jul 1 15:15:08 ecw-mysql1 unix: [ID 839527 kern.notice] sched:
Jul 1 15:15:08 ecw-mysql1 unix: [ID 753105 kern.notice] #pf Page fault
Jul 1 15:15:08 ecw-mysql1 unix: [ID 532287 kern.notice] Bad kernel fault at
addr=0xfffffe0a3e18ec20
Jul 1 15:15:08 ecw-mysql1 unix: [ID 243837 kern.notice] pid=0,
pc=0xfffffffffb8a0663, sp=0xffffff005b7e1218, eflags=0x10246
Jul 1 15:15:08 ecw-mysql1 unix: [ID 211416 kern.notice] cr0:
8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 2660<vmxe,xmme,fxsr,mce,pae>
Jul 1 15:15:08 ecw-mysql1 unix: [ID 624947 kern.notice] cr2: fffffe0a3e18ec20
Jul 1 15:15:08 ecw-mysql1 unix: [ID 100000 kern.notice]
Jul 1 15:15:08 ecw-mysql1 unix: [ID 592667 kern.notice] rdi: fffffe0a3e18ec20 rsi: 0 rdx: e0508673
Jul 1 15:15:08 ecw-mysql1 unix: [ID 592667 kern.notice] rcx:
3 r8: 0 r9: ffffff0cb9384000
Jul 1 15:15:08 ecw-mysql1 unix: [ID 592667 kern.notice] rax:
0 rbx: e0508673 rbp: ffffff005b7e12b0
Jul 1 15:15:08 ecw-mysql1 unix: [ID 592667 kern.notice] r10:
0 r11: ffffff0000002000 r12: 0
Jul 1 15:15:08 ecw-mysql1 unix: [ID 592667 kern.notice] r13:
1 r14: fffffe0a3e18ec20 r15: e0508673
Jul 1 15:15:08 ecw-mysql1 unix: [ID 592667 kern.notice] fsb:
0 gsb: fffffffffbc5ef70 ds: 4b
Jul 1 15:15:08 ecw-mysql1 unix: [ID 592667 kern.notice] es:
4b fs: 0 gs: 1c3
Jul 1 15:15:08 ecw-mysql1 unix: [ID 592667 kern.notice] trp:
e err: 3 rip: fffffffffb8a0663
Jul 1 15:15:08 ecw-mysql1 unix: [ID 592667 kern.notice] cs:
e030 rfl: 10246 rsp: ffffff005b7e1218
Jul 1 15:15:08 ecw-mysql1 unix: [ID 266532 kern.notice] ss:
e02b
Jul 1 15:15:08 ecw-mysql1 unix: [ID 100000 kern.notice]
Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1000 unix:die+10f ()
Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1110
unix:trap+1768 ()
Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1120
unix:_cmntrap+12f ()
Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e12b0
unix:atomic_cas_ptr+3 ()
Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1350
unix:hati_pte_map+160 ()
Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e13d0
unix:hati_load_common+15d ()
Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1490
unix:hat_devload+15d ()
Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e14f0
rootnex:rootnex_map_regspec+151 ()
Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e15a0
rootnex:rootnex_map+141 ()
Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e15f0
genunix:ddi_map+51 ()
Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e16e0
npe:npe_bus_map+43d ()
Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1720
pcie_pci:pepb_bus_map+31 ()
Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1760
pcie_pci:pepb_bus_map+31 ()
Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e17b0
genunix:ddi_map+51 ()
Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1870
genunix:ddi_regs_map_setup+d5 ()
Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e18c0
genunix:pci_config_setup+69 ()
Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1900
pcie:pcie_init_bus+41 ()
Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1a30
pcie_pci:pepb_initchild+bc ()
Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1ab0
pcie_pci:pepb_ctlops+276 ()
Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1af0
genunix:init_node+78 ()
Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1b30
genunix:i_ndi_config_node+fa ()
Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1b60
genunix:i_ndi_init_hw_children+48 ()
Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1bc0
genunix:config_immediate_children+83 ()
Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1c10
genunix:devi_config_common+a6 ()
Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1c60
genunix:mt_config_thread+53 ()
Jul 1 15:15:08 ecw-mysql1 genunix: [ID 655072 kern.notice] ffffff005b7e1c70
unix:thread_start+8 ()
Jul 1 15:15:08 ecw-mysql1 unix: [ID 100000 kern.notice]
Jul 1 15:15:08 ecw-mysql1 genunix: [ID 672855 kern.notice] syncing file systems...
Jul 1 15:15:09 ecw-mysql1 genunix: [ID 904073 kern.notice] done
Jul 1 15:15:10 ecw-mysql1 genunix: [ID 111219 kern.notice] dumping to
/dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Jul 1 15:17:51 ecw-mysql1 genunix: [ID 409368 kern.notice] ^M100% done: 1588175 pages dumped, compression ratio 3.44,
Jul 1 15:17:51 ecw-mysql1 genunix: [ID 851671 kern.notice] dump succeeded
Anyway, sometimes they blame xVM hypervisor for the crash, sometimes not. I've
got twin Dell 2900s, have moved the domainUs from one machine to the other,
same results.
Name ID Mem VCPUs State Time(s)
Def 1024 1 0.0
Domain-0 0 34154 8 r----- 3871.7
EDB_Bs 1 1024 1 -b---- 803.8
EDB_Faare 8 2048 2 -b---- 1099.3
EDB_Gral 7 1024 1 -b---- 185.2
EDB_NC 6 1024 1 -b---- 290.0
EDB_Tg 2 1024 1 -b---- 45.0
EDB_Way 3 1024 1 -b---- 62.3
EDB_Wel 5 1024 1 -b---- 278.2
EDB_Wnd 9 1024 1 -b---- 306.9
EHX_Dbase 10 4096 1 -b---- 51.3
Iine 4 1024 1 -b---- 76.0
Repair 512 1
13.1
Anyway the crashed occur when the Time(s) for any one domainU gets up around 25000 or so. These are production databases, so they do get a lot of work.
Anyway, it's aggravating when the servers die like that, but zfs is there
helping out, so that's nice.
no idea if any of this makes sense, it's late, and I'm not too concerned about
it anymore, however any help would be great!
thanks,
Jack
_______________________________________________
xen-discuss mailing list
[email protected]