Soren,

Thanks for the follow up ... I suspected that you had just typeo'd your
example scenario, but wanted to clarify it for me and everyone else
following along.


> I hear you. All my servers are, in fact, remote. I'm however in the
> happy situation that if a machine fails to come online after a reboot, I
> can boot up a RAM-based rescue system from whence I can diagnose the
> system. I realise you might not be as fortunate.


I'd like to hear more about this ... your own, proprietary, or open source,
or ???

I've tried serial terminal servers in the past, but PC BIOS just isn't that
bright (like Sun OBP and all other old skool Unix vendors have done
forever).  I've gotten some tty access, but if the server is busted at too
low a level, you are basically DOA.

For these new Hardy servers I'm building, I'm playing with the idea of
booting off of a USB stick, but USB is _slow_ and the read-write nature of a
running system (even just the Xen server host) would likely cause rapid
failure of the USB sticks.  My next wild idea was to build a 3-way RAID1 for
the root disk (USB + 2 internal HD partitions of 2.1GB), sync everything,
then pull the USB ... plug it in weekly to sync up and then yank it again.

I was hoping to avoid a situation we had a few months ago where an apt-get
(or some function in a post-install) "fixed" the grub menu.lst and caused
the server to not be bootable anymore.  That was the reference to walking a
not-really-technical user through booting a Live CD and doing network
config.  That _sucked_, but we got the box back eventually -- the error
message from grub was pretty baffling and completely misleading, of course.

>
> Notwithstanding, I'd still prefer the system not just boot without any
> sort of interaction with an admin of some sort. A simple dialog asking
> if you understand the risks involved and still want to continue booting
> would be perfectly acceptable. That would make the required guidance
> much simpler.
>

I can see the system not auto-booting, but at least have the option to
select "yes, I know the system is broken -- boot with networking so my admin
can fix it" would be acceptable.


> <snip>
>
> I understand. The perfect solution for me would be an ssh server in the
> initramfs so that I could ssh into the server and take a look around,
> reassure myself that the faulty disk has been properly identified, etc.,
> etc. and then take appropriate action.
>

yeah, having network smarts and ssh (on an alternate port since you might
not be able to read the ondisk password file?) would be great!

>
> <snip>
>
> This is all very valuable input. It's good to have some leverage when we
> engage in the discussion about perhaps getting this functionality pushed
> back into hardy.
>

My big push to my customers for Ubuntu *is* the LTS feature ... many of them
have been burned by un-supported, and un-upgradable RH systems (I've got a
few 6.2 systems out there still ... ugh).  Getting what I would consider a
mission critical feature for a SERVER pushed back into the LTS server
release would be very valuable to the argument that Ubuntu is "Enterprise
Ready" and willing to add the necessary features to run "Mission Critical"
applications on.

Thanks,
Sam
-- 
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam

Reply via email to