Thanks a bunch, I think this is the root cause of most of my issues!
Elliot Jayapal Reddy Uradi wrote:
Hi Elliot, Reboot the router and see the management server for router startcommand. These values are passed in startcommand. If it has . at the end then check the database nics table for entry with guest ip. If nic table has entry with . then correct and restart the MS and restart VR. Thanks, Jayapal On 16-Jul-2014, at 3:39 PM, Elliot Berg<elliot.b...@avcosystems.com> wrote:Hi, I've got template=domP name=r-27-VM eth0ip=10.4.2.6 eth0mask=255.0.0.0. gateway=10.0.0.1 domain=cs1cloud.internal dhcprange=10.0.0.1 eth1ip=169.254.1.246 eth1mask=255.255.0.0 type=dhcpsrvr disable_rp_filter=true dns1=10.0.0.12 dns2= ip6dns1= ip6dns2= In that file, which includes the incorrect netmask. Elliot Jayapal Reddy Uradi wrote:Hi, Check the /var/cache/cloud/cmdline for eth0ip=10.1.1.1 eth0mask=255.255.255.0 If it is correct, then interfaces file is written wrongly. The /etc/network/interfaces updated from the cloud-early-config on router boot. What you can do is put set -x in cloud-early-config and run /etc/init.d/cloud-early-config from the router. And observe the setup_interface for how /etc/network/interfaces is written. Thanks, Jayapal On 16-Jul-2014, at 3:07 PM, Elliot Berg<elliot.b...@avcosystems.com> wrote:Hi, So that fails with the error Error: an inet prefix is expected rather than "10.4.2.6/255.0.0.0.". Failed to bring up eth0. I went and looked at the router's /etc/network/interfaces file and spotted that the netmask has a "." on the end, as below. Removing that and then running ifup eth0 works, however when I reboot the router that file appears to be regenerated, as my change was undone. Does anyone know where the information to generate that file comes from? iface eth0 inet static address 10.4.2.6 netmask 255.0.0.0. Thanks, Elliot Jayapal Reddy Uradi wrote:Hi Elliot, Can you please try 'ifup eth0' on the router. It seems there is delay in bringing up the eth0 interface. Thanks, Jayapal On 16-Jul-2014, at 12:40 PM, Elliot Berg<elliot.b...@avcosystems.com> wrote:I've already had to flatten and start again so I'd rather avoid it - but my suspicion is that all of this is related to the kvm host's networking somehow. I followed the instructions on the cloudstack install guide, and ended up with the below - does it look right to you guys? auto lo iface lo inet loopback auto eth0 iface eth0 inet manual auto cloudbr0 iface cloudbr0 inet static bridge_ports eth0 bridge_fd 5 bridge_stp off bridge_maxwait 1 address 10.4.0.2 netmask 255.0.0.0 network 10.0.0.0 broadcast 10.255.255.255 gateway 10.0.0.1 # dns-* options are implemented by the resolvconf package, if installed dns-nameservers 10.0.0.12 dns-search avco auto cloudbr1 iface cloudbr1 inet manual bridge_ports eth0 bridge_fd 5 bridge_stp off bridge_maxwait 1 Many Thanks, Elliot Elliot Berg wrote:Hi, Cloud.log contains the following just after the machine's rebooted; Mon Jul 14 16:01:06 UTC 2014 checking that eth0 has IP Mon Jul 14 16:01:07 UTC 2014 waiting for eth0 interface setup with ip timer=0 Mon Jul 14 16:01:08 UTC 2014 waiting for eth0 interface setup with ip timer=1 Mon Jul 14 16:01:09 UTC 2014 waiting for eth0 interface setup with ip timer=2 Mon Jul 14 16:01:10 UTC 2014 waiting for eth0 interface setup with ip timer=3 Mon Jul 14 16:01:11 UTC 2014 waiting for eth0 interface setup with ip timer=4 Mon Jul 14 16:01:12 UTC 2014 waiting for eth0 interface setup with ip timer=5 Mon Jul 14 16:01:13 UTC 2014 waiting for eth0 interface setup with ip timer=6 Mon Jul 14 16:01:14 UTC 2014 waiting for eth0 interface setup with ip timer=7 Mon Jul 14 16:01:15 UTC 2014 waiting for eth0 interface setup with ip timer=8 Mon Jul 14 16:01:16 UTC 2014 waiting for eth0 interface setup with ip timer=9 Mon Jul 14 16:01:17 UTC 2014 waiting for eth0 interface setup with ip timer=10 Mon Jul 14 16:01:18 UTC 2014 waiting for eth0 interface setup with ip timer=11 Mon Jul 14 16:01:19 UTC 2014 waiting for eth0 interface setup with ip timer=12 Mon Jul 14 16:01:20 UTC 2014 waiting for eth0 interface setup with ip timer=13 Mon Jul 14 16:01:21 UTC 2014 waiting for eth0 interface setup with ip timer=14 Mon Jul 14 16:01:22 UTC 2014 waiting for eth0 interface setup with ip timer=15 Mon Jul 14 16:01:23 UTC 2014 waiting for eth0 interface setup with ip timer=16 Mon Jul 14 16:01:23 UTC 2014 interface eth0 is not set up with ip... exiting As I say, I'm wondering whether this indicates a more general networking issue on the host, as I'd have expected the virtual router to sort its own networking assuming the host's is fine? Thanks, Elliot Jayapal Reddy Uradi wrote:Hi, Check the logs while the router is booting. Also check /var/log/cloud.log Thanks, Jayapal On 14-Jul-2014, at 2:39 PM, Elliot Berg<elliot.b...@avcosystems.com> wrote:Hi, I did that earlier as part of the troubleshooting when it was stuck - so I've just looked at the logs instead of recreating it again as that was only just done. When you say the router logs, do you mean general logs on the virtual router machine? If so, syslog/messages/kern.log/daemon.log are all empty? Elliot Jayapal Reddy Uradi wrote:Hi Elliot, Try recreating router (destroy the router and deploy new vm, router get recreated). After recreation if the problem still exists, check the router logs to see why the interfaces are brought up. Thanks, jayapal On 11-Jul-2014, at 1:38 PM, Elliot Berg<elliot.b...@avcosystems.com> wrote:So, I'm wondering whether the guest not having the interfaces configured correctly (i.e. not having an IP) is just a symptom of more generally broken networking - my interfaces file for the KVM host is below, does anyone spot any issues? auto lo iface lo inet loopback auto eth0 iface eth0 inet manual auto cloudbr0 iface cloudbr0 inet static bridge_ports eth0 bridge_fd 5 bridge_stp off bridge_maxwait 1 address 10.4.0.2 netmask 255.0.0.0 network 10.0.0.0 broadcast 10.255.255.255 gateway 10.0.0.1 # dns-* options are implemented by the resolvconf package, if installed dns-nameservers 10.0.0.12 dns-search avco auto cloudbr1 iface cloudbr1 inet manual bridge_ports eth0 bridge_fd 5 bridge_stp off bridge_maxwait 1 Thanks, Elliot Elliot Berg wrote:Doh! I did, but forgot about it being on a funny port. Now that I'm into the VM I can see that it's not running, and fails to start when it tries to bind to the address that it should have on the guest range. I notice that "ifconfig -a" shows two NICs, only one of which is up (the one with the link local IP). I'm guessing that indicates a more general networking issue? I think how it's laid out is 10.4.0.0-255 for physical machines (1 is the management server, 2 is the first host), 10.4.1.0-255 is the management network and 10.4.2.0-255 is the guest network...but it's possible I've misunderstood the networking config during setup? What I really wanted was hosts on 10.4.0.0-255 and guests on 10.4.1.0-255 (and beyond), as in the future I'd like it to co-exist with our existing infrastructure while we migrate things - but I kept being told about conflicts etc when I tried to set up cloudstack like that during the initial set up process? Thanks, Elliot Jayapal Reddy Uradi wrote:Hi Elliot, Did you ssh to VR using the ssh key ? Ex: ssh -i /root/.ssh/id_rsa.cloud -p3922root@169.254.3.196 If it is failed to ssh, then there is issue with the ssh keys. Thanks, Jayapal On 09-Jul-2014, at 4:43 PM, Harikrishna Patnala<harikrishna.patn...@citrix.com> wrote:1) Log into your KVM host. 2) Use command “virsh list”. This gives the list of VMs on the host. 3) Use command “virsh console<VirtualRouterId>” to log into the VR. -Harikrishna On 09-Jul-2014, at 3:52 pm, Elliot Berg<elliot.b...@avcosystems.com> wrote:I don't know - I can't seem to ssh to the link local IP. It pings, but ssh times out. If I try and use the "connect to console" button in the gui, that too times out :( Elliot Harikrishna Patnala wrote:From the logs2014-07-08 12:08:56,218 DEBUG [agent.transport.Request] (AgentManager-Handler-1:null) Seq 1-277348416: Processing: { Ans: , MgmtId: 159320647860937, via: 1, Ver: v1, Flags: 110, [{"com.cloud.agent.api.Answer":{"result":false,"details":"grep: /var/lib/misc/dnsmasq.leases: No such file or directory","wait":0}}] } Can you check whether dnsmasq service is running in the Virtual Router ? if not, start the service and check for “/var/lib/misc/dnsmasq.leases” -Harikrishna On 08-Jul-2014, at 3:47 pm, Elliot Berg<elliot.b...@avcosystems.com> wrote:Hi, I've done that, and now there's a new virtual router which says it's running, however a deployment still fails. My latest lot of logs are available athttps://dl.dropboxusercontent.com/u/47728104/management-server.log.gz, and there's now one thing in the op_it_work table with a step != 'Done', which is a ConsoleProxy. Interestingly if I look at the console proxy vm in the cloudstack management gui it says it's running, though. Thanks, Elliot Harikrishna Patnala wrote:Yes mark the VR to stopped, destroy VR, mark the VR entry in op_it_work to “Done” and try deploying VM. -Harikrishna On 08-Jul-2014, at 12:44 pm, Elliot Berg<elliot.b...@avcosystems.com> wrote:Hi, It appears to be stuck in the "starting" state - so I don't get the option to reboot it or anything. If I change the state to stopped in the database directly will the management server attempt to start it again or do I need to do something more? Thanks! Elliot Harikrishna Patnala wrote:Is your Virtual Router up and running ? If is in running state you can mark it Done and deploy a VM. If it is in stopped state try restarting it. You can try updating the field as well. -Harikrishna On 07-Jul-2014, at 7:10 pm, Elliot Berg<elliot.b...@avcosystems.com> wrote:I can see two entries that have the "step" field set to something other than "Done", one of them is ConsoleProxy | Starting and the other is DomainRouter | Prepare Am I safe to just delete the rows, or should I just update the field? Thanks, Elliot Harikrishna Patnala wrote:Do you see any work item pending for Virtual Router r-4-VM in “op_it_work” table ? If there are any, remove those entries and try VM deployment again. I see in the logs that VR has a task pending 2014-07-07 10:28:15,934 WARN [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-5:job-48 = [ 22369802-b5aa-4b5a-a26d-1fab11241551 ]) The task item for vm VM[DomainRouter|r-4-VM] has been inactive for 418531 -Harikrishna On 07-Jul-2014, at 2:18 pm, Elliot Berg<elliot.b...@avcosystems.com<mailto:elliot.b...@avcosystems.com>> wrote: I'm still not really spotting anything indicating why it's not using the host, but I suspect that's just because I don't really know what I'm looking for - so I've zipped the whole log for today and stuffed it on dropbox athttps://dl.dropboxusercontent.com/u/47728104/management-server.log.gz. Hopefully someone who's used cloudstack a lot more will have more success! Thanks, Elliot Elliot Berg wrote: I'm going back over everything and I've noticed something else - everywhere I've looked for how to use local storage says I should change two global settings; * system.vm.use.local.storage = true * use.local.storage = true However I'm looking at my global settings and only the first exists (which I have set to true). Elliot Elliot Berg wrote: Ah, so when looking back a bit further before (I was kind of only looking for exceptions higher up before now), I've just spotted this... 2014-07-03 10:48:28,765 DEBUG [allocator.impl.FirstFitAllocator] (Job-Executor-3:job-46 = [ 92fb959d-edc5-4fe2-84a0-5 6001226e4ac ] FirstFitRoutingAllocator) Looking for speed=1000Mhz, Ram=1024 2014-07-03 10:48:28,765 DEBUG [allocator.impl.FirstFitAllocator] (Job-Executor-3:job-46 = [ 92fb959d-edc5-4fe2-84a0-5 6001226e4ac ] FirstFitRoutingAllocator) Host name: cloudstack-host1, hostId: 1 is in avoid set, skipping this and try ing other available hosts That's the one and only host - so I'm guessing that has something to do with it! Elliot -- Elliot Berg | Analyst Programmer/Network Team Email:elliot.b...@avcosystems.com<mailto:elliot.b...@avcosystems.com> | Tel: 01753 213700 | Web:www.avcosystems.com<http://www.avcosystems.com/> <image.png> Avco Systems Ltd, Registered in England& Wales, Registration Number 1976620 Registered Office: Avco Systems | 17 Bath Road | Slough | SL1 3UF ilya musayev wrote: Elliot, When you see such an error - there usually a predecessor message that says CloudStack checked for X, Y and Z and found no suitable resources based on your configuration. Put the logs on pastebin or some other site (strip out any private info you dont want to share). I would also recommend cloudstack 4.3.1 (which is not officially out yet) but should come thru in the next several weeks. Its latest stable release of CloudStack 4.3.0 - with latest bug fixes. I've put a build for folks who want to try it out until we complete official release of ACS 4.3.1 process. Unzip tgz and it should have required RPMs with both Open Source and Non-Open Source modules. http://www.cloudsand.com/cloudstack-4.3.0-1.tgz Regards ilya On 7/2/14, 1:06 AM, Elliot Berg wrote: Hi, I've been putting together a cloudstack set-up for experimentation purposes - right now we're just trying to compare different platforms for private cloud infrastructure before we start getting too in depth with any of them. I've added the cloudstack 4.2 apt repository, and I'm running on Ubuntu 12.04 LTS, and I believe I've followed all the installation guides correctly at the various stages. We've set up a management server, which is also an NFS server, however we're interested in using local storage for the majority of things, and have also set up a single KVM host which I believe is all configured correctly to use local storage. If I look at the dashboard, I'm told I have more than enough resource in every section to create an instance the size I want to - which is a small offering I've created with just 1.0GHz and 1GB of RAM, with local storage. The host's not very powerful, but according to the dashboard I am using 1.50GHz/5.87GHz, 1.38GB/7.80GB, 3.55GB/285.95GB Secondary Storage, 1.03GB/450.99GB Local Storage and 0.00KB/571.90GB Primary Storage (I'm assuming that's meant to be a combination of the NFS server's primary storage offering and the local storage on the host, though the numbers don't quite make sense at first glance). However, when I try to add an instance, I receive an InsufficientServerCapacityException and I'm struggling to work out why. I can't add an instance using a small shared storage offering either, but if I'm not mistaken that's expected because the zone and host are configured to use local storage. The only thing I can think of is that the local storage isn't properly configured, but when I've looked it seems to be. Any pointers for how I can further diagnose this would be great - thanks in advance! Elliot