Take a look at: http://www.fpaste.org/86434/51637081/
2014-03-18 14:15 GMT-03:00 Alena Prokharchyk <alena.prokharc...@citrix.com>: > Diego, > > Check for these messages in the CS log: > > > [{"com.cloud.agent.api.NetworkUsageAnswer":{"routerName":"r-3-VM","bytesSen > t":0,"bytesReceived":0,"result":true,"details":"success","wait":0}}] } > > > And see if you are seeing non-0 bytesReceived/bytesSent reported from > the backend for any day after 2013-11-08 > > -Alena. > > From: Diego Spinola Castro <spinolacas...@gmail.com> > Date: Tuesday, March 18, 2014 at 10:12 AM > > To: Alena Prokharchyk <alena.prokharc...@citrix.com> > Cc: "users@cloudstack.apache.org" <users@cloudstack.apache.org>, Kishan > Kavala <kishan.kav...@citrix.com> > Subject: Re: NetworkUsage > > Hi Alena, there's no records after 2013-11-08 (when i did the upgrade > to CS 4.2). > > > If there are no records for Feb2 in the DB, can you check if CS reported > any usage during this day? How do i do that? > > > 2014-03-18 13:45 GMT-03:00 Alena Prokharchyk <alena.prokharc...@citrix.com > >: > >> Ok, so having 0 for some router records is actually fine as either the >> Vrs are stopped, or there are no vms. Good you've confirmed that's not the >> problem. >> >> Now we have to debug something different - API not returning certain >> types of usage. For that, you have to: >> >> >> 1. go to cloud_usage DB >> 2. Select * from cloudusage where usage_type in (4,5) >> >> And see if there are any records for the Feb 2 are present. If there are, >> then its an API problem - when data is present in the DB, but API doesn't >> return it. >> >> If there are no records for Feb2 in the DB, can you check if CS >> reported any usage during this day? >> >> Thanks, >> Alena. >> >> From: Diego Spinola Castro <spinolacas...@gmail.com> >> Date: Tuesday, March 18, 2014 at 8:46 AM >> To: Alena Prokharchyk <alena.prokharc...@citrix.com> >> Cc: "users@cloudstack.apache.org" <users@cloudstack.apache.org>, Kishan >> Kavala <kishan.kav...@citrix.com> >> Subject: Re: NetworkUsage >> >> HI Alena, thanks for the answer. Let me explain better my case. >> Regarding those accounts , i have a lot of them without any instance >> running, there's also some vrouters of that list which are even stopped. >> I'm trying get the traffic count from usage api, i don't know why but all >> usage type 4 is zero, there's the commands i'm running: >> >> Trying get UsageType=4 >> curl " >> http://localhost:8096/client/api?command=listUsageRecords&startdate=2014-02-01&enddate=2014-02-01&type=4 >> " >> Neither Type 4 nor 5 are returning. >> >> This used to work before upgrade to 4.2. >> BTW i have only one management server. >> >> >> >> >> 2014-03-17 20:54 GMT-03:00 Alena Prokharchyk < >> alena.prokharc...@citrix.com>: >> >>> Diego, spent some more time on debugging your issue. Might not be related >>> to transaction refactoring as it went in after 4.2. And records for some >>> accounts actually got non-0 network stat: >>> >>> 0 network stats: >>> >>> mysql> select count(*) from user_statistics where net_bytes_received=0 or >>> net_bytes_sent=0 order by account_id; >>> +----------+ >>> | count(*) | >>> +----------+ >>> | 26 | >>> +----------+ >>> 1 row in set (0.00 sec) >>> >>> Non-0 network stats >>> >>> mysql> select count(*) from user_statistics where net_bytes_received!=0 >>> or >>> net_bytes_sent!=0 order by account_id; >>> +----------+ >>> | count(*) | >>> +----------+ >>> | 207 | >>> +----------+ >>> 1 row in set (0.00 sec) >>> >>> >>> >>> Adding Kishan to the thread as he has more expertise in Network usage >>> calculation and he might be able to help us. >>> >>> >>> I would divide problematic accounts (Networks) that are not getting usage >>> stats update, into 2 categories: >>> >>> CATEGORY #1 Networks having non-0 current_bytes* and 0 net_bytes received >>> >>> This situation is totally normal as net_bytes_received get updated only >>> when VR gets stopped. Yet on Diego's setup, those entries are not counted >>> when do network usage calculation. Here is the account having problems: >>> >>> mysql> select * from user_statistics where account_id=176; >>> >>> +-----+----------------+------------+-------------------+-----------+------ >>> >>> --------+------------+--------------------+----------------+--------------- >>> ---------+--------------------+--------------------+----------------+ >>> | id | data_center_id | account_id | public_ip_address | device_id | >>> device_type | network_id | net_bytes_received | net_bytes_sent | >>> current_bytes_received | current_bytes_sent | agg_bytes_received | >>> agg_bytes_sent | >>> >>> +-----+----------------+------------+-------------------+-----------+------ >>> >>> --------+------------+--------------------+----------------+--------------- >>> ---------+--------------------+--------------------+----------------+ >>> | 228 | 1 | 176 | NULL | 1193 | >>> DomainRouter | 366 | 0 | 0 | >>> 4271398427 | 2014478889 | 0 | >>> 0 | >>> >>> +-----+----------------+------------+-------------------+-----------+------ >>> >>> --------+------------+--------------------+----------------+--------------- >>> ---------+--------------------+--------------------+----------------+ >>> 1 row in set (0.00 sec) >>> >>> >>> >>> And here is the Usage log: >>> >>> >>> 1. 2014-03-17 03:15:02,779 DEBUG [usage.parser.NetworkUsageParser] >>> (Usage-Job-1:null) Parsing all Network usage events for account: 176 >>> >>> 2. 2014-03-17 03:15:02,779 DEBUG [usage.parser.NetworkUsageParser] >>> (Usage-Job-1:null) No usage record (0 bytes used) generated for account: >>> 176 >>> >>> >>> >>> >>> Kishan, is it by design not to count current_bytes in network usage? Can >>> you please double check the code, because it doesn't look right to me as >>> the VR may not go into Stopped state for a while, yet we have to >>> calculate >>> the usage >>> >>> >>> CATEGORY #2 >>> >>> Accounts (networks) having all 0s for >>> >>> net_bytes_received/net_bytes_sent/current_bytes_received/current_bytes_sent >>> >>> Diego, for those accounts we have to figure out whether the backend >>> reporting any usage for the Vrs. To debug that, you have to: >>> >>> A) get all the VR names >>> >>> mysql> select name from vm_instance where removed is null and >>> vm_type='DomainRouter' and account_id in (select account_id from >>> user_statistics where net_bytes_received = 0 and net_bytes_sent=0 and >>> current_bytes_received=0 and curre >>> +-----------+ >>> | name | >>> +-----------+ >>> | r-621-VM | >>> | r-651-VM | >>> | r-658-VM | >>> | r-671-VM | >>> | r-682-VM | >>> | r-886-VM | >>> | r-892-VM | >>> | r-915-VM | >>> | r-934-VM | >>> | r-935-VM | >>> | r-986-VM | >>> | r-1002-VM | >>> | r-1004-VM | >>> | r-1030-VM | >>> | r-1148-VM | >>> | r-1205-VM | >>> | r-1221-VM | >>> +-----------+ >>> >>> >>> B) Search for NetworkUsageAnswer and routerName in the log to locate >>> statements like that. Do it on your today's log and the log(s) from >>> couple >>> of days before. You might also want to run some traffic >>> >>> >>> 2014-03-17 14:41:12,661 DEBUG [agent.transport.Request] >>> (DirectAgent-15:null) Seq 1-281411626: Processing: { Ans: , MgmtId: >>> 4278190080, via: 1, Ver: v1, Flags: 10, >>> >>> [{"com.cloud.agent.api.NetworkUsageAnswer":{"routerName":"r-3-VM","bytesSen >>> t":0,"bytesReceived":0,"result":true,"details":"success","wait":0}}] } >>> >>> >>> If you see that the backend reports non-0 value, it means that the >>> Management server fails to save it. If the backend reports 0, and yet you >>> are sure that there is a network traffic generated on the routers above, >>> there might be some error in a way we collect the traffic on the VR. >>> >>> >>> Also how many management servers do you have in your setup? >>> >>> >>> -Alena. >>> >>> >>> >>> On 3/17/14, 12:05 PM, "Alena Prokharchyk" <alena.prokharc...@citrix.com> >>> wrote: >>> >>> >Diego, please go ahead and file a bug with Priority Critical against CS. >>> >Looks like there is some bug in a way to commit db statement in >>> >transaction resulting in the DB not being updated: >>> > >>> >2014-03-17 13:14:15,431 TRACE [db.Transaction.Transaction] >>> >(RouterMonitor-1:null) txn: Not committing because transaction started >>> >elsewhere: >>> >>> >-GenericDaoBase.update:814-ComponentInstantiationPostProcessor$Interceptor >>> >>> >Dispatcher.intercept:125-GenericDaoBase.update:774-ComponentInstantiationP >>> >>> >ostProcessor$InterceptorDispatcher.intercept:125-GenericDaoBase.update:126 >>> >>> >2-ComponentInstantiationPostProcessor$InterceptorDispatcher.intercept:125- >>> >>> >VirtualNetworkApplianceManagerImpl$NetworkUsageTask.run:948-Executors$Runn >>> >>> >ableAdapter.call:471-FutureTask$Sync.innerRunAndReset:351-FutureTask.runAn >>> >>> >dReset:178-ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201:165- >>> >ScheduledThreadPoolExecutor$ScheduledFutureTask.run:267 / >>> >>> >-VirtualNetworkApplianceManagerImpl$NetworkUsageTask.run:900-Executors$Run >>> >>> >nableAdapter.call:471-FutureTask$Sync.innerRunAndReset:351-FutureTask.runA >>> >ndReset:178-ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201:165 >>> >>> >-ScheduledThreadPoolExecutor$ScheduledFutureTask.run:267-ThreadPoolExecuto >>> >r.runWorker:1146-ThreadPoolExecutor$Worker.run:615-Thread.run:701 : >>> >update, update, update, >>> >>> >-VirtualNetworkApplianceManagerImpl$NetworkUsageTask.run:900-Executors$Run >>> >>> >nableAdapter.call:471-FutureTask$Sync.innerRunAndReset:351-FutureTask.runA >>> >ndReset:178-ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201:165 >>> >>> >-ScheduledThreadPoolExecutor$ScheduledFutureTask.run:267-ThreadPoolExecuto >>> >r.runWorker:1146-ThreadPoolExecutor$Worker.run:615-Thread.run:701, >>> > >>> > >>> > Attach the latest log from this email to the bug. >>> > >>> >Unfortunately, there is no way to workaround it on your side as its a >>> >code issue. >>> > >>> >-Alena. >>> > >>> >From: Diego Spinola Castro >>> ><spinolacas...@gmail.com<mailto:spinolacas...@gmail.com>> >>> >Date: Monday, March 17, 2014 at 11:25 AM >>> >To: Alena Prokharchyk >>> ><alena.prokharc...@citrix.com<mailto:alena.prokharc...@citrix.com>> >>> >Cc: "users@cloudstack.apache.org<mailto:users@cloudstack.apache.org>" >>> ><users@cloudstack.apache.org<mailto:users@cloudstack.apache.org>> >>> >Subject: Re: NetworkUsage >>> > >>> >Hi, sorry. i went lost with this huge log. >>> > >>> > >>> > >>> >2014-03-17 15:10 GMT-03:00 Alena Prokharchyk >>> ><alena.prokharc...@citrix.com<mailto:alena.prokharc...@citrix.com>>: >>> >Diego, the log you've provided, doesn't contain "Rolling back the >>> >transaction" statement. You should wait for this particular error to >>> >happen (while logging on TRACE), and then take a log snippet. >>> > >>> >From: Diego Spinola Castro >>> ><spinolacas...@gmail.com<mailto:spinolacas...@gmail.com>> >>> >Date: Monday, March 17, 2014 at 11:00 AM >>> >To: Alena Prokharchyk >>> ><alena.prokharc...@citrix.com<mailto:alena.prokharc...@citrix.com>> >>> >Cc: "users@cloudstack.apache.org<mailto:users@cloudstack.apache.org>" >>> ><users@cloudstack.apache.org<mailto:users@cloudstack.apache.org>> >>> >Subject: Re: NetworkUsage >>> > >>> >Hi, Alena. thanks for the tips, i got some logs: >>> >http://www.fpaste.org/86141/13950791/ >>> > >>> > >>> >2014-03-17 14:27 GMT-03:00 Alena Prokharchyk >>> ><alena.prokharc...@citrix.com<mailto:alena.prokharc...@citrix.com>>: >>> >Diego, >>> > >>> >Unfortunately CS lock lack the needed info for debugging the problem. I >>> >see only transaction roll back statement: >>> > >>> >2014-03-17 06:01:22,363 DEBUG [db.Transaction.Transaction] >>> >(RouterMonitor-1:null) Rolling back the transaction: Time = 2 Name = >>> >>> >-VirtualNetworkApplianceManagerImpl$NetworkUsageTask.run:900-Executors$Run >>> >n >>> >>> >ableAdapter.call:471-FutureTask$Sync.innerRunAndReset:351-FutureTask.runAn >>> >d >>> >>> >Reset:178-ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201:165- >>> > >>> > >>> >To debug it further, you need to change the log level from DEBUG to >>> TRACE >>> >for a certain class by modifying log4 properties (no MS restart is >>> >required) >>> > >>> > >>> >Here is the link on how to do it: >>> > >>> > >>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Useful+Tips+and+Tri >>> >c >>> >ks, "How to enable trace logging for a particular class in CloudStack?² >>> >section. Replace category name with: >>> > >>> ><category name=³com.cloud.utils.db"> >>> > <priority value=³TRACE"/> >>> > </category> >>> > >>> > >>> > >>> > >>> > >>> >After the change is done, wait till you see TRACE statements in the log, >>> >and till the error "Rolling back the transaction" occurs again. Take the >>> >log snippet and send it out. After you are done with that, you will need >>> >to remove TRACE config from log4 properties file, otherwise your logs >>> will >>> >grow huge. >>> > >>> >-Alena. >>> > >>> > >>> >On 3/17/14, 9:48 AM, "Diego Spinola Castro" >>> ><spinolacas...@gmail.com<mailto:spinolacas...@gmail.com>> >>> >wrote: >>> > >>> >>Hi guys, i've been running into a issue when getting network traffic >>> from >>> >>usage server. All records are Zero, i have a CS 4.2.0 install. >>> >> >>> >>Here the usage.logs http://www.fpaste.org/86116/13950748/ >>> >>And management-server.log http://www.fpaste.org/86118/07486613/ >>> >> >>> >>Does anyone know what is wrong ? >>> > >>> > >>> > >>> >>> >> >