On 03/09/2016 06:18 AM, Amos Jeffries wrote:
On 9/03/2016 4:59 a.m., Brendan Kearney wrote:
i have a roku4 device and it constantly has issues causing it to
buffer.  i want to try intercepting the traffic to see if i can smooth
out the rough spots.
Squid is unlikely to help with this issue.

"Buffering ..." issues are usually caused by:

- broken algorithms on the device consuming data faster than it lets the
remote endpoint be aware it can process, and/or
- network level congestion, and/or
- latency increase from excessive buffer sizes (on device, or network).


  i can install squid on the router device i have
and intercept the port 80/443 traffic, but i want to push the traffic to
my load balanced VIP so the "real" proxies can do the fulfillment work.
Each level of software you have processing this traffic increases the
latency delays packets have. Setups like this also add extra bottlenecks
which can get congested.

Notice how both of those things are items on the problem list. So adding
a proxy is one of the worst things you can do in this situation.

On the other hand, it *might* help if the problem is lack of a cache
near the client(s). You need to know that a cache will help though
before starting.


My advice is to read up on "buffer bloat". What the term means and how
to remove it from your network. Check that you have ICMP and ICMPv6
working on your network to handle device level issues and congestion
handling activities.

Then if the problem remains, check your traffic to see how much is
cacheable. Squid intercepts can usually cache 5%-20% of any network
traffic if there is no other caching already being done on that traffic
(excluding browser caches). With attention and tuning it can reach
soewhere around 50% under certain conditions.

Amos

_______________________________________________
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users
a bit about my router and network:

router - hp n36l microserver, 1.3 GHz Athlon II Neo CPU, 4 GB RAM, on board Gb NIC for WAN, HP nc364t 4x1Gb NIC using e1000e driver. the 4 ports on the nc364t card are bonded with 802.3ad and LACP and 9 VLANs are trunked across.

switch 1 - cisco sg500-52
switch 2 - cisco sg300-28

router is connected to switch 1 with a 4 port bond and switch 1 is connected to switch 2 with a 4 port bond. all network drops throughout the house are terminated to a patch panel and patched into the sg500. all servers are connected to the sg300, and have a 4 port bond for in-band connections and an IPMI card for out of band mgmt.

the router does firewall, internet gateway/NAT, load balancing, and routing (locally connected only, no dynamic routing such as ospf via quagga).

now, what i have done so far:

when if first got the roku4, i found issues with the sling tv app. hulu worked without issue, and continues without issue even now. i have looked into QoS, firewall performance tweaks, ring buffer increases, and kernel tuning for things like packets per second capacity. i also have roku SE devices, that have no issues in hulu or sling tv at all. having put up a vm for munin monitoring, i am able to see some details about the network.

QoS will not be of any value because none of the links i control are saturated or congested. everything is gig, except for the roku devices. the 4 is 100 Mb and the SE's are wifi. the only way for me to have QoS kick in is to artificially congest my links, say with very few ring buffers. i dont see this as a reasonable option at this point.

i have tuned my firewall policy in several ways. first, i stopped logging the roku HTTP/HTTPS traffic. very chatty sessions lead to lots of logs. each log event calls the "logger" binary, and i was paying penalties for starting a new process thousands of times to log the access events. i also reject all other traffic from the roku's instead of dropping the traffic. this helps with the google dns lookups the devices try, and i no longer pay the dns timeout penalties for that. i have also stopped the systemd logging and i am not paying the i/o penalty for writing those logs to disk. since i use rsyslog with RELP (Reliable Event Log Processing), all logging still goes on, i just have reliable syslog over tcp with receipt acknowledgment, and cascading FIFO queue to memory and then to disk if need be. i believe this has helped reclaim i/o, interrupts and contexts, leading to some (minor) performance gains.

the hp nc364t quad gig nic's are bonded, and i see RX and TX errors on the bond interface (not the VLAN sub interfaces and not the physical p1pX interfaces). i increased the ring buffers on all 4 interfaces from the default of 256 to 512 and then to 1024 and then to 2048, testing each change along the way. 1024 seems to the best so far and i dont think there are any issues with buffer bloat. i have used http://www.dslreports.com/speedtest to run speed tests since it has a buffer bloat calculation built in. i have tested using my load balanced squid instances as well as direct outbound with no proxy. it does not detect any buffer bloat at all. with the ring buffers set to 1024, there is a small but perceptible performance gain and anecdotally, i notice page loads are quicker. from what i have read about buffer bloat, it is only related to interface buffer queues and not memory/disk caches such as squid. it seems that any delay seen is at the beginning of a stream and not during an in-progress conversation.

i am currently endeavoring to tune the kernel on the router box, and have found some info that seems to have helped in small ways. having talked to a coworker who is well versed in all things related to the network stack, he tells me the info found in this blog, http://www.nateware.com/linux-network-tuning-for-2013.html, is good for a server that is handling the connections as opposed to a router that is routing the traffic. i have added the suggested tweaks to my box, as it is a load balancer (TCP Proxy) and does handle connections in addition to routing. there is also an article on ars technicha about building your own linux router. the teaser article was a decent read, but the next installment should have more meat-and-potatoes in it and i am waiting on that, to see what gaps i have in my setup. the second article should be dropping soon.

i put together a monitoring vm and have munin pulling snmp stats from all my infrastructure. i can see the bandwidth usage and found some patterns. the roku SE's only use about 3 Mb when playing back. the roku4 uses between 5 and 6 Mb. hulu and sling tv both work fine on the SE's, and hulu works fine on the roku4. sling tv on the roku4 degrades in video (pixelates), sound quality fails and goes from stereo to mono, and ultimately the round ring of buffering destroys the end user experience. of course i see in the munin graphs when the buffering occurs, but i have not been able to correlate the buffering event to anything specific.

i setup a span port on the sg500 switch and recorded a 23 minute session of me watching tv in a packet capture. the resulting 567 MB pcap has some interesting data. first, there is no ICMP traffic in it. so the PMTUD/ICMP Type 3, Code 4/Packet Fragmentation suggestion likely wont pan out. i still added the rule to my firewall, though as it is sound logic to allow it outbound. second, the stream is in the clear and is downloaded in partial content (HTTP/206) chunks. below are the response headers for one of the streams in the capture:

HTTP/1.1 206 Partial Content
ETag: "482433a6a462d26fc994b06a1856547f"
Last-Modified: Thu, 25 Feb 2016 01:00:22 GMT
Server: Dynapack/0.1."152-4" Go/go1.5
X-Backend: pcg15dynpak12
XID-Fetch: 150850146
Grace: none
Linear-Cache-Host: cg5-lcache001
Linear-Cache: MISS
XID-Deliver: 150850145
Accept-Ranges: bytes
Cache-Control: public, max-age=604744
Expires: Thu, 03 Mar 2016 00:59:40 GMT
Date: Thu, 25 Feb 2016 01:00:36 GMT
Content-Range: bytes 143805-287607/318480
Content-Length: 143803
Connection: keep-alive
Content-Type: video/MP2T
access-control-max-age: 86400
access-control-allow-credentials: true
access-control-expose-headers: Server,range,hdntl,hdnts
access-control-allow-headers: origin,range,hdntl,hdnts
access-control-allow-methods: GET,POST,OPTIONS
access-control-allow-origin: *

the ETag, Last-Modified, Cache-Control and Expires headers indicate to me that i would be able to cache this content, so i believe there would be a benefit to getting squid into the mix here.

looking at the IO Graph in wireshark, i can see latency spikes during the buffering events, but modulates/undulates throughout the captured session. i am not sure i know enough about what i am looking at to make sense of it.

with all of this info, i do believe proxying this traffic will improve the situation. just how much improvement is yet to be seen. with what i think i need, in terms of intercepting the traffic, are there any glaring holes or pitfalls?

thanks for the assistance,

brendan
_______________________________________________
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users

Reply via email to