Implementing Transparent Caching using Squid
Central control – The user cannot change his/her browser to bypass the cache.
Not Robust – Because transparent caching relies on stable routed path between the client and the origin server which happens to pass through a “cached path,” it is susceptible to routing changes in the Internet. In other words, if a connection between a client and a cache is established and a routing change occurs which causes the client to take a path which no longer flows through the “diverting” network device, the session will break and the user will have to reload the page. If routes in the Internet are flapping, then results will be even more unpredictable.
User control – Transparent caching takes control away from the user. Many users have very strong biases about caching and will actually change ISPs to either avoid it or get it.
Browser dependency – For successful operation, many transparent caches rely on the browser supplying the host name of the origin server in the HTTP request header. This is required because these caches cannot access the destination IP address of the origin server from the IP address of the packet. Therefore, upon a cache miss, they cannot determine the origin server address to send the request to. Some early browsers do not provide this information and therefore will not work properly with these transparent caches, but 90% of today’s browsers satisfy the above. In the real world, Many network providers have observed that a significant amount of HTTP requests are for non-cacheable content (as much as 35-45%). The hit rate and performance of the cache is inversely proportional to the amount of non-cacheable content sent to the cache.
Policy based routing.
Using smart switching.
By setting Squid Box as a Gateway.
L4 SWITCH – An L4 switch operates at Layer 4 in the OSI model – the Transport layer. L4 switches base their switching decisions on information in the TCP header, and TCP is a protocol that resides at Layer 4 in the OSI seven-layer model. These switches determine where to pass the traffic based on the port number.
L7 SWITCH – At the time of this writing, more sophisticated switches are becoming available. These new switches operate at Layer 7 of the OSI model – the Application layer. Because these switches operate at Layer 7, they can understand URLs and can understand much more about the traffic than an L4 switch can. An L7 switch provides the same features that an L4 switch provides plus additional, more sophisticated features. |
Some L4 and L7 switches can switch more than a gigabyte of data.
For HTTP transparent caching, they partition traffic based on the requested Web server’s IP address.
For HTTP transparent caching, they can be configured to send traffic directly to the Internet if a Web cache fails.
How the L7 switch is different :
An L7 switch can partition HTTP client traffic based on the requested URL. | |
For HTTP requests, the L7 switch can look at the request and determine whether the object is cacheable. With an L7 switch, requests for obviously non-cacheable objects, such as URLs with cookies and CGI, will bypass the cache. Non-cacheable objects are then obtained directly from a Web server. |
Performance comparison between L4 and L7 switches :
The performance of L4 and L7 switches is similar. However, because the L7 switch looks more closely at TCP/IP packets for port 80 or port 119, its response time is slightly slower than that of an L4 switch. |
Packets headed for port 80 on some computer on the Internet must be redirected by the router or L4 switch (As explained before) to the computer where squid is running. This can be achieved by setting squid box as a Gateway also.
In Squid Box, packets which are redirected by a smart switch or router to the Squid box still need to be redirected to the port where Squid is listening on. Redirecting these packets cannot be done by Squid. Redirecting packets must be done by the Linux kernel, using the IP-chains program. The kernel then receives a packet on port 80, looks at the firewall configuration, and adjusts the packet appropriately i.e. by changing the destination port to 3128, or whatever port Squid is running on. If you need IP Filter redirection, then use the -enable-ipf-transparent configure option in Squid to support certain HTTP clients (HTTP/1.0 clients, NOT sending the Host header). However, normal browsing using the popular browsers will work even without it.
Positive :
Low-cost solution. |
Using smart switching
Positive :
Fail over : For HTTP transparent caching, if a Squid proxy server is down or is too busy, the switch passes the traffic to the Internet or, if there are multiple Squid Proxy Servers, to another Squid proxy server it is configured to recognize. |
Negative:
High Cost |
Comparison of using a router to using an L4 or L7 switch
For many routers, complex filters, such as a filter for intercepting HTTP (port 80) or NNTP (port 119) requests, can have a dramatic negative impact on the performance of the router. Conversely, L4 and L7 switches are designed to intercept packets of different types. With a policy-based router (non-Cisco router or a Cisco router not running WCCP), the system administrator must manually set up how requests will be distributed, which might result in less efficient partitioning of requests than if a switch were used. |
Squid box as a Gateway
Positive:
Low cost of implementation |
Negative :
It is beneficial only for small LAN and WAN users. |
Conclusion
This paper has outlined the various methods of implementing Transparent Caching using Squid. Each of these methods has its advantages, the choice is left to the implementation team which has to decide based on their network, data access pattern, volume of data, request rate, criticality and budget available. Web caching is a matured technology and Squid is very widely used web caching application, the choice and method of implementation as said may vary, although other features present in the implementation may continue or be enhanced, the underlying fundamentals will be the same as those discussed here. There are other tools available to supplement the system like reporting tools, configuration and management tools and load balancing for implementing multiple cache boxes. And finally the overall success largely depends on the configuration and fine-tuning of both Squid and Linux.