Yesterday I came across this idea of caching all the data that I browse on my hard disk so that the average load time of a website decreases. Actually the idea is I’ll cache all the static data that I browse like images, static html pages, CSS files and similar things which does not change frequently and can be served from the cache. But while setting up the proxy server on my machine, I faced the problem that my machine which is going to act as a proxy server is behind my institute’s proxy. So, a simple caching proxy server can’t serve my needs and I have to really figure out how to setup a hierarchical proxy server. Below we’ll see how to setup a hierarchical proxy server.
When I thought of setting up a caching proxy server, squid immediately struck my mind. Actually I don’t know about any other proxy servers. I never setup proxy server before this ( I tried a lot of time, but in vain). So, I started googling about squid setup. There were a lot of tutorials, but either they were too small to get things going or they were too verbose that I couldn’t manage to read them. So, I directly jump into squid configuration file squid.conf . And with references from here and there, I managed to setup the proxy server successfully.
Note: The configurations below worked on Fedora 7 with squid 2.6STABLE16. The same configurations may work with other squid versions and on other operating systems as well, but try them at your own risk.
Part 1 : Setting up simple proxy server with squid
Setting up a very simple and usable proxy server is really easy. You need to add/edit only 2-3 lines /etc/squid/squid.conf to get started.
Add your ip to the access list.
acl myip src 172.17.8.175 #<your_ip_which_will_use_the_proxy_server> (e.g. )
http_access allow myip
http_port 8080 #<http_proxy_port> (this is 3128 by default. you can set it to anything you like. e.g. 8080)
Save the squid.conf file. Then issue these commands.
[root@localhost squid]# squid -z [Enter] (as root) (This needs to be executed only once.)
[root@localhost squid]# service squid start [Enter] (as root)
If you want to start the squid server on boot, issue this command.
[root@localhost squid]# chkconfig --level 345 squid on [Enter] (as root)
Now, your machine is a proxy server. You can setup your browser to use the machine as a proxy server.
The proxy server will work only if your machine has a public IP and is directly connected to internet.
Part 2: Setting up a hierarchical caching proxy server with squid
The above setup works fine if a machine is directly connected to internet. But my machine itself is behind a proxy, so setting up a proxy on my machine is of no use unless the proxy on my machine uses the institute proxy for connecting to internet. So, here we jump into squid.conf again and this time we have to really do some brain storming. If you are a newbie to Linux and don’t know how to make a system work when nothing seems to help, you will probably be better off by using institute’s proxy.
Here is the scenario.
1. Your browser sends a content request to proxy on your machine.
2. Check: if a cache HIT from institute proxy cache (HIT means content was found in cache)
2a. Check: if content is older than the original upstream content
2aa. Fetch content from upstream and serve the client
2ba. Serve the content from the cache
3. Check: if cache HIT from proxy on your machine
3a. Check: if content is older than the original upstream content
3aa. Fetch content from upstream and serve the client
3ba. Serve the content from the cache
4. Cache MISS from both the proxies
4a. Fetch the content from upstream and serve the client
The above method of operation is very basic and is my understanding of squid. It may not be the exact squid behavior.
Now, lets see the configurations needed for setting up the hierarchical caching proxy server with squid.
I assume that we already have squid setup at institute’s proxy whether in caching mode or not. The best way to add/edit the following lines in your squid.conf is to search for particular parameter and then edit the value to set as given.
I also assume that you have simple proxy server setup on your machine and now we want to make it act as child proxy of the institute’s proxy.
# Your local machine will act as a sibling proxy
cache_peer 172.17.8.175 sibling 3128 3130 no-query weight=10
# The institute's proxy server will act as a parent proxy
# 'default' mean the last-resort
cache_peer 192.168.36.204 parent 8080 3130 no-query proxy-only no-digest default
# allow accessing peer cache for access list 'myip'
cache_peer_access 172.17.8.175 allow myip
# Don't cache dynamic content
hierarchy_stoplist cgi-bin ?
acl QUERY urlpath_regex cgi-bin \?
cache deny QUERY
# Size of main memory to be used for caching
cache_mem 200 MB
# max size of content to be stored in main memory
maximum_object_size_in_memory 7000 KB
# policy for cache replacement if memory is full
cache_replacement_policy heap LFUDA
# the directory to be used for storing cache on your hdd
cache_dir aufs /var/spool/squid 200 16 256
# max file descriptor open at a time .. 0(unlimited)
# min object size to cache on hdd
minimum_object_size 0 KB
# max object size to cache on hdd
maximum_object_size 16384 KB
# access log
access_log /var/log/squid/access.log squid
refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern . 0 20% 4320
store_avg_object_size 20 KB
acl apache rep_header Server ^Apache
broken_vary_encoding allow apache
refresh_stale_hit 5 seconds
acl SSL_ports port 443 563 1863 5190 5222 5050 6667
# Allow AIM protocols
acl AIM_ports port 5190 9898 6667
acl AIM_domains dstdomain .oscar.aol.com .blue.aol.com .freenode.net
acl AIM_domains dstdomain .messaging.aol.com .aim.com
acl AIM_hosts dstdomain login.oscar.aol.com login.glogin.messaging.aol.com toc.oscar.aol.com irc.freenode.net
acl AIM_nets dst 22.214.171.124/255.255.0.0
acl AIM_methods method CONNECT
http_access allow AIM_methods AIM_ports AIM_nets
http_access allow AIM_methods AIM_ports AIM_hosts
http_access allow AIM_methods AIM_ports AIM_domains
# Allow Yahoo Messenger
acl YIM_ports port 5050
acl YIM_domains dstdomain .yahoo.com .yahoo.co.jp
acl YIM_hosts dstdomain scs.msg.yahoo.com cs.yahoo.co.jp
acl YIM_methods method CONNECT
http_access allow YIM_methods YIM_ports YIM_hosts
http_access allow YIM_methods YIM_ports YIM_domains
# Allow GTalk
acl GTALK_ports port 5222 5050
acl GTALK_domains dstdomain .google.com
acl GTALK_hosts dstdomain talk.google.com
acl GTALK_methods method CONNECT
http_access allow GTALK_methods GTALK_ports GTALK_hosts
http_access allow GTALK_methods GTALK_ports GTALK_domains
# Allow MSN
acl MSN_ports port 1863 443 1503
acl MSN_domains dstdomain .microsoft.com .hotmail.com .live.com .msft.net .msn.com .passport.com
acl MSN_hosts dstdomain messenger.hotmail.com
acl MSN_nets dst 126.96.36.199/255.255.255.0
acl MSN_methods method CONNECT
http_access allow MSN_methods MSN_ports MSN_hosts
# Turn this off if hierarchical behavior is needed
never_direct deny myip
That’s the minimal configuration you need for running squid in hierarchical way. Save the squid.conf file and start/restart/reload the squid service. Setup your browser to use your machine as proxy and while using it’ll cache all the static content. You should experience some reduction in average page load time.
I am currently using squid in above configuration. And its turning out to be nice for me. I am browsing websites faster and saving a chunk of bandwidth for my institute.
Introduction of another proxy server increases the latency for dynamic content.
The above configurations and views are a result of my understanding of squid. If you feel this may break your system or it may have adverse effects, don’t use them. At least don’t use these on a production system.