IntelligentMirror: GSOC Project Update

Brief Introduction

IntelligentMirror can be used to create a mirror of static HTTP content on your local network. When you download something (say a software package) from Internet, it is stored/cached on a local machine on your network and subsequent downloads of that particular software package are supplied from the storage/cache of the local machine. This facilitate the efficient usage of bandwidth and also reduces the average download time. IntelligentMirror can also do pre-fetching of RPM packages from fedora repositories spread all over the world and can also pre-populate the local repo with popular packages like mplayer, vlc, gstreamer which are normally accessed immediately after a fresh install.

Definition for a lay man

Think of Internet as a hard disk, your proxy server as a cache and your Intranet as a CPU. Now, whenever your CPU needs to process something, it needs data from cache. If data is not there in cache, it’ll be fetched from RAM and/or hard disk. IntelligentMirror sits on your proxy server and keep caching packages in a browsable manner which can be served via http for subsequent requests.

For further details about IntelligentMirror, go here.

Update

After getting the hosting space on fedorahosted.org, I pushed the code I have written. You can check the source tree here.

We are buidling IntelligentMirror as a plugin to squid which taps requests from clients and checks them against a cache. Checkout how to write a custom redirector or how to tap requests to squid. And acts accordingly. We are working on live streaming the partially downloaded package to the end user while caching it.

If you have any suggestion, feel free to leave them as a comment here or edit the wiki page 🙂

 

How To: Configure Squid Proxy Server

Mission

To configure squid for simple proxying without caching anything.

Use Cases

  1. When you want to have control on what people browse on your lan.
  2. When number of machine is more than the number of IP addresses you can afford to buy.
  3. When you want to help this holy world in saving some IPV4 addresses 😛

Assumptions

  1. You have a machine connected directly to internet that you are going to use as a proxy server for other machines on your network.
  2. The machines on your network are using 192.168.0.0/16 as private address space. You can use anyone/multiple address spaces of the available but for this howto we assume 192.168.0.0/16 as the local network.
  3. The local IP address of the machine which will run squid proxy server is 192.168.36.204. You can have any IP, but for this howto we assume this.

How to proceed

First of all ensure that you have squid installed. After installing squid, you need to set access control in squid configuration file which resides in /etc/squid by default. Open /etc/squid/squid.conf and add/edit following lines according to your preferences. Few lines already exist in the configuration file, you can add the rest.

# The port on which squid will listen for requests
http_port 8080
# If 'cgi-bin' or '?' is in query, squid should not check with neighbours'/parents' cache
# and should go to target web-server.
hierarchy_stoplist cgi-bin ?
# If url contains 'cgi-bin' or '?', then it must not be cached
acl QUERY urlpath_regex cgi-bin \?
cache deny QUERY
acl apache rep_header Server ^Apache
broken_vary_encoding allow apache
# Absolute path to squid access log.
access_log /var/log/squid/access.log squid
refresh_pattern ^ftp:           1440    20%     10080
refresh_pattern ^gopher:        1440    0%      1440
refresh_pattern .               0       20%     4320
# Access control list to control every IP address
acl all src 0.0.0.0/0.0.0.0
# Access control list for source machine in LAN
acl lan_src src 192.168.0.0/16
# Access control list for destination machine in LAN
acl lan_dst dst 192.168.0.0/16
# Access control list to manage squid cache
acl manager proto cache_object
# Access control list to define IP address allowed for source localhost
acl localhost src 127.0.0.1/255.255.255.255
# Access control list to define IP addresses allowed for localhost as destination
acl to_localhost dst 127.0.0.0/8
# Access control list to define Safe ports that should be allowed by default
acl SSL_ports port 443 563 1863 5190 5222 5050 6667
acl Safe_ports port 80          # http
acl Safe_ports port 21          # ftp
acl Safe_ports port 443         # https
acl Safe_ports port 70          # gopher
acl Safe_ports port 210         # wais
acl Safe_ports port 1025-65535  # unregistered ports
acl Safe_ports port 280         # http-mgmt
acl Safe_ports port 488         # gss-http
acl Safe_ports port 591         # filemaker
acl Safe_ports port 777         # multiling http
acl CONNECT method CONNECT
# Allow cache management only from localhost
http_access allow manager localhost
# Deny cache management from remote hosts
http_access deny manager
# Deny http access via all the ports which are not listed as safe
http_access deny !Safe_ports
# Deny all connections via all ports which are not listed as safe
http_access deny CONNECT !SSL_ports
# Allow http access from localhost
http_access allow localhost
# Allow http access from machines on LAN
http_access allow lan_src
http_access deny all
http_reply_access allow all
icp_access allow all
# Deny caching for everyone so that there is not caching at all
cache deny all
coredump_dir /var/spool/squid
# Never allow direct connection to machines on the internet
prefer_direct off
never_direct allow all
# Allow direct connetion if the destination machine is on LAN
always_direct allow lan_dst
# Delete this line if you don't have /etc/hosts file
hosts_file /etc/hosts
# Allow AIM connections
# Delete the following 9 lines if you don't want people to connect to AIM
acl AIM_ports port 5190 9898 6667
acl AIM_domains dstdomain .oscar.aol.com .blue.aol.com .freenode.net
acl AIM_domains dstdomain .messaging.aol.com .aim.com
acl AIM_hosts dstdomain login.oscar.aol.com login.glogin.messaging.aol.com toc.oscar.aol.com irc.freenode.net
acl AIM_nets dst 64.12.0.0/255.255.0.0
acl AIM_methods method CONNECT
http_access allow AIM_methods AIM_ports AIM_nets
http_access allow AIM_methods AIM_ports AIM_hosts
http_access allow AIM_methods AIM_ports AIM_domains
# Allow connections to Yahoo Messenger
# Delete the following 6 lines if you don't want people to connect to Yahoo Messenger
acl YIM_ports port 5050
acl YIM_domains dstdomain .yahoo.com .yahoo.co.jp
acl YIM_hosts dstdomain scs.msg.yahoo.com cs.yahoo.co.jp
acl YIM_methods method CONNECT
http_access allow YIM_methods YIM_ports YIM_hosts
http_access allow YIM_methods YIM_ports YIM_domains
# Allow connections to Google Talk
# Delete the following 6 lines if you don't want people to connect to Google Talk
acl GTALK_ports port 5222 5050
acl GTALK_domains dstdomain .google.com
acl GTALK_hosts dstdomain talk.google.com
acl GTALK_methods method CONNECT
http_access allow GTALK_methods GTALK_ports GTALK_hosts
http_access allow GTALK_methods GTALK_ports GTALK_domains
# Allow connections to MSN
# Delete the following 6 lines if you don't want people to connect to Google Talk
acl MSN_ports port 1863 443 1503
acl MSN_domains dstdomain .microsoft.com .hotmail.com .live.com .msft.net .msn.com .passport.com
acl MSN_hosts dstdomain messenger.hotmail.com
acl MSN_nets dst 207.46.111.0/255.255.255.0
acl MSN_methods method CONNECT
http_access allow MSN_methods MSN_ports MSN_hosts

Now, start the squid proxy server as

service squid start

Also, if you want squid to be started every time you boot the machine, execute the following command

chkconfig --level 345 squid on

You have a squid proxy server running now. You can ask clients to configure there browsers to use 192.168.36.204 as a proxy server with 8080 as proxy port. Command line utilities like elinks, lynx, yum, wget etc. can be asked to use proxy by exporting http_proxy variable as below. Users can also add these lines to ~/.bashrc file to avoid exporting every-time.

export http_proxy='http://192.168.36.204:8080'
export ftp_proxy='http://192.168.36.204:8080'

I highly recommend the book “Squid Proxy Server 3.1: Beginner’s Guide (Paperback)” for further reading.

 

How To: Configure Hierarchicy of Proxy Servers (Squid)

Yesterday I came across this idea of caching all the data that I browse on my hard disk so that the average load time of a website decreases. Actually the idea is I’ll cache all the static data that I browse like images, static html pages, CSS files and similar things which does not change frequently and can be served from the cache. But while setting up the proxy server on my machine, I faced the problem that my machine which is going to act as a proxy server is behind my institute’s proxy. So, a simple caching proxy server can’t serve my needs and I have to really figure out how to setup a hierarchical proxy server. Below we’ll see how to setup a hierarchical proxy server.

Approach

When I thought of setting up a caching proxy server, squid immediately struck my mind. Actually I don’t know about any other proxy servers. I never setup proxy server before this ( I tried a lot of time, but in vain). So, I started googling about squid setup. There were a lot of tutorials, but either they were too small to get things going or they were too verbose that I couldn’t manage to read them. So, I directly jump into squid configuration file squid.conf . And with references from here and there, I managed to setup the proxy server successfully.

Note: The configurations below worked on Fedora 7 with squid 2.6STABLE16. The same configurations may work with other squid versions and on other operating systems as well, but try them at your own risk.

Part 1 : Setting up simple proxy server with squid

Setting up a very simple and usable proxy server is really easy. You need to add/edit only 2-3 lines /etc/squid/squid.conf to get started.

Add your ip to the access list.

1
2
3
acl myip src 172.17.8.175 #<your_ip_which_will_use_the_proxy_server> (e.g. )
http_access allow myip
http_port 8080 #<http_proxy_port> (this is 3128 by default. you can set it to anything you like. e.g. 8080)

Save the squid.conf file. Then issue these commands.

1
2
[root@localhost squid]# squid -z [Enter] (as root) (This needs to be executed only once.)
[root@localhost squid]# service squid start [Enter] (as root)

If you want to start the squid server on boot, issue this command.

[root@localhost squid]# chkconfig --level 345 squid on [Enter] (as root)

Now, your machine is a proxy server. You can setup your browser to use the machine as a proxy server.

Conditions

The proxy server will work only if your machine has a public IP and is directly connected to internet.

Part 2: Setting up a hierarchical caching proxy server with squid

The above setup works fine if a machine is directly connected to internet. But my machine itself is behind a proxy, so setting up a proxy on my machine is of no use unless the proxy on my machine uses the institute proxy for connecting to internet. So, here we jump into squid.conf again and this time we have to really do some brain storming. If you are a newbie to Linux and don’t know how to make a system work when nothing seems to help, you will probably be better off by using institute’s proxy.

Here is the scenario.

1
2
3
4
5
6
7
8
9
10
11
12
13
1. Your browser sends a content request to proxy on your machine.
2. Check: if a cache HIT from institute proxy cache (HIT means content was found in cache)
	2a. Check: if content is older than the original upstream content
		2aa. Fetch content from upstream and serve the client
	2b. else
		2ba. Serve the content from the cache
3. Check: if cache HIT from proxy on your machine
	3a. Check: if content is older than the original upstream content
		3aa. Fetch content from upstream and serve the client
	3b. else
		3ba. Serve the content from the cache
4. Cache MISS from both the proxies
	4a. Fetch the content from upstream and serve the client

The above method of operation is very basic and is my understanding of squid. It may not be the exact squid behavior.

Now, lets see the configurations needed for setting up the hierarchical caching proxy server with squid.

Assumptions

I assume that we already have squid setup at institute’s proxy whether in caching mode or not. The best way to add/edit the following lines in your squid.conf is to search for particular parameter and then edit the value to set as given.

I also assume that you have simple proxy server setup on your machine and now we want to make it act as child proxy of the institute’s proxy.

Configuration

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# Your local machine will act as a sibling proxy
cache_peer 172.17.8.175 sibling 3128 3130 no-query weight=10
# The institute's proxy server will act as a parent proxy
# 'default' mean the last-resort
cache_peer 192.168.36.204 parent 8080 3130 no-query proxy-only no-digest default
# allow accessing peer cache for access list 'myip'
cache_peer_access 172.17.8.175 allow myip
# Don't cache dynamic content
hierarchy_stoplist cgi-bin ?
acl QUERY urlpath_regex cgi-bin \?
cache deny QUERY
# Size of main memory to be used for caching
cache_mem 200 MB
# max size of content to be stored in main memory
maximum_object_size_in_memory 7000 KB
# policy for cache replacement if memory is full
cache_replacement_policy heap LFUDA
# the directory to be used for storing cache on your hdd
cache_dir aufs /var/spool/squid 200 16 256
# max file descriptor open at a time .. 0(unlimited)
max_open_disk_fds 0
# min object size to cache on hdd
minimum_object_size 0 KB
# max object size to cache on hdd
maximum_object_size 16384 KB
# access log
access_log /var/log/squid/access.log squid
refresh_pattern ^ftp:           1440    20%     10080
refresh_pattern ^gopher:        1440    0%      1440
refresh_pattern .               0       20%     4320
store_avg_object_size 20 KB
acl apache rep_header Server ^Apache
broken_vary_encoding allow apache
refresh_stale_hit 5 seconds
acl SSL_ports port 443 563 1863 5190 5222 5050 6667
# Allow AIM protocols
acl AIM_ports port 5190 9898 6667
acl AIM_domains dstdomain .oscar.aol.com .blue.aol.com .freenode.net
acl AIM_domains dstdomain .messaging.aol.com .aim.com
acl AIM_hosts dstdomain login.oscar.aol.com login.glogin.messaging.aol.com toc.oscar.aol.com irc.freenode.net
acl AIM_nets dst 64.12.0.0/255.255.0.0
acl AIM_methods method CONNECT
http_access allow AIM_methods AIM_ports AIM_nets
http_access allow AIM_methods AIM_ports AIM_hosts
http_access allow AIM_methods AIM_ports AIM_domains
# Allow Yahoo Messenger
acl YIM_ports port 5050
acl YIM_domains dstdomain .yahoo.com .yahoo.co.jp
acl YIM_hosts dstdomain scs.msg.yahoo.com cs.yahoo.co.jp
acl YIM_methods method CONNECT
http_access allow YIM_methods YIM_ports YIM_hosts
http_access allow YIM_methods YIM_ports YIM_domains
# Allow GTalk
acl GTALK_ports port 5222 5050
acl GTALK_domains dstdomain .google.com
acl GTALK_hosts dstdomain talk.google.com
acl GTALK_methods method CONNECT
http_access allow GTALK_methods GTALK_ports GTALK_hosts
http_access allow GTALK_methods GTALK_ports GTALK_domains
# Allow MSN
acl MSN_ports port 1863 443 1503
acl MSN_domains dstdomain .microsoft.com .hotmail.com .live.com .msft.net .msn.com .passport.com
acl MSN_hosts dstdomain messenger.hotmail.com
acl MSN_nets dst 207.46.111.0/255.255.255.0
acl MSN_methods method CONNECT
http_access allow MSN_methods MSN_ports MSN_hosts
# Turn this off if hierarchical behavior is needed
nonhierarchical_direct off
never_direct deny myip
hosts_file /etc/hosts
coredump_dir /var/spool/squid

That’s the minimal configuration you need for running squid in hierarchical way. Save the squid.conf file and start/restart/reload the squid service. Setup your browser to use your machine as proxy and while using it’ll cache all the static content. You should experience some reduction in average page load time.

Advantages

I am currently using squid in above configuration. And its turning out to be nice for me. I am browsing websites faster and saving a chunk of bandwidth for my institute.

Disadvantages

Introduction of another proxy server increases the latency for dynamic content.

Notice

The above configurations and views are a result of my understanding of squid. If you feel this may break your system or it may have adverse effects, don’t use them. At least don’t use these on a production system.