My New Book on Squid Proxy Server (A Beginner’s Guide)

I have not blogged since a long time mainly because I was a bit busy authoring a book Squid Proxy Server 3.1: Beginner’s Guide for Packt Publications. The book is an introductory guide to Squid (especially the new features in Squid-3 series) covering both the basic aspects as well as the in dept details for advanced users. The book focuses on learning by doing and provides example scenarios for the concepts discussed throughout the book. Access control configuration, reverse proxying, interception proxying, authentication and other features have been discussed in details with examples.

Checkout the links below:

 

Tip: Multiproxy Switch : Easily use multiple proxies in Firefox

A lot of people (especially working people with mobile devices like notebook/netbooks) need to use different proxy servers at home and office. There are several Firefox extensions available to achieve the required functionality but IMHO Multiproxy Switch(Mozilla Addon Page) is the best because

  1. Its simple and easy to use. It does what it should. No fancy/extra terrestrial stuff. Just switch proxies :)
  2. Easy and Firefox like interface to specify different proxies. Many extensions add their own fancy interfaces for specifying proxies which eventually suck big time.
  3. I am a fan of this one. The No-Proxy list. I could never understand those regular expression based no-proxy lists in FoxyProxy. Multiproxy Switch has Firefox like No-Proxy list which rocks and understandable :)

If you happen to come across a better proxy switcher for Firefox, do let us know :)

 

Crack: Google Authentication Services are Vulnerable

There is a vulnerability in the way Google authentication service works. Whenever you login to any of the Google’s online services like GMail, Orkut, Groups, Docs, Youtube, Calendar etc., you are redirected to an authentication server which authenticates against the entered username and password and redirect back to the required service (GMail, Youtube etc.) setting the session variables.

Now, if you are able to grab the url used to set the session variables, you can login as the user to whom that url belongs from any machine on the Internet (need not be the machine belonging to the same subnet) without entering the username and password of the user.

The proxy servers in the organizations can be used to exploit this vulnerability. Squid is the most popular proxy server used. In the default configuration, squid strips the query terms of a url before logging. So, this vulnerability can’t be exploited. But if you turn off the stripping mechanism by adding the line shown below, then squid will log the complete url.

strip_query_terms off

So, after turning stripping mechanism off, the log will contain urls which will look like this

http://www.google.co.in/accounts/SetSID?ssdc=1&sidt=Q5UrfB0BAAA%3D.oHVGErODzffQ%2Bms%2FOKfk53g5naReDKehRNHOBsmJlBu3VTNXjF03SbgX%2FVEEhmImhR4mlu5IAAjM%2BdbuXvMMSIb0oU8IGCYpnLcSNkbCIrG%2BQnm81YmX5%2Brcrq7U6Qx65%2F1yaQ2NzgmKD94jg0Iw13iXDen3qD5qn6L%2FhmmYWwTrcOeuTzGbO%2BAehpjEU3mrWapRafaq3b4kxyigJ68s8QrGQqZTINNE%2Bs%2BoIkZWmGt5kNzoT8fkVAsWJeu3CKFkxj4oVMngeDvpwb1nyFpsJCltOzmAr46fTxVJSpvQdx0%3D.BMLtjUdIDCcuszktZSvYzA%3D%3D&continue=http%3A%2F%2Fwww.orkut.com%2FRedirLogin.aspx%3Fmsg%3D0%26ts%3D1226148773097%3A1226148773386%3A1226148774868%26auth%3DDQAAAIcAAAC1pPE1QT4chKgrU4B3oyKZrQRkEVPtYlclpESQoXV_d9x9gdoe75Z0hfJ_22Pn5tVMR7j-uV5YCps3NB48L0bFlDeX-4PGHVT6Loztp_ru3tAy_gxDa9_YAEbz4d9CO4wD2VTKtzax9zvpGgrnJVZQfoWPkkIomUmxDtVGoH7g3fA3UjS0vdBJ2PJtgFMElso

Replace .co.in with your tld specific to your country. If you paste this url in any browser, it’ll directly log you in and you can do whatever you want to that account. Remember that all such urls remains valid only for two minutes. So, if you use that url after two minutes, it’ll lead nowhere.

At the time of writing this post Orkut, Google Docs, Google Calendar, Google Books and Youtube are vulnerable.

So, make sure your squid has stripping mechanism turned on and your squid server is properly firewalled.

You can watch the Video proof for Orkut on Blip.tv, Youtube.

 

IntelligentMirror: RPM and DEB Caching Improved (0.5)

After spending a lot of time with youtube cache, now I am trying to devote some time to update intelligentmirror with required features and enhancements that youtube cache already enjoys. In the same direction here is version 0.5 of intelligentmirror.

Improvements

  • Added max_parallel_downloads options to controll the maximum threading fetching from upstream to cache the packages.
  • Fine grained control on logging via max_logfile_size and max_logfile_backups option.
  • Added setup script to help you install intelligentmirror. No need to execute commands one by one for installation. Just run
 [root@localhost]# python setup.py install [ENTER]
  • Added update script (update-im). So in case you decide to change the locations for caching rpm/deb packages, just run
 [root@localhost]# update-im [ENTER]

OR

 [root@localhost]# /usr/sbin/update-im [ENTER]
  • Download scheduler similar to youtube cache is added to facilitate the download queing in case of large number of requests.
  • More informative logging.
  • cache.log is not flooding anymore with XMLRPC logs and python tracebacks.
  • Added extensive exception handling thoughout the program.

Availability

  1. RPMs for Fedora/Red Hat/Cent OS
  2. Source RPMs for Fedora/Red Hat/Cent OS
  3. Source Tar balls

Installation and Configuration

INSTALL and README files should help you throughout the installation and configuration process.

In case you have questions, ask them here in comments. Suggestions for improvement are welcome :)

 

IntelligentMirror Gets Even More Intelligent (1.0.1)

Warning : This version of IntelligentMirror is compatible with only squid-2.7 as of now. It is NOT compatible even with squid-3.0.

IntelligentMirror Version 1.0.1

I have been following squid development regularly (at least the part in which I am interested) and they have introduced a new directive in squid-2.7 known as StoreUrlRewrite (storeurl_rewrite_program). Using this directive you can instruct squid to cache url A (http://abc.com/foo/bar/version/crap.rpm) as url B (http://proxy.fedora.co.in/intelligentmirror/crap.rpm). In simple words you can direct squid to cache any url as any other url without any extra efforts.

So keeping the above directive in mind, I have worked out a different version of intelligentmirror especially for squid-2.7.

IntelligentMirror : Old method of operation

  1. IntelligentMirror gets a client request for a URL.
  2. Check: if URL is not in (RPM, metadata file)
    • Then its none of our business.
    • Let proxy handle it the normal way.
    • Done and exit.
  3. Check: if RPM/metadata is available in cache
    • Stream the RPM/metadata from cache.
    • Done and exit.
  4. Check: if RPM/metadata is not available in cache
    • Download in parallel for caching in some dir and stream.
    • Done and exit.

IntelligentMirror : New method of operation

  1. IntelligentMirror gets a client request for a URL.
  2. Check: if request for rpm
    1. Direct squid to cache the request as http://<same_host_all_the_time>/intelligentmirror/<rpmname>.rpm
  3. Check: if request for deb
    1. Direct squid to cache the request as http://<same_host_all_the_time>/intelligentmirror/<debname>.deb
  4. Done and exit.

So your squid will see every request for an rpm package as a request http://<same_host_all_the_time>/intelligentmirror/<rpmname>.rpm. So, if you happen to request the same rpm from a different mirror, it’ll still be served from cache :)

Improvements

  1. No need to check if the url supplied by squid is for rpm or not because storeurl_rewrite_program has an acl controller attached which will invoke intelligentmirror for urls ending in .rpm .
  2. No need to check if the url is already cached or not. No need to worry about the directory where you are going to store the packages. No human intervention is needed in maintaining the cache. Almighty squid is doing everything for us.
  3. No need to worry if the target package has changed because of the resigning or whatever because squid will do that for you.
  4. No need to actually download the package in parallel for caching because squid is already doing that.
  5. No need to worry about the hashing algorithms and storage optimizations for the cached content.

Availability

  1. RPM for Fedora/Red Hat
  2. Source RPM for Fedora/Red Hat
  3. Source Tarball

Install and Configure

The install and configure files should be enough to guide you through the installation if you choose the tar ball way. Otherwise you can always install from rpm from the above link.

Note1: You have to configure your squid to use intelligentmirror as a plugin even if you install via rpm. Check the configure file at the above link.

Note2: StoreUrlRewrite will probably be available in squid-3.1.

 

IntelligentMirror: RPM and DEB Caching Improved (0.4)

IntelligentMirror version 0.4 is available now. There have been significant improvements in intelligent mirror since last release.

Improvements

  1. Fixed defunct process problem. You will not see defunct python processes hanging around anymore. Previously every forked daemon used to got defucnt because parent never waited for the forked child to finish.
  2. IntelligentMirror now supports caching of Debian packages just like rpms. So now IntelligentMirror is best suited shared environments where people have different tastes.
  3. Intelligent Mirror now uses url_rewrite_program instead of redirect_program. This boosts the efficiency of IntelligentMirror by a significant factor as url_rewrite_program has an acl controller url_rewrite_access. And using url_rewrite_access only requests for rpm/deb packages will be passed to Intelligent Mirror. So, IM now need not process each and every incoming request. Also, it has redirector_bypass directive which will bypass IM in case all the instances of IM are busy serving requests. So, squid will not die with a fatal error in case of huge requests.
  4. Options to enable/disable caching for rpm and Debian packages have been added.
  5. Options to control the total size of caching directories and the size of individual package to be cached have also been introduced.
  6. Proxy authentication is also supported now just the way it is supported in yum.
  7. Packages are not checked for last-modified time anymore. Because in principle two rpms A and B can only have same name iff they have the same contents. So, the delay in response time in case of hits has reduced.

Availability

  1. RPMs for Fedora/Red Hat
  2. Source RPMs for Fedora/Red Hat
  3. Source Tar balls

Installation and configuration is easy and the INSTALL and README files should serve the purpose.

In case you have any suggestions or problems, leave a comment here or file a ticket on project page.

 

IntelligentMirror: Available for Testing

Note : A newer version of intelligentmirror is available now. Please check this.

Intelligent Mirror is basically a tool or squid plugin (redirector) to cache rpm packages so that the subsequent requests for the same package can be served from the local cache which will eventually save a lot of bandwidth and downloading time.

Who needs Intelligent Mirror?

  1. If you are on a shared network where a lot of people use linux distros with RPM as their package manager, then you need this. Universities should come under this category.
  2. If you have a set of systems having red hat derivatives and almost identical OS versions, you need this. LAN setups at home should come under this category.
  3. If you can’t afford to or don’t want to mirror entire fedora repo for local access due to bandwidth limitations, you need this.

What it does?

As described above, Intelligent Mirror, just caches rpms which are requested by the clients in a shared network. And subsequent requests for those rpms are served from the cache. For a detailed description, check the project page.

Why not use Squid in caching mode?

Squid caching is based on url hashing. Let me explain with an example how Intelligent Mirror is actually intelligent as compared to squid while caching rpms.

Let us say there is an rpm yum-3.2.0-1.fc7.i386.rpm . You executed “yum update yum“. And let us say the newer version of yum is yum-3.2.18-1.fc9.i386.rpm which was fetched from one of the fedora mirrors http://abc.com/ (say). Now someone on the same network launched “yum update yum” and he got the same rpm yum-3.2.18-1.fc9.i386.rpm. But this time rpm was fetched from another mirror http://xyz.com/ (say).

Case I : Squid caching

Squid will cache http://abc.com/linux/fc9/updates/i386/yum-3.2.18-1.fc9.i386.rpm . And when http://xyz.com/linux/fc9/updates/i386/yum-3.2.18-1.fc9.i386.rpm will be requested, it’ll result in a cache miss and squid will again download the same package and will cache this one as well. Now there are two problems

  1. Squid is not able to serve from the cache, though the package was the same.
  2. Additional storage space is being wasted in caching the same package. And this can really harm if unluckily a different mirror is picked in all the subsequent queries.

Case II : IntelligentMirror caching

Intelligent Mirror will cache the package yum-3.2.18-1.fc9.i386.rpm without bothering about its origin. And even if yum picks up a different mirror for the subsequent request, the package will be served from the cache and will not be fetched from upstream. So, the obvious advantage of saving the bandwidth and downloading time.

Download

Intelligent Mirror source tarball, rpm, source rpm are available for download from here.

Installing and Configuring Intelligent Mirror

Install Guide

Configuration Guide

Issues and Suggestions

If you see any issue or you have any suggestions for improving the functionality, either mail me at kulbirsaini25 AT GMAIL DoT COM or file a ticket on the project page.

 

IntelligentMirror: GSOC Project Update

Brief Introduction

IntelligentMirror can be used to create a mirror of static HTTP content on your local network. When you download something (say a software package) from Internet, it is stored/cached on a local machine on your network and subsequent downloads of that particular software package are supplied from the storage/cache of the local machine. This facilitate the efficient usage of bandwidth and also reduces the average download time. IntelligentMirror can also do pre-fetching of RPM packages from fedora repositories spread all over the world and can also pre-populate the local repo with popular packages like mplayer, vlc, gstreamer which are normally accessed immediately after a fresh install.

Definition for a lay man

Think of Internet as a hard disk, your proxy server as a cache and your Intranet as a CPU. Now, whenever your CPU needs to process something, it needs data from cache. If data is not there in cache, it’ll be fetched from RAM and/or hard disk. IntelligentMirror sits on your proxy server and keep caching packages in a browsable manner which can be served via http for subsequent requests.

For further details about IntelligentMirror, go here.

Update

After getting the hosting space on fedorahosted.org, I pushed the code I have written. You can check the source tree here.

We are buidling IntelligentMirror as a plugin to squid which taps requests from clients and checks them against a cache. Checkout how to write a custom redirector or how to tap requests to squid. And acts accordingly. We are working on live streaming the partially downloaded package to the end user while caching it.

If you have any suggestion, feel free to leave them as a comment here or edit the wiki page :)

 

How To: Configure Squid Proxy Server

Mission

To configure squid for simple proxying without caching anything.

Use Cases

  1. When you want to have control on what people browse on your lan.
  2. When number of machine is more than the number of IP addresses you can afford to buy.
  3. When you want to help this holy world in saving some IPV4 addresses 😛

Assumptions

  1. You have a machine connected directly to internet that you are going to use as a proxy server for other machines on your network.
  2. The machines on your network are using 192.168.0.0/16 as private address space. You can use anyone/multiple address spaces of the available but for this howto we assume 192.168.0.0/16 as the local network.
  3. The local IP address of the machine which will run squid proxy server is 192.168.36.204. You can have any IP, but for this howto we assume this.

How to proceed

First of all ensure that you have squid installed. After installing squid, you need to set access control in squid configuration file which resides in /etc/squid by default. Open /etc/squid/squid.conf and add/edit following lines according to your preferences. Few lines already exist in the configuration file, you can add the rest.

# The port on which squid will listen for requests
http_port 8080
# If 'cgi-bin' or '?' is in query, squid should not check with neighbours'/parents' cache
# and should go to target web-server.
hierarchy_stoplist cgi-bin ?
# If url contains 'cgi-bin' or '?', then it must not be cached
acl QUERY urlpath_regex cgi-bin \?
cache deny QUERY
acl apache rep_header Server ^Apache
broken_vary_encoding allow apache
# Absolute path to squid access log.
access_log /var/log/squid/access.log squid
refresh_pattern ^ftp:           1440    20%     10080
refresh_pattern ^gopher:        1440    0%      1440
refresh_pattern .               0       20%     4320
# Access control list to control every IP address
acl all src 0.0.0.0/0.0.0.0
# Access control list for source machine in LAN
acl lan_src src 192.168.0.0/16
# Access control list for destination machine in LAN
acl lan_dst dst 192.168.0.0/16
# Access control list to manage squid cache
acl manager proto cache_object
# Access control list to define IP address allowed for source localhost
acl localhost src 127.0.0.1/255.255.255.255
# Access control list to define IP addresses allowed for localhost as destination
acl to_localhost dst 127.0.0.0/8
# Access control list to define Safe ports that should be allowed by default
acl SSL_ports port 443 563 1863 5190 5222 5050 6667
acl Safe_ports port 80          # http
acl Safe_ports port 21          # ftp
acl Safe_ports port 443         # https
acl Safe_ports port 70          # gopher
acl Safe_ports port 210         # wais
acl Safe_ports port 1025-65535  # unregistered ports
acl Safe_ports port 280         # http-mgmt
acl Safe_ports port 488         # gss-http
acl Safe_ports port 591         # filemaker
acl Safe_ports port 777         # multiling http
acl CONNECT method CONNECT
# Allow cache management only from localhost
http_access allow manager localhost
# Deny cache management from remote hosts
http_access deny manager
# Deny http access via all the ports which are not listed as safe
http_access deny !Safe_ports
# Deny all connections via all ports which are not listed as safe
http_access deny CONNECT !SSL_ports
# Allow http access from localhost
http_access allow localhost
# Allow http access from machines on LAN
http_access allow lan_src
http_access deny all
http_reply_access allow all
icp_access allow all
# Deny caching for everyone so that there is not caching at all
cache deny all
coredump_dir /var/spool/squid
# Never allow direct connection to machines on the internet
prefer_direct off
never_direct allow all
# Allow direct connetion if the destination machine is on LAN
always_direct allow lan_dst
# Delete this line if you don't have /etc/hosts file
hosts_file /etc/hosts
# Allow AIM connections
# Delete the following 9 lines if you don't want people to connect to AIM
acl AIM_ports port 5190 9898 6667
acl AIM_domains dstdomain .oscar.aol.com .blue.aol.com .freenode.net
acl AIM_domains dstdomain .messaging.aol.com .aim.com
acl AIM_hosts dstdomain login.oscar.aol.com login.glogin.messaging.aol.com toc.oscar.aol.com irc.freenode.net
acl AIM_nets dst 64.12.0.0/255.255.0.0
acl AIM_methods method CONNECT
http_access allow AIM_methods AIM_ports AIM_nets
http_access allow AIM_methods AIM_ports AIM_hosts
http_access allow AIM_methods AIM_ports AIM_domains
# Allow connections to Yahoo Messenger
# Delete the following 6 lines if you don't want people to connect to Yahoo Messenger
acl YIM_ports port 5050
acl YIM_domains dstdomain .yahoo.com .yahoo.co.jp
acl YIM_hosts dstdomain scs.msg.yahoo.com cs.yahoo.co.jp
acl YIM_methods method CONNECT
http_access allow YIM_methods YIM_ports YIM_hosts
http_access allow YIM_methods YIM_ports YIM_domains
# Allow connections to Google Talk
# Delete the following 6 lines if you don't want people to connect to Google Talk
acl GTALK_ports port 5222 5050
acl GTALK_domains dstdomain .google.com
acl GTALK_hosts dstdomain talk.google.com
acl GTALK_methods method CONNECT
http_access allow GTALK_methods GTALK_ports GTALK_hosts
http_access allow GTALK_methods GTALK_ports GTALK_domains
# Allow connections to MSN
# Delete the following 6 lines if you don't want people to connect to Google Talk
acl MSN_ports port 1863 443 1503
acl MSN_domains dstdomain .microsoft.com .hotmail.com .live.com .msft.net .msn.com .passport.com
acl MSN_hosts dstdomain messenger.hotmail.com
acl MSN_nets dst 207.46.111.0/255.255.255.0
acl MSN_methods method CONNECT
http_access allow MSN_methods MSN_ports MSN_hosts

Now, start the squid proxy server as

service squid start

Also, if you want squid to be started every time you boot the machine, execute the following command

chkconfig --level 345 squid on

You have a squid proxy server running now. You can ask clients to configure there browsers to use 192.168.36.204 as a proxy server with 8080 as proxy port. Command line utilities like elinks, lynx, yum, wget etc. can be asked to use proxy by exporting http_proxy variable as below. Users can also add these lines to ~/.bashrc file to avoid exporting every-time.

export http_proxy='http://192.168.36.204:8080'
export ftp_proxy='http://192.168.36.204:8080'

I highly recommend the book “Squid Proxy Server 3.1: Beginner’s Guide (Paperback)” for further reading.

 

How To: Write Custom Redirector or Rewritor Plugin For Squid in Python

Mission

To write a custom Python program which can act as a plugin for Squid to redirect a given URL to another URL. This is useful when already existing redirector plugins for Squid doesn’t suit your needs or you want everything of your own.

Use Cases

  1. When you want to redirect URLs using a database like mysql or postgresql.
  2. When you want to redirect based on mappings stored in simple text files.
  3. When you want to build a redirector which can learn by itself using AI techniques 😛

How to proceed

From Squid FAQ,

The redirector program must read URLs (one per line) on standard input, and write rewritten URLs or blank lines on standard output. Note that the redirector program can not use buffered I/O. Squid writes additional information after the URL which a redirector can use to make a decision.

The format of the line read from the standard input by the program is as follows.

1
2
3
URL ip-address/fqdn ident method
# for example
http://saini.co.in 172.17.8.175/saini.co.in - GET -

The implementation sounds very simple and it is indeed very simple to implement. The only thing that should be taken care of is the unbuffered I/O. You should immediately flush the output to standard output once decision is taken.

For this howto, we assume we have a method called ‘modify_url()‘ which returns either a blank line or a modified URL to which the client should be redirected.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#!/usr/bin/env python
 
import sys
def modify_url(line):
    list = line.split(' ')
    # first element of the list is the URL
    old_url = list[0]
    new_url = '\n'
    # take the decision and modify the url if needed
    # do remember that the new_url should contain a '\n' at the end.
    if old_url.endswith('.avi'):
        new_url = 'http://fedora.co.in/errors/accessDenied.html' + new_url
    return new_url
 
while True:
    # the format of the line read from stdin is
    # URL ip-address/fqdn ident method
    # for example
    # http://saini.co.in 172.17.8.175/saini.co.in - GET -
    line = sys.stdin.readline().strip()
    # new_url is a simple URL only
    # for example
    # http://fedora.co.in
    new_url = modify_url(line)
    sys.stdout.write(new_url)
    sys.stdout.flush()

Save the above file somewhere. We save this example file in /etc/squid/custom_redirect.py. Now, we have the function for redirecting clients. We need to configure squid to use custom_redirect.py . Below is the squid configuration for telling squid to use the above program as redirector.

1
2
3
4
5
6
# Add these lines to /etc/squid/squid.conf file.
# /usr/bin/python should be replaced by the path to python executable if you installed it somewhere else.
redirect_program /usr/bin/python /etc/squid/custom_redirect.py
# Number of instances of the above program that should run concurrently.
# 5 is good enough but you should go for 10 at least. Anything below 5 would not work properly.
redirect_children 5

Now, start/reload/restart squid. That’s all we need to write and use a custom redirector plugin for squid.