IntelligentMirror: RPM and DEB Caching Improved (0.5)

After spending a lot of time with youtube cache, now I am trying to devote some time to update intelligentmirror with required features and enhancements that youtube cache already enjoys. In the same direction here is version 0.5 of intelligentmirror.

Improvements

  • Added max_parallel_downloads options to controll the maximum threading fetching from upstream to cache the packages.
  • Fine grained control on logging via max_logfile_size and max_logfile_backups option.
  • Added setup script to help you install intelligentmirror. No need to execute commands one by one for installation. Just run
 [root@localhost]# python setup.py install [ENTER]
  • Added update script (update-im). So in case you decide to change the locations for caching rpm/deb packages, just run
 [root@localhost]# update-im [ENTER]

OR

 [root@localhost]# /usr/sbin/update-im [ENTER]
  • Download scheduler similar to youtube cache is added to facilitate the download queing in case of large number of requests.
  • More informative logging.
  • cache.log is not flooding anymore with XMLRPC logs and python tracebacks.
  • Added extensive exception handling thoughout the program.

Availability

  1. RPMs for Fedora/Red Hat/Cent OS
  2. Source RPMs for Fedora/Red Hat/Cent OS
  3. Source Tar balls

Installation and Configuration

INSTALL and README files should help you throughout the installation and configuration process.

In case you have questions, ask them here in comments. Suggestions for improvement are welcome 🙂

 

How To: Configure Caching Nameserver (named)

Mission

To configure a caching nameserver on a local machine which will cascade to another previously configured and functional nameserver (may or may not be caching. It’ll generally be your ISP nameserver or the one provided by your organization).

Advantage

  • Reduces the delay in domain name resolution drastically as the requests for frequently accessed websites are served from cache.

Working

  • named gets a request for domain resolution.
  • It checks whether the request can be satisfied from cache. If the answer is in cache and not stale, the request is satisfied from cache itself saving a lot of time 🙂
  • If request can’t be satisfied from cache, named queries the first parent. If it replies with the answer, then named will cache the response and subsequent requests for the same domain name will be satisfied from the cache.
  • In case first parent fails to reply, named will query the second parent and so on.

(The working is my understanding of caching-nameserver using wireshark as traffic analysis tool and caching-nameserver may not behave exactly as explained above.)

How to install

named is by default on most of the systems by the package name ‘caching-nameserver‘. If its not present on your system, install using

[root@localhost ~]# yum install caching-nameserver [ENTER]
# If that doesn't work try this
[root@localhost ~]# yum install bind [ENTER]

How to configure

The main configuration file for named resides in /var/named/chroot/etc/named.caching-nameserver.conf which is also soft linked from /etc/named.caching-nameserver.conf . named configuration file supports C/C++ style comments.

For a caching nameserver which will cascade to another nameserver, there is nothing much to be configured. You need to configure “options” block. Below is a configuration file for a machine with IP address 172.17.8.64 cascading to two nameserver 192.168.36.204 and 192.168.36.210. The comments inline explain what each option does.

options {
  // Set the port to 53 which is standard port for DNS.
  // Add the IP address on which named will listen separated by semi-colons.
  // It'll be your own IP address.
  listen-on port 53 {127.0.0.1; 172.17.8.64;};
  // These are default. Leave them as it is.
  directory   "/var/named";
  dump-file   "/var/named/data/cache_dump.db";
  statistics-file "/var/named/data/named_stats.txt";
  memstatistics-file "/var/named/data/named_mem_stats.txt";
  // The machines which are allowed to query this nameserver.
  // Normally you'll allow only your machine. But you can allow other machines also.
  // The address should be separated by semi-colons. To allow a network 172.16.31.0/24,
  // the line would be
  // allow-query {localhost; 172.16.31.0/24; };
  // Don't forget the semi-colons.
  allow-query     { localhost; 172.17.8.64; };
  recursion yes;
  // The parent nameservers. List all the nameserver which you can query.
  forwarders { 192.168.36.204; 192.168.36.210; };
  forward first;
};
logging {
        channel default_debug {
                file "data/named.run";
                severity dynamic;
        };
};
zone "." IN {
  type hint;
  file "named.ca";
};
include "/etc/named.rfc1912.zones";

Start caching-nameserver

Now start the caching-nameserver using the following command

[root@localhost ~]# server named start [ENTER]

OR

[root@localhost ~]# /etc/init.d/named start [ENTER]

To make named start every time your reboot your machine use following command

[root@localhost ~]# chkconfig named on [ENTER]

Using caching-nameserver

To use your caching-nameserver, open /etc/resolv.conf file and add the following line

nameserver 127.0.0.1

Comment all other lines in the file, so that finally the file looks like

; generated by /sbin/dhclient-script
#search wlan.iiit.ac.in
#nameserver 192.168.36.204
#nameserver 192.168.36.210
nameserver 127.0.0.1

Now your system will use your own nameserver (in caching mode) for resolving all domain names. To test if your nameserver use the following command

[root@localhost ~]# dig fedora.co.in [ENTER]

Now if you use that command for the second time, the resolution time will be around 2-3 milli seconds while first time it would be around 400-700 milli seconds.

Example

Below is two subsequent runs of dig for fedora.co.in . Notice the Query time.

[root@bordeaux SPECS]# dig fedora.co.in
; <<>> DiG 9.4.2rc1 <<>> fedora.co.in
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7839
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1
;; QUESTION SECTION:
;fedora.co.in.                  IN      A
;; ANSWER SECTION:
fedora.co.in.           83629   IN      A       72.249.126.241
;; AUTHORITY SECTION:
fedora.co.in.           79709   IN      NS      ns.fedora.co.in.
;; ADDITIONAL SECTION:
ns.fedora.co.in.        79709   IN      A       72.249.126.241
;; Query time: 531 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Wed Nov 19 18:04:47 2008
;; MSG SIZE  rcvd: 79
[root@bordeaux SPECS]# dig fedora.co.in
; <<>> DiG 9.4.2rc1 <<>> fedora.co.in
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64233
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1
;; QUESTION SECTION:
;fedora.co.in.                  IN      A
;; ANSWER SECTION:
fedora.co.in.           83625   IN      A       72.249.126.241
;; AUTHORITY SECTION:
fedora.co.in.           79705   IN      NS      ns.fedora.co.in.
;; ADDITIONAL SECTION:
ns.fedora.co.in.        79705   IN      A       72.249.126.241
;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Wed Nov 19 18:04:51 2008
;; MSG SIZE  rcvd: 79
[root@bordeaux SPECS]#
 

IntelligentMirror Gets Even More Intelligent (1.0.1)

Warning : This version of IntelligentMirror is compatible with only squid-2.7 as of now. It is NOT compatible even with squid-3.0.

IntelligentMirror Version 1.0.1

I have been following squid development regularly (at least the part in which I am interested) and they have introduced a new directive in squid-2.7 known as StoreUrlRewrite (storeurl_rewrite_program). Using this directive you can instruct squid to cache url A (http://abc.com/foo/bar/version/crap.rpm) as url B (http://proxy.fedora.co.in/intelligentmirror/crap.rpm). In simple words you can direct squid to cache any url as any other url without any extra efforts.

So keeping the above directive in mind, I have worked out a different version of intelligentmirror especially for squid-2.7.

IntelligentMirror : Old method of operation

  1. IntelligentMirror gets a client request for a URL.
  2. Check: if URL is not in (RPM, metadata file)
    • Then its none of our business.
    • Let proxy handle it the normal way.
    • Done and exit.
  3. Check: if RPM/metadata is available in cache
    • Stream the RPM/metadata from cache.
    • Done and exit.
  4. Check: if RPM/metadata is not available in cache
    • Download in parallel for caching in some dir and stream.
    • Done and exit.

IntelligentMirror : New method of operation

  1. IntelligentMirror gets a client request for a URL.
  2. Check: if request for rpm
    1. Direct squid to cache the request as http://<same_host_all_the_time>/intelligentmirror/<rpmname>.rpm
  3. Check: if request for deb
    1. Direct squid to cache the request as http://<same_host_all_the_time>/intelligentmirror/<debname>.deb
  4. Done and exit.

So your squid will see every request for an rpm package as a request http://<same_host_all_the_time>/intelligentmirror/<rpmname>.rpm. So, if you happen to request the same rpm from a different mirror, it’ll still be served from cache 🙂

Improvements

  1. No need to check if the url supplied by squid is for rpm or not because storeurl_rewrite_program has an acl controller attached which will invoke intelligentmirror for urls ending in .rpm .
  2. No need to check if the url is already cached or not. No need to worry about the directory where you are going to store the packages. No human intervention is needed in maintaining the cache. Almighty squid is doing everything for us.
  3. No need to worry if the target package has changed because of the resigning or whatever because squid will do that for you.
  4. No need to actually download the package in parallel for caching because squid is already doing that.
  5. No need to worry about the hashing algorithms and storage optimizations for the cached content.

Availability

  1. RPM for Fedora/Red Hat
  2. Source RPM for Fedora/Red Hat
  3. Source Tarball

Install and Configure

The install and configure files should be enough to guide you through the installation if you choose the tar ball way. Otherwise you can always install from rpm from the above link.

Note1: You have to configure your squid to use intelligentmirror as a plugin even if you install via rpm. Check the configure file at the above link.

Note2: StoreUrlRewrite will probably be available in squid-3.1.

 

IntelligentMirror: RPM and DEB Caching Improved (0.4)

IntelligentMirror version 0.4 is available now. There have been significant improvements in intelligent mirror since last release.

Improvements

  1. Fixed defunct process problem. You will not see defunct python processes hanging around anymore. Previously every forked daemon used to got defucnt because parent never waited for the forked child to finish.
  2. IntelligentMirror now supports caching of Debian packages just like rpms. So now IntelligentMirror is best suited shared environments where people have different tastes.
  3. Intelligent Mirror now uses url_rewrite_program instead of redirect_program. This boosts the efficiency of IntelligentMirror by a significant factor as url_rewrite_program has an acl controller url_rewrite_access. And using url_rewrite_access only requests for rpm/deb packages will be passed to Intelligent Mirror. So, IM now need not process each and every incoming request. Also, it has redirector_bypass directive which will bypass IM in case all the instances of IM are busy serving requests. So, squid will not die with a fatal error in case of huge requests.
  4. Options to enable/disable caching for rpm and Debian packages have been added.
  5. Options to control the total size of caching directories and the size of individual package to be cached have also been introduced.
  6. Proxy authentication is also supported now just the way it is supported in yum.
  7. Packages are not checked for last-modified time anymore. Because in principle two rpms A and B can only have same name iff they have the same contents. So, the delay in response time in case of hits has reduced.

Availability

  1. RPMs for Fedora/Red Hat
  2. Source RPMs for Fedora/Red Hat
  3. Source Tar balls

Installation and configuration is easy and the INSTALL and README files should serve the purpose.

In case you have any suggestions or problems, leave a comment here or file a ticket on project page.

 

IntelligentMirror: Available for Testing

Note : A newer version of intelligentmirror is available now. Please check this.

Intelligent Mirror is basically a tool or squid plugin (redirector) to cache rpm packages so that the subsequent requests for the same package can be served from the local cache which will eventually save a lot of bandwidth and downloading time.

Who needs Intelligent Mirror?

  1. If you are on a shared network where a lot of people use linux distros with RPM as their package manager, then you need this. Universities should come under this category.
  2. If you have a set of systems having red hat derivatives and almost identical OS versions, you need this. LAN setups at home should come under this category.
  3. If you can’t afford to or don’t want to mirror entire fedora repo for local access due to bandwidth limitations, you need this.

What it does?

As described above, Intelligent Mirror, just caches rpms which are requested by the clients in a shared network. And subsequent requests for those rpms are served from the cache. For a detailed description, check the project page.

Why not use Squid in caching mode?

Squid caching is based on url hashing. Let me explain with an example how Intelligent Mirror is actually intelligent as compared to squid while caching rpms.

Let us say there is an rpm yum-3.2.0-1.fc7.i386.rpm . You executed “yum update yum“. And let us say the newer version of yum is yum-3.2.18-1.fc9.i386.rpm which was fetched from one of the fedora mirrors http://abc.com/ (say). Now someone on the same network launched “yum update yum” and he got the same rpm yum-3.2.18-1.fc9.i386.rpm. But this time rpm was fetched from another mirror http://xyz.com/ (say).

Case I : Squid caching

Squid will cache http://abc.com/linux/fc9/updates/i386/yum-3.2.18-1.fc9.i386.rpm . And when http://xyz.com/linux/fc9/updates/i386/yum-3.2.18-1.fc9.i386.rpm will be requested, it’ll result in a cache miss and squid will again download the same package and will cache this one as well. Now there are two problems

  1. Squid is not able to serve from the cache, though the package was the same.
  2. Additional storage space is being wasted in caching the same package. And this can really harm if unluckily a different mirror is picked in all the subsequent queries.

Case II : IntelligentMirror caching

Intelligent Mirror will cache the package yum-3.2.18-1.fc9.i386.rpm without bothering about its origin. And even if yum picks up a different mirror for the subsequent request, the package will be served from the cache and will not be fetched from upstream. So, the obvious advantage of saving the bandwidth and downloading time.

Download

Intelligent Mirror source tarball, rpm, source rpm are available for download from here.

Installing and Configuring Intelligent Mirror

Install Guide

Configuration Guide

Issues and Suggestions

If you see any issue or you have any suggestions for improving the functionality, either mail me at kulbirsaini25 AT GMAIL DoT COM or file a ticket on the project page.