IntelligentMirror: RPM and DEB Caching Improved (0.5)

After spending a lot of time with youtube cache, now I am trying to devote some time to update intelligentmirror with required features and enhancements that youtube cache already enjoys. In the same direction here is version 0.5 of intelligentmirror.

Improvements

  • Added max_parallel_downloads options to controll the maximum threading fetching from upstream to cache the packages.
  • Fine grained control on logging via max_logfile_size and max_logfile_backups option.
  • Added setup script to help you install intelligentmirror. No need to execute commands one by one for installation. Just run
 [root@localhost]# python setup.py install [ENTER]
  • Added update script (update-im). So in case you decide to change the locations for caching rpm/deb packages, just run
 [root@localhost]# update-im [ENTER]

OR

 [root@localhost]# /usr/sbin/update-im [ENTER]
  • Download scheduler similar to youtube cache is added to facilitate the download queing in case of large number of requests.
  • More informative logging.
  • cache.log is not flooding anymore with XMLRPC logs and python tracebacks.
  • Added extensive exception handling thoughout the program.

Availability

  1. RPMs for Fedora/Red Hat/Cent OS
  2. Source RPMs for Fedora/Red Hat/Cent OS
  3. Source Tar balls

Installation and Configuration

INSTALL and README files should help you throughout the installation and configuration process.

In case you have questions, ask them here in comments. Suggestions for improvement are welcome 🙂

 

IntelligentMirror Gets Even More Intelligent (1.0.1)

Warning : This version of IntelligentMirror is compatible with only squid-2.7 as of now. It is NOT compatible even with squid-3.0.

IntelligentMirror Version 1.0.1

I have been following squid development regularly (at least the part in which I am interested) and they have introduced a new directive in squid-2.7 known as StoreUrlRewrite (storeurl_rewrite_program). Using this directive you can instruct squid to cache url A (http://abc.com/foo/bar/version/crap.rpm) as url B (http://proxy.fedora.co.in/intelligentmirror/crap.rpm). In simple words you can direct squid to cache any url as any other url without any extra efforts.

So keeping the above directive in mind, I have worked out a different version of intelligentmirror especially for squid-2.7.

IntelligentMirror : Old method of operation

  1. IntelligentMirror gets a client request for a URL.
  2. Check: if URL is not in (RPM, metadata file)
    • Then its none of our business.
    • Let proxy handle it the normal way.
    • Done and exit.
  3. Check: if RPM/metadata is available in cache
    • Stream the RPM/metadata from cache.
    • Done and exit.
  4. Check: if RPM/metadata is not available in cache
    • Download in parallel for caching in some dir and stream.
    • Done and exit.

IntelligentMirror : New method of operation

  1. IntelligentMirror gets a client request for a URL.
  2. Check: if request for rpm
    1. Direct squid to cache the request as http://<same_host_all_the_time>/intelligentmirror/<rpmname>.rpm
  3. Check: if request for deb
    1. Direct squid to cache the request as http://<same_host_all_the_time>/intelligentmirror/<debname>.deb
  4. Done and exit.

So your squid will see every request for an rpm package as a request http://<same_host_all_the_time>/intelligentmirror/<rpmname>.rpm. So, if you happen to request the same rpm from a different mirror, it’ll still be served from cache 🙂

Improvements

  1. No need to check if the url supplied by squid is for rpm or not because storeurl_rewrite_program has an acl controller attached which will invoke intelligentmirror for urls ending in .rpm .
  2. No need to check if the url is already cached or not. No need to worry about the directory where you are going to store the packages. No human intervention is needed in maintaining the cache. Almighty squid is doing everything for us.
  3. No need to worry if the target package has changed because of the resigning or whatever because squid will do that for you.
  4. No need to actually download the package in parallel for caching because squid is already doing that.
  5. No need to worry about the hashing algorithms and storage optimizations for the cached content.

Availability

  1. RPM for Fedora/Red Hat
  2. Source RPM for Fedora/Red Hat
  3. Source Tarball

Install and Configure

The install and configure files should be enough to guide you through the installation if you choose the tar ball way. Otherwise you can always install from rpm from the above link.

Note1: You have to configure your squid to use intelligentmirror as a plugin even if you install via rpm. Check the configure file at the above link.

Note2: StoreUrlRewrite will probably be available in squid-3.1.

 

IntelligentMirror: RPM and DEB Caching Improved (0.4)

IntelligentMirror version 0.4 is available now. There have been significant improvements in intelligent mirror since last release.

Improvements

  1. Fixed defunct process problem. You will not see defunct python processes hanging around anymore. Previously every forked daemon used to got defucnt because parent never waited for the forked child to finish.
  2. IntelligentMirror now supports caching of Debian packages just like rpms. So now IntelligentMirror is best suited shared environments where people have different tastes.
  3. Intelligent Mirror now uses url_rewrite_program instead of redirect_program. This boosts the efficiency of IntelligentMirror by a significant factor as url_rewrite_program has an acl controller url_rewrite_access. And using url_rewrite_access only requests for rpm/deb packages will be passed to Intelligent Mirror. So, IM now need not process each and every incoming request. Also, it has redirector_bypass directive which will bypass IM in case all the instances of IM are busy serving requests. So, squid will not die with a fatal error in case of huge requests.
  4. Options to enable/disable caching for rpm and Debian packages have been added.
  5. Options to control the total size of caching directories and the size of individual package to be cached have also been introduced.
  6. Proxy authentication is also supported now just the way it is supported in yum.
  7. Packages are not checked for last-modified time anymore. Because in principle two rpms A and B can only have same name iff they have the same contents. So, the delay in response time in case of hits has reduced.

Availability

  1. RPMs for Fedora/Red Hat
  2. Source RPMs for Fedora/Red Hat
  3. Source Tar balls

Installation and configuration is easy and the INSTALL and README files should serve the purpose.

In case you have any suggestions or problems, leave a comment here or file a ticket on project page.

 

IntelligentMirror: Available for Testing

Note : A newer version of intelligentmirror is available now. Please check this.

Intelligent Mirror is basically a tool or squid plugin (redirector) to cache rpm packages so that the subsequent requests for the same package can be served from the local cache which will eventually save a lot of bandwidth and downloading time.

Who needs Intelligent Mirror?

  1. If you are on a shared network where a lot of people use linux distros with RPM as their package manager, then you need this. Universities should come under this category.
  2. If you have a set of systems having red hat derivatives and almost identical OS versions, you need this. LAN setups at home should come under this category.
  3. If you can’t afford to or don’t want to mirror entire fedora repo for local access due to bandwidth limitations, you need this.

What it does?

As described above, Intelligent Mirror, just caches rpms which are requested by the clients in a shared network. And subsequent requests for those rpms are served from the cache. For a detailed description, check the project page.

Why not use Squid in caching mode?

Squid caching is based on url hashing. Let me explain with an example how Intelligent Mirror is actually intelligent as compared to squid while caching rpms.

Let us say there is an rpm yum-3.2.0-1.fc7.i386.rpm . You executed “yum update yum“. And let us say the newer version of yum is yum-3.2.18-1.fc9.i386.rpm which was fetched from one of the fedora mirrors http://abc.com/ (say). Now someone on the same network launched “yum update yum” and he got the same rpm yum-3.2.18-1.fc9.i386.rpm. But this time rpm was fetched from another mirror http://xyz.com/ (say).

Case I : Squid caching

Squid will cache http://abc.com/linux/fc9/updates/i386/yum-3.2.18-1.fc9.i386.rpm . And when http://xyz.com/linux/fc9/updates/i386/yum-3.2.18-1.fc9.i386.rpm will be requested, it’ll result in a cache miss and squid will again download the same package and will cache this one as well. Now there are two problems

  1. Squid is not able to serve from the cache, though the package was the same.
  2. Additional storage space is being wasted in caching the same package. And this can really harm if unluckily a different mirror is picked in all the subsequent queries.

Case II : IntelligentMirror caching

Intelligent Mirror will cache the package yum-3.2.18-1.fc9.i386.rpm without bothering about its origin. And even if yum picks up a different mirror for the subsequent request, the package will be served from the cache and will not be fetched from upstream. So, the obvious advantage of saving the bandwidth and downloading time.

Download

Intelligent Mirror source tarball, rpm, source rpm are available for download from here.

Installing and Configuring Intelligent Mirror

Install Guide

Configuration Guide

Issues and Suggestions

If you see any issue or you have any suggestions for improving the functionality, either mail me at kulbirsaini25 AT GMAIL DoT COM or file a ticket on the project page.

 

IntelligentMirror: GSOC Project Update

Brief Introduction

IntelligentMirror can be used to create a mirror of static HTTP content on your local network. When you download something (say a software package) from Internet, it is stored/cached on a local machine on your network and subsequent downloads of that particular software package are supplied from the storage/cache of the local machine. This facilitate the efficient usage of bandwidth and also reduces the average download time. IntelligentMirror can also do pre-fetching of RPM packages from fedora repositories spread all over the world and can also pre-populate the local repo with popular packages like mplayer, vlc, gstreamer which are normally accessed immediately after a fresh install.

Definition for a lay man

Think of Internet as a hard disk, your proxy server as a cache and your Intranet as a CPU. Now, whenever your CPU needs to process something, it needs data from cache. If data is not there in cache, it’ll be fetched from RAM and/or hard disk. IntelligentMirror sits on your proxy server and keep caching packages in a browsable manner which can be served via http for subsequent requests.

For further details about IntelligentMirror, go here.

Update

After getting the hosting space on fedorahosted.org, I pushed the code I have written. You can check the source tree here.

We are buidling IntelligentMirror as a plugin to squid which taps requests from clients and checks them against a cache. Checkout how to write a custom redirector or how to tap requests to squid. And acts accordingly. We are working on live streaming the partially downloaded package to the end user while caching it.

If you have any suggestion, feel free to leave them as a comment here or edit the wiki page 🙂