After spending a lot of time with youtube cache, now I am trying to devote some time to update intelligentmirror with required features and enhancements that youtube cache already enjoys. In the same direction here is version 0.5 of intelligentmirror.
- Added max_parallel_downloads options to controll the maximum threading fetching from upstream to cache the packages.
- Fine grained control on logging via max_logfile_size and max_logfile_backups option.
- Added setup script to help you install intelligentmirror. No need to execute commands one by one for installation. Just run
[root@localhost]# python setup.py install [ENTER]
- Added update script (update-im). So in case you decide to change the locations for caching rpm/deb packages, just run
[root@localhost]# update-im [ENTER]
[root@localhost]# /usr/sbin/update-im [ENTER]
- Download scheduler similar to youtube cache is added to facilitate the download queing in case of large number of requests.
- More informative logging.
- cache.log is not flooding anymore with XMLRPC logs and python tracebacks.
- Added extensive exception handling thoughout the program.
- RPMs for Fedora/Red Hat/Cent OS
- Source RPMs for Fedora/Red Hat/Cent OS
- Source Tar balls
Installation and Configuration
INSTALL and README files should help you throughout the installation and configuration process.
In case you have questions, ask them here in comments. Suggestions for improvement are welcome
Warning : This version of IntelligentMirror is compatible with only squid-2.7 as of now. It is NOT compatible even with squid-3.0.
IntelligentMirror Version 1.0.1
I have been following squid development regularly (at least the part in which I am interested) and they have introduced a new directive in squid-2.7 known as StoreUrlRewrite (storeurl_rewrite_program). Using this directive you can instruct squid to cache url A (http://abc.com/foo/bar/version/crap.rpm) as url B (http://proxy.fedora.co.in/intelligentmirror/crap.rpm). In simple words you can direct squid to cache any url as any other url without any extra efforts.
So keeping the above directive in mind, I have worked out a different version of intelligentmirror especially for squid-2.7.
IntelligentMirror : Old method of operation
- IntelligentMirror gets a client request for a URL.
- Check: if URL is not in (RPM, metadata file)
- Then its none of our business.
- Let proxy handle it the normal way.
- Done and exit.
- Check: if RPM/metadata is available in cache
- Stream the RPM/metadata from cache.
- Done and exit.
- Check: if RPM/metadata is not available in cache
- Download in parallel for caching in some dir and stream.
- Done and exit.
IntelligentMirror : New method of operation
- IntelligentMirror gets a client request for a URL.
- Check: if request for rpm
- Direct squid to cache the request as http://<same_host_all_the_time>/intelligentmirror/<rpmname>.rpm
- Check: if request for deb
- Direct squid to cache the request as http://<same_host_all_the_time>/intelligentmirror/<debname>.deb
- Done and exit.
So your squid will see every request for an rpm package as a request http://<same_host_all_the_time>/intelligentmirror/<rpmname>.rpm. So, if you happen to request the same rpm from a different mirror, it’ll still be served from cache
- No need to check if the url supplied by squid is for rpm or not because storeurl_rewrite_program has an acl controller attached which will invoke intelligentmirror for urls ending in .rpm .
- No need to check if the url is already cached or not. No need to worry about the directory where you are going to store the packages. No human intervention is needed in maintaining the cache. Almighty squid is doing everything for us.
- No need to worry if the target package has changed because of the resigning or whatever because squid will do that for you.
- No need to actually download the package in parallel for caching because squid is already doing that.
- No need to worry about the hashing algorithms and storage optimizations for the cached content.
- RPM for Fedora/Red Hat
- Source RPM for Fedora/Red Hat
- Source Tarball
Install and Configure
The install and configure files should be enough to guide you through the installation if you choose the tar ball way. Otherwise you can always install from rpm from the above link.
Note1: You have to configure your squid to use intelligentmirror as a plugin even if you install via rpm. Check the configure file at the above link.
Note2: StoreUrlRewrite will probably be available in squid-3.1.
IntelligentMirror version 0.4 is available now. There have been significant improvements in intelligent mirror since last release.
- Fixed defunct process problem. You will not see defunct python processes hanging around anymore. Previously every forked daemon used to got defucnt because parent never waited for the forked child to finish.
- IntelligentMirror now supports caching of Debian packages just like rpms. So now IntelligentMirror is best suited shared environments where people have different tastes.
- Intelligent Mirror now uses url_rewrite_program instead of redirect_program. This boosts the efficiency of IntelligentMirror by a significant factor as url_rewrite_program has an acl controller url_rewrite_access. And using url_rewrite_access only requests for rpm/deb packages will be passed to Intelligent Mirror. So, IM now need not process each and every incoming request. Also, it has redirector_bypass directive which will bypass IM in case all the instances of IM are busy serving requests. So, squid will not die with a fatal error in case of huge requests.
- Options to enable/disable caching for rpm and Debian packages have been added.
- Options to control the total size of caching directories and the size of individual package to be cached have also been introduced.
- Proxy authentication is also supported now just the way it is supported in yum.
- Packages are not checked for last-modified time anymore. Because in principle two rpms A and B can only have same name iff they have the same contents. So, the delay in response time in case of hits has reduced.
- RPMs for Fedora/Red Hat
- Source RPMs for Fedora/Red Hat
- Source Tar balls
Installation and configuration is easy and the INSTALL and README files should serve the purpose.
In case you have any suggestions or problems, leave a comment here or file a ticket on project page.
Note : A newer version of intelligentmirror is available now. Please check this.
Intelligent Mirror is basically a tool or squid plugin (redirector) to cache rpm packages so that the subsequent requests for the same package can be served from the local cache which will eventually save a lot of bandwidth and downloading time.
Who needs Intelligent Mirror?
- If you are on a shared network where a lot of people use linux distros with RPM as their package manager, then you need this. Universities should come under this category.
- If you have a set of systems having red hat derivatives and almost identical OS versions, you need this. LAN setups at home should come under this category.
- If you can’t afford to or don’t want to mirror entire fedora repo for local access due to bandwidth limitations, you need this.
What it does?
As described above, Intelligent Mirror, just caches rpms which are requested by the clients in a shared network. And subsequent requests for those rpms are served from the cache. For a detailed description, check the project page.
Why not use Squid in caching mode?
Squid caching is based on url hashing. Let me explain with an example how Intelligent Mirror is actually intelligent as compared to squid while caching rpms.
Let us say there is an rpm yum-3.2.0-1.fc7.i386.rpm . You executed “yum update yum“. And let us say the newer version of yum is yum-3.2.18-1.fc9.i386.rpm which was fetched from one of the fedora mirrors http://abc.com/ (say). Now someone on the same network launched “yum update yum” and he got the same rpm yum-3.2.18-1.fc9.i386.rpm. But this time rpm was fetched from another mirror http://xyz.com/ (say).
Case I : Squid caching
Squid will cache http://abc.com/linux/fc9/updates/i386/yum-3.2.18-1.fc9.i386.rpm . And when http://xyz.com/linux/fc9/updates/i386/yum-3.2.18-1.fc9.i386.rpm will be requested, it’ll result in a cache miss and squid will again download the same package and will cache this one as well. Now there are two problems
- Squid is not able to serve from the cache, though the package was the same.
- Additional storage space is being wasted in caching the same package. And this can really harm if unluckily a different mirror is picked in all the subsequent queries.
Case II : IntelligentMirror caching
Intelligent Mirror will cache the package yum-3.2.18-1.fc9.i386.rpm without bothering about its origin. And even if yum picks up a different mirror for the subsequent request, the package will be served from the cache and will not be fetched from upstream. So, the obvious advantage of saving the bandwidth and downloading time.
Intelligent Mirror source tarball, rpm, source rpm are available for download from here.
Installing and Configuring Intelligent Mirror
Issues and Suggestions
If you see any issue or you have any suggestions for improving the functionality, either mail me at kulbirsaini25 AT GMAIL DoT COM or file a ticket on the project page.
IntelligentMirror can be used to create a mirror of static HTTP content on your local network. When you download something (say a software package) from Internet, it is stored/cached on a local machine on your network and subsequent downloads of that particular software package are supplied from the storage/cache of the local machine. This facilitate the efficient usage of bandwidth and also reduces the average download time. IntelligentMirror can also do pre-fetching of RPM packages from fedora repositories spread all over the world and can also pre-populate the local repo with popular packages like mplayer, vlc, gstreamer which are normally accessed immediately after a fresh install.
Definition for a lay man
Think of Internet as a hard disk, your proxy server as a cache and your Intranet as a CPU. Now, whenever your CPU needs to process something, it needs data from cache. If data is not there in cache, it’ll be fetched from RAM and/or hard disk. IntelligentMirror sits on your proxy server and keep caching packages in a browsable manner which can be served via http for subsequent requests.
For further details about IntelligentMirror, go here.
After getting the hosting space on fedorahosted.org, I pushed the code I have written. You can check the source tree here.
We are buidling IntelligentMirror as a plugin to squid which taps requests from clients and checks them against a cache. Checkout how to write a custom redirector or how to tap requests to squid. And acts accordingly. We are working on live streaming the partially downloaded package to the end user while caching it.
If you have any suggestion, feel free to leave them as a comment here or edit the wiki page
I have decided to stick with Fedora 7 due to the bad experience with Fedora 8 last night and also the difficulties in moving the servers to new os that I am running on Fedora 7. As I wanted to work with Padma in my spare time, so need a good IDE that can handle project in a nice way and can help me to import the cvs from repos online. So, is there any choice I have. There is one and only one – The Eclipse. Some people call it – programming paradise. Some may disagree to that and some other may say that Vim is best to program. I also use Vim quite often and in fact 90% of the time I do so. But using Vim looks confusing when the size of the project is beyond certain extent.
Anyway, here I am going to discuss how to install Eclipse in Fedora 7 because its not there by default. There are two approaches. One is extremely simple and other is extremely difficult.
Use yum do install eclipse. Just issue ‘yum install eclipse-*’ and it’ll be done automatically. But this method takes a very long time as yum will sequentially download the packages and dependencies and its very slow.
If I have a good bandwidth, then I’ll download all the packages and resolve the dependencies myself. But resolving dependencies will be frustrating enough that anyone will switch back to slow yum. But due to certain reasons which I suspect to be memory leaks by Firefox and other apps, my system was damn slow and yum could not do anything even after 10 minutes. It was not even able to download the package list.
So, I decided to download all the packages and install them. I download all the eclipse packages and their dependencies manually and installed it successfully. Here is the list of packages and dependencies so that you need not do rpm -ivh a 100 times All these dependencies are available on rpmfind.net and packages can be fetched from any fedora mirror. These are tested on Fedora 7.
So, be sure to fetch the dependencies first. Hope that helps.
After installing jdk-6u2 for Linux from Sun Microsystems’ site, when I ran javaws it gave strange error that libstdc++.so.5 not found and it aborted. I searched for libstdc++.so.5 and it was not there in /usr/lib/ as expected. That implied something is wrong. As I installed jdk-6u2 from rpms, it should have given dependency error for the particular library, bit it didn’t. After searching for sometime I found that libstdc++.so.5 is provided by compat-libstdc++-33 package, which was not installed on my system. After I installing the particular package, everything worked fine.
It worked fine for Fedora Core 6 because compat-libstdc++-33 is provided by default in Fedora Core 6.
But its kind of strange that if libstdc++.so.5 and hence compat-libstdc++-33 is required for jdk-6u2 then why ‘rpm -ivh’ didn’t give dependency error.