IntelligentMirror: Available for Testing

Note : A newer version of intelligentmirror is available now. Please check this.

Intelligent Mirror is basically a tool or squid plugin (redirector) to cache rpm packages so that the subsequent requests for the same package can be served from the local cache which will eventually save a lot of bandwidth and downloading time.

Who needs Intelligent Mirror?

  1. If you are on a shared network where a lot of people use linux distros with RPM as their package manager, then you need this. Universities should come under this category.
  2. If you have a set of systems having red hat derivatives and almost identical OS versions, you need this. LAN setups at home should come under this category.
  3. If you can’t afford to or don’t want to mirror entire fedora repo for local access due to bandwidth limitations, you need this.

What it does?

As described above, Intelligent Mirror, just caches rpms which are requested by the clients in a shared network. And subsequent requests for those rpms are served from the cache. For a detailed description, check the project page.

Why not use Squid in caching mode?

Squid caching is based on url hashing. Let me explain with an example how Intelligent Mirror is actually intelligent as compared to squid while caching rpms.

Let us say there is an rpm yum-3.2.0-1.fc7.i386.rpm . You executed “yum update yum“. And let us say the newer version of yum is yum-3.2.18-1.fc9.i386.rpm which was fetched from one of the fedora mirrors http://abc.com/ (say). Now someone on the same network launched “yum update yum” and he got the same rpm yum-3.2.18-1.fc9.i386.rpm. But this time rpm was fetched from another mirror http://xyz.com/ (say).

Case I : Squid caching

Squid will cache http://abc.com/linux/fc9/updates/i386/yum-3.2.18-1.fc9.i386.rpm . And when http://xyz.com/linux/fc9/updates/i386/yum-3.2.18-1.fc9.i386.rpm will be requested, it’ll result in a cache miss and squid will again download the same package and will cache this one as well. Now there are two problems

  1. Squid is not able to serve from the cache, though the package was the same.
  2. Additional storage space is being wasted in caching the same package. And this can really harm if unluckily a different mirror is picked in all the subsequent queries.

Case II : IntelligentMirror caching

Intelligent Mirror will cache the package yum-3.2.18-1.fc9.i386.rpm without bothering about its origin. And even if yum picks up a different mirror for the subsequent request, the package will be served from the cache and will not be fetched from upstream. So, the obvious advantage of saving the bandwidth and downloading time.

Download

Intelligent Mirror source tarball, rpm, source rpm are available for download from here.

Installing and Configuring Intelligent Mirror

Install Guide

Configuration Guide

Issues and Suggestions

If you see any issue or you have any suggestions for improving the functionality, either mail me at kulbirsaini25 AT GMAIL DoT COM or file a ticket on the project page.

 

How To: Test Fedora Pre-release

If you want your Fedora to be best ever, just grab a copy of Fedora DVD or live iso from here and test the features you like. Checkout the Fedora 10 feature list and test as many as possible.

If you don’t have access to optical media (cd/dvd rom) and want to try the live iso, VirtualBox is your friend. Grab the latest version of VirtualBox for your distro and platfrom from Sun download center and install using rpm or yum. Using VirtualBox is quite easy. Here is a video tutorial for noobs on how to test pre-release live ISOs using VirualBox. Have fun πŸ™‚

 

Info: Eclipse DemoCamps 2008 – Ganymede Edition Hyderabad

About the event from Eclipsepedia,

Eclipse is releasing version 3.4 (Ganymede) on 25th June 2008. To mark the occasion, the Eclipse foundation is organizing a series of Democamps around the world. The Hyderabad edition is organized by hyd-eclipse.org (a Hysea initiative). We will discuss the new features and projects in the Ganymede release, and also present a few case studies of some interesting usecases of Eclipse. The event will conclude with a networking of the Eclipse developers’ community in Hyderabad.

I happened to attend the camp with a lot of my friends. Organisers were overwhelmed by the turnout. More than 100 people (mostly from industry) attended the camp and the place was almost overcrowded. The idea was to educate the newbies about the new features in version 3.4 and showcase some demos for enterprise software development and also to bring the eclipse developers at a common platform.

The camp started with a session about the new features in Ganymede version. Kiran Kumar from Progress Software talked about the new features like improved regular expression support, changes in update manager, enhancements in PDE etc. Following this, Saurav from Pramati Technologies talked about the WTP new features in this version mainly with respect to JavaScript and other Java improvements. Finally Ravi Sankar from Progress Software concluded with Xpand demo.

All in all, it was a good event which provided a bit of exposure to the developments going on in and around Hyderabad. Thanks Hysea, Progress and Pramati for organizing the wonderful event πŸ˜€

 

Info: Padma Version 0.4.13 Released

From Padma’s official website,

Padma is a system for transforming Indic text between various public and proprietary formats. This extension applies the technology to Mozilla based applications. Padma is available as an extension for Firefox, Thunderbird, Netscape, Mozilla suite and SeaMonkey platforms. Padma currently supports Telugu, Malayalam, Tamil, Devanagari, Gujarati, Kannada, Bengali, and Gurmukhi scripts

This particular version includes support for Marathi TrueType Font – Shree-Dev-0714, which I wrote last summer. I am happy that it is finally Padma core. A big credit goes to Radhika Thammishetty and Harshita Vani for guiding and helping me out in writing the mappings.

 

Bug: Knetstats Bandwidth Monitoring Problem

I use knetstats for monitoring my wireless traffic. One fine day my wireless connection was not working as expected. I reloaded the module and reactivated the wifi device. Just to check if there were some traffic, I opened knetstats and guess what happened. I was blown the by the huge traffic on my wifi interface. The upload speed was something around 5.33 ExaBytes/sec ( or 5864061874995MB/sec) for sometime. I almost survived a heart attack. I wonder when we will have Internet connection with that speed. Here is the proof πŸ™‚

Knetstats Gone Mad

 

IntelligentMirror: GSOC Project Update

Brief Introduction

IntelligentMirror can be used to create a mirror of static HTTP content on your local network. When you download something (say a software package) from Internet, it is stored/cached on a local machine on your network and subsequent downloads of that particular software package are supplied from the storage/cache of the local machine. This facilitate the efficient usage of bandwidth and also reduces the average download time. IntelligentMirror can also do pre-fetching of RPM packages from fedora repositories spread all over the world and can also pre-populate the local repo with popular packages like mplayer, vlc, gstreamer which are normally accessed immediately after a fresh install.

Definition for a lay man

Think of Internet as a hard disk, your proxy server as a cache and your Intranet as a CPU. Now, whenever your CPU needs to process something, it needs data from cache. If data is not there in cache, it’ll be fetched from RAM and/or hard disk. IntelligentMirror sits on your proxy server and keep caching packages in a browsable manner which can be served via http for subsequent requests.

For further details about IntelligentMirror, go here.

Update

After getting the hosting space on fedorahosted.org, I pushed the code I have written. You can check the source tree here.

We are buidling IntelligentMirror as a plugin to squid which taps requests from clients and checks them against a cache. Checkout how to write a custom redirector or how to tap requests to squid. And acts accordingly. We are working on live streaming the partially downloaded package to the end user while caching it.

If you have any suggestion, feel free to leave them as a comment here or edit the wiki page πŸ™‚

 

Info: Yum Bug Day on May 30

Hey all yum lovers,

We (the yum people) are organising Yum Bug Day on May 30, 2008. We try to kill as much bugs as possible. For a list of bugs, check Yum Bugs and Yum-Utils Bugs. If you think you can fix, provide info about one or more or you have any suggestions, just drop in #yum on freenode.net or shoot a mail at yum-devel.

In Seth’s words,

Yum Bug Day!
It’s exciting! It’s Fun! You get to tell people things like: Already
fixed, WONTFIX, CANTFIX, NOTABUG, and ‘ooo yah, that’s pretty broken’.

So, what are you waiting for?? πŸ˜›

 

Review: Spicebird – A Collaboration Platform

Well, I happened to attend this workshop on “How to build business around open source tools” organized by Twincling Society and IIIT Hyderabad. There I came to know about Spicebird. Spicebird is a single platform for many collaboration needs. It provides e-mail, calendaring and instant messaging with intuitive integration and unlimited extensibility. Spicebird is being developed by a Hyderabad based Indian start-up named Synovel (All four founders are alumni of IIIT Hyderabad). Below we look at some features that Spicebird provides.

1. Tabbed Interface

The tabbed interface for different utilities like mail, calendar, contacts, tasks etc. looks pretty clean. The interface is not at all cluttered in any way and navigation to different utilities is straight forward. You don’t have to brainstorm before getting something done.

2. Familiar Interface & Crisp Icon Set

Spicebird has an interface similar to loads of mozilla based application out there. The settings, preferences and the way things have been managed are familiar. So people who are switching from other open source email clients will not face any problems at all. Spicebird uses icons from Tango Project. The icons used are really good looking.

3. Nice Home Tab

The way Home tab has been organized is really appealing. You can add applets which includes rss feeds from you favourite blogs, mail folder views, calendar, upcoming events and Date & Time. Geeks love rss feeds. And what can be better than having it on your home tab all the time along with your mails. Event applet comes handy to remind you of the upcoming meetings and deadlines. And its on home tab all the time πŸ™‚ Date & Time is specially helpful when you collaborate with people in different timezones. So you can add their timezone on home tab and you know when is the right time to ping them.

Spicebird Home Tab

4. Email

Email experience is more or less like any other open source email client. But Spicebird provides some intitutive features like if it finds that the content of a mail is about a meeting, it’ll give an option for creating a calendar event for the same. This is a really good feature and this is just the begining. Spicebird is still beta.

SpiceBird Intutive Mail

5. Instant Messaging

This is a really cool feature from collaboration point of view and which makes Spicebird different from the masses. Spicebird is supporting IM via any jabber server. So if you are a startup, setup your own jabber server on Intranet and use it for collaboration. Mind blowing!! This also includes Gmail/GTalk. So you can just say bye bye to your messenger and start using it right away with GTalk. Plus this will import all your contacts to your local address book. Another real good feature which is not there in lot of other email clients.

SpiceBird Instant Message using Jabber, GTalk

6. Calendar & Task Management

Another good feature. Integrated calendar and task management. You can quickly add tasks and events. And you need not check your calendar for upcoming events, add upcoming event applet on home tab and you will have them all the time in front of your eyes πŸ™‚

Spicebird Calendar and Task Manager

Conclusion

Whether you are a startup which is looking for tools to collaborate or a user who is excited about using open source tools, just go and download Spicebird from here and explore a new way of managing things at a single place πŸ™‚

You can look at Spicebird Roadmap here and checkout the video demo of Spicebird here.

 

How To: Configure Squid Proxy Server

Mission

To configure squid for simple proxying without caching anything.

Use Cases

  1. When you want to have control on what people browse on your lan.
  2. When number of machine is more than the number of IP addresses you can afford to buy.
  3. When you want to help this holy world in saving some IPV4 addresses πŸ˜›

Assumptions

  1. You have a machine connected directly to internet that you are going to use as a proxy server for other machines on your network.
  2. The machines on your network are using 192.168.0.0/16 as private address space. You can use anyone/multiple address spaces of the available but for this howto we assume 192.168.0.0/16 as the local network.
  3. The local IP address of the machine which will run squid proxy server is 192.168.36.204. You can have any IP, but for this howto we assume this.

How to proceed

First of all ensure that you have squid installed. After installing squid, you need to set access control in squid configuration file which resides in /etc/squid by default. Open /etc/squid/squid.conf and add/edit following lines according to your preferences. Few lines already exist in the configuration file, you can add the rest.

# The port on which squid will listen for requests
http_port 8080
# If 'cgi-bin' or '?' is in query, squid should not check with neighbours'/parents' cache
# and should go to target web-server.
hierarchy_stoplist cgi-bin ?
# If url contains 'cgi-bin' or '?', then it must not be cached
acl QUERY urlpath_regex cgi-bin \?
cache deny QUERY
acl apache rep_header Server ^Apache
broken_vary_encoding allow apache
# Absolute path to squid access log.
access_log /var/log/squid/access.log squid
refresh_pattern ^ftp:           1440    20%     10080
refresh_pattern ^gopher:        1440    0%      1440
refresh_pattern .               0       20%     4320
# Access control list to control every IP address
acl all src 0.0.0.0/0.0.0.0
# Access control list for source machine in LAN
acl lan_src src 192.168.0.0/16
# Access control list for destination machine in LAN
acl lan_dst dst 192.168.0.0/16
# Access control list to manage squid cache
acl manager proto cache_object
# Access control list to define IP address allowed for source localhost
acl localhost src 127.0.0.1/255.255.255.255
# Access control list to define IP addresses allowed for localhost as destination
acl to_localhost dst 127.0.0.0/8
# Access control list to define Safe ports that should be allowed by default
acl SSL_ports port 443 563 1863 5190 5222 5050 6667
acl Safe_ports port 80          # http
acl Safe_ports port 21          # ftp
acl Safe_ports port 443         # https
acl Safe_ports port 70          # gopher
acl Safe_ports port 210         # wais
acl Safe_ports port 1025-65535  # unregistered ports
acl Safe_ports port 280         # http-mgmt
acl Safe_ports port 488         # gss-http
acl Safe_ports port 591         # filemaker
acl Safe_ports port 777         # multiling http
acl CONNECT method CONNECT
# Allow cache management only from localhost
http_access allow manager localhost
# Deny cache management from remote hosts
http_access deny manager
# Deny http access via all the ports which are not listed as safe
http_access deny !Safe_ports
# Deny all connections via all ports which are not listed as safe
http_access deny CONNECT !SSL_ports
# Allow http access from localhost
http_access allow localhost
# Allow http access from machines on LAN
http_access allow lan_src
http_access deny all
http_reply_access allow all
icp_access allow all
# Deny caching for everyone so that there is not caching at all
cache deny all
coredump_dir /var/spool/squid
# Never allow direct connection to machines on the internet
prefer_direct off
never_direct allow all
# Allow direct connetion if the destination machine is on LAN
always_direct allow lan_dst
# Delete this line if you don't have /etc/hosts file
hosts_file /etc/hosts
# Allow AIM connections
# Delete the following 9 lines if you don't want people to connect to AIM
acl AIM_ports port 5190 9898 6667
acl AIM_domains dstdomain .oscar.aol.com .blue.aol.com .freenode.net
acl AIM_domains dstdomain .messaging.aol.com .aim.com
acl AIM_hosts dstdomain login.oscar.aol.com login.glogin.messaging.aol.com toc.oscar.aol.com irc.freenode.net
acl AIM_nets dst 64.12.0.0/255.255.0.0
acl AIM_methods method CONNECT
http_access allow AIM_methods AIM_ports AIM_nets
http_access allow AIM_methods AIM_ports AIM_hosts
http_access allow AIM_methods AIM_ports AIM_domains
# Allow connections to Yahoo Messenger
# Delete the following 6 lines if you don't want people to connect to Yahoo Messenger
acl YIM_ports port 5050
acl YIM_domains dstdomain .yahoo.com .yahoo.co.jp
acl YIM_hosts dstdomain scs.msg.yahoo.com cs.yahoo.co.jp
acl YIM_methods method CONNECT
http_access allow YIM_methods YIM_ports YIM_hosts
http_access allow YIM_methods YIM_ports YIM_domains
# Allow connections to Google Talk
# Delete the following 6 lines if you don't want people to connect to Google Talk
acl GTALK_ports port 5222 5050
acl GTALK_domains dstdomain .google.com
acl GTALK_hosts dstdomain talk.google.com
acl GTALK_methods method CONNECT
http_access allow GTALK_methods GTALK_ports GTALK_hosts
http_access allow GTALK_methods GTALK_ports GTALK_domains
# Allow connections to MSN
# Delete the following 6 lines if you don't want people to connect to Google Talk
acl MSN_ports port 1863 443 1503
acl MSN_domains dstdomain .microsoft.com .hotmail.com .live.com .msft.net .msn.com .passport.com
acl MSN_hosts dstdomain messenger.hotmail.com
acl MSN_nets dst 207.46.111.0/255.255.255.0
acl MSN_methods method CONNECT
http_access allow MSN_methods MSN_ports MSN_hosts

Now, start the squid proxy server as

service squid start

Also, if you want squid to be started every time you boot the machine, execute the following command

chkconfig --level 345 squid on

You have a squid proxy server running now. You can ask clients to configure there browsers to use 192.168.36.204 as a proxy server with 8080 as proxy port. Command line utilities like elinks, lynx, yum, wget etc. can be asked to use proxy by exporting http_proxy variable as below. Users can also add these lines to ~/.bashrc file to avoid exporting every-time.

export http_proxy='http://192.168.36.204:8080'
export ftp_proxy='http://192.168.36.204:8080'

I highly recommend the book “Squid Proxy Server 3.1: Beginner’s Guide (Paperback)” for further reading.

 

How To: Write Custom Redirector or Rewritor Plugin For Squid in Python

Mission

To write a custom Python program which can act as a plugin for Squid to redirect a given URL to another URL. This is useful when already existing redirector plugins for Squid doesn’t suit your needs or you want everything of your own.

Use Cases

  1. When you want to redirect URLs using a database like mysql or postgresql.
  2. When you want to redirect based on mappings stored in simple text files.
  3. When you want to build a redirector which can learn by itself using AI techniques πŸ˜›

How to proceed

From Squid FAQ,

The redirector program must read URLs (one per line) on standard input, and write rewritten URLs or blank lines on standard output. Note that the redirector program can not use buffered I/O. Squid writes additional information after the URL which a redirector can use to make a decision.

The format of the line read from the standard input by the program is as follows.

1
2
3
URL ip-address/fqdn ident method
# for example
http://saini.co.in 172.17.8.175/saini.co.in - GET -

The implementation sounds very simple and it is indeed very simple to implement. The only thing that should be taken care of is the unbuffered I/O. You should immediately flush the output to standard output once decision is taken.

For this howto, we assume we have a method called ‘modify_url()‘ which returns either a blank line or a modified URL to which the client should be redirected.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#!/usr/bin/env python
 
import sys
def modify_url(line):
    list = line.split(' ')
    # first element of the list is the URL
    old_url = list[0]
    new_url = '\n'
    # take the decision and modify the url if needed
    # do remember that the new_url should contain a '\n' at the end.
    if old_url.endswith('.avi'):
        new_url = 'http://fedora.co.in/errors/accessDenied.html' + new_url
    return new_url
 
while True:
    # the format of the line read from stdin is
    # URL ip-address/fqdn ident method
    # for example
    # http://saini.co.in 172.17.8.175/saini.co.in - GET -
    line = sys.stdin.readline().strip()
    # new_url is a simple URL only
    # for example
    # http://fedora.co.in
    new_url = modify_url(line)
    sys.stdout.write(new_url)
    sys.stdout.flush()

Save the above file somewhere. We save this example file in /etc/squid/custom_redirect.py. Now, we have the function for redirecting clients. We need to configure squid to use custom_redirect.py . Below is the squid configuration for telling squid to use the above program as redirector.

1
2
3
4
5
6
# Add these lines to /etc/squid/squid.conf file.
# /usr/bin/python should be replaced by the path to python executable if you installed it somewhere else.
redirect_program /usr/bin/python /etc/squid/custom_redirect.py
# Number of instances of the above program that should run concurrently.
# 5 is good enough but you should go for 10 at least. Anything below 5 would not work properly.
redirect_children 5

Now, start/reload/restart squid. That’s all we need to write and use a custom redirector plugin for squid.