MSN Crawlers Pawned

After seeing the way MSN crawled my last post, I just realized why Microsoft could never do good in Search Engine Market :)

I wonder why MSN would crawler same page from two different machines. I wonder if a single page can be divided further for crawling. Checkout the screenshots below :)

Why Google Wins

A few minutes later. Three Microsoft machines were crawling the same page 😛

MSN Crawler Pawned

I have conformed using ARIN that all these IPs belong to Microsoft 😀


How To: Boot Fedora Faster

Note: These tricks apply to any Linux based OS. But I have tested them only on Fedora, so can’t say whether they’ll work on other Linux(s).

My current Fedora installation is now almost one and a half years old. Yes. I am still using Fedora 7 😀 I have Fedora 10 on my other machine. Coming to the agenda, my Fedora installation has grown beyond control and I have services from named, squid, drbl, privoxy, vsftpd, vbox*, smb and what not on a personal desktop. These services really force my system startup to slow down to more than two minutes. While shutting down, its very easy to just cut the power supply but while booting up I can’t help and it frustrates me. And what frustrates me further that I have 4GB DDR2 RAM and AMD64 X2 5600+ (2.8GHz x 2) and booting time is still more than two minutes.


  • Boot Fedora faster using whatever techniques possible.

Remove the services from normal order and delay their execution to a later stage. So, services like network, squid, privoxy, named, vsftpd, smb etc. doesn’t make sense unless I am not logged in and using them. Let us start them after we have login screen.

Turn off all the services by using the command

[root@bordeaux ~]# chkconfig service_name off

where service_name is the service you want to turn off.

Now create a file /etc/ Enter a line like this

[root@bordeaux ~]# service service_name start

for every service that you have turned off in the Step 1.1 and you want it to be running after your machine starts up. Now, your file should look like this

service network start &
service sshd start &
modprobe it87 &
modprobe k8temp &
/usr/bin/iptraf -s eth0 -B &
/usr/bin/iptraf -s lo -B &
service squid start &
service privoxy start &
service httpd start &
service mysqld start &
service named start &
service smb start &
service vboxdrv start &
service vboxnet start &
service vsftpd start &

Add the following line to /etc/rc.local file

/bin/bash /etc/ &

Done!!! Notice the &s in both files. They are for execution in background so that a process can block boot process. You’ll observe a drop of 10-20 seconds in system startup time.

Problem with Hack #1 : The execution is not really parallel. It executes like a process in the background. So we can’t get the real advantage of parallel execution.

Hack #2 solves this problem. Now we don’t put processes in background. We use daemon forking to fork a separate daemon process which will start all the services for us in parallel. Here we’ll get the real advantage and startup time will decrease further.

This step is totally similar to Step 1.1. So skipping it.

This step is also similar to Step 1.2. The /etc/ file should look like this.

service network start
service xinetd start
service crond start
service anacron start
service atd start
service sshd start
service rpcbind start
service rpcgssd start
service rpcimapd start
modprobe it87
modprobe k8temp
/usr/bin/iptraf -s eth0 -B
/usr/bin/iptraf -s lo -B
service nasd start
service squid start
service privoxy start
service httpd start
service iptables start
service lm_sensors start
service mysqld start
service named start
service nfs start
service nfslock start
service smb start
service vboxdrv start
service vboxnet start
service vsftpd start
service autofs start
service smartd start

Notice the absence of &s in the file.

Download the attached file attached at the end of this post or copy paste the following code to /etc/ file.

#!/usr/bin/env python
# (C) Copyright 2008 Kulbir Saini
# License : GPL
import os
import sys
def fork_daemon(f):
    """This function forks a daemon."""
    # Perform double fork
    r = ''
    if os.fork(): # Parent
        # Wait for the child so that it doesn't defunct
        # Return a function
        return  lambda *x, **kw: r
    # Otherwise, we are the child
    # Perform second fork
    if os.fork():
    def wrapper(*args, **kwargs):
        """Wrapper function to be returned from generator.
        Executes the function bound to the generator and then
        exits the process"""
        f(*args, **kwargs)
    return wrapper
def start_services(startup_file):
    command = '/bin/bash ' + startup_file + ' > /dev/null 2> /dev/null '
if __name__ == '__main__':
    forkd = fork_daemon(start_services)
    print 'Executing ', sys.argv[1], '[  OK  ]'

Add the following line to your /etc/rc.local file.

/usr/bin/python /etc/ /etc/

Thats it. Done!!! Now you’ll experience a boost of about 25-30 seconds of decrease in boot time.

Stats of my machine

With all services started in normal order : 2minutes.
With Hack #1 : 1minute 42 seconds.
With Hack #2 : 1minute.

Warning : These hacks may break your system and can make it unusable. Use at your own risk.


Crack: Google Authentication Services are Vulnerable

There is a vulnerability in the way Google authentication service works. Whenever you login to any of the Google’s online services like GMail, Orkut, Groups, Docs, Youtube, Calendar etc., you are redirected to an authentication server which authenticates against the entered username and password and redirect back to the required service (GMail, Youtube etc.) setting the session variables.

Now, if you are able to grab the url used to set the session variables, you can login as the user to whom that url belongs from any machine on the Internet (need not be the machine belonging to the same subnet) without entering the username and password of the user.

The proxy servers in the organizations can be used to exploit this vulnerability. Squid is the most popular proxy server used. In the default configuration, squid strips the query terms of a url before logging. So, this vulnerability can’t be exploited. But if you turn off the stripping mechanism by adding the line shown below, then squid will log the complete url.

strip_query_terms off

So, after turning stripping mechanism off, the log will contain urls which will look like this

Replace with your tld specific to your country. If you paste this url in any browser, it’ll directly log you in and you can do whatever you want to that account. Remember that all such urls remains valid only for two minutes. So, if you use that url after two minutes, it’ll lead nowhere.

At the time of writing this post Orkut, Google Docs, Google Calendar, Google Books and Youtube are vulnerable.

So, make sure your squid has stripping mechanism turned on and your squid server is properly firewalled.

You can watch the Video proof for Orkut on, Youtube.


How To: Configure Caching Nameserver (named)


To configure a caching nameserver on a local machine which will cascade to another previously configured and functional nameserver (may or may not be caching. It’ll generally be your ISP nameserver or the one provided by your organization).


  • Reduces the delay in domain name resolution drastically as the requests for frequently accessed websites are served from cache.


  • named gets a request for domain resolution.
  • It checks whether the request can be satisfied from cache. If the answer is in cache and not stale, the request is satisfied from cache itself saving a lot of time :)
  • If request can’t be satisfied from cache, named queries the first parent. If it replies with the answer, then named will cache the response and subsequent requests for the same domain name will be satisfied from the cache.
  • In case first parent fails to reply, named will query the second parent and so on.

(The working is my understanding of caching-nameserver using wireshark as traffic analysis tool and caching-nameserver may not behave exactly as explained above.)

How to install

named is by default on most of the systems by the package name ‘caching-nameserver‘. If its not present on your system, install using

[root@localhost ~]# yum install caching-nameserver [ENTER]
# If that doesn't work try this
[root@localhost ~]# yum install bind [ENTER]

How to configure

The main configuration file for named resides in /var/named/chroot/etc/named.caching-nameserver.conf which is also soft linked from /etc/named.caching-nameserver.conf . named configuration file supports C/C++ style comments.

For a caching nameserver which will cascade to another nameserver, there is nothing much to be configured. You need to configure “options” block. Below is a configuration file for a machine with IP address cascading to two nameserver and The comments inline explain what each option does.

options {
  // Set the port to 53 which is standard port for DNS.
  // Add the IP address on which named will listen separated by semi-colons.
  // It'll be your own IP address.
  listen-on port 53 {;;};
  // These are default. Leave them as it is.
  directory   "/var/named";
  dump-file   "/var/named/data/cache_dump.db";
  statistics-file "/var/named/data/named_stats.txt";
  memstatistics-file "/var/named/data/named_mem_stats.txt";
  // The machines which are allowed to query this nameserver.
  // Normally you'll allow only your machine. But you can allow other machines also.
  // The address should be separated by semi-colons. To allow a network,
  // the line would be
  // allow-query {localhost;; };
  // Don't forget the semi-colons.
  allow-query     { localhost;; };
  recursion yes;
  // The parent nameservers. List all the nameserver which you can query.
  forwarders {;; };
  forward first;
logging {
        channel default_debug {
                file "data/";
                severity dynamic;
zone "." IN {
  type hint;
  file "";
include "/etc/named.rfc1912.zones";

Start caching-nameserver

Now start the caching-nameserver using the following command

[root@localhost ~]# server named start [ENTER]


[root@localhost ~]# /etc/init.d/named start [ENTER]

To make named start every time your reboot your machine use following command

[root@localhost ~]# chkconfig named on [ENTER]

Using caching-nameserver

To use your caching-nameserver, open /etc/resolv.conf file and add the following line


Comment all other lines in the file, so that finally the file looks like

; generated by /sbin/dhclient-script

Now your system will use your own nameserver (in caching mode) for resolving all domain names. To test if your nameserver use the following command

[root@localhost ~]# dig [ENTER]

Now if you use that command for the second time, the resolution time will be around 2-3 milli seconds while first time it would be around 400-700 milli seconds.


Below is two subsequent runs of dig for . Notice the Query time.

[root@bordeaux SPECS]# dig
; <<>> DiG 9.4.2rc1 <<>>
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7839
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1
;                  IN      A
;; ANSWER SECTION:           83629   IN      A
;; AUTHORITY SECTION:           79709   IN      NS
;; ADDITIONAL SECTION:        79709   IN      A
;; Query time: 531 msec
;; WHEN: Wed Nov 19 18:04:47 2008
;; MSG SIZE  rcvd: 79
[root@bordeaux SPECS]# dig
; <<>> DiG 9.4.2rc1 <<>>
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64233
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1
;                  IN      A
;; ANSWER SECTION:           83625   IN      A
;; AUTHORITY SECTION:           79705   IN      NS
;; ADDITIONAL SECTION:        79705   IN      A
;; Query time: 1 msec
;; WHEN: Wed Nov 19 18:04:51 2008
;; MSG SIZE  rcvd: 79
[root@bordeaux SPECS]#

Hack: A Fast Network Scanning Program

I was searching for a simple tool which can do a port scanning in a huge network quickly without making me wait for ages. I first thought of using nmap, but it was a bit too complex and it takes a lot of time to discover the machines even after optimizing the parameters. After searching a lot, I wrote to one of my seniors, Sandeep Kumar, asking the details of his program which maintains a list of active FTP servers in the network. He replied with a reference to his own findings about the network scanning tools. He is using an enhanced version of a program originally written by Troy Robinson. I tried the program out of curiosity and found out that its damn fast as compared to nmap (no literal comparison) :) The program can be downloaded from here.

How to use

Compile the program using gcc as

[root@localhost ~]# gcc NetworkScanner.c [ENTER]

Now create a file IPRange.txt containing the IP address ranges for your network. The contents of the file may be

172.16.*.* Meaning all the IP address with first two parts as 172.16 and rest of the address will be generated by permutations.

172.16.1-16.* Meaning the first two parts are fixed. Third part will vary from 1 to 16. And the fourth part will be permuted from 0 to 255.

So an IPRange.txt may look like


Now run the program as

[root@localhost ~]# ./a.out port_to_be_scanned Parallel_attempts IP_list_file output.txt [ENTER]

Parallel_attempts is the number of processes that’ll be forked for scanning the network port. It is safe to have its value as 255. A very high value may hog the network or may even slow down your machine. So an example run would be

[root@localhost ~]# ./a.out 21 255 IPRange.txt Output.txt [ENTER]


I carried out a lot of test on my network using the following setup and parameters

Machine : AMD X2 5600+ (2.6GHz Dual Core), 4GB 800MHz DDR2 RAM, Gigabit Ethernet Card (on 100mbps network).

Port : 21 (FTP)

IPRange.txt : Total 16896 IP Addresses

Machines on wired (100mbps) network
Machines on wireless (54mbps) network

Network Scanner Benchmarks

Parallel Attempts

Scanning Time (seconds)

Upload Bandwidth (kbps)

255 180 13
512 90 25
1024 47 55
2048 25 100
4096 14 205
6144 11 307
8192 9 374

The interval between two scans was almost 30-40 seconds. I think parallelism beyond 8192 will crash my machine, so I didn’t try. You can try it at your own risk :) I hope this program help you scan your network.


How To: Install Fedora without CD or DVD

Note: If you are new to Fedora/Linux, I highly recommend the book “Fedora Linux Toolbox

[amazon-product alink=”0000FF” bordercolor=”000000″ height=”240″]0470082917[/amazon-product]

Use Case

  1. When you don’t have CD / DVD drive on your system.
  2. You have Fedora DVD but your system has only a CD Drive.
  3. You don’t want to waste time and resources in burning iso on optical media.


  1. You have a Fedora DVD iso or rescue cd iso.
  2. You have a Linux installation on your system.
  3. You have a partition (FAT32, ext2, ext3) which you will not format while installing the new OS.

How to proceed

Let us assume you want to install Fedora 9 on your system and you have a Linux distro already installed on your system. You have downloaded the Fedora DVD iso (Fedora-9-DVD-i686.iso). And you have a FAT32/ext2/ext3 partition /stuff/ which you will not format during installation.

Step 1 : Move the Fedora DVD iso to /stuff/ directory.

[root@saini saini]# mv Fedora-9-DVD-i686.iso /stuff/ [Enter]

Step 2 : Mount Fedora DVD iso on /mnt/

[root@saini saini]# mount /stuff/Fedora-9-DVD-i686.iso /mnt/ -ro loop [Enter] (do as root)

Step 3 : Copy the initrd.img and vmlinuz to /boot/ partition

[root@saini saini]# cd /mnt/isolinux/ [Enter]
[root@saini isolinux]# cp initrd.img vmlinuz /boot/ [Enter] (do as root)

Step 4 : Create grub entry for booting into Fedora 9

Add these lines at the end of your /boot/grub/grub.conf file.

title Fedora 9 (New installation)
    kernel /vmlinuz
    initrd /initrd.img

Step 5 : Note the device having Fedora DVD iso

[root@saini saini]# df -h [Enter]
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda3              15G  9.5G  4.1G  70% /
/dev/sda8             135G  116G   13G  91% /stuff
/dev/sda5             4.8G  1.2G  3.4G  26% /home
/dev/sda1              99M   12M   82M  13% /boot

In this case /dev/sda8 contains Fedora DVD iso. Note this down as you need it later.

Step 6 : Reboot

Reboot your system and boot into the Fedora 9 (New installation) grub entry.

Step 7 : Install from hard disk

While in installation wizard, select “Hard drive” as installation method and choose /dev/sda8 as it contains the Fedora DVD iso. And rest is damn easy.


How To: Write Custom Redirector or Rewritor Plugin For Squid in Python


To write a custom Python program which can act as a plugin for Squid to redirect a given URL to another URL. This is useful when already existing redirector plugins for Squid doesn’t suit your needs or you want everything of your own.

Use Cases

  1. When you want to redirect URLs using a database like mysql or postgresql.
  2. When you want to redirect based on mappings stored in simple text files.
  3. When you want to build a redirector which can learn by itself using AI techniques 😛

How to proceed

From Squid FAQ,

The redirector program must read URLs (one per line) on standard input, and write rewritten URLs or blank lines on standard output. Note that the redirector program can not use buffered I/O. Squid writes additional information after the URL which a redirector can use to make a decision.

The format of the line read from the standard input by the program is as follows.

URL ip-address/fqdn ident method
# for example - GET -

The implementation sounds very simple and it is indeed very simple to implement. The only thing that should be taken care of is the unbuffered I/O. You should immediately flush the output to standard output once decision is taken.

For this howto, we assume we have a method called ‘modify_url()‘ which returns either a blank line or a modified URL to which the client should be redirected.

#!/usr/bin/env python
import sys
def modify_url(line):
    list = line.split(' ')
    # first element of the list is the URL
    old_url = list[0]
    new_url = '\n'
    # take the decision and modify the url if needed
    # do remember that the new_url should contain a '\n' at the end.
    if old_url.endswith('.avi'):
        new_url = '' + new_url
    return new_url
while True:
    # the format of the line read from stdin is
    # URL ip-address/fqdn ident method
    # for example
    # - GET -
    line = sys.stdin.readline().strip()
    # new_url is a simple URL only
    # for example
    new_url = modify_url(line)

Save the above file somewhere. We save this example file in /etc/squid/ Now, we have the function for redirecting clients. We need to configure squid to use . Below is the squid configuration for telling squid to use the above program as redirector.

# Add these lines to /etc/squid/squid.conf file.
# /usr/bin/python should be replaced by the path to python executable if you installed it somewhere else.
redirect_program /usr/bin/python /etc/squid/
# Number of instances of the above program that should run concurrently.
# 5 is good enough but you should go for 10 at least. Anything below 5 would not work properly.
redirect_children 5

Now, start/reload/restart squid. That’s all we need to write and use a custom redirector plugin for squid.


How To: Write Custom Basic Authentication Plugin for Squid in Python


To write a Python program which can be used to authenticate for Squid proxy server. This is useful when you don’t want to configure complex systems like LDAP, ntlm etc.

Use Cases

  1. When you want to authenticate clients using mysql database.
  2. When you want to authenticate clients using flat files or /etc/passwd file or some custom service on your network.

How to proceed

From auth_param section in squid.conf file:

Specify the command for the external authenticator. Such a program reads a line containing "username password" and replies "OK" or "ERR" in an endless loop. "ERR" responses may optionally be followed by a error description available as %m in the returned error page.

By default, the basic authentication scheme is not used unless a program is specified.

That clearly states that our python program should read a line from standard input (stdin) and write the appropriate response to the standard output (stdout). But there are some issues with I/O. The output should be unbuffered and should be flushed to standard output immediately after the response is known.

So, lets see a small program where we authenticate using a function ‘matchpassword()‘. This function returns True when username, password pair matches and returns False when they mismatch.

import sys
import socket
"""USAGE:The function returns True if the user and passwd match False otherwise"""
def matchpasswd(login,passwd):
    # Write your own function definition. 
    # Use mysql, files, /etc/passwd or some service or whatever you want
while True:
    # read a line from stdin
    line = sys.stdin.readline()
    # remove '\n' from line
    line = line.strip()
    # extract username and password from line
    username = line[:line.find(' ')]
    password = line[line.find(' ')+1:]
    if matchpasswd(username, password):
    # Flush the output to stdout.

Save the above file somewhere. We save this example file in /etc/squid/ .Now, we have the function for authenticating clients. We need to configure squid to use . Below is the squid configuration for telling squid to use the above program as basic authenticator.

# you need to specify /usr/bin/python if your file is not executable and needs an interpreter to be invoked.
# Replace /usr/bin/python with /usr/bin/php , if you write auth program in php.
auth_param basic program /usr/bin/python /etc/squid/
# how many instances of the above program should run concurrently
auth_param basic children 5
# display some message to clients when they are asked for username, password
auth_param basic realm Please enter your proxy server username and password
# for how much time the authentication should be valid
auth_param basic credentialsttl 2 hours
# whether username, password should be case sensitive or not
auth_param basic casesensitive on

Now, to force clients to authenticate, configure the acls as follow. Below we assume, you want to force all clients on your lan to authenticate for using proxy server.

# acl to force proxy authentication
acl authenticated proxy_auth REQUIRED
# acl to define IPs from your lan
acl lan src
# acl to force clients on your lan to authenticate
http_access allow lan authenticated

Now, reload/restart squid. That’s all we need to write and use a custom authentication plugin for squid.


Username can’t contain spaces. Otherwise program will not be able to parse/extract username, password from standard input.


How To: Install FFMPEG and FFMPEG-PHP

I was randomly browsing the internet and reading about making a website look better and I encountered ffmpeg-php. ffmpeg is a very powerful tool to record, convert and stream audio and video. Its a very rich tool almost supporting every format out there in the world. It can convert any format to any other format provided the codec. ffmpeg-php is an extension for PHP that provides a rich library to access info about audio and video files. The good thing about ffmpeg-php is that it can retrieve all info about any audio/video file subjected to the condition that the particular audio/video format is supported by your ffmpeg installation. So, now you have a clear idea that you can do wonders with audio/videos while showing them on your site :)

I tried some of the functionalities and they worked out of the box. Here’s is complete how to on installing ffmpeg and ffmpeg-php.


I tried installing ffmpeg from rpms provided by several Fedora repositories but after installation ffmpeg doesn’t seem to work. After several tries, I installed ffmpeg from source rpms and it worked. Below, I will describe how to install ffmpeg from source rpm.

Step 1:

Make sure that you have ‘rpmbuild’ installed by issuing

[root@bordeaux saini]# rpm -q rpmbuild [Enter]

command. If the above says that rpmbuild is not installed, then install it using yum as given below

[root@bordeaux saini]# yum install rpmbuild [Enter] (do as root)

Step 2:

Download the latest src rpm of ffmpeg from Issue the command given below

[root@bordeaux saini]# rpm -hiv ffmpeg-x.x.x.xx-xxx.src.rpm [Enter] (do as root)

Step 3:

Go to ‘/usr/src/redhat/SPECS/’ directory and issue the command given below

[root@bordeaux saini]# cd /usr/src/redhat/SPECS/ [Enter]
[root@bordeaux SPECS]# rpmbuild -ba ffmpeg.spec [Enter] (do as root)

If it gives an error like package ‘xyz’ is need by ffmpeg. Then install the package ‘xyz’ using yum as

[root@bordeaux SPECS]# yum install xyz [Enter] (do as root)

After installing the dependencies, issue the rpmbuild command ‘rpmbuild -ba ffmpeg.spec’. Now ffmpeg rpms will be build and they will be stored in ‘/usr/src/redhat/RPMS/i386/’.

Step 4:

Go the ‘/usr/src/redhat/RPMS/i386/’ (x86_64 instead of i386 if your OS is 64 bit). Install all the rpms that were built by rpmbuild.

[root@bordeaux saini]# rpm -hiv *.rpm [Enter] (do as root)

Thats it. ffmpeg is now successfully installed on your computer. Half the job is done. Now lets proceed with ffmpeg-php installation.


We will install ffmpeg-php from source bundle.

Step 1:

Make sure that ‘php-devel’ installed on your machine by issuing

[root@bordeaux saini]# rpm -q php-devel [Enter]

command. If the above command says the ‘php-devel’ is not installed, then install it using the following command.

[root@bordeaux saini]# yum install php-devel [Enter] (do as root)

Step 2:

Download the latest version of ffmpeg-php from here. Unpack the file you have downloaded.

[root@bordeaux saini]# bunzip2 -d ffmpeg-php-0.5.1.tbz2 [Enter]
[root@bordeaux saini]# tar -xvf ffmpeg-php-0.5.1.tar [Enter]

Step 3:

Issue the following command in sequence if everything goes fine.

[root@bordeaux saini]# cd ffmpeg-php-0.5.1 [Enter]
[root@bordeaux ffmpeg-php-0.5.1]# phpize [Enter]
[root@bordeaux ffmpeg-php-0.5.1]# ./configure [Enter]
[root@bordeaux ffmpeg-php-0.5.1]# make [Enter]
[root@bordeaux ffmpeg-php-0.5.1]# make install [Enter] (do as root)

Step 4:

Open ‘/etc/php.ini’ and add a line ‘’ in the category ‘Dynamic Extensions’. For help see the image below.

Step 5:

Restart apache web server aka ‘httpd’ service by issuing the command.

[root@bordeaux saini]# service httpd restart [Enter] (do as root)

Step 6:

Write a test php file and test your ffmpeg-php installation.


Save the above code in ‘info.php’ and save the file in ‘/var/www/html/’ and browse http://localhost/info.php . If you see something like this.
Then the ffmpeg-php is successfully installed on your machine. Now you can jump into the world of video manipulation via your website.