MSN Crawlers Pawned

After seeing the way MSN crawled my last post, I just realized why Microsoft could never do good in Search Engine Market 🙂

I wonder why MSN would crawler same page from two different machines. I wonder if a single page can be divided further for crawling. Checkout the screenshots below 🙂

Why Google Wins

A few minutes later. Three Microsoft machines were crawling the same page 😛

MSN Crawler Pawned

I have conformed using ARIN that all these IPs belong to Microsoft 😀

 

How To: Boot Fedora Faster

Note: These tricks apply to any Linux based OS. But I have tested them only on Fedora, so can’t say whether they’ll work on other Linux(s).

My current Fedora installation is now almost one and a half years old. Yes. I am still using Fedora 7 😀 I have Fedora 10 on my other machine. Coming to the agenda, my Fedora installation has grown beyond control and I have services from named, squid, drbl, privoxy, vsftpd, vbox*, smb and what not on a personal desktop. These services really force my system startup to slow down to more than two minutes. While shutting down, its very easy to just cut the power supply but while booting up I can’t help and it frustrates me. And what frustrates me further that I have 4GB DDR2 RAM and AMD64 X2 5600+ (2.8GHz x 2) and booting time is still more than two minutes.

Agenda

  • Boot Fedora faster using whatever techniques possible.

Remove the services from normal order and delay their execution to a later stage. So, services like network, squid, privoxy, named, vsftpd, smb etc. doesn’t make sense unless I am not logged in and using them. Let us start them after we have login screen.

Turn off all the services by using the command

[root@bordeaux ~]# chkconfig service_name off

where service_name is the service you want to turn off.

Now create a file /etc/startup.sh. Enter a line like this

[root@bordeaux ~]# service service_name start

for every service that you have turned off in the Step 1.1 and you want it to be running after your machine starts up. Now, your startup.sh file should look like this

service network start &
service sshd start &
modprobe it87 &
modprobe k8temp &
/usr/bin/iptraf -s eth0 -B &
/usr/bin/iptraf -s lo -B &
service squid start &
service privoxy start &
service httpd start &
service mysqld start &
service named start &
service smb start &
service vboxdrv start &
service vboxnet start &
service vsftpd start &

Add the following line to /etc/rc.local file

/bin/bash /etc/startup.sh &

Done!!! Notice the &s in both files. They are for execution in background so that a process can block boot process. You’ll observe a drop of 10-20 seconds in system startup time.

Problem with Hack #1 : The execution is not really parallel. It executes like a process in the background. So we can’t get the real advantage of parallel execution.

Hack #2 solves this problem. Now we don’t put processes in background. We use daemon forking to fork a separate daemon process which will start all the services for us in parallel. Here we’ll get the real advantage and startup time will decrease further.

This step is totally similar to Step 1.1. So skipping it.

This step is also similar to Step 1.2. The /etc/startup.sh file should look like this.

service network start
service xinetd start
service crond start
service anacron start
service atd start
service sshd start
service rpcbind start
service rpcgssd start
service rpcimapd start
modprobe it87
modprobe k8temp
/usr/bin/iptraf -s eth0 -B
/usr/bin/iptraf -s lo -B
service nasd start
service squid start
service privoxy start
service httpd start
service iptables start
service lm_sensors start
service mysqld start
service named start
service nfs start
service nfslock start
service smb start
service vboxdrv start
service vboxnet start
service vsftpd start
service autofs start
service smartd start

Notice the absence of &s in the file.

Download the attached startup.py file attached at the end of this post or copy paste the following code to /etc/startup.py file.

#!/usr/bin/env python
# (C) Copyright 2008 Kulbir Saini
# License : GPL
import os
import sys
def fork_daemon(f):
    """This function forks a daemon."""
    # Perform double fork
    r = ''
    if os.fork(): # Parent
        # Wait for the child so that it doesn't defunct
        os.wait()
        # Return a function
        return  lambda *x, **kw: r
    # Otherwise, we are the child
    # Perform second fork
    os.setsid()
    os.umask(077)
    os.chdir('/')
    if os.fork():
        os._exit(0)
    def wrapper(*args, **kwargs):
        """Wrapper function to be returned from generator.
        Executes the function bound to the generator and then
        exits the process"""
        f(*args, **kwargs)
        os._exit(0)
    return wrapper
 
def start_services(startup_file):
    command = '/bin/bash ' + startup_file + ' > /dev/null 2> /dev/null '
    os.system(command)
    return
 
if __name__ == '__main__':
    forkd = fork_daemon(start_services)
    forkd(sys.argv[1])
    print 'Executing ', sys.argv[1], '[  OK  ]'

Add the following line to your /etc/rc.local file.

/usr/bin/python /etc/startup.py /etc/startup.sh

Thats it. Done!!! Now you’ll experience a boost of about 25-30 seconds of decrease in boot time.

Stats of my machine

With all services started in normal order : 2minutes.
With Hack #1 : 1minute 42 seconds.
With Hack #2 : 1minute.

Warning : These hacks may break your system and can make it unusable. Use at your own risk.

 

Crack: Google Authentication Services are Vulnerable

There is a vulnerability in the way Google authentication service works. Whenever you login to any of the Google’s online services like GMail, Orkut, Groups, Docs, Youtube, Calendar etc., you are redirected to an authentication server which authenticates against the entered username and password and redirect back to the required service (GMail, Youtube etc.) setting the session variables.

Now, if you are able to grab the url used to set the session variables, you can login as the user to whom that url belongs from any machine on the Internet (need not be the machine belonging to the same subnet) without entering the username and password of the user.

The proxy servers in the organizations can be used to exploit this vulnerability. Squid is the most popular proxy server used. In the default configuration, squid strips the query terms of a url before logging. So, this vulnerability can’t be exploited. But if you turn off the stripping mechanism by adding the line shown below, then squid will log the complete url.

strip_query_terms off

So, after turning stripping mechanism off, the log will contain urls which will look like this

http://www.google.co.in/accounts/SetSID?ssdc=1&sidt=Q5UrfB0BAAA%3D.oHVGErODzffQ%2Bms%2FOKfk53g5naReDKehRNHOBsmJlBu3VTNXjF03SbgX%2FVEEhmImhR4mlu5IAAjM%2BdbuXvMMSIb0oU8IGCYpnLcSNkbCIrG%2BQnm81YmX5%2Brcrq7U6Qx65%2F1yaQ2NzgmKD94jg0Iw13iXDen3qD5qn6L%2FhmmYWwTrcOeuTzGbO%2BAehpjEU3mrWapRafaq3b4kxyigJ68s8QrGQqZTINNE%2Bs%2BoIkZWmGt5kNzoT8fkVAsWJeu3CKFkxj4oVMngeDvpwb1nyFpsJCltOzmAr46fTxVJSpvQdx0%3D.BMLtjUdIDCcuszktZSvYzA%3D%3D&continue=http%3A%2F%2Fwww.orkut.com%2FRedirLogin.aspx%3Fmsg%3D0%26ts%3D1226148773097%3A1226148773386%3A1226148774868%26auth%3DDQAAAIcAAAC1pPE1QT4chKgrU4B3oyKZrQRkEVPtYlclpESQoXV_d9x9gdoe75Z0hfJ_22Pn5tVMR7j-uV5YCps3NB48L0bFlDeX-4PGHVT6Loztp_ru3tAy_gxDa9_YAEbz4d9CO4wD2VTKtzax9zvpGgrnJVZQfoWPkkIomUmxDtVGoH7g3fA3UjS0vdBJ2PJtgFMElso

Replace .co.in with your tld specific to your country. If you paste this url in any browser, it’ll directly log you in and you can do whatever you want to that account. Remember that all such urls remains valid only for two minutes. So, if you use that url after two minutes, it’ll lead nowhere.

At the time of writing this post Orkut, Google Docs, Google Calendar, Google Books and Youtube are vulnerable.

So, make sure your squid has stripping mechanism turned on and your squid server is properly firewalled.

You can watch the Video proof for Orkut on Blip.tv, Youtube.

 

How To: Configure Caching Nameserver (named)

Mission

To configure a caching nameserver on a local machine which will cascade to another previously configured and functional nameserver (may or may not be caching. It’ll generally be your ISP nameserver or the one provided by your organization).

Advantage

  • Reduces the delay in domain name resolution drastically as the requests for frequently accessed websites are served from cache.

Working

  • named gets a request for domain resolution.
  • It checks whether the request can be satisfied from cache. If the answer is in cache and not stale, the request is satisfied from cache itself saving a lot of time 🙂
  • If request can’t be satisfied from cache, named queries the first parent. If it replies with the answer, then named will cache the response and subsequent requests for the same domain name will be satisfied from the cache.
  • In case first parent fails to reply, named will query the second parent and so on.

(The working is my understanding of caching-nameserver using wireshark as traffic analysis tool and caching-nameserver may not behave exactly as explained above.)

How to install

named is by default on most of the systems by the package name ‘caching-nameserver‘. If its not present on your system, install using

[root@localhost ~]# yum install caching-nameserver [ENTER]
# If that doesn't work try this
[root@localhost ~]# yum install bind [ENTER]

How to configure

The main configuration file for named resides in /var/named/chroot/etc/named.caching-nameserver.conf which is also soft linked from /etc/named.caching-nameserver.conf . named configuration file supports C/C++ style comments.

For a caching nameserver which will cascade to another nameserver, there is nothing much to be configured. You need to configure “options” block. Below is a configuration file for a machine with IP address 172.17.8.64 cascading to two nameserver 192.168.36.204 and 192.168.36.210. The comments inline explain what each option does.

options {
  // Set the port to 53 which is standard port for DNS.
  // Add the IP address on which named will listen separated by semi-colons.
  // It'll be your own IP address.
  listen-on port 53 {127.0.0.1; 172.17.8.64;};
  // These are default. Leave them as it is.
  directory   "/var/named";
  dump-file   "/var/named/data/cache_dump.db";
  statistics-file "/var/named/data/named_stats.txt";
  memstatistics-file "/var/named/data/named_mem_stats.txt";
  // The machines which are allowed to query this nameserver.
  // Normally you'll allow only your machine. But you can allow other machines also.
  // The address should be separated by semi-colons. To allow a network 172.16.31.0/24,
  // the line would be
  // allow-query {localhost; 172.16.31.0/24; };
  // Don't forget the semi-colons.
  allow-query     { localhost; 172.17.8.64; };
  recursion yes;
  // The parent nameservers. List all the nameserver which you can query.
  forwarders { 192.168.36.204; 192.168.36.210; };
  forward first;
};
logging {
        channel default_debug {
                file "data/named.run";
                severity dynamic;
        };
};
zone "." IN {
  type hint;
  file "named.ca";
};
include "/etc/named.rfc1912.zones";

Start caching-nameserver

Now start the caching-nameserver using the following command

[root@localhost ~]# server named start [ENTER]

OR

[root@localhost ~]# /etc/init.d/named start [ENTER]

To make named start every time your reboot your machine use following command

[root@localhost ~]# chkconfig named on [ENTER]

Using caching-nameserver

To use your caching-nameserver, open /etc/resolv.conf file and add the following line

nameserver 127.0.0.1

Comment all other lines in the file, so that finally the file looks like

; generated by /sbin/dhclient-script
#search wlan.iiit.ac.in
#nameserver 192.168.36.204
#nameserver 192.168.36.210
nameserver 127.0.0.1

Now your system will use your own nameserver (in caching mode) for resolving all domain names. To test if your nameserver use the following command

[root@localhost ~]# dig fedora.co.in [ENTER]

Now if you use that command for the second time, the resolution time will be around 2-3 milli seconds while first time it would be around 400-700 milli seconds.

Example

Below is two subsequent runs of dig for fedora.co.in . Notice the Query time.

[root@bordeaux SPECS]# dig fedora.co.in
; <<>> DiG 9.4.2rc1 <<>> fedora.co.in
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7839
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1
;; QUESTION SECTION:
;fedora.co.in.                  IN      A
;; ANSWER SECTION:
fedora.co.in.           83629   IN      A       72.249.126.241
;; AUTHORITY SECTION:
fedora.co.in.           79709   IN      NS      ns.fedora.co.in.
;; ADDITIONAL SECTION:
ns.fedora.co.in.        79709   IN      A       72.249.126.241
;; Query time: 531 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Wed Nov 19 18:04:47 2008
;; MSG SIZE  rcvd: 79
[root@bordeaux SPECS]# dig fedora.co.in
; <<>> DiG 9.4.2rc1 <<>> fedora.co.in
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64233
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1
;; QUESTION SECTION:
;fedora.co.in.                  IN      A
;; ANSWER SECTION:
fedora.co.in.           83625   IN      A       72.249.126.241
;; AUTHORITY SECTION:
fedora.co.in.           79705   IN      NS      ns.fedora.co.in.
;; ADDITIONAL SECTION:
ns.fedora.co.in.        79705   IN      A       72.249.126.241
;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Wed Nov 19 18:04:51 2008
;; MSG SIZE  rcvd: 79
[root@bordeaux SPECS]#