I Love Twitter, I Love Python = I Love Twython

I was really late to join Twitter. As soon as I joined Twitter, I started exploring various aspects of Twitter and hence the Twitter API. While searching for a wrapper for Twitter API in python, I came across python-twitter which was good and tweepy which was ok and I finally found twython. Twython is awesome and more importantly upto date with Twitter API.

While I was using twython, I found one fact very irritating. It didn’t have docstrings. And I had to go to Twitter API Wiki for description of every single function πŸ™ I talked to Ryan (Twython developer, who works all day long at his day job with webs.com and develops twython when everybody sleeps πŸ˜› ) about the same and offered to write docstrings for twython.

I spent a few hours and sent a patch with docstrings for all the functions. Now twython has complete documentation. Also Ryan is trying his best to implement wrappers for the newly introduced functions in Twitter API. So, currently Twython is THE best wrapper available in python for Twitter API.

PS : Next post will be “How to use twython”.

 

TIP: Simple Regular Expression for Email Validation

I was playing around with Ruby on Rails and needed an email field for some application. I thought of validating the email address before actually saving it. Googled for same and got tons of results, but none of them was perfect. Few were somewhat exhaustive but horribly difficult to understand and those which were simple wereΒ  not exhaustive enough for practical use. I thought I’ll just write the regex myself and came up with the following. I have tested it for few weird cases like an email hosted on a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.exmaple.com (a subdomain deep down the hierarchy) etc.

^[A-Za-z0-9._%+-]+@([A-Za-z0-9-]+\.)+([A-Za-z0-9]{2,4}|museum)$

I hope it’ll be of help. And don’t forget to pen down your suggestions via comments.

 

How To: Boot Fedora Faster

Note: These tricks apply to any Linux based OS. But I have tested them only on Fedora, so can’t say whether they’ll work on other Linux(s).

My current Fedora installation is now almost one and a half years old. Yes. I am still using Fedora 7 πŸ˜€ I have Fedora 10 on my other machine. Coming to the agenda, my Fedora installation has grown beyond control and I have services from named, squid, drbl, privoxy, vsftpd, vbox*, smb and what not on a personal desktop. These services really force my system startup to slow down to more than two minutes. While shutting down, its very easy to just cut the power supply but while booting up I can’t help and it frustrates me. And what frustrates me further that I have 4GB DDR2 RAM and AMD64 X2 5600+ (2.8GHz x 2) and booting time is still more than two minutes.

Agenda

  • Boot Fedora faster using whatever techniques possible.

Remove the services from normal order and delay their execution to a later stage. So, services like network, squid, privoxy, named, vsftpd, smb etc. doesn’t make sense unless I am not logged in and using them. Let us start them after we have login screen.

Turn off all the services by using the command

[root@bordeaux ~]# chkconfig service_name off

where service_name is the service you want to turn off.

Now create a file /etc/startup.sh. Enter a line like this

[root@bordeaux ~]# service service_name start

for every service that you have turned off in the Step 1.1 and you want it to be running after your machine starts up. Now, your startup.sh file should look like this

service network start &
service sshd start &
modprobe it87 &
modprobe k8temp &
/usr/bin/iptraf -s eth0 -B &
/usr/bin/iptraf -s lo -B &
service squid start &
service privoxy start &
service httpd start &
service mysqld start &
service named start &
service smb start &
service vboxdrv start &
service vboxnet start &
service vsftpd start &

Add the following line to /etc/rc.local file

/bin/bash /etc/startup.sh &

Done!!! Notice the &s in both files. They are for execution in background so that a process can block boot process. You’ll observe a drop of 10-20 seconds in system startup time.

Problem with Hack #1 : The execution is not really parallel. It executes like a process in the background. So we can’t get the real advantage of parallel execution.

Hack #2 solves this problem. Now we don’t put processes in background. We use daemon forking to fork a separate daemon process which will start all the services for us in parallel. Here we’ll get the real advantage and startup time will decrease further.

This step is totally similar to Step 1.1. So skipping it.

This step is also similar to Step 1.2. The /etc/startup.sh file should look like this.

service network start
service xinetd start
service crond start
service anacron start
service atd start
service sshd start
service rpcbind start
service rpcgssd start
service rpcimapd start
modprobe it87
modprobe k8temp
/usr/bin/iptraf -s eth0 -B
/usr/bin/iptraf -s lo -B
service nasd start
service squid start
service privoxy start
service httpd start
service iptables start
service lm_sensors start
service mysqld start
service named start
service nfs start
service nfslock start
service smb start
service vboxdrv start
service vboxnet start
service vsftpd start
service autofs start
service smartd start

Notice the absence of &s in the file.

Download the attached startup.py file attached at the end of this post or copy paste the following code to /etc/startup.py file.

#!/usr/bin/env python
# (C) Copyright 2008 Kulbir Saini
# License : GPL
import os
import sys
def fork_daemon(f):
    """This function forks a daemon."""
    # Perform double fork
    r = ''
    if os.fork(): # Parent
        # Wait for the child so that it doesn't defunct
        os.wait()
        # Return a function
        return  lambda *x, **kw: r
    # Otherwise, we are the child
    # Perform second fork
    os.setsid()
    os.umask(077)
    os.chdir('/')
    if os.fork():
        os._exit(0)
    def wrapper(*args, **kwargs):
        """Wrapper function to be returned from generator.
        Executes the function bound to the generator and then
        exits the process"""
        f(*args, **kwargs)
        os._exit(0)
    return wrapper
 
def start_services(startup_file):
    command = '/bin/bash ' + startup_file + ' > /dev/null 2> /dev/null '
    os.system(command)
    return
 
if __name__ == '__main__':
    forkd = fork_daemon(start_services)
    forkd(sys.argv[1])
    print 'Executing ', sys.argv[1], '[  OK  ]'

Add the following line to your /etc/rc.local file.

/usr/bin/python /etc/startup.py /etc/startup.sh

Thats it. Done!!! Now you’ll experience a boost of about 25-30 seconds of decrease in boot time.

Stats of my machine

With all services started in normal order : 2minutes.
With Hack #1 : 1minute 42 seconds.
With Hack #2 : 1minute.

Warning : These hacks may break your system and can make it unusable. Use at your own risk.

 

Hack: A Fast Network Scanning Program

I was searching for a simple tool which can do a port scanning in a huge network quickly without making me wait for ages. I first thought of using nmap, but it was a bit too complex and it takes a lot of time to discover the machines even after optimizing the parameters. After searching a lot, I wrote to one of my seniors, Sandeep Kumar, asking the details of his program which maintains a list of active FTP servers in the network. He replied with a reference to his own findings about the network scanning tools. He is using an enhanced version of a program originally written by Troy Robinson. I tried the program out of curiosity and found out that its damn fast as compared to nmap (no literal comparison) πŸ™‚ The program can be downloaded from here.

How to use

Compile the program using gcc as

[root@localhost ~]# gcc NetworkScanner.c [ENTER]

Now create a file IPRange.txt containing the IP address ranges for your network. The contents of the file may be

172.16.*.* Meaning all the IP address with first two parts as 172.16 and rest of the address will be generated by permutations.

172.16.1-16.* Meaning the first two parts are fixed. Third part will vary from 1 to 16. And the fourth part will be permuted from 0 to 255.

So an IPRange.txt may look like

1
2
172.16.1-16.*
192.168.36.*

Now run the program as

[root@localhost ~]# ./a.out port_to_be_scanned Parallel_attempts IP_list_file output.txt [ENTER]

Parallel_attempts is the number of processes that’ll be forked for scanning the network port. It is safe to have its value as 255. A very high value may hog the network or may even slow down your machine. So an example run would be

[root@localhost ~]# ./a.out 21 255 IPRange.txt Output.txt [ENTER]

Benchmarks

I carried out a lot of test on my network using the following setup and parameters

Machine : AMD X2 5600+ (2.6GHz Dual Core), 4GB 800MHz DDR2 RAM, Gigabit Ethernet Card (on 100mbps network).

Port : 21 (FTP)

IPRange.txt : Total 16896 IP Addresses

1
2
3
4
5
Machines on wired (100mbps) network
172.16.1-48.* 
192.168.36.*
Machines on wireless (54mbps) network
172.17.0-16.*

Network Scanner Benchmarks

Parallel Attempts

Scanning Time (seconds)

Upload Bandwidth (kbps)

255 180 13
512 90 25
1024 47 55
2048 25 100
4096 14 205
6144 11 307
8192 9 374

The interval between two scans was almost 30-40 seconds. I think parallelism beyond 8192 will crash my machine, so I didn’t try. You can try it at your own risk πŸ™‚ I hope this program help you scan your network.

 

How To: Write Custom Redirector or Rewritor Plugin For Squid in Python

Mission

To write a custom Python program which can act as a plugin for Squid to redirect a given URL to another URL. This is useful when already existing redirector plugins for Squid doesn’t suit your needs or you want everything of your own.

Use Cases

  1. When you want to redirect URLs using a database like mysql or postgresql.
  2. When you want to redirect based on mappings stored in simple text files.
  3. When you want to build a redirector which can learn by itself using AI techniques πŸ˜›

How to proceed

From Squid FAQ,

The redirector program must read URLs (one per line) on standard input, and write rewritten URLs or blank lines on standard output. Note that the redirector program can not use buffered I/O. Squid writes additional information after the URL which a redirector can use to make a decision.

The format of the line read from the standard input by the program is as follows.

1
2
3
URL ip-address/fqdn ident method
# for example
http://saini.co.in 172.17.8.175/saini.co.in - GET -

The implementation sounds very simple and it is indeed very simple to implement. The only thing that should be taken care of is the unbuffered I/O. You should immediately flush the output to standard output once decision is taken.

For this howto, we assume we have a method called ‘modify_url()‘ which returns either a blank line or a modified URL to which the client should be redirected.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#!/usr/bin/env python
 
import sys
def modify_url(line):
    list = line.split(' ')
    # first element of the list is the URL
    old_url = list[0]
    new_url = '\n'
    # take the decision and modify the url if needed
    # do remember that the new_url should contain a '\n' at the end.
    if old_url.endswith('.avi'):
        new_url = 'http://fedora.co.in/errors/accessDenied.html' + new_url
    return new_url
 
while True:
    # the format of the line read from stdin is
    # URL ip-address/fqdn ident method
    # for example
    # http://saini.co.in 172.17.8.175/saini.co.in - GET -
    line = sys.stdin.readline().strip()
    # new_url is a simple URL only
    # for example
    # http://fedora.co.in
    new_url = modify_url(line)
    sys.stdout.write(new_url)
    sys.stdout.flush()

Save the above file somewhere. We save this example file in /etc/squid/custom_redirect.py. Now, we have the function for redirecting clients. We need to configure squid to use custom_redirect.py . Below is the squid configuration for telling squid to use the above program as redirector.

1
2
3
4
5
6
# Add these lines to /etc/squid/squid.conf file.
# /usr/bin/python should be replaced by the path to python executable if you installed it somewhere else.
redirect_program /usr/bin/python /etc/squid/custom_redirect.py
# Number of instances of the above program that should run concurrently.
# 5 is good enough but you should go for 10 at least. Anything below 5 would not work properly.
redirect_children 5

Now, start/reload/restart squid. That’s all we need to write and use a custom redirector plugin for squid.