A lot has changed since the last blog post (more than three years). I was happily running a successful business around Videocache till Google decided to push HTTPS really hard and enforced SSL even for video content. That rendered Videocache completely useless as YouTube video caching was the unique selling point. Though people are still using it for other websites (whatever supported and not HTTPS yet), I personally didn’t find it good enough for selling. To add to the trouble, Mozilla and friends announced that there will be free certs for everyone. Now, that took away whatever motivation was left to keep working on Videocache. I decided to open source Videocache and the source is now available on GitHub. If you have better ideas or you are looking forward to make things work by forging certs etc, fork it and give it a shot.
In 2015, if you are web developer, you must know how APIs work and you should be able to consume them. So, to learn to expose APIs and version them properly, I fired a small project Pixomatix. Being a Rails developer, you really get obsessed with it and try to implement everything using Rails. Even when you want an API with 2-3 endpoints, you tend to make the horrible mistake of doing it in Rails. This kept bugging me and a few weeks later, I decided to freshen up my Sinatra memories. But working with Sinatra is not all that easy especially if you are used to all the niceties of Rails. Dug up my attempt of implementing Videocache in Ruby, and extracted few tasks and configurations I had automated long time ago. Ended up working a lot more on it and packages into a template app with almost all the essential stuff. Though I need to document it a little more, the app has got everything needed to expose a versioned API via Sinatra.
On the other hand, I tried to use devise gem to authentication for Pixomatix. It was all good for integration with standard web apps and APIs but it sort of failed me when I tried to make the API versioned. Devise turned out to be black-hole when I tried to dig deeper to make things work. I tried a few other gems which supported token authentication but they were also no good for versioning. Generally, you may not need to version the authentication part of your API, but what if you do! Since, this was just a learning exercise, I was hell bent on implementing this. So, I just reinvented the wheel and coded basic authentication (including token authentication) for the API.
That’s it for this post. I am looking forward to post regularly on the new stuff I learn.
I have not blogged since a long time mainly because I was a bit busy authoring a book Squid Proxy Server 3.1: Beginner’s Guide for Packt Publications. The book is an introductory guide to Squid (especially the new features in Squid-3 series) covering both the basic aspects as well as the in dept details for advanced users. The book focuses on learning by doing and provides example scenarios for the concepts discussed throughout the book. Access control configuration, reverse proxying, interception proxying, authentication and other features have been discussed in details with examples.
Checkout the links below:
There is a vulnerability in the way Google authentication service works. Whenever you login to any of the Google’s online services like GMail, Orkut, Groups, Docs, Youtube, Calendar etc., you are redirected to an authentication server which authenticates against the entered username and password and redirect back to the required service (GMail, Youtube etc.) setting the session variables.
Now, if you are able to grab the url used to set the session variables, you can login as the user to whom that url belongs from any machine on the Internet (need not be the machine belonging to the same subnet) without entering the username and password of the user.
The proxy servers in the organizations can be used to exploit this vulnerability. Squid is the most popular proxy server used. In the default configuration, squid strips the query terms of a url before logging. So, this vulnerability can’t be exploited. But if you turn off the stripping mechanism by adding the line shown below, then squid will log the complete url.
So, after turning stripping mechanism off, the log will contain urls which will look like this
Replace .co.in with your tld specific to your country. If you paste this url in any browser, it’ll directly log you in and you can do whatever you want to that account. Remember that all such urls remains valid only for two minutes. So, if you use that url after two minutes, it’ll lead nowhere.
At the time of writing this post Orkut, Google Docs, Google Calendar, Google Books and Youtube are vulnerable.
So, make sure your squid has stripping mechanism turned on and your squid server is properly firewalled.
You can watch the Video proof for Orkut on Blip.tv, Youtube.
After spending a lot of time with youtube cache, now I am trying to devote some time to update intelligentmirror with required features and enhancements that youtube cache already enjoys. In the same direction here is version 0.5 of intelligentmirror.
- Added max_parallel_downloads options to controll the maximum threading fetching from upstream to cache the packages.
- Fine grained control on logging via max_logfile_size and max_logfile_backups option.
- Added setup script to help you install intelligentmirror. No need to execute commands one by one for installation. Just run
[root@localhost]# python setup.py install [ENTER]
- Added update script (update-im). So in case you decide to change the locations for caching rpm/deb packages, just run
[root@localhost]# update-im [ENTER]
[root@localhost]# /usr/sbin/update-im [ENTER]
- Download scheduler similar to youtube cache is added to facilitate the download queing in case of large number of requests.
- More informative logging.
- cache.log is not flooding anymore with XMLRPC logs and python tracebacks.
- Added extensive exception handling thoughout the program.
- RPMs for Fedora/Red Hat/Cent OS
- Source RPMs for Fedora/Red Hat/Cent OS
- Source Tar balls
Installation and Configuration
INSTALL and README files should help you throughout the installation and configuration process.
In case you have questions, ask them here in comments. Suggestions for improvement are welcome 🙂
Warning : This version of IntelligentMirror is compatible with only squid-2.7 as of now. It is NOT compatible even with squid-3.0.
IntelligentMirror Version 1.0.1
I have been following squid development regularly (at least the part in which I am interested) and they have introduced a new directive in squid-2.7 known as StoreUrlRewrite (storeurl_rewrite_program). Using this directive you can instruct squid to cache url A (http://abc.com/foo/bar/version/crap.rpm) as url B (http://proxy.fedora.co.in/intelligentmirror/crap.rpm). In simple words you can direct squid to cache any url as any other url without any extra efforts.
So keeping the above directive in mind, I have worked out a different version of intelligentmirror especially for squid-2.7.
IntelligentMirror : Old method of operation
- IntelligentMirror gets a client request for a URL.
- Check: if URL is not in (RPM, metadata file)
- Then its none of our business.
- Let proxy handle it the normal way.
- Done and exit.
- Check: if RPM/metadata is available in cache
- Stream the RPM/metadata from cache.
- Done and exit.
- Check: if RPM/metadata is not available in cache
- Download in parallel for caching in some dir and stream.
- Done and exit.
IntelligentMirror : New method of operation
- IntelligentMirror gets a client request for a URL.
- Check: if request for rpm
- Direct squid to cache the request as http://<same_host_all_the_time>/intelligentmirror/<rpmname>.rpm
- Check: if request for deb
- Direct squid to cache the request as http://<same_host_all_the_time>/intelligentmirror/<debname>.deb
- Done and exit.
So your squid will see every request for an rpm package as a request http://<same_host_all_the_time>/intelligentmirror/<rpmname>.rpm. So, if you happen to request the same rpm from a different mirror, it’ll still be served from cache 🙂
- No need to check if the url supplied by squid is for rpm or not because storeurl_rewrite_program has an acl controller attached which will invoke intelligentmirror for urls ending in .rpm .
- No need to check if the url is already cached or not. No need to worry about the directory where you are going to store the packages. No human intervention is needed in maintaining the cache. Almighty squid is doing everything for us.
- No need to worry if the target package has changed because of the resigning or whatever because squid will do that for you.
- No need to actually download the package in parallel for caching because squid is already doing that.
- No need to worry about the hashing algorithms and storage optimizations for the cached content.
- RPM for Fedora/Red Hat
- Source RPM for Fedora/Red Hat
- Source Tarball
Install and Configure
The install and configure files should be enough to guide you through the installation if you choose the tar ball way. Otherwise you can always install from rpm from the above link.
Note1: You have to configure your squid to use intelligentmirror as a plugin even if you install via rpm. Check the configure file at the above link.
Note2: StoreUrlRewrite will probably be available in squid-3.1.
IntelligentMirror version 0.4 is available now. There have been significant improvements in intelligent mirror since last release.
- Fixed defunct process problem. You will not see defunct python processes hanging around anymore. Previously every forked daemon used to got defucnt because parent never waited for the forked child to finish.
- IntelligentMirror now supports caching of Debian packages just like rpms. So now IntelligentMirror is best suited shared environments where people have different tastes.
- Intelligent Mirror now uses url_rewrite_program instead of redirect_program. This boosts the efficiency of IntelligentMirror by a significant factor as url_rewrite_program has an acl controller url_rewrite_access. And using url_rewrite_access only requests for rpm/deb packages will be passed to Intelligent Mirror. So, IM now need not process each and every incoming request. Also, it has redirector_bypass directive which will bypass IM in case all the instances of IM are busy serving requests. So, squid will not die with a fatal error in case of huge requests.
- Options to enable/disable caching for rpm and Debian packages have been added.
- Options to control the total size of caching directories and the size of individual package to be cached have also been introduced.
- Proxy authentication is also supported now just the way it is supported in yum.
- Packages are not checked for last-modified time anymore. Because in principle two rpms A and B can only have same name iff they have the same contents. So, the delay in response time in case of hits has reduced.
- RPMs for Fedora/Red Hat
- Source RPMs for Fedora/Red Hat
- Source Tar balls
Installation and configuration is easy and the INSTALL and README files should serve the purpose.
In case you have any suggestions or problems, leave a comment here or file a ticket on project page.
Note : A newer version of intelligentmirror is available now. Please check this.
Intelligent Mirror is basically a tool or squid plugin (redirector) to cache rpm packages so that the subsequent requests for the same package can be served from the local cache which will eventually save a lot of bandwidth and downloading time.
Who needs Intelligent Mirror?
- If you are on a shared network where a lot of people use linux distros with RPM as their package manager, then you need this. Universities should come under this category.
- If you have a set of systems having red hat derivatives and almost identical OS versions, you need this. LAN setups at home should come under this category.
- If you can’t afford to or don’t want to mirror entire fedora repo for local access due to bandwidth limitations, you need this.
What it does?
As described above, Intelligent Mirror, just caches rpms which are requested by the clients in a shared network. And subsequent requests for those rpms are served from the cache. For a detailed description, check the project page.
Why not use Squid in caching mode?
Squid caching is based on url hashing. Let me explain with an example how Intelligent Mirror is actually intelligent as compared to squid while caching rpms.
Let us say there is an rpm yum-3.2.0-1.fc7.i386.rpm . You executed “yum update yum“. And let us say the newer version of yum is yum-3.2.18-1.fc9.i386.rpm which was fetched from one of the fedora mirrors http://abc.com/ (say). Now someone on the same network launched “yum update yum” and he got the same rpm yum-3.2.18-1.fc9.i386.rpm. But this time rpm was fetched from another mirror http://xyz.com/ (say).
Case I : Squid caching
Squid will cache http://abc.com/linux/fc9/updates/i386/yum-3.2.18-1.fc9.i386.rpm . And when http://xyz.com/linux/fc9/updates/i386/yum-3.2.18-1.fc9.i386.rpm will be requested, it’ll result in a cache miss and squid will again download the same package and will cache this one as well. Now there are two problems
- Squid is not able to serve from the cache, though the package was the same.
- Additional storage space is being wasted in caching the same package. And this can really harm if unluckily a different mirror is picked in all the subsequent queries.
Case II : IntelligentMirror caching
Intelligent Mirror will cache the package yum-3.2.18-1.fc9.i386.rpm without bothering about its origin. And even if yum picks up a different mirror for the subsequent request, the package will be served from the cache and will not be fetched from upstream. So, the obvious advantage of saving the bandwidth and downloading time.
Intelligent Mirror source tarball, rpm, source rpm are available for download from here.
Installing and Configuring Intelligent Mirror
Issues and Suggestions
If you see any issue or you have any suggestions for improving the functionality, either mail me at kulbirsaini25 AT GMAIL DoT COM or file a ticket on the project page.
IntelligentMirror can be used to create a mirror of static HTTP content on your local network. When you download something (say a software package) from Internet, it is stored/cached on a local machine on your network and subsequent downloads of that particular software package are supplied from the storage/cache of the local machine. This facilitate the efficient usage of bandwidth and also reduces the average download time. IntelligentMirror can also do pre-fetching of RPM packages from fedora repositories spread all over the world and can also pre-populate the local repo with popular packages like mplayer, vlc, gstreamer which are normally accessed immediately after a fresh install.
Definition for a lay man
Think of Internet as a hard disk, your proxy server as a cache and your Intranet as a CPU. Now, whenever your CPU needs to process something, it needs data from cache. If data is not there in cache, it’ll be fetched from RAM and/or hard disk. IntelligentMirror sits on your proxy server and keep caching packages in a browsable manner which can be served via http for subsequent requests.
For further details about IntelligentMirror, go here.
After getting the hosting space on fedorahosted.org, I pushed the code I have written. You can check the source tree here.
We are buidling IntelligentMirror as a plugin to squid which taps requests from clients and checks them against a cache. Checkout how to write a custom redirector or how to tap requests to squid. And acts accordingly. We are working on live streaming the partially downloaded package to the end user while caching it.
If you have any suggestion, feel free to leave them as a comment here or edit the wiki page 🙂
To configure squid for simple proxying without caching anything.
- When you want to have control on what people browse on your lan.
- When number of machine is more than the number of IP addresses you can afford to buy.
- When you want to help this holy world in saving some IPV4 addresses 😛
- You have a machine connected directly to internet that you are going to use as a proxy server for other machines on your network.
- The machines on your network are using 192.168.0.0/16 as private address space. You can use anyone/multiple address spaces of the available but for this howto we assume 192.168.0.0/16 as the local network.
- The local IP address of the machine which will run squid proxy server is 192.168.36.204. You can have any IP, but for this howto we assume this.
How to proceed
First of all ensure that you have squid installed. After installing squid, you need to set access control in squid configuration file which resides in /etc/squid by default. Open /etc/squid/squid.conf and add/edit following lines according to your preferences. Few lines already exist in the configuration file, you can add the rest.
# The port on which squid will listen for requests
# If 'cgi-bin' or '?' is in query, squid should not check with neighbours'/parents' cache
# and should go to target web-server.
hierarchy_stoplist cgi-bin ?
# If url contains 'cgi-bin' or '?', then it must not be cached
acl QUERY urlpath_regex cgi-bin \?
cache deny QUERY
acl apache rep_header Server ^Apache
broken_vary_encoding allow apache
# Absolute path to squid access log.
access_log /var/log/squid/access.log squid
refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern . 0 20% 4320
# Access control list to control every IP address
acl all src 0.0.0.0/0.0.0.0
# Access control list for source machine in LAN
acl lan_src src 192.168.0.0/16
# Access control list for destination machine in LAN
acl lan_dst dst 192.168.0.0/16
# Access control list to manage squid cache
acl manager proto cache_object
# Access control list to define IP address allowed for source localhost
acl localhost src 127.0.0.1/255.255.255.255
# Access control list to define IP addresses allowed for localhost as destination
acl to_localhost dst 127.0.0.0/8
# Access control list to define Safe ports that should be allowed by default
acl SSL_ports port 443 563 1863 5190 5222 5050 6667
acl Safe_ports port 80 # http
acl Safe_ports port 21 # ftp
acl Safe_ports port 443 # https
acl Safe_ports port 70 # gopher
acl Safe_ports port 210 # wais
acl Safe_ports port 1025-65535 # unregistered ports
acl Safe_ports port 280 # http-mgmt
acl Safe_ports port 488 # gss-http
acl Safe_ports port 591 # filemaker
acl Safe_ports port 777 # multiling http
acl CONNECT method CONNECT
# Allow cache management only from localhost
http_access allow manager localhost
# Deny cache management from remote hosts
http_access deny manager
# Deny http access via all the ports which are not listed as safe
http_access deny !Safe_ports
# Deny all connections via all ports which are not listed as safe
http_access deny CONNECT !SSL_ports
# Allow http access from localhost
http_access allow localhost
# Allow http access from machines on LAN
http_access allow lan_src
http_access deny all
http_reply_access allow all
icp_access allow all
# Deny caching for everyone so that there is not caching at all
cache deny all
# Never allow direct connection to machines on the internet
never_direct allow all
# Allow direct connetion if the destination machine is on LAN
always_direct allow lan_dst
# Delete this line if you don't have /etc/hosts file
# Allow AIM connections
# Delete the following 9 lines if you don't want people to connect to AIM
acl AIM_ports port 5190 9898 6667
acl AIM_domains dstdomain .oscar.aol.com .blue.aol.com .freenode.net
acl AIM_domains dstdomain .messaging.aol.com .aim.com
acl AIM_hosts dstdomain login.oscar.aol.com login.glogin.messaging.aol.com toc.oscar.aol.com irc.freenode.net
acl AIM_nets dst 184.108.40.206/255.255.0.0
acl AIM_methods method CONNECT
http_access allow AIM_methods AIM_ports AIM_nets
http_access allow AIM_methods AIM_ports AIM_hosts
http_access allow AIM_methods AIM_ports AIM_domains
# Allow connections to Yahoo Messenger
# Delete the following 6 lines if you don't want people to connect to Yahoo Messenger
acl YIM_ports port 5050
acl YIM_domains dstdomain .yahoo.com .yahoo.co.jp
acl YIM_hosts dstdomain scs.msg.yahoo.com cs.yahoo.co.jp
acl YIM_methods method CONNECT
http_access allow YIM_methods YIM_ports YIM_hosts
http_access allow YIM_methods YIM_ports YIM_domains
# Allow connections to Google Talk
# Delete the following 6 lines if you don't want people to connect to Google Talk
acl GTALK_ports port 5222 5050
acl GTALK_domains dstdomain .google.com
acl GTALK_hosts dstdomain talk.google.com
acl GTALK_methods method CONNECT
http_access allow GTALK_methods GTALK_ports GTALK_hosts
http_access allow GTALK_methods GTALK_ports GTALK_domains
# Allow connections to MSN
# Delete the following 6 lines if you don't want people to connect to Google Talk
acl MSN_ports port 1863 443 1503
acl MSN_domains dstdomain .microsoft.com .hotmail.com .live.com .msft.net .msn.com .passport.com
acl MSN_hosts dstdomain messenger.hotmail.com
acl MSN_nets dst 220.127.116.11/255.255.255.0
acl MSN_methods method CONNECT
http_access allow MSN_methods MSN_ports MSN_hosts
Now, start the squid proxy server as
Also, if you want squid to be started every time you boot the machine, execute the following command
chkconfig --level 345 squid on
You have a squid proxy server running now. You can ask clients to configure there browsers to use 192.168.36.204 as a proxy server with 8080 as proxy port. Command line utilities like elinks, lynx, yum, wget etc. can be asked to use proxy by exporting http_proxy variable as below. Users can also add these lines to ~/.bashrc file to avoid exporting every-time.
I highly recommend the book “Squid Proxy Server 3.1: Beginner’s Guide (Paperback)” for further reading.
You can get our complete MB5-294 exam pass resources including our latest MB6-288 and MB7-227 training courses. Our 70-272 and 70-162 are also playing vital role in IT world.
To write a custom Python program which can act as a plugin for Squid to redirect a given URL to another URL. This is useful when already existing redirector plugins for Squid doesn’t suit your needs or you want everything of your own.
- When you want to redirect URLs using a database like mysql or postgresql.
- When you want to redirect based on mappings stored in simple text files.
- When you want to build a redirector which can learn by itself using AI techniques 😛
How to proceed
From Squid FAQ,
The redirector program must read URLs (one per line) on standard input, and write rewritten URLs or blank lines on standard output. Note that the redirector program can not use buffered I/O. Squid writes additional information after the URL which a redirector can use to make a decision.
The format of the line read from the standard input by the program is as follows.
URL ip-address/fqdn ident method
# for example
http://saini.co.in 172.17.8.175/saini.co.in - GET -
The implementation sounds very simple and it is indeed very simple to implement. The only thing that should be taken care of is the unbuffered I/O. You should immediately flush the output to standard output once decision is taken.
For this howto, we assume we have a method called ‘modify_url()‘ which returns either a blank line or a modified URL to which the client should be redirected.
list = line.split(' ')
# first element of the list is the URL
old_url = list
new_url = '\n'
# take the decision and modify the url if needed
# do remember that the new_url should contain a '\n' at the end.
new_url = 'http://fedora.co.in/errors/accessDenied.html' + new_url
# the format of the line read from stdin is
# URL ip-address/fqdn ident method
# for example
# http://saini.co.in 172.17.8.175/saini.co.in - GET -
line = sys.stdin.readline().strip()
# new_url is a simple URL only
# for example
new_url = modify_url(line)
Save the above file somewhere. We save this example file in /etc/squid/custom_redirect.py. Now, we have the function for redirecting clients. We need to configure squid to use custom_redirect.py . Below is the squid configuration for telling squid to use the above program as redirector.
# Add these lines to /etc/squid/squid.conf file.
# /usr/bin/python should be replaced by the path to python executable if you installed it somewhere else.
redirect_program /usr/bin/python /etc/squid/custom_redirect.py
# Number of instances of the above program that should run concurrently.
# 5 is good enough but you should go for 10 at least. Anything below 5 would not work properly.
Now, start/reload/restart squid. That’s all we need to write and use a custom redirector plugin for squid.