Note : A newer version of youtube cache is available now. Please check this.
Mission
To cache Youtube videos using Squid proxy server so that subsequent requests for the same video can be served from the cache saving the loading time and some bandwidth.
Use Cases
- To reduce the loading time for the subsequent requests on a shared network.
- To save some bandwidth which is wasted in serving the same video again and again.
- When the clients on your proxy are in thousands and they love browsing youtube all the time (universities are typical examples
), the 2nd factor becomes an issue. - When you don’t want to hack the refresh_pattern in squid and don’t want to disturb the dynamic content.
How to proceed
I tried a lot of squid hacks listed all around the web. But none of them seems to work for me. And even if they work, the weird load balancing system of youtube servers result in weird behaviors. I was working on my GSOC project which is also related to caching but of RPMs, when I got this idea of caching youtube videos using custom squid redirectors. Sometime back, I posted a tutorial on how to write custom squid redirectors in Python. I extended that to write a squid plugin to cache youtube videos which I call youtube_cache.
Method of operation for caching
- Squid gets a request for a url from the client.
- youtube_cache taps the url
- CHECK: if url is .youtube.com and has a get_video request
- Extract the 11 character long unique video id ( this is different for all the videos on youtube)
- CHECK: if video is already in cache
- CHECK: if cached video is newer or same as remote video (last-modified timestamp)
- Serve from cache and exit
- ELSE: Download and cache the new video in background ( using daemon forking )
- CHECK: if cached video is newer or same as remote video (last-modified timestamp)
- ELSE: Download and cache the new video in background ( using daemon forking )
- ELSE: url is not from any of the youtube servers. Let squid handle it
Getting youtube_cache
The module which I am currently using to cache the youtube videos is available here. Readme is included in which all the necessary steps for apache and squid configurations are included. You need to play around with the caching directory permissions to make it work properly.
I have tested the script with squid-2.6STABLE16 on Fedora 7. And it is caching videos properly and serving the subsequent requests successfully.
Todo
There are a few things which I am looking forward to implement.
- On the fly caching and serving of the videos.
- Caching videos from other sites like Google, Metacafe etc..
- Aging feature which will delete videos from cache after a certain time.
- Any other thing you can suggest
.













{ 1 trackback }
{ 9 comments… read them below or add one }
Another alternative is to use the store_url rewrite stuff in Squid 2.7-HEAD (written by Adrian Chadd) — here.
I would be more than happy if you put this thing on our proxy server. Will save a lot of time and bandwidth
.
awesome I liked it
really a fast youtube OR video streaming is future
can you do it for other video sites as well like
video.google.com OR metacafe.com ??
nice work
cheers!!
lHi, I have check but got crash.
logging.basicConfig(
format=’%(asctime)s %(levelname)s (message)s’,
level=logging.DEBUG,
filename=logfile,
filemode=’a’
)
can’t write log file. Squid craches frequently with this code.
(Among many other things,) we had tried doing something similar for a week on 204 . It was a pointless exercise.
The basic assumption was, that on a LAN, (such as a college’s) a video that has been requested once is likely to be requested soon by someone else. But the crucial question was: What percentage of viewed videos are actually requested again?
Turns out _very_ few.
On a given day, predicting which of the several videos just seen will be the ones to be requested again is almost impossible. So, the only choice is to have a liberal caching policy (i.e., cache almost every requested video in the last few hours). But of course we didn’t do this, because the cost of storing such large files. ( a single video ~= 20-30 webpages)
In general, for large files o(500kb) it is best to not implement caching. The other option is to implement client-feedback based caching. i.e, a client requests the server to cache that file explicitly. (assuming its a sensible client)
I agree to all your points. But keeping in mind our poor internet connection, don’t you think caching youtube is actually required. And apart from everything, there is a golden rule “Storage is way cheaper than bandwidth”.
I am not sure about client feedback based caching because no one really cares. Everybody just want the service without active participation.
[...] from above, I have been working on IntelligentMirror, my GSOC project and its sister project Youtube Caching using squid. I have achieved 100% youtube caching without altering the refresh patterns in squid. That means [...]
I have not been able to make this working. Squid doesn’t start unless I comment out the new configs in squid.conf. It could be that I havent configured apache properly before this.
Anyways I will wait for Ubuntu 8.10 in which squid 2.7 replaces squid 2.6.
can i view a private video by any chance ?