The other day I was a bit bored so I wanted to try something new. I wanted to build a proxy server which required me to authenticate, but I was not satisfied with the basic HTTP authentication options Squid gave me. I wanted a nice looking webpage, with a form. I also wanted something which would allow “visitor self service”, like a “I forgot my password”-page, or a page where a user could pay and sign up.
This is the page I get when typing www.google.com in the address bar of my browser. Only after authenticating, I get granted access to the internet. It isn’t just web filtering, every application that requires internet access will be denied until the authentication process has been completed.
What I was looking for, is called a Captive portal. According to Wikipedia:
The captive portal technique forces an HTTP client on a network to see a special web page (usually for authentication purposes) before using the Internet normally. A captive portal turns a Web browser into an authentication device. This is done by intercepting all packets, regardless of address or port, until the user opens a browser and tries to access the Internet. At that time the browser is redirected to a web page which may require authentication and/or payment, or simply display an acceptable use policy and require the user to agree. Captive portals are used at most Wi-Fi hotspots, and it can be used to control wired access (e.g. apartment houses, hotel rooms, business centers, “open” Ethernet jacks) as well.
I got inspired to do this after reading about Kapcheng, a project with exactly the same goal as mine. I asked for his sources and I got them, but after looking through them I saw it was not exactly what I was looking for. So, I decided to “roll my own”.
This is exactly what I was looking for, and I implemented it in a nice way. In my implementation, I’m using Squid as a proxy server with all clients behind a router. This means that all requests coming from clients within the same network (machines 1 and 2) will be authenticated, not just per client.
This is a downside to my implementation, because of the router which I mentioned earlier, but the techniques I used here are still usable within a network. In that situation, clients connect directly from their machines to the machine running Squid (the gateway), and Squid catches the traffic and redirects it to the authentication page, or allows the traffic.
Technical implementation
I have a virtual private server (VPS) hosted at Strato, which isn’t expensive but gives me all the benefits of having a dedicated server. It’s much cheaper because it uses virtual machines. The VPS I have is running on CentOS 5.3, which is a full-blown Linux distribution optimized for servers.
On top of the CentOS-machine, I run Apache, PHP and MySQL – it’s a pretty typical LAMP setup. I install and update components using yum, because this doesn’t involve much installation work for me.
Squid is a free, open source, web proxy server. It caches often-requested pages and images, so delivery to the end user is somewhat faster. It also has nice features on redirecting traffic called a redirector, and this is what I am using for the authentication. Squid has built-in authentication mechanisms, but these give a pop-up box and this does not fit my requirements so I can’t use them.
Because each and every request is redirected, I needed a quick way of looking up if the requestor still has the required permissions for that request. If he has, the request should be allowed. If there is no permission, the user is redirected to the proxy authentication web page by means of a HTTP 302 redirection header.
Because file system queries are intensive, and using a database for recording active sessions is overkill as well, I needed a different technique. A quick, easy way to store some data and retrieve it without the overhead of querying a file system.
There is a program which does exactly that – it’s called memcached:
memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.
Danga Interactive developed memcached to enhance the speed of LiveJournal.com, a site which was already doing 20 million+ dynamic page views per day for 1 million users with a bunch of webservers and a bunch of database servers. memcached dropped the database load to almost nothing, yielding faster page load times for users, better resource utilization, and faster access to the databases on a memcache miss.
Memcached has a small downside, which makes perfect sense given memcached’s origin: after restarting memcached, all data stored in it is gone! This is no problem however, as active sessions aren’t that important to store: a user can quickly reauthenticate and then continue his work.
The Squid part
In my squid.conf, I added this line, which will start a few processes (called “children”) which will in turn take all requests and process it:
url_rewrite_program /etc/squid/scripts/redirect.php
The redirect.php-file, as mentioned in above line, takes the client IP and matches it against the memcached table. In my implementation, I take the sha1-hash of each IP and store it in memcache, along with the username and session start time. This is useful in order to have a session expire after a specific amount of time, for example, 8 hours.
After receiving a request, redirect.php (running as a shell script) checks in memcached if the hash exists, and if it’s still valid. If it does, the request is permitted and the original requested URL is outputted back to Squid. If authentication is required, the original URL is base64-encoded and the client is redirected to an authentication page by sending a HTTP 302 header with the base64-encoded URL appended to it.
Caution: you do need to check if the requested URL isn’t part of your authentication server or payment service provider – it will cause an infinite loop if it is. Because you receive the entire URL, you can perform some matching rules against it… but keep it modest – too much regexp’s will slow things down and that’s exactly what you don’t want.
The client part
Because the user will authenticate from a web page, you’ll need a web server. Apache is the most common used webserver, but Lighttpd will do just fine, or IIS, or any other webserver. I chose Apache because I already had it running I set up an additional port for the web server to listen on, but this isn’t required, it can also run on a default port.
I use a MySQL database for storing user information – user name, password, e-mail address, etc. This information will be used upon initial authentication or registration, after which the user gets authenticated or not.
When a user lands on the web page, a form is displayed asking for credentials. After submitting the form, his credentials are verified. If they’re invalid, the user is redirected back to the authentication page and gets asked for his credentials again.
If his credentials are valid, an entry is added to memcached and the user is redirected back to the URL he requested (which had been supplied to the authentication page by using the base64-encoded version).
Of course, just displaying a form isn’t the only thing possible – you could also add a payment page, or just make it a general Terms and Agreements-page which a user has to agree to before granting him access to the network. The sky is the limit.
In my own implementation, I used the Kohana PHP framework, which is a model-view-controller based framework and is easy to start with. The framework provides all kind of nifty features such as clean URLs, a database layer (you don’t even have to write your own queries anymore), and much more.
Caution: if you’re using Squid as a web proxy server after a router (proxy server configured from within a browser), you can’t use HTTPS on your web server because Squid can’t add the X-FORWARDED-FOR header to HTTPS-streams, which means that the auth. page will receive the proxy server’s IP address and not that of the client that wants access!
When you’re using Squid as a transparent proxy (i.e. no configuration for the connecting clients, and the Squid server is in the same network) you can use HTTPS. This will be the case in most hotels, bars, restaurants, and offices.
Wrapping it up
Of course, this is just a start, and you can do much more than this – you could add a nice “pay first, then get access”-form, it’s entirely up to you. I’m not releasing my own codes yet, because I want to clean it up a bit first. However, eventually I will put up a nice tarball which should be good to go.
The technologies I used are Linux, Squid, Apache, MySQL, PHP and Memcached. They’re all free, fast, well-supported and do the job exactly as I want them to. The time it took me to create this? A few hours. I hope this can be of any value to somebody, feedback is always welcome. Feel free to leave a message, or send me an e-mail.
This post was originally posted on a diffetent weblog. Because a migration did not succeed, some comments got lost. My apologies for that.
0 Responses to “Using Squid to build a captive portal, for free!”