Archive

Posts Tagged ‘browser agent’

Created a browser agent API with CodeIgniter

December 7th, 2009

I’ve created my first ever API. I often work with and develop applications around APIs from other providers such as the Twitter API or one of the many APIs provided by Google but this is my first time creating an API that others can use.

Built using CodeIgniter, the API has a simple purpose, to take in a browser agent id string and return whether it thinks the browser agent is a bot or a regular web browser such as Internet Explorer being used by a person browsing the web.

Using CodeIgniter made the creating of the API pretty easy. CodeIgniter creates friendly URLs and PHP has the ability to encode a JSON output out of the box. The output is cached for quick performance and usage of the API is logged. It took only a couch, a laptop and an afternoon to get it all going.

Its one of several APIs I’m planning, I’ll be using it for some of my own software products in the future.  I’m hoping to make each one accessible to other developers too, one or more of them might be useful in other peoples applications. Maybe someone might make some cool mashups with them.

As for the information that feeds the API, this comes from a simple PHP script that is on this blog and collects the names of all browser agents that visit the site. One of the other participants on the Genesis Programme I’m on, Garry Bennett who owns and operates www.mytown.ie has allowed me to put the script on his site too.

Mytown gets a lot of traffic, a way more than this blog. His site is the main site collecting all the browser agents, over 330,000 browsers so far.  In the few days that the script is on his site more than 7,000 unique browser agents have been recorded compared to the 400 or 500 that were recorded from my own site in a similar length of time.

I have a page which shows the number of browser agents seen and the number of distinct agents recorded. If anyone has a lot of traffic to their site and would like to help collect browser agent information, please let me know.  The script is a line or two of code for the footer of a page and doesn’t slow down the loading time of a page or collect any other information, just the browser agent visiting the site.

Having a list of browser agents on its own doesn’t do much though. I needed a way to be able to see each agent one at a time and label it as a web robot (an automated programme such as the Google Robot which visits sites to check for new content) or a regular browser agent a person would use to browse the web.

I put up a basic page called ‘Bot or Not’. This page shows a random browser agent 1 at a time and asks the user if the agent they see is a bot or not. Sometimes is easy enough to spot a web robot but not always. A techie person looking at the string would be able to tell easily enough.

Each time a person votes on whether the agent is a bot or not, the vote is recorded. It doesn’t assume the person answering is absolutely correct, it will ask a user to vote on that browser agent again in time and record all votes. The system will label the agent according to which ever has most votes. When using the Bot or Not API, the result you get back contains the browser agent you are testing, its decision on whether the agent is a bot or not and also shows the ‘bot’ vote count and the ‘not’ vote count.

Heres a sample output from the API:

{”agent”:”8feef41ca25f9763304ac81247b22cfd”,”bot_votes”:”0″,”not_votes”:”1″,”decision”:”not”}

The browser agent is hashed to make it shorter and easier to pass to the API, browser agent strings can often be very long and can contain various character symbols that could confuse the system. In the API output about you can see the bot vote is 0 and the not vote is 1 so the overall decision is that this is not a web robot.

Developers could have may uses for this API. They could use is to test incoming traffic to their site to block or redirect bots in case bots were causing the system to slow down with too many page requests of perhaps there is a bot coping content from a site.

I’ve been spending some time rating each browser agent myself using the Bot or Not page. Of the 7000 or so unique browser agents there, nearly 700 of them are obvious bots, such as the MSN, Google and Yahoo bot. If you have a minute, rate a few of them if you can.

Ironically, something I forgot about when creating the Bot or Not page was that bots such as Googlebot will be visiting that page too and clicking on the ‘bot’ and ‘not’ links.  I’ll be my own first customer to use the API to examine if the votes were made by bots or people.

Using the API

If you want to use the API, please do. To access it use the following URL

http://api.murrion.com/agent/[MD5 of Agent to Test]

Example:

Testing the browser agent :

Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729)

Use the PHP md5 function:

md5(”Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729)”);

Call the API with the md5 output:

http://api.murrion.com/agent/1b08a1420f959565a86c4554cc16f81f

JSON output

{”agent”:”1b08a1420f959565a86c4554cc16f81f”,”bot_votes”:”0″,”not_votes”:”1″,”decision”:”not”}

That browser is not a bot.