ZeptoBlog

My tech playground

January 14th, 2007

Writing a PHP Framework (Authentication and Authorization)

In my previous post about writing a PHP Framework I mentioned a script called Tower.php. By using Tower as the single point of entry I can handle authentication and authorization all in one place for any page rendered.

Authentication is deciding whether a user has access to the system. This is most often accomplished by requiring the user to enter a username and password. Tower.php checks the users credentials and sends them to the login page if they are not authenticated. This works whether the user is trying to go to a bookmarked page or if their session has timed out but they still have the application up in their browser.

Authorization, on the other hand, determines what parts of the system the user has access to - whether they have been authenticated or not. In my framework, I have chosen to implement authorization by require each page to defined the permission required to render the page. I then group the permissions into roles and assign a role to each user. If the page doesn’t require any permissions, then it is a public page and anyone can view it. This is, of course, the case with the login page.

December 17th, 2006

Performance tuning my MySQL database

I doubt that many of you have ever tried to write your own web tracking software. I mostly did it to see if I could and also to have complete control over my tracker. My wife says I’m a bit of a control freak. Anyway, I posted on it earlier this year. I have called it Silentracker.

I wrote it in PHP. Any good software architect out there is probably shaking their head and laughing at me right about now. I know because I’m a pretty good software architect and I still wonder about my decision. It comes down to the fact that I never expect to make any money from the project and I don’t feel like spending money to get a server with more capabilities. So my choices where Perl or PHP. I didn’t feel like learning Perl… so there you have it. I originally wrote it using flat files to store the data. I changed that to MySQL and there we are.

Everything was fine when I started and was only tracking my mom’s web site which gets a few hits a day. Now, my brother’s company has started using it also and it didn’t take very long before I had 25,000 row in the hit table of the database. It won’t be long before there are 100,000 or 1,000,000. You get the point. As often happens, this “prototype” was in full production and suffering from the lack of fine tuning that you would expect from a prototype.

This week I took some time to clean up the queries and see if I could speed things up a bit. There are a number of things that I have considered and some of them I have implemented. Things are moving along nicely now and I figured I would throw these out there for those who might be having similar problems (I am going to assume that you are administering your database using phpMyAdmin):

  • Take a look at the queries that you are running most often and the tables that they hit and see if you can add an index or two to help speed things up.
    If you are not familiar with indexes, you can create an index on any table and base it on one or more columns in that table. Then when a query is run against the table the database will use an index where available to speed up the query. This ads overhead to inserts, updates, and deletes on the table so don’t over do it.

  • Denormalize the data if necessary.
    I found that I was performing the same expensive operation in most of my queries. I was doing a complex text comparison between two columns. I was able to do that same comparison as part of the insert and store the result in a separate column as a boolean value and save time on every query. One problem with this is that it is denormalizing your data to a certain degree, but to me it was so worth it. Then I create an index on that new column to speed things up even more.

  • Use explain to analyze your queries.
    When you run any query in phpMyAdmin you can then click on the explain link. This will analyze the query and tell you how it executes. It tells you what kind of search it does on the table and which, if any, index is being used. You may need to go to the MySQL documentation to see exactly what the results mean, but this can be invaluable when you have nested queries and complex table with complex joins.

These optimizations were enough to get thing back up to a good speed for now on Silentracker. When the data gets larger I will probably start archiving old data from the table. I may move each site into it’s own table or even create multiple databases to limit the size of the data. I also may divide types of hits into separate tables. If necessary there are other denormalizations that I can do also. A little analysis can go a long way.

One word of caution. Avoid premature optimization. Optimizations can degrade the maintainability of your software. Only optimize when there is a problem to solve.

Performance tuning is not limited to the database. With a language like PHP, you sometimes have to get performance enhancements wherever you can. For instance, I have graphs that only need to be created once each day. Why not cache those on the disk and check the file modification time to decide when to recreate them. If there is information that is guaranteed to be constant for the length of the session, store it in the session. Feel free to add other performance tuning in the comments below.

November 15th, 2006

Writing a PHP Framework

I’m not even sure where to start with this topic. I’m thinking that a good place to start is how the users get to the framework. .htaccess

htaccess is somewhat of a new thing to me. I have extensive experience in writing a Java Framework using Tomcat and JBoss. The way that I choose to write a framework is to have every page request filter through a single controlling entry point. This can really limit the chance of security holes. In Servlet containers this is done using servlet and servlet mapping definitions. You can easily send all requests through a single servlet that takes care of authentication and authorization before a page ever gets displayed. It turns out that the servlet mapping is just a watered down version of Apache’s mod_rewrite and, of course, .htaccess is processed by mod_rewrite. What I discovered is that you can achieve the same effect by using .htaccess and a well written php script.

This is what my .htaccess looks like:

RewriteEngine On

RewriteRule .*style/(.*) style/$1 [L]

RewriteRule .*images/(.*) images/$1 [L]

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !(framework/Tower.php)
RewriteRule . framework/Tower.php [L]

First of all, I used the .htacess that comes with Wordpress as a starting point - give credit where credit is due. I think there is only one line left from the borrowed file though. If this code looks totally foreign to you then do a Google Search for mod_rewrite. I looked at this Apache Doc and this nice reference sheet when getting familiar with mod_rewrite.

The gist of that code is this. Every request that come to my url will be redirected to a script called Tower.php - tower is meant to be like an airport tower that directs traffic - unless the request points to an actual file on the server. This caveat is there because I am building this framework in the middle of a working site and I don’t want to convert the entire site before using the new features. Also, I added the style/… and image/… so that my scripts don’t have to worry about how many ../../../ are needed to find the css and image files. Of course if your system is always running at the root - not in a subdirectory - you don’t need this little bit.

Now, I’m not saying that this is the best way to write a .htaccess file to do what I am doing, but it works. If you experts out there have suggestions, please let me know.

October 7th, 2006

Ajax with xajax

Early this year I was talking to a friend of mine and he told me that he was working on an open source project called xajax. At the time I hadn’t really started to play with Ajax yet but the project sounded interesting so I filed it away in the back of my mind for future retrieval. Then in May my company sent me to The Ajax Experience in San Fransisco. That got me all excited about what could be done with Ajax. It also scared me because of the dificulties that people experience when using Ajax. So now one of my projects has got me using xajax.

As far as tools go, xajax seems to be a good one so far. It has many of the things that I want (unabtrusive, lightweight, extensible) without being overbearing.

So far I have stuck to a basic - edit table rows in place - implementation and even that has given me headaches. What I find is that most of the headaches are related to the poor error logging in the browsers or the incompatibility of the browsers. I started by using a link to call the javascript function like this <a href=”javascript:xajax_myFunction()”>click here</a>. That worked fine in IE but was giving me problems in Firefox. I chose to use a button instead of tracking down the problem (this is only a prototype) and got that working in Firefox but could not get it to work in IE. It turns out that you can set innerHTML in a <tr> tag in Firefox, but not in IE. So I moved things around and started setting the innerHTML in the <td> tags instead and then it stopped working in Firefox but started working in IE. Eventually I found that I had to move my <button> outside of the form in order to get it to work in Firefox. This whole time I am working with a new tool - xajax - and wondering if it is the tool or the browser. Well, I decided it was the browser and the tool worked as advertized.

If you are thinking about using Ajax in one of your projects, remember that it takes a lot of patience. If your project is in PHP then take a look at xajax. It doesn’t do everything Ajax for you, but it give a nice abstraction layer for the usual stuff.

September 28th, 2006

Blogroll Z version .80 Released

Blogroll Z is a rss/atom feed aggregator. See documentation Getting Started, Configuration, and Add Blogroll Z to your page. Get downloads here. See implementations here and here.

September 20th, 2006

Building an Aggregator

About a year and a half ago my brother came to me with an idea for a rss feed aggregator. I had no idea what rss was, but I love the challenge of learning new technologies so of course I was interested. It turned out that there was a group of bloggers that had formed a loose community but really had no gateway to that community other than one large blog that people tended to congregate toward. My brother was part of a group of 7 blogs that had congregated together to funnel more traffic to each other’s blog. The Idea was to create this aggregator for the whole community which would act as a portal.
Their was no money being put up for the project and the server that was available did not have Java Servlet capabilities so it was going to be done in PHP. I had done one minor project using PHP before so I had some understanding of what to expect. As I searched the web I ran across an rss feed reader written in php called MagpieRSS. It is a nice piece of code and open-source so I decided to base my aggregator on it. I like MagpieRSS because it is concise and specific in what it does, but that also meant there was a good deal of work to do.

Basic parameters

First, the aggregator needed to take an arbitrary number of blog feeds, order the posts by date, and be able to display the most recent posts in descending chronological order. Secondly, it needed to be configurable using text (xml) files so someone that doesn’t know PHP can maintain the blogroll.

Lessons Learned

So I start with a list of blog rss feeds and I write some PHP to read each of feeds and sort the posts by date. Not too hard. I quickly realized that each type of feed has a different date field to read. Once I got the date fields normalized, I found that they were all on different time zones and would not sort correctly. I added a time offset to the config file for each feed and viola. A feed aggregator. Not so bad, huh?

Then next thing that we had to deal with was how often to fetch updated feeds. Of course you want to give your users realtime data, but if you start fetching feeds every couple of minutes, you will get your site banned from the server where the feed is. We eventually settled on 20 minute intervals for fetching the feeds or something like that.

The site currently has approximately 70 feeds that it aggregates so the next problem is how to trigger the feed update. The default method is to fetch the updated feed when the cached feed expires. The problem with that is that every twenty minutes all of the caches will expire and someone is going to get a really long load time. Long load times irritate me so I set up a cron job that would force a refresh of the cache before its expiration. This avoids the long load times.

Conclusion

There were a lot of other things that were added to the aggregator over time. but the core didn’t change much. This was a fun project and one that I think others could benefit from. I am thinking about packaging a version of it up and providing it under a free license. If you are interested in using it, please email me at zeptoblog at zeptoworld dot com and I will see what I can do.

September 18th, 2006

Silentracker

So I was talking to my brother about web tracking software. There are a couple of things that are always bothersome when dealing with tracking traffic to your web site. First, none of the trackers that are out there seem to provide all of the information that you want. All too often, you have to put more than one tracker on your page just to get the reports you need. Secondly, if the tracker is free then the terms of use ensure that you put their stupid image on your page or they will shut you down.

Well, I got tired of this and wrote my own tracker. I am calling it silentracker (not to be confused with Silent Tracker of course.) As of the writing of this post it is in beta and only being used by a couple of sites. If you would like to be a beta tester for it then send me an email at zeptoblog at zeptoworld dot com and I will see what we can work out. The goal of this tracker is to only give my users what they need.

One of these days I am going to post on the lessons learned from writing my tracker. I would have preferred to write it in Java, but I ended up doing it in php because of the limitations of the server that I currently have available. I will have to get a new server one of these days that supports Java Servlets.