I doubt that many of you have ever tried to write your own web tracking software. I mostly did it to see if I could and also to have complete control over my tracker. My wife says I’m a bit of a control freak. Anyway, I posted on it earlier this year. I have called it Silentracker.

I wrote it in PHP. Any good software architect out there is probably shaking their head and laughing at me right about now. I know because I’m a pretty good software architect and I still wonder about my decision. It comes down to the fact that I never expect to make any money from the project and I don’t feel like spending money to get a server with more capabilities. So my choices where Perl or PHP. I didn’t feel like learning Perl… so there you have it. I originally wrote it using flat files to store the data. I changed that to MySQL and there we are.

Everything was fine when I started and was only tracking my mom’s web site which gets a few hits a day. Now, my brother’s company has started using it also and it didn’t take very long before I had 25,000 row in the hit table of the database. It won’t be long before there are 100,000 or 1,000,000. You get the point. As often happens, this “prototype” was in full production and suffering from the lack of fine tuning that you would expect from a prototype.

This week I took some time to clean up the queries and see if I could speed things up a bit. There are a number of things that I have considered and some of them I have implemented. Things are moving along nicely now and I figured I would throw these out there for those who might be having similar problems (I am going to assume that you are administering your database using phpMyAdmin):

  • Take a look at the queries that you are running most often and the tables that they hit and see if you can add an index or two to help speed things up.
    If you are not familiar with indexes, you can create an index on any table and base it on one or more columns in that table. Then when a query is run against the table the database will use an index where available to speed up the query. This ads overhead to inserts, updates, and deletes on the table so don’t over do it.

  • Denormalize the data if necessary.
    I found that I was performing the same expensive operation in most of my queries. I was doing a complex text comparison between two columns. I was able to do that same comparison as part of the insert and store the result in a separate column as a boolean value and save time on every query. One problem with this is that it is denormalizing your data to a certain degree, but to me it was so worth it. Then I create an index on that new column to speed things up even more.

  • Use explain to analyze your queries.
    When you run any query in phpMyAdmin you can then click on the explain link. This will analyze the query and tell you how it executes. It tells you what kind of search it does on the table and which, if any, index is being used. You may need to go to the MySQL documentation to see exactly what the results mean, but this can be invaluable when you have nested queries and complex table with complex joins.

These optimizations were enough to get thing back up to a good speed for now on Silentracker. When the data gets larger I will probably start archiving old data from the table. I may move each site into it’s own table or even create multiple databases to limit the size of the data. I also may divide types of hits into separate tables. If necessary there are other denormalizations that I can do also. A little analysis can go a long way.

One word of caution. Avoid premature optimization. Optimizations can degrade the maintainability of your software. Only optimize when there is a problem to solve.

Performance tuning is not limited to the database. With a language like PHP, you sometimes have to get performance enhancements wherever you can. For instance, I have graphs that only need to be created once each day. Why not cache those on the disk and check the file modification time to decide when to recreate them. If there is information that is guaranteed to be constant for the length of the session, store it in the session. Feel free to add other performance tuning in the comments below.