About a year and a half ago my brother came to me with an idea for a rss feed aggregator. I had no idea what rss was, but I love the challenge of learning new technologies so of course I was interested. It turned out that there was a group of bloggers that had formed a loose community but really had no gateway to that community other than one large blog that people tended to congregate toward. My brother was part of a group of 7 blogs that had congregated together to funnel more traffic to each other’s blog. The Idea was to create this aggregator for the whole community which would act as a portal.
Their was no money being put up for the project and the server that was available did not have Java Servlet capabilities so it was going to be done in PHP. I had done one minor project using PHP before so I had some understanding of what to expect. As I searched the web I ran across an rss feed reader written in php called MagpieRSS. It is a nice piece of code and open-source so I decided to base my aggregator on it. I like MagpieRSS because it is concise and specific in what it does, but that also meant there was a good deal of work to do.
Basic parameters
First, the aggregator needed to take an arbitrary number of blog feeds, order the posts by date, and be able to display the most recent posts in descending chronological order. Secondly, it needed to be configurable using text (xml) files so someone that doesn’t know PHP can maintain the blogroll.
Lessons Learned
So I start with a list of blog rss feeds and I write some PHP to read each of feeds and sort the posts by date. Not too hard. I quickly realized that each type of feed has a different date field to read. Once I got the date fields normalized, I found that they were all on different time zones and would not sort correctly. I added a time offset to the config file for each feed and viola. A feed aggregator. Not so bad, huh?
Then next thing that we had to deal with was how often to fetch updated feeds. Of course you want to give your users realtime data, but if you start fetching feeds every couple of minutes, you will get your site banned from the server where the feed is. We eventually settled on 20 minute intervals for fetching the feeds or something like that.
The site currently has approximately 70 feeds that it aggregates so the next problem is how to trigger the feed update. The default method is to fetch the updated feed when the cached feed expires. The problem with that is that every twenty minutes all of the caches will expire and someone is going to get a really long load time. Long load times irritate me so I set up a cron job that would force a refresh of the cache before its expiration. This avoids the long load times.
Conclusion
There were a lot of other things that were added to the aggregator over time. but the core didn’t change much. This was a fun project and one that I think others could benefit from. I am thinking about packaging a version of it up and providing it under a free license. If you are interested in using it, please email me at zeptoblog at zeptoworld dot com and I will see what I can do.
