Archive for March, 2005

The tyranny of broken HTML in RSS

Thursday, March 31st, 2005

One of the problems with rendering RSS content nicely is broken HTML tag pairs. It seems certain RSS generators are very careless when it comes to preparing item summaries, often chopping through the middle of link tags when snatching the first few lines of an article. This isn’t such a big deal if you’re just displaying one item, but if you’ve got a whole bunch of these displayed one after another a single broken anchor (link) tag or stray blockquote (indentation tag) can really mess things up. I really don’t want to have to get into HTML parsing but it looks like I’m not going to have much choice at this rate.

A couple of offenders I spotted today are BoingBoing and AppleMatters. There are many more though, it’s just down to luck which ones get away with it and which ones don’t.

NSURLConnection woes

Wednesday, March 2nd, 2005

I’ve been trying to improve the speed at which things download in NewsMac Pro as well as provide support for things like feeds which require authentication. The logical choice seemed to be moving from using NSURLHandle and friends to NSURLConnection which was introduced with WebKit back in OS X 10.2.7.

The first thing that struck me about NSURLConnection was that it was very light on methods – still I figured that would just make it a bit easier to use. Initially I tried using it synchronously (this means the thread that was doing the download would basically hang until the connection either finished downloading, or failed). However the performance wasn’t great, and I read on CocoaDev that this approach also leaked memory. So the other day I decided to do a pretty major overhaul of the download system to use the event driven delegate methods. That wasn’t too hard and it only took a few hours to have it up and running, but then I discovered a huge caveat that seems affect a lot of WebKit related classes – it can’t cope at all well with threads. Now in a networked application threads are essential unless you want the entire app to lock up for the duration of each burst of network activity. NSURLConnection does threading behind the scenes, but makes it very hard to actually be run itself from a thread – which is more or less necessary if you want to have multiple concurrent downloads happening.

Anyway I thought I’d solved this and performance was indeed better, then I click on a freshly downloaded channel while others were still downloading, the new headlines pop up, I click one to see it displayed in the headline browser (a WebView) and boom NewsMac crashes inside one of NSURLConnections’ threads. WTF? Clearly the WebView was creating its own NSURLConnection and that was conflicting with the one’s I’d created, but I don’t see why it should. I really hope Apple fixes this ASAP because this is just shabby. I’m now left with the choice of going back to the old way of doing things or rewriting around something like CURLHandle, which I’ve just downloaded to estimate how much work it would take to integrate into NewsMac. That essentially broken classes like this make their way into the API of a shipping operating system and then remain unfixed for over a year strikes me as unacceptable, and NSURLConenction isn’t alone. At the very least Apple could provide a warning in the API that the class is still ‘experimental’.

While some of you might be horrified that NewsMac Pro seems this broken, let me reassure you that with a object oriented language and modular program like NewsMac, ripping out the engine and sticking a new one in isn’t that big of a deal – it’s just an annoyance because this is time I’d rather be spending on finishing other features.