trackback module

Though this is not a techno-blog, I do do a fair amount of work behind the scenes myself in php, css and html to get it looking like it does (not a winner I notice!). But one thing has bugged me for some time...

The observant amongst you may have noticed that it is developed using the Drupal content management system. This has lots of add-on modules contributed by nice people all over the world. One of these is supposed to process "trackbacks". You'll have seen these on others' blogs - where if someone on another site posts something about a story you've written, you get a magical little line saying "Such and such a site or blog refers to this post".

Well with the Drupal trackback module this only appear to work if people have explicitly "pinged" the trackback URL for the post of mine they are referring to. And nowadays it seems that very few people actually do this. When I post on my blog it does ping each site I've linked to in the posting, I think (it certainly does at least the first site I refer to) without me having to explicitly tell it to.

Under construction So others see when I have referred to them and can put a link back in to my blog if they want. But I don't see automatically when they do, so they don't get a link back to their post underneath mine. But I know I am being linked to - I've been mentioned in the Lib Dem Voice "Golden Dozen" a few times, and in the Brit Blog Round-up and so on.

The trackback module claims to be able to discover a site that is referring to mine just from someone clicking on the link in another blog's article that refers to a post of mine. But it doesn't appear to work. So, while you've all been to conference, I've been trying my hand at rewriting that module to enable my blog to discover when someone refers to my blog just from click-throughs. And that's why I've been "blogging lite" this week.

I've not actually got very far yet. Just setting up a debugging environment took long enough. And now that I sit down to think about how to do it, I find it much harder than I first thought. I can see why the module authors have not implemented it yet! For example - how do I distinguish between a referrer that is a search engine results page, a "real" article that refers to me, or just a "blogroll" type link in a sidebar. Incidentally - I notice that Lib Dem Voice picks up these sidebar links, as I've seen my pages listed as referring to some of their articles when actually they seem to have picked up the feed from the Lib Dem Blogs aggregator I put in my sidebar.

However, if you've stumbled on this because you are also looking for this feature in the Drupal Trackback module, you might be interested in having a look at the logic I think I have finally settled on in the following specification to achieve this:

Note: I am going to do this as a Drupal cron job in the trackback module using entries in the Drupal access_log database table. This is because the processing of each one to check whether it is a real "referral" or a search engine page or a link outside an article like a blogroll will take a little processing time for each page load if I do it when each page is requested by a user agent. So I guess I'm doing something similar to what Technorati does when it indexes your site when it has received a ping. Except it won't be triggered by a ping, but a referrer record in the access_log table. So here's the logic in crude "pseudo code".

  1. START processing:
  2. When cron runs,
  3. For each row in the access_log table that accesses a Drupal node and has a referrer URL since the last time cron ran
  4. Try to fetch an RSS type feed url from the referrer site.
  5. If the site doesn't provide a feed url, then it's likely it's not a blog or news type site and we can stop processing.
  6. If it does provide a feed url, fetch the feed and parse it.
  7. For each article in that feed check whether it contains a reference to the Drupal node referred to in the access_log record.
  8. If it does not contain a reference to said Drupal node we can stop processing.
  9. If it does, extract an excerpt and title from the article in the feed and save the whole lot into the trackbacks_received table. It should then appear under the node on my site when it is viewed.
  10. STOP processing.

As a slight aside, I'm wondering how to check comments on others' posts as well. I'm not quite clear whether all types of feed have a way of discovering the comment feed, if one is available, for each article in the main feed. If so it can probably be done.

Syndicate content