Profound statement ahead. The web is made of links.

Except when they don’t work.

This trail all began with a throw away tweet:

Yes, I have that old blogger habit of looking through old posts or using them as my (cue the operatic swelling music)… Knowledge Management System. So often, “or mansy days” as the typo-prone tweeter does, I can’t find what I once linked to.

Thanks to the Wayback Machine Chrome Extension I am at least not stuck in room 404, I at least can find copies, sometimes just remnants in the Internet Archive.

Page not available? View a saved version courtesy of the Internet archive Wayback Machine. Click here to see an archived version.

Bless. This. Machine.

In a twitter response my friend Ken Bauer suggested that Jim Groom had some kind of solution. And that did remind me that yes, Jim, who occasionally blogs (;-) did have something I remember on his blog where dead links appeared crossed out.

It did not take long to find a sample post where that happens:

A paragraph of text with hyperlinked portion that is crossed out.
Signs of a dead link on Jim Groom’s blog

I can see on inspecting the HTML that the hyperlink has a class="broken_link" in it- my guess is some kind of plugin that tests links slips in the class name. It at least gives me as reader an indication that the link has left the web.

With a little bit of guessing I am pretty sure it is the Broken Link Checker plugin which does what it says but as well “Makes broken links display differently in posts”.

My curiosity itch got to me thinking that if this plugin does all the leg work to find all links in a blog, test them, and change the class name of a hyperlink with a foul URL, maybe I could modify it to make it link to a Wayback Machine link.

I know from experience that if you take the link in Jim’s post http://rusc.uoc.edu/index.php/rusc/article/view/27 and preceded it with https://web.archive.org/web/*/ you get a link that shows all of it’s entries in the Wayback machine, or https://web.archive.org/web/*/http://rusc.uoc.edu/index.php/rusc/article/view/27

My optimism dimmed a little when I saw how many files there were in the plugin!

many files and folders of code!

My strategy was to find where in all this mess class name of “broken_link” is inserted. This is where my love of BBEdit comes in because I can search for “broken_link” across the entire directory of the plugin. Very quickly I found a function highlight_broken_link inside includes/any-post.php that looks like where the action happens.

I have nowhere near an understanding of how this plugin works, but conceptually:

  • Some routine does a parsing of all posts, extracts all urls inside link tags
  • Each unique url is externally checked if it returns an http status code indicating the link works (or not)
  • The plugin thus keeps its own database table of all links and status
  • When a post is displayed, a hook from this plugin is called when the content is requested by WordPress
  • All links in the post are looked up, and if it one that has been reported dead, on output the class="broken_link" is added to the markup

This is clever because it is never changing the markup of the original post, all the changes happen when it is displayed.

Here is that function in the Broken Link Checker plugin that modifies the CSS.

I thought about changing the class name to something different for my purposes for creating a redirect tag, but it matters not. I can just change the CSS to indicate it is a Wayback Machine redirect.

I have to figure out how to change the link. With some more digging, I figure out to put this after the part of the highlight_broken_link function that modifies the CSS:

The $link variable contains all parts of the hyperlink, so I am just changing its value at display time to insert ahead of it that part that calls in the WayBack machine.

There is one tweak I need to make this work- that is that $link comes into the function the typical way:

This is pass by value. I can use it in my function but I cannot change it. A small change makes it come in as pass by reference that allows my function to change it:

As a sidebar, I have for a while been cleverly indicating all links on my blog with an icon inserted before the link to indicate it goes to the Wayback machine.

I decide to use it to indicate even more that a link is not only dead, but being redirected to the Wayback machine. Interestingly I did not find any CSS in the code, but there is an option of the plugin to change the style of the broken_link class.

The original plugin simple, just a change of the text to have a line through it.

With some testing I make it different- change color to red, and use my method of inserting an icon in front:

This was just a change in the plugin options (core/init.php) from:

to

There is a little bit of trickery to reference an image file stored in plugin directory.

So in about 2 hours I have it working on a test WordPress. But no one can see it. So the next test is trying on a demo site with just 1 post in it, made specifically to have dead links.

More success!

I am rather cautious to try it here on the Big Ship CogDogBlog, it has over 5000 posts. What it if breaks the boat? But no gain without an adventure. The Broken Link Checker found about 35,000 unique URLs in blog posts!

I let it go all night and maybe it got half done

And now, push 48 hours, it is down to 5000 urls left to test. It seems to do a nice job of running in the background.

Even so, the work has just begun. I have been checking in and finding plenty of deadlinks are my own typos. Or Malformed links.

And then it looks like a lot of photo sites, Pixabay, Pexels, Unsplash block the remote checking I do, so I am getting returns of 403 forbidden. These can all by marked as ok in the Broken Link Checker interface, so they will not appear with the redirect link.

But all in all it is working.

I have made a fork of the original plugin that is now on GitHub at https://github.com/cogdog/broken-link-checker It’s hardly am update or new version. I cannot claim this plugin as mine since really I changed maybe 5 lines of code. I am not sure if the original developers would want to fold in my changes.

This has been fun to work through, thanks Ken for nudging me down the road. And as Jim blogged in response (hey this kid blogger has some promise…), archiving and such is a Sisyphean Labor of Link Love.

I got 5000+ links to comb through. But my blog is gonna be cleaner and I will have accomplished myself the wishful dream I tossed out as tweet in the wind.

I can do my part to tend my own garden of links- whether others let their stuff rot is on them.


Featured Image:

Who Broke the Internet?
Who Broke the Internet? flickr photo by cogdogblog shared into the public domain using Creative Commons Public Domain Dedication (CC0)

If this kind of stuff has value, please support me by tossing a one time PayPal kibble or monthly on Patreon
Become a patron at Patreon!
Profile Picture for CogDog The Blog
An early 90s builder of web stuff and blogging Alan Levine barks at CogDogBlog.com on web storytelling (#ds106 #4life), photography, bending WordPress, and serendipity in the infinite internet river. He thinks it's weird to write about himself in the third person. And he is 100% into the Fediverse (or tells himself so) Tooting as @cogdog@cosocial.ca

Comments

  1. I too use Broken Link Checker, but must be honest, combing through hundreds of links is a strange labour of love. Like Jim, I too dabbled with Amber a few years ago, but found it chewed up all my memory, so I scrapped it. My current approach is to link to my own bookmarks where possible, this means that if a link disapeears, I still regain some of the context. In addition to this, I found that I was sort of spamming some sites with pingbacks. Although that seems to have been fixed by the fact that pingbacks seem to be broken on my sites.

Leave a Reply

Your email address will not be published. Required fields are marked *