A Duct Tape WordPress Plugin for Redirecting Broken Links

Profound statement ahead. The web is made of links.

Except when they don’t work.

This trail all began with a throw away tweet:

I might need a WordPress plugin than converts all links to run them through the Wayback Machine. Mansy days it feels like 90% of my blogged links are dead.
— Alan Levine (@cogdog) September 15, 2021

Yes, I have that old blogger habit of looking through old posts or using them as my (cue the operatic swelling music)… Knowledge Management System. So often, “or mansy days” as the typo-prone tweeter does, I can’t find what I once linked to.

Thanks to the Wayback Machine Chrome Extension I am at least not stuck in room 404, I at least can find copies, sometimes just remnants in the Internet Archive.

Page not available? View a saved version courtesy of the Internet archive Wayback Machine. Click here to see an archived version.

Bless. This. Machine.

In a twitter response my friend Ken Bauer suggested that Jim Groom had some kind of solution. And that did remind me that yes, Jim, who occasionally blogs (;-) did have something I remember on his blog where dead links appeared crossed out.

It did not take long to find a sample post where that happens:

A paragraph of text with hyperlinked portion that is crossed out. — Signs of a dead link on Jim Groom’s blog

I can see on inspecting the HTML that the hyperlink has a class="broken_link" in it- my guess is some kind of plugin that tests links slips in the class name. It at least gives me as reader an indication that the link has left the web.

With a little bit of guessing I am pretty sure it is the Broken Link Checker plugin which does what it says but as well “Makes broken links display differently in posts”.

My curiosity itch got to me thinking that if this plugin does all the leg work to find all links in a blog, test them, and change the class name of a hyperlink with a foul URL, maybe I could modify it to make it link to a Wayback Machine link.

I know from experience that if you take the link in Jim’s post http://rusc.uoc.edu/index.php/rusc/article/view/27 and preceded it with https://web.archive.org/web/*/ you get a link that shows all of it’s entries in the Wayback machine, or https://web.archive.org/web/*/http://rusc.uoc.edu/index.php/rusc/article/view/27

My optimism dimmed a little when I saw how many files there were in the plugin!

My strategy was to find where in all this mess class name of “broken_link” is inserted. This is where my love of BBEdit comes in because I can search for “broken_link” across the entire directory of the plugin. Very quickly I found a function highlight_broken_link inside includes/any-post.php that looks like where the action happens.

I have nowhere near an understanding of how this plugin works, but conceptually:

Some routine does a parsing of all posts, extracts all urls inside link tags
Each unique url is externally checked if it returns an http status code indicating the link works (or not)
The plugin thus keeps its own database table of all links and status
When a post is displayed, a hook from this plugin is called when the content is requested by WordPress
All links in the post are looked up, and if it one that has been reported dead, on output the class="broken_link" is added to the markup

This is clever because it is never changing the markup of the original post, all the changes happen when it is displayed.

Here is that function in the Broken Link Checker plugin that modifies the CSS.

function highlight_broken_link( $link, $broken_link_urls ) {
	if ( ! in_array( $link['href'], $broken_link_urls ) ) {
		//Link not broken = return the original link tag
		return $link['#raw'];
	}

	//Add 'broken_link' to the 'class' attribute (unless already present).
	if ( $this->plugin_conf->options['mark_broken_links'] ) {
		if ( isset( $link['class'] ) ) {
			$classes = explode( ' ', $link['class'] );
			if ( ! in_array( 'broken_link', $classes ) ) {
				$classes[]     = 'broken_link';
				$link['class'] = implode( ' ', $classes );
			}
		} else {
			$link['class'] = 'broken_link';
		}
	}

	//Nofollow the link (unless it's already nofollow'ed)
	if ( $this->plugin_conf->options['nofollow_broken_links'] ) {
		if ( isset( $link['rel'] ) ) {
			$relations = explode( ' ', $link['rel'] );
			if ( ! in_array( 'nofollow', $relations ) ) {
				$relations[] = 'nofollow';
				$link['rel'] = implode( ' ', $relations );
			}
		} else {
			$link['rel'] = 'nofollow';
		}
	}

	return $link;
}

I thought about changing the class name to something different for my purposes for creating a redirect tag, but it matters not. I can just change the CSS to indicate it is a Wayback Machine redirect.

I have to figure out how to change the link. With some more digging, I figure out to put this after the part of the highlight_broken_link function that modifies the CSS:

$link['href'] = 'https://web.archive.org/web/*/' . $link['href'];

The $link variable contains all parts of the hyperlink, so I am just changing its value at display time to insert ahead of it that part that calls in the WayBack machine.

There is one tweak I need to make this work- that is that $link comes into the function the typical way:

function highlight_broken_link( $link, $broken_link_urls )  {

This is pass by value. I can use it in my function but I cannot change it. A small change makes it come in as pass by reference that allows my function to change it:

function highlight_broken_link( &$link, $broken_link_urls )  {

As a sidebar, I have for a while been cleverly indicating all links on my blog with an icon inserted before the link to indicate it goes to the Wayback machine.

a[href*="web.archive.org/web"] {
    padding: 0px 0 0px 22px;
    background: transparent url(https://cogdogblog.com/images/social/archive.png) no-repeat center left;
}

I decide to use it to indicate even more that a link is not only dead, but being redirected to the Wayback machine. Interestingly I did not find any CSS in the code, but there is an option of the plugin to change the style of the broken_link class.

The original plugin simple, just a change of the text to have a line through it.

With some testing I make it different- change color to red, and use my method of inserting an icon in front:

This was just a change in the plugin options (core/init.php) from:

'broken_link_css'=> ".broken_link, a.broken_link {\n\ttext-decoration: line-through;\n}",

'broken_link_css'=> ".broken_link, a.broken_link {\n\tcolor:red;\n\ttext-decoration: line-through;\n\tpadding: 0px 0 0px 22px;\n\tbackground:transparent url(" . plugin_dir_url( __DIR__ ) . "images/archive.png) no-repeat center left;\n}",

There is a little bit of trickery to reference an image file stored in plugin directory.

So in about 2 hours I have it working on a test WordPress. But no one can see it. So the next test is trying on a demo site with just 1 post in it, made specifically to have dead links.

More success!

http://secretrevolution.us/demo/2021/09/16/oh-my-the-links/

I am rather cautious to try it here on the Big Ship CogDogBlog, it has over 5000 posts. What it if breaks the boat? But no gain without an adventure. The Broken Link Checker found about 35,000 unique URLs in blog posts!

Okay, letting it chomp through 30,000 cogdogblog links. It does the job seemingly well.

Yet to be blogged, but if you want to try the alpha https://t.co/lbGbkLogYU pic.twitter.com/t4xozQ4a9G
— Alan Levine (@cogdog) September 16, 2021

I let it go all night and maybe it got half done

After chugging all night, link checker only half done. The happy price of piling up 5400 posts. pic.twitter.com/LUpRd3yjTz
— Alan Levine (@cogdog) September 17, 2021

And now, push 48 hours, it is down to 5000 urls left to test. It seems to do a nice job of running in the background.

Even so, the work has just begun. I have been checking in and finding plenty of deadlinks are my own typos. Or Malformed links.

And then it looks like a lot of photo sites, Pixabay, Pexels, Unsplash block the remote checking I do, so I am getting returns of 403 forbidden. These can all by marked as ok in the Broken Link Checker interface, so they will not appear with the redirect link.

But all in all it is working.

I have made a fork of the original plugin that is now on GitHub at https://github.com/cogdog/broken-link-checker It’s hardly am update or new version. I cannot claim this plugin as mine since really I changed maybe 5 lines of code. I am not sure if the original developers would want to fold in my changes.

This has been fun to work through, thanks Ken for nudging me down the road. And as Jim blogged in response (hey this kid blogger has some promise…), archiving and such is a Sisyphean Labor of Link Love.

I got 5000+ links to comb through. But my blog is gonna be cleaner and I will have accomplished myself the wishful dream I tossed out as tweet in the wind.

I can do my part to tend my own garden of links- whether others let their stuff rot is on them.

Featured Image:

Who Broke the Internet? flickr photo by cogdogblog shared into the public domain using Creative Commons Public Domain Dedication (CC0)

Share this barking on social media

If this kind of stuff has value, please support me by tossing a one time PayPal kibble or monthly on Patreon

Comments

Aaron Davis says:

September 20, 2021 at 5:55 am

I too use Broken Link Checker, but must be honest, combing through hundreds of links is a strange labour of love. Like Jim, I too dabbled with Amber a few years ago, but found it chewed up all my memory, so I scrapped it. My current approach is to link to my own bookmarks where possible, this means that if a link disapeears, I still regain some of the context. In addition to this, I found that I was sort of spamming some sites with pingbacks. Although that seems to have been fixed by the fact that pingbacks seem to be broken on my sites.

Reclaiming Your Web, Wordpress

A Duct Tape WordPress Plugin for Redirecting Broken Links

Comments

Leave a Reply Cancel reply

Follow CogDogBlog

My Profile

Your Profile