You gotta celebrate the small things, right?

What I will detail may likely benefit no one else beyond me should I ever lose my notes. But, writing about the process is… well what I like to do.

So, for the lesser informed, MySQL is the “world’s most popular open source database” (thought more proficient and hip programmers might sneer at it). It’s behind some large chunk of web sites you see these days, at least 25% of the web because MySQL is what runs under WordPress.

I cannot say I have a vast knowledge of MySQL; I know enough to poke round, a few things to repair/optimize a database, how to write my own queries and code to do it outside of WordPress, or for sometimes in a WordPress site when the only way to make it do what you want is a custom query.

If it were a spoken language, you might say I am fluent enough to order meals, conduct transactions, and find my way around cities, but hardly versatile enough to read books or engage in fast talk conversations.

With technology languages, you chip away at small bits of it until your accent and intuition seem to click.

So here is my situation, which again, I doubt anyone else deals with exactly the same thing. The DS106 site has been around since 2010 and has a lot of “stuff” in it; over 50,000 posts the majority syndicated into it via the Feed WordPress plugin.

My strategy when setting up Feed WordPress sites is to convert all categories AND tags in remote blogs TO WordPress tags on the aggregating site. This way, I can use the regular post categories for the site organization, and all the syndicated stuff can use tag clouds, etc to organize internally.

DS106 was the first Feed WordPress site I dabbled in, and it was already humming as a syndication bus before I got inside of it. So whomever had done the set up, actually did it the other way — for all blog posts syndicated in, all tags and categories are converted on the main site to WordPress Categories (this is why we use tags to separate out different course versions – e.g. for all my students blogs from Spring 2013 http://ds106.us/tag/umwsp13/ vs the Noir 106 one form last year http://ds106.us/tag/noir106/.

Actually the only time the tags/categories on external sites are used, is when DS10 syndicated posts are re-syndicated to the Assignment Bank as responses to assignments. Once it uses external tags (converted on DS106 to categories) like VideoAssignments VideoAssignments1874 they are never used.

And people use a lot of tags and categories on their site, and they all get sucked into DS106 and converted to categories. So much that the taxonomy tables (wp_terms and wp_term_taxonomy) get really bloated, so much that the site, especially the dashboard gets really slow on screens that need to read in categories.

That said, we do end up using Categories for some organizational stuff on the DS106 site, like the assignment categories for a 2012 class or a collection of videos of ds106 greatness.

So I found a way to organize categories we need to keep and all the ones that come in via syndication. I have a parent category for the former, called ###The Site (hash tags so it always comes in at the top of the category list), and all ones we need to keep as site organizers come under it.

DS106 organizing scheme

DS106 organizing scheme

When I was doing this a few years ago, I could not find anything that would help me mass delete categories, so I brute forced it.

  • In the screen options, I set the number of items to view in the category listing to be high, like 100 or 200).
  • I’d go to the last page of categories, click to select all, and delete.
  • Repeat until all categories outside of the ###The Site were gone.

It was tedious, it might take an hour or two, but it did keep the site purring. I’d have to do this once or twice a year.

Well, on going back to looking at the DS106 site in more than maybe 2 years… well no one is dumb enough to do this task. There were something like 16,000 categories in there and the dashboard was slower than (fill in your metaphor).

It was time to try a different approach, especially since the manual mass deletion seemed to generate a lot of PHP execution time out errors (deleting 200 categories is a lot of work in MySQL done one at a time in a loop).

I began by doing some select tests in my own blog’s database. The thing I do know is that both tags and categories share the same database; to the code they are just taxonomies. So the wp_terms table is where we see the names of categories and tags, each has a unique term_id value. I do this work via PHPMyAdmin, a web interface for working with databases that comes with most hosting cpanels.

The wp_terms database table for this blog

The wp_terms database table for this blog

An this works in conjunction with wp_term_taxonomy table, which tells use fo a particular term_id, what kind of taxonomy it is (category or post_tag), if it has a parent assigned to it. as well as how many posts it is connected to:

wp_term_taxonomy table for this blog

wp_term_taxonomy table for this blog

Now I sort of know the queries to do this involve a JOIN where I search both tables, and use some kind of relationship that binds them (in this case its the term_id column).

So my first thought is to do a query where I try a SELECT to find stuff that has the same term_id in both tables, are categories, and also are ones that do not have a parent (parent=0). I try this direct query in PHPMyAdmin:

And I get pretty lucky, it works on the first try

select

I get all my blog categories (22 of them) that do not have a parent. I then try it in DS106, and it works as well.. except that my special category ###The Site is included because… doh… it has no parent.

All I need to do is add another bit to my WHERE to exclude that one category (whose term_id is 11527)

I am close, because now I can SELECT all of the rows that ought to be deleted. But how to delete them? A simple database delete is pretty easy, but how to do it with a JOIN query? Stackoverflow to the rescue!

Here is the winning query, that deleted some 15,000 un-needed categories, and made the site run more smoothly

HIGH FIVE! It did this delete in one swoop.

I should say in an act of caution, I made copies of both tables before I ran the query. Just in case.

(I still might have to find a way to clean the wp_term_relationships table, but thats for after this victory party).

And, as usual t took about twice as long to write this up as to figure it out… but there is a value in this, somewhere in the future when I forget how I did this.

Someday I need to dig more into the MySQL query methods, because all I seem to know are LEFT Joins. I did come across a great guide — Jeff Atwood’s A Visual Explanation of SQL Joins — which I should read closely. One day.

But now, it’s my own sick little high five party. It’s these things that build confidence and intuition that nudges one up the learning cliff.


Top / Featured Image Credit: Creative Commons Licensed Deviant Art Image by Pandark http://pandark.deviantart.com/art/High-Five-350078391

If this kind of stuff has value, please support me by tossing a one time PayPal kibble or monthly on Patreon
Become a patron at Patreon!
Profile Picture for CogDog The Blog
An early 90s builder of web stuff and blogging Alan Levine barks at CogDogBlog.com on web storytelling (#ds106 #4life), photography, bending WordPress, and serendipity in the infinite internet river. He thinks it's weird to write about himself in the third person. And he is 100% into the Fediverse (or tells himself so) Tooting as @cogdog@cosocial.ca

Leave a Reply

Your email address will not be published. Required fields are marked *