You gotta celebrate the small things, right?
Self high-fiving for figuring out MySQL query to delete 16,000 un-needed categories from the ds106 site.
— Alan Levine (@cogdog) January 6, 2016
What I will detail may likely benefit no one else beyond me should I ever lose my notes. But, writing about the process is… well what I like to do.
So, for the lesser informed, MySQL is the “world’s most popular open source database” (thought more proficient and hip programmers might sneer at it). It’s behind some large chunk of web sites you see these days, at least 25% of the web because MySQL is what runs under WordPress.
I cannot say I have a vast knowledge of MySQL; I know enough to poke round, a few things to repair/optimize a database, how to write my own queries and code to do it outside of WordPress, or for sometimes in a WordPress site when the only way to make it do what you want is a custom query.
If it were a spoken language, you might say I am fluent enough to order meals, conduct transactions, and find my way around cities, but hardly versatile enough to read books or engage in fast talk conversations.
With technology languages, you chip away at small bits of it until your accent and intuition seem to click.
So here is my situation, which again, I doubt anyone else deals with exactly the same thing. The DS106 site has been around since 2010 and has a lot of “stuff” in it; over 50,000 posts the majority syndicated into it via the Feed WordPress plugin.
My strategy when setting up Feed WordPress sites is to convert all categories AND tags in remote blogs TO WordPress tags on the aggregating site. This way, I can use the regular post categories for the site organization, and all the syndicated stuff can use tag clouds, etc to organize internally.
DS106 was the first Feed WordPress site I dabbled in, and it was already humming as a syndication bus before I got inside of it. So whomever had done the set up, actually did it the other way — for all blog posts syndicated in, all tags and categories are converted on the main site to WordPress Categories (this is why we use tags to separate out different course versions – e.g. for all my students blogs from Spring 2013 http://ds106.us/tag/umwsp13/ vs the Noir 106 one form last year http://ds106.us/tag/noir106/.
Actually the only time the tags/categories on external sites are used, is when DS10 syndicated posts are re-syndicated to the Assignment Bank as responses to assignments. Once it uses external tags (converted on DS106 to categories) like
VideoAssignments1874 they are never used.
And people use a lot of tags and categories on their site, and they all get sucked into DS106 and converted to categories. So much that the taxonomy tables (wp_terms and wp_term_taxonomy) get really bloated, so much that the site, especially the dashboard gets really slow on screens that need to read in categories.
So I found a way to organize categories we need to keep and all the ones that come in via syndication. I have a parent category for the former, called
###The Site (hash tags so it always comes in at the top of the category list), and all ones we need to keep as site organizers come under it.
When I was doing this a few years ago, I could not find anything that would help me mass delete categories, so I brute forced it.
- In the screen options, I set the number of items to view in the category listing to be high, like 100 or 200).
- I’d go to the last page of categories, click to select all, and delete.
- Repeat until all categories outside of the
###The Sitewere gone.
It was tedious, it might take an hour or two, but it did keep the site purring. I’d have to do this once or twice a year.
Well, on going back to looking at the DS106 site in more than maybe 2 years… well no one is dumb enough to do this task. There were something like 16,000 categories in there and the dashboard was slower than (fill in your metaphor).
It was time to try a different approach, especially since the manual mass deletion seemed to generate a lot of PHP execution time out errors (deleting 200 categories is a lot of work in MySQL done one at a time in a loop).
I began by doing some select tests in my own blog’s database. The thing I do know is that both tags and categories share the same database; to the code they are just taxonomies. So the
wp_terms table is where we see the names of categories and tags, each has a unique
term_id value. I do this work via PHPMyAdmin, a web interface for working with databases that comes with most hosting cpanels.
An this works in conjunction with
wp_term_taxonomy table, which tells use fo a particular
term_id, what kind of taxonomy it is (category or post_tag), if it has a parent assigned to it. as well as how many posts it is connected to:
Now I sort of know the queries to do this involve a JOIN where I search both tables, and use some kind of relationship that binds them (in this case its the term_id column).
So my first thought is to do a query where I try a SELECT to find stuff that has the same term_id in both tables, are categories, and also are ones that do not have a parent (parent=0). I try this direct query in PHPMyAdmin:
SELECT * FROM wp_terms AS t LEFT JOIN wp_term_taxonomy AS tax ON tax.term_id = t.term_id WHERE tax.taxonomy = 'category' AND tax.parent = 0
And I get pretty lucky, it works on the first try
I get all my blog categories (22 of them) that do not have a parent. I then try it in DS106, and it works as well.. except that my special category
###The Site is included because… doh… it has no parent.
All I need to do is add another bit to my WHERE to exclude that one category (whose term_id is 11527)
SELECT * FROM wp_terms AS t LEFT JOIN wp_term_taxonomy AS tax ON tax.term_id = t.term_id WHERE tax.taxonomy = 'category' AND tax.parent = 0 AND t.term_id != 11527
I am close, because now I can SELECT all of the rows that ought to be deleted. But how to delete them? A simple database delete is pretty easy, but how to do it with a JOIN query? Stackoverflow to the rescue!
Here is the winning query, that deleted some 15,000 un-needed categories, and made the site run more smoothly
DELETE t,tax FROM wp_terms AS t LEFT JOIN wp_term_taxonomy AS tax ON tax.term_id = t.term_id WHERE tax.taxonomy = 'category' AND tax.parent = 0 AND t.term_id != 11527
HIGH FIVE! It did this delete in one swoop.
I should say in an act of caution, I made copies of both tables before I ran the query. Just in case.
(I still might have to find a way to clean the
wp_term_relationships table, but thats for after this victory party).
And, as usual t took about twice as long to write this up as to figure it out… but there is a value in this, somewhere in the future when I forget how I did this.
Someday I need to dig more into the MySQL query methods, because all I seem to know are LEFT Joins. I did come across a great guide — Jeff Atwood’s A Visual Explanation of SQL Joins — which I should read closely. One day.
But now, it’s my own sick little high five party. It’s these things that build confidence and intuition that nudges one up the learning cliff.
Top / Featured Image Credit: Creative Commons Licensed Deviant Art Image by Pandark http://pandark.deviantart.com/art/High-Five-350078391
The post "Brief Moment of MySQL Self High Fiving" was originally zapped with 10,000 volts and declared "It's ALIVE" by Dr. Frankenstein at CogDogBlog (http://cogdogblog.com/2016/01/mysql-self-high-fiving/) on January 6, 2016.