My motivation for adding features to stuff I code is increased by all the following: praise, a coin in the top jar, and when it also fits my own needs.

I’ve got a little WordPress plugin (my first ever and one of my favorite found feature images) I wrote a while ago to export data on posts that could be useful in data sifting.

I was driven to do this thinking about the amount of information that is in huge collections of posts, especially in a Feed WordPress syndication hub.

Think about it. DS106 has been syndicating participant posts since 2011 over many iterations of its incarnations. It is currently pulling in posts from 1400+ blogs and in that database are almost 80,000 syndicated posts.

You’d think some researcher might nibble on that.

Well since I have done a number of hubs, I made Export Posts to CSV exports information on posts in your favorite grid based data structure. It returns:

  • post ID
  • source (either ‘local’ or ‘syndicated’)
  • post title
  • publication date and time
  • author name (first and last name from profile)
  • author username
  • blogname (host blog or remote if syndicated post)
  • post character count (string character count after HTML stripped out)
  • post word count (after HTML stripped out)
  • number of links in post (count of </a> tags, is that lame?)
  • list of hyperlink urls (from all <a href="http[s]://*****.****.***">...</a> tags)
  • number of tags
  • list of tags
  • number of comments (if it is a locally published post, syndicated ones are messy)

I recently used it to provide some data to the eCampus Ontario folks for activity on the Ontario Extend syndication hub. The feature that made it useful was a way to limit to categories, letting me extract data just for syndicated posts (another benefit of setting up for syndication hub to use categories smartly).

Out of the blue (or green) I got a comment entered as an issue in GitHub.

Hey, First off – this is the best plugin ever created.

Secondly, would it be hard for me to add a date range to the export?
What I’m trying to do is to run a report of the articles published every two weeks for that period.

“Best plugin ever created” fits one of my criteria.

It seemed like a reasonable idea and one I can see using, so that was enough motivation. It took a bit of code knocking with my PHP hammer to get the right date query in place. So there are now more options; you can restrict exports not only by category, but also posts after a selected date, before one, or both to get a slice.

Thats testing it on this here blog, finding all WordPress categorized posts between June 1 and July 1 of this year download. Or in screenshot mode:

There it is free for anyone to use.

I’d love input on other data I should try an extract (data science is not my bag of tricks). In the process I am grabbing the full content, so maybe there are more things that might be done with that text.


Featured Image: Pixabay image by Alexas_Fotos shared into the public domain using Creative Commons CC0

If this kind of stuff has any value, please support me monthly on Patreon or a one time PayPal kibble toss
Profile Picture for Alan Levine aka CogDog
An early 90s builder of the web and blogging Alan Levine barks at CogDogBlog.com on web storytelling (#ds106 #4life), photography, bending WordPress, and serendipity in the infinite internet river. He thinks it's weird to write about himself in the third person.

Leave a Reply

Your email address will not be published. Required fields are marked *