How to clean up the WordPress wp_postmeta database table

Each table in the WordPress database has its own little eccentricities that clog up websites, and as part of my MySQL work in the last few weeks, I’ve tried to break down each one’s plumbing problems. The wp_postmeta table (you may have a different prefix in your database) is a particularly tricky case: it doesn’t have obvious rows to clear out, like wp_posts (revisions) or wp_options (transients). But under the right (wrong) circumstances, it can still get fatter than a 40-pound cat.

The Problem with wp_postmeta

The two dangers of the wp_postmeta table are:

  • Import leftovers from another CMS like Blogger, Movable Type, etc. I don’t know if import technology has improved since, but on imports from several years ago, these services transferred everything correctly to wp_posts—and then dumped everything, or a lot of things, as a duplicate into wp_postmeta, sometimes multiple times. I just removed 137 megabytes of this from a client’s site and I’m not done digging. If you have ever transferred your site, this is a clean-up you should do immediately.
  • Leftovers from deleted or deactivated plugins that didn’t clean up after themselves. This is pretty common. WP-Optimize and other plugins claim they can catch these, but in my experience, clean-up plugins aren’t smart enough to do this properly. No disrespect—it’s a complicated task that needs human attention. But we can make it easier.

Good database habits: prevention first

Every WordPress site should do this sort of scan at least once, but both of these are problems facing older sites. Newer plugins should do a better job of cleaning up after themselves. However, the default of many plugins is to leave everything when you delete the plugin unless you check a settings box confirming you want it to take all its data with it. (WordFence is one of these.) So make sure you go through every settings screen to take care of this.

If you have the time and expertise, I would even test every plugin you want use on a sample site to see if it’s leave-no-trace or not. Or look at the code to check.

It’s going to be a lot faster to do this upfront than spend hours a few years from now trying to figure out if aiosp_description is something you’re still using on 3,000 posts.

What’s in the wp_postmeta table?

wp_postmeta is a standard WordPress meta table, so it comes with a unique ID for the row, the ID of the post the row is attached to, and meta_key and meta_value pairs that actually add the metadata to posts, pages, and attachments.

meta_key rows come with a leading underscore (for secret/default settings) and no underscore (for user-accessible settings in the admin dashboard), so we need to search for both.

WordPress and plugins might add meta_key rows with multiple different values attached to the prefix. i.e. the All-In-One SEO plugin adds (or used to add) individual metadata key-value pairs like this:

  • aiosp_description
  • aiosp_edit
  • aiosp_keywords
  • aiosp_title

(Yoast used to do this and now adds its own table to mirror the wp_posts table instead).

So our first step is to isolate the prefixes and then we can do individual searches to further break them down. We want to know how many rows they occupy and how much hard drive space, too.

We’ll be doing this from the command line after logging into mysql, or you can also perform these operations from your database with phpMyAdmin.

We’ll do this with a pair of queries—if there’s a more efficient way, email me!

The first query uses a SUBSTRING_INDEX with an underscore delimiter and a position of 1 to return the string of the meta_key value that appears before the first underscore. In other words, it will return all prefixes with no leading underscore. But it also returns all rows with a leading underscore as a blank, so we can filter those out with the WHERE clause.

The second query returns the prefix from only rows with a leading underscore, ensured by another WHERE clause.

Let’s build the first query:

SELECT SUBSTRING_INDEX(meta_key, '_', 1) AS `Meta`
     FROM wp_postmeta
     WHERE meta_key NOT LIKE '\_%';

We probably want to do this with a row count and in order.

SELECT SUBSTRING_INDEX(meta_key, '_', 1) AS `Meta`, COUNT(*) AS `Count`
     FROM wp_postmeta
     WHERE meta_key NOT LIKE '\_%'
     GROUP BY `Meta`
     ORDER BY `Count` DESC;

And let’s add the data size of each group of prefixes, too. We’ll get this by summing the length of each column in the table and dividing by 1048567 (1024 x 1024) to get megabyte values from bytes.

SELECT SUBSTRING_INDEX(meta_key, '_', 1) AS `Meta`,
 (SUM(LENGTH(meta_id)+LENGTH(post_id)+LENGTH(meta_key)+LENGTH(meta_value)))/1048567 AS `Size`, COUNT(*) AS `Count`
    FROM wp_postmeta
    WHERE meta_key NOT LIKE '\_%'
    GROUP BY `Meta`
    ORDER BY `Size` DESC;

And here’s the second query for prefixes without leading underscores.

-- list and count for prefixes with a leading underscore
SELECT SUBSTRING_INDEX(meta_key, '_', 2) AS `Meta`, '' COUNT(*) AS `Count` 
    FROM wp_postmeta
    WHERE meta_key LIKE '\_%'
    GROUP BY `Meta`
    ORDER BY `Count`;
-- list, data size, and count
SELECT SUBSTRING_INDEX(meta_key, '_', 2) AS `Meta`, 
(SUM(LENGTH(meta_id)+LENGTH(post_id)+LENGTH(meta_key)+LENGTH(meta_value)))/1048567 AS `Size`, COUNT(*) AS `Count`
    FROM wp_postmeta
    WHERE meta_key LIKE '\_%'
    GROUP BY `Meta`
    ORDER BY `Size` DESC;

You might want to add a LIMIT 5 or so just to get the main prefixes.

Investigating the meta keys

Now we know what’s taking up space. It’s up to you to figure out if this is metadata you’re actually using or if it is plugin or import leftovers. This is the part that plugins can struggle to figure out programmatically.

This is also a good point to back up your database or at least the wp_postmeta table in case you delete something it turns out you needed.

How to investigate:

  • Check the prefixes against your current plugins to find obvious orphans
  • Do deeper searches on prefixes with many rows to get the full list of keys
  • Look for outdated meta keys with a date search

I am no longer using the commenting plugin Disqus on Rawkblog, for instance, so anything here with “disqus” in it is going to be trash. But not everything is going to be this obvious, which is where checking by date, to see which meta keys haven’t been used in a while, can give us some quick clues.

Go over to wp_posts to check your latest post ID. On Rawkblog, I’m up to 15896. If I have a prefix that hasn’t been used since post 4330—and I do, “aktt_notify_twitter”—then it could well be junk. I know for sure this one is, but it could also just be a field that is infrequently used and is still adding info to the post—when in doubt, check the actual posts before deleting anything.

I’m going to walk through ways I tried to do this and get to the best solution at the end.

You can check prefixes one by one to see when they were most recently used. Use your own prefix:

SELECT post_id, meta_key 
    FROM wp_postmeta 
    WHERE meta_key LIKE 'aktt\_%'
    ORDER BY post_id DESC
    LIMIT 1;

This will give you the most recent post_id for the meta key. If it’s 500 posts ago, you may have junk. If you are only checking two or three, this is easy enough. It’s fastest to do these one at a time.

The better way: we can do this even more easily with the SQL MAX function to grab the highest ID number that pairs with the prefix. SQL is amazing.

SELECT MAX(post_id) AS `post_id`, SUBSTRING_INDEX(meta_key, '_', 1) AS `Meta`
FROM wp_postmeta
    WHERE meta_key NOT LIKE '\_%'
    GROUP BY `Meta`
    ORDER BY `post_id` DESC;

Looking at Rawkblog, I have a ton of meta data that stops at post No. 15,158.

Looking this up… this post is a NextGen slideshow. I’ve deleted this plugin and am no longer using it. But all the slideshow junk is still in my wp_postmeta! Wow.

See, this is why you look.

Let’s add data and a row count to this to see how much space this is all taking up:

SELECT MAX(post_id) AS `post_id`, SUBSTRING_INDEX(meta_key, '_', 1) AS `Meta`, (SUM(LENGTH(meta_id)+LENGTH(post_id)+LENGTH(meta_key)+LENGTH(meta_value)))/1048567 AS `Size`, COUNT(*) AS `Count`
    FROM wp_postmeta
    WHERE meta_key NOT LIKE '\_%'
    GROUP BY `Meta`
    ORDER BY `post_id` DESC;

SELECT MAX(post_id) AS `post_id`, SUBSTRING_INDEX(meta_key, '_', 2) AS `Meta`,
 (SUM(LENGTH(meta_id)+LENGTH(post_id)+LENGTH(meta_key)+LENGTH(meta_value)))/1048567 AS `Size`, COUNT(*) AS `Count`
    FROM wp_postmeta
    WHERE meta_key LIKE '\_%'
    GROUP BY `Meta`
    ORDER BY `Size` DESC;

This is really good. But here’s the best way, which adds a JOIN so we can get the dates and not have to look at the other table. Don’t forget to add MAX to the date.

SELECT MAX(t1.post_id) AS `post_id`, MAX(t2.post_date) AS `Date`, SUBSTRING_INDEX(t1.meta_key, '_', 1) AS `Meta`, (SUM(LENGTH(meta_id)+LENGTH(post_id)+LENGTH(meta_key)+LENGTH(meta_value)))/1048567 AS `Size`, COUNT(*) AS `Count`
    FROM wp_postmeta AS t1
    JOIN wp_posts AS t2 
        ON t1.post_id = t2.ID
    WHERE meta_key NOT LIKE '\_%'
    GROUP BY `Meta`
    ORDER BY `Date` DESC;

The main offender here for Rawkblog: 2,760 rows for a Disqus plugin I’m no longer using.

Here’s the second query:

SELECT MAX(t1.post_id) AS `post_id`, MAX(t2.post_date) AS `Date`, SUBSTRING_INDEX(t1.meta_key, '_', 2) AS `Meta`, (SUM(LENGTH(meta_id)+LENGTH(post_id)+LENGTH(meta_key)+LENGTH(meta_value)))/1048567 AS `Size`, COUNT(*) AS `Count`
    FROM wp_postmeta AS t1
    JOIN wp_posts AS t2 
        ON t1.post_id = t2.ID
    WHERE meta_key LIKE '\_%'
    GROUP BY `Meta`
    ORDER BY `Date` DESC;

Once you have your list of meta keys that haven’t been used since, say, 2011, check some of those old IDs (or just login to WordPress and look at your posts) and see if this is junk or not. Looking at the meta of a post page should make this pretty clear.

Cleaning the clutter

Here’s the fun part:

DELETE FROM wp_postmeta 
    WHERE meta_key LIKE 'disqus%';

Insert your own meta key.

Delete by the individual meta key or by prefix—do a search first to make sure you don’t delete anything accidentally. Always do a SELECT double-check before a DELETE. If you’ve never done an SQL DELETE, don’t—hire a professional. (I am a professional!)

As always, OPTIMIZE your tables after any deletions to finish the clean-up process.

Wrap-up

In this article, we’ve covered how the wp_postmeta database table gets clogged and a range of ways to explore it to find out how and delete the leftovers for a faster, cleaner WordPress site.

Learn more about WordPress optimization for other tables on my database deep-cleaning GitHub project, and let me know if you need a hand with your site.