Extracting unique words from all my blog post titles

Had an idea to extract all the unique words from my blog post titles and sort and rank them by frequency. I used MySQL, sed, tr, grep, cat and a little bash script hacked together to do this.

Here’s the top 10 unique words in my blog post titles.

OccurrencesWord
150Windows
46Server
32Cisco
26Command
25Microsoft
22SQL
20Explorer
19Linux
18Internet
18Error

Here’s how I got to this…

SQL Query

select id,post_title from wp_posts where post_type='post' and post_status='publish'

Bash Script

The script splits each word into a new line and also removes any non-alphanumeric characters sh split.sh > single-words.txt

#!/bin/bash

cat post-titles.csv | while read line
do
    for word in $line
    do
        echo $word | tr -cd '[:alnum:]\n'
    done
done

Cleanup and Sorting

Remove empty lines

sed -i '/^$/d' single-words.txt

Prepare stopwords

wget https://gist.githubusercontent.com/sebleier/554280/raw/ -O stopwords.txt

Remove stopwords from list I have so far.

cat single-words.txt | grep -v -Fix -f stopwords.txt|sort -rn|uniq -c|sort -rn|head -15

And that’s a wrap.

Login to WordPress from Python

I’ve been trying to learn some Python and have been tinkering with the requests module. Here is how I am able to log into a webpage, such as WordPress.

import requests
 url = "https://techish.net/wp-login.php"
 redirect_to  = "https://techish.net/wp-admin/"
 with requests.Session() as session:
     post = session.post(url, data={
         'log': 'admin',
         'pwd': 'password',
         'redirect_to': redirect_to
         }, allow_redirects=True)
     get = session.get(redirect_to, cookies=post.cookies)
     print(get.text)

WordPress TwentyTwenty Theme – Inter font Apache2 error

I’m testing out the development version of TwentyTwenty theme from WordPress on this site.

I noted that calls to /assets/fonts/inter/Inter-upright.var.woff2 were causing some grief for Apache2 (Error 500):

AH00681: Syntax error in type map, no ':' in /var/www/clients/client0/web1/web/wp-content/themes/twentytwenty/assets/fonts/inter/Inter-upright.var.woff2 for header wof2

Cursory Google search indicates that Apache2 is interpreting filenames with .var.* in the name as a Type Map.

To work around this, I’ve set the following in my .htaccess:

RemoveHandler .var

WP Preserve Backslashes

I created a WordPress plugin based on a personal dilemma I ran into with my site being stripped of backslashes.

Upon post save, it converts backslashes to HTML entity ] which is what will be stored in the database.

The plugin is available on GitHub at https://github.com/rjkreider/wp-preserve-backslashes
Here’s the function if you want to just drop it in your functions.php file instead of installing it as a plugin.

function wppb_keepbackslash($PostID) {
    $thePost = get_post($PostID);
    $Content = str_replace('\\', '\', $thePost->post_content);
    // unhook this function so it doesn't loop infinitely
    remove_action( 'save_post', 'wppb_keepbackslash' );
    $UpdatedPost = array (
          'ID'           => $PostID,
          'post_title'   => $thePost->post_title,
          'post_content' => $Content
        );
  wp_update_post( $UpdatedPost );
/** if (is_wp_error($post_id)) {
 $post_id=   wp_update_post( $UpdatedPost );
        $errors = $post_id->get_error_messages();
        foreach ($errors as $error) {
                echo $error;
        }
}   **/
    // re-hook this function
    add_action( 'save_post', 'wppb_keepbackslash' );
}
add_action('save_post', 'wppb_keepbackslash' ); // Update Content when saving content