Migrating to Hugo from WordPress

I’ve migrated this site from WordPress to Hugo. Here’s some notes on it.

First, it takes less than 30 seconds to build the site and also for Pagefind to index for the search functionality.

site@kreinix:~/techish# time ./build.sh
Start building sites …
hugo v0.129.0-e85be29867d71e09ce48d293ad9d1f715bc09bb9+extended linux/amd64 BuildDate=2024-07-17T13:29:16Z VendorInfo=gohugoio


                   | EN-US
-------------------+--------
  Pages            |  1035
  Paginator pages  |     0
  Non-page files   |     0
  Static files     | 12458
  Processed images |     0
  Aliases          |     1
  Cleaned          |     0

Total in 18692 ms

Running Pagefind v1.1.0
Running from: "/site/techish"
Source:       "/var/www/html"
Output:       "/var/www/html/pagefind"

[Walking source directory]
Found 906 files matching **/*.{html}

[Parsing files]
Did not find a data-pagefind-body element on the site.
↳ Indexing all <body> elements on the site.

[Reading languages]
Discovered 2 languages: en-us, unknown

[Building search indexes]
Total:
  Indexed 1 language
  Indexed 904 pages
  Indexed 18763 words
  Indexed 0 filters
  Indexed 0 sorts

Finished in 2.681 seconds

real    0m21.474s
user    0m7.717s
sys     0m5.473s

Ok, pretty cool.

To get here, there were a handful of things I did.

  • Convert WordPress MySQL to SQLite3 database file (done long ago)
  • Write Python3 script to convert SQLite3 database file to HTML files to process as Markdown

I have the Python3 script in a repo on my github.

What’s really cool is that I went from about 500KB for data request to less than 100KB now serving static files and cleaning up and not using a lot of the bloat from a PHP / WordPress combination.

Problems

Me being pure lazy right now, I will fix the links across all the posts soon. Some things I’ve had to use a hammer on to get working right away. One thing being that I used to use links such as /category/post/ permalink. The python I wrote didn’t take the category from the post and put it in a category folder. Here’s a real crappy hammer approach to getting nginx to work around this issue and redirect /category/post/ to /post/.

location ~* ^/(?!posts/|search/|pagefind/|icons/|js/|fonts/|categories/|tags/)[^/]+/[^/]+/?$ {
    if (-f $request_filename) {
        break;
    }
    rewrite ^/[^/]+/(.+)$ /$1 redirect;
}
location /posts/ {
    try_files $uri $uri/ =404;
}
location / {
    try_files $uri $uri/ =404;
}

Shortcodes

I used a plugin on my WordPress blog that took table data from a shortcode plugin and converted it to a <table></table> nicely. It was convenient because I could create tabled data quickly using a structure such as:

columnA columnB
row1A row1B
row2A row2B

Among that, I’m certain in the past I used other plugins that used shortcodes. I need to parse the files now to fix this. Eventually I need to implement this in the SQL to Mardown portion to avoid any after-conversion processing.

Titles

Seems the wp2hugopy code (github) that extracts posts doesn’t handle html entities in the title of the post so I’ll need to add a unit test for that case for the wp2hugopy project. For now, I’ll write something up to specifically find frontmatter with title: that is not quoted and quote it with whatever HTML entities are in it. That also means I need to slugify the markdown filename manually. An update to my wp2hugopy code should handle that automatically when there’s a fresh extraction in the future.

Published by

Rich

Just another IT guy.

Leave a Reply

Your email address will not be published. Required fields are marked *