All my backslashes are gone in WordPress. Yikes.

Discovered that my most recent conversion from SQLite to MySQL seems to have screwed up my backslashes in all my posts that have backslashes.

This is bad because my code snippets should not be copy & pasted and run at face value unless you verify the code!  It could seriously break shit.

Ugh.  This is going to be a PITA to go and fix 500 posts.   It might be quicker to try to fix the SQLite DB file and try another conversion.  This isn’t the first time I’ve noticed this problem.  I see the issue when I restore from XML files as well, and even just copying database using something like mysqldump to dump and then importing using mysql command.  I’m probably just missing a simple flag to not strip slashes or something.

My next step is to confirm if there is actually a backslash in the SQL data and it is just being stripped in the_content() or something;  or if the backslash is REALLY not there.  *sad face*

update 1:  the slashes are not in SQL.  Looks like I need to look at my export DB to see if they are in that. *crosses fingers*

update 2:  found this article that creates a function to convert backslashes into HTML entities as the posts are saved.  https://www.tweaking4all.com/web-development/wordpress/preserve-backslash-in-posts/#comment-268277

WordPress SQLite to MySQL Migration Complete

Just finished migrating my website from SQLite to MySQL. What a rush. (lol)

It was actually not as bad as I thought. A lot of sed, grep and other sorcery involved; especially in transforming of SQLite statements to MySQL.

Some quick commands I used:

sqlite techish.db .dump > production_2018-08-23.dump.sql

I found that it used quotes for tables and column names, so I had to remove those first and foremost.

sed -i '/INSERT INTO/,/VALUES (/s/"//g' production_2018-08-23.dump.sql

Next I found that there was an error using mysql -ufoo -p mynewdatabase < production_2018-08-23.dump.sql because the table creations were failing still. So I did a quick fresh install of a vanilla WordPress install, did a dump of the database and just grabbed the table creation parts out:

Dump fresh database:

mysqldump -ufoo -p wordpres > wordpress.sql

Next, I just want table creations…

awk '/CREATE TABLE/, /) ENGINE/' wordpress.sql > create_tables.sql

Next, run create_tables.sql on my new database and then import data.

mysql -ufoo -p mynewdatabase < create_tables.sql

Sweet, that worked and I have a baseline of tables now.

Now importing the data…

Development Log: Duplicate File Finder

I have thousands of files stored on an external USB attached 1TB drive.  My drive is currently 95% full.  I know I have duplicate files throughout the drive because over time I have been lazy and made backups of backups (or copies of copies) of images or other documents.
Time to clean house.
I’ve searched online for a tool to do the following things, relatively easily and in a decent designed user interface:

  • Find duplicates based on hash (SHA-256)
  • List duplicates at end of scan
  • Give me an option to delete duplicates, or move them somewhere
  • Be somewhat fast

Every tool I’ve used fell short somewhere.  So I decided to write my own application to do what I want.
What will my application do?
Hash each file recursively given a starting path and store the following information into an SQLite database for reporting and/or cleanup purposes.

  • SHA-256 Hash
  • File full path
  • File name
  • File extension
  • File mimetype
  • File size
  • File last modified time

With this information, I could run a report such as the following pseudo report:
Show me a list of all duplicate files with an extension of JPG over a file size of 1MB modified in the past 180 days.

That’s just a simple query, something like:

SELECT fileHash, fileName, filePath, fileSize COUNT(fileHash) FROM indexed_files WHERE fileExtension='JPG' and fileSize > 1024 GROUP BY fileHash HAVING COUNT(fileHash)>1

My application can show me a list of these and make some decisions to allow me to move or delete the duplicates after the query runs.

One problem comes to mind in automating removal or moving duplicates… What if there are more than 1 duplicate file; how do I handle this?

So on to the bits and pieces…

The hashing function is pretty straight-forward in VB.NET (did I mention I was writing this in .NET?).

Imports System.IO
Imports System.Security
Imports System.Security.Cryptography
Function hashFile(ByVal fileName As String)
  Dim hash
  hash = SHA256.Create()
  Dim hashValue() As Byte
  Dim fileStream As FileStream = File.OpenRead(fileName)
  fileStream.Position = 0
  hashValue = hash.ComputeHash(fileStream)
  Dim hashHex = PrintByteArray(hashValue)
  fileStream.Close()
  Return hashHex
End Function
Public Function PrintByteArray(ByVal array() As Byte)
  Dim hexValue As String = ""
  Dim i As Integer
  For i = 0 To array.Length - 1
    hexValue += array(i).ToString("X2")
  Next i
  Return hexValue.ToLower
End Function
Dim path As String = "Z:"
' Insert recursion function here and inside, use the following:
Dim fHash = hashFile(path) ' The SHA-256 hash of the file
Dim fPath = Nothing ' The full path to the file
Dim fName = Nothing ' The filename
Dim fExt = Nothing ' The file's extension
Dim fSize = Nothing ' The file's size in bytes
Dim fLastMod = Nothing ' The timestamp the file was last modified
Dim fMimeType = Nothing ' The mimetype of the file

Ok cool, so I have a somewhat workable code idea here. I’m not sure how long this is going to take to process, so I want to sample a few hundred files and maybe even think about some options I can pass to my application such as only hashing specific exensions or specific file names like *IMG_* or even be able to exclude something.
But first… a proof of concept.

Update: 11/28/2016

Spent some time working on the application.  Here’s a GUI rendition;  not much since it is being used as a testing application.

I have also implemented some code for SQLite use to store this to a database.  Here’s a screenshot of the database.

Continuing on with some brainstorming, I’ve been thinking about how to handle the multiple duplicates.
I think what I want to do is

  • Add new table “duplicates”
  • Link “duplicates” to “files” table by “id” based on duplicate hashes
  • Store all duplicates found in this table for later management (deleting, archiving, etc.)

After testing some SQL queries and using some test data, I came up with this query:

SELECT * FROM file a
WHERE ( hash ) IN ( SELECT hash FROM file GROUP BY hash HAVING COUNT(*) > 1 )

This gives me the correct results as illustrated in the screenshot below.

So with being able to pick out the duplicate files and display them via a query, I can then use the lowest “id” as the base or even the last modified date as the original and move the duplicates to a table to be removed or archived.
Running my first test on a local NAS with thousands of file.  It’s been running about 3 hours and the database file is at 1.44MB.

Update 12/1/2016

I’ve worked on the application off and on over the past few days trying to optimize the file recursion method.  I ended up implementing a faster method than I created above, and I wrote about it here.

Here’s a piece of the code within the recursion function.  I’m running the first test on my user directory, C:Users
kreider.  The recursive count took about 1.5 seconds to count all the files (27k).  I will need to add logic because the file count doesn’t actually attempt to open and create a hash like my hash function does;  so 27k files may actually end up only being 22k or whatever.

Just a file count of C:\users\rkreider (SSD) took about 1.5 seconds for 26k files.

File count of my user directory (SSD disk), no file hashing or other processing done.


Hashing Test Run 1
On this pass, I decided to run the hash on the files.  It took considerably longer, just under 5 minutes.

File hashing recursively of my user directory (SSD).


Something important to note.  Not all 26,683 of the original files scanned were actually hashed for various reasons such as Access Permissions, file already opened by something, etc.
For comparison, the database (SQLite) created 26,505 records and is 5.4MB in size.
Hashing Test Run 2
I moved the file counter further into the hash loop and only increment the counter when a file is successfully hashed.  Here are my results now.

Recursive hash of my user directory (SSD) with a found/processed indicator now.


As you can see, it found 26,684 file and could only process (hash) 26,510.

Comparing the result in GUI to the database with SELECT COUNT(*) FROM file, it matches properly.  The database size remains about the same, 5.39MB.

One thing that I’m trying to decide is whether or not to put some type of progress identifier on the interface.
The thing is, this adds overhead because I have to first get a count of files and that will take x seconds.  In the case of the NAS scan, it took 500+ seconds, over 5 minutes.  So I’d be waiting 5 minutes JUST for a count and then I’d start the file hashing which will take time.  I just don’t know if it is worth it, but it sure would be nice I believe.

Database Schema

CREATE TABLE [file] (
[id] INTEGER  PRIMARY KEY AUTOINCREMENT NOT NULL,
[hash] text  NULL,
[fullname] text  NULL,
[shortname] text  NULL,
[extension] text  NULL,
[mimetype] text  NULL,
[size] intEGER  NULL,
[modified] TIMESTAMP  NULL
);

MAC Address Lookup

Spent a little time at lunch today creating a MAC lookup tool for my site. There is now a new menu, Tools, which will have some of the online tools I setup over time.

I use perl script to parse the IEEE oui.txt file and dump it into an SQlite3 database. From there I wrote some PHP to query that database.

You can visit https://techish.net/mac/ and start searching.

The following are all valid formats for supplying a MAC. You can supply the whole MAC if you want, I try to be smart about my filtering.

  • https://techish.net/mac/00-00-00
  • https://techish.net/mac/00:00:00
  • https://techish.net/mac/0000.00
  • https://techish.net/mac/000000
  • https://techish.net/mac/000000afsdf3efds8afasd0f

If you find any bugs, or features, drop me a line.

Performance Tuning OwnCloud 6.02

I am using OwnCloud for some personal file storage and synchronizing of Contacts on my Linux server. The web interface is horribly slow with a default install. Here are some of the things I did to adjust performance and make it a bit faster.
PHP Specific

  • Increase memory_limit to 512MB
  • Installed php-apc

OwnCloud Specific

  • Installed using MySQL instead of SQLite3
  • Disabled addons that I did not need
  • Changed from AJAX Cron to Cron

Linux Server Specific

  • Nothing

I still see a request times on scan.php to be 1 second+, however, performance overall is much improved.
My System Setup

  • OwnCloud 6.02
  • MySQL 5.5.35
  • PHP FastCGI
  • Debian Linux 7.4
  • Memory – 4GB
  • CPU – 2x2GHz