400 Million Records in MySQL

So I’m trying to figure out a way to make searching a VARCHAR in MySQL fast when there are 400 million rows.

I tried the UNIQUE approach using alter table names add unique(name(15)); in the table but I have some duplicates apparently, so now I’m trying a different method.

I’m going to create multiple tables;  a-z, 0-9.

Based on the input query such as SELECT * FROM name WHERE name='rich' I’ll split that out and re-write the query such as SELECT * FROM r_name WHERE name='rich'.  r_name table is considerably smaller than the entire name table of 400 million rows.  I’ll just say 30 million.

Here’s my bash script to process the main plain text file that I’ll then create LOAD for in MySQL for each split file.

#!/bin/bash
# Split names.txt into a.txt, b.txt, c.txt etc.
for x in {a..z}
do
 echo Scanning $x
 grep -i ^$x names.txt >$x.txt
done
for x in {0..9}
do
 echo Scanning $x
 grep -i ^$x names.txt >$x.txt
done
echo.
echo Done

This is currently processing, so I’ll update when this is finished and see how much more optimized this will be.

Update

Well, crap.  I think I screwed up when I initially parsed out the main names.txt and tried to only grab unique lines.  I should have passed a case insensitive uniq command using uniq -i.

So now I’ll try doing that again and reloading the database.

Update 2

I reloaded the names database from text file.  Doing a create index idx_name on names(name(767)); changed a query execution time of select * from names where name='rich' from 12 minutes to 0.03 seconds.  I’m happy now. And really happy I don’t need to create a bunch of tables and add logic to some PHP/MySQL. Just goes to show I’m green behind the ears when it comes to database technology.

Note: Not quite 400 million after removing duplicates and invalid names from the file 😉

MariaDB [data001]> select max(id) from names;
+-----------+
| max(id)   |
+-----------+
| 321995408 |
+-----------+
1 row in set (0.00 sec)
MariaDB [data001]> select * from names where name='rich';
+----------+-------------+
| id       | name        |
+----------+-------------+
| 86382207 | rich        |
+----------+-------------+
1 row in set (0.00 sec)

Leave a Reply

Your email address will not be published. Required fields are marked *