400 Million Records in MySQL

So I’m trying to figure out a way to make searching a VARCHAR in MySQL fast when there are 400 million rows.
I tried the UNIQUE approach using alter table names add unique(name(15)); in the table but I have some duplicates apparently, so now I’m trying a different method.

I’m going to create multiple tables;  a-z, 0-9.

Based on the input query such as SELECT * FROM name WHERE name='rich' I’ll split that out and re-write the query such as SELECT * FROM r_name WHERE name='rich'.  r_name table is considerably smaller than the entire name table of 400 million rows.  I’ll just say 30 million.

Here’s my bash script to process the main plain text file that I’ll then create LOAD for in MySQL for each split file.

#!/bin/bash

# Split names.txt into a.txt, b.txt, c.txt etc.

for x in {a..z}
do
 echo Scanning $x
 grep -i ^$x names.txt >$x.txt
done
for x in {0..9}
do
 echo Scanning $x
 grep -i ^$x names.txt >$x.txt
done

echo.
echo Done

This is currently processing, so I’ll update when this is finished and see how much more optimized this will be.

Update

Well, crap.  I think I screwed up when I initially parsed out the main names.txt and tried to only grab unique lines.  I should have passed a case insensitive uniq command using uniq -i.

So now I’ll try doing that again and reloading the database.

Update 2

I reloaded the names database from text file.  Doing a create index idx_name on names(name(767)); changed a query execution time of select * from names where name='rich' from 12 minutes to 0.03 seconds.  I’m happy now. And really happy I don’t need to create a bunch of tables and add logic to some PHP/MySQL. Just goes to show I’m green behind the ears when it comes to database technology.

Note: Not quite 400 million after removing duplicates and invalid names from the file 😉

MariaDB [data001]> select max(id) from names;
+-----------+
| max(id)   |
+-----------+
| 321995408 |
+-----------+
1 row in set (0.00 sec)
MariaDB [data001]> select * from names where name='rich';
+----------+-------------+
| id       | name        |
+----------+-------------+
| 86382207 | rich        |
+----------+-------------+
1 row in set (0.00 sec)

Windows 10 with Ubuntu: Bash + Conky + Firefox

2016-07-12_165719

Install the Windows Subsystem for Linux (Beta)

2016-07-12_170013

Set Windows 10 Developer Mode

2016-07-12_170058

Install X Server in Windows

I prefer Xming;  get it and install it from here:  http://sourceforge.net/projects/xming/files/latest/download

The defaults should work just fine.

Install stuff in Bash

2016-07-12_174934Open a Bash prompt;  you can hit the Windows Key and start typing bash (without quotes).  It’ll go through some installation stuff the first time you run it… give it a minute or two.

When it’s all done, time to install things…

Install some things in Bash.  I’m just going to install Firefox and Conky.

$ sudo apt-get install firefox conky

My conkyrc

The own_window_transparent yes causes weird issues for me; so I commented that out.

~/.conkyrc

own_window yes
#own_window_transparent yes
own_window_type desktop
own_window_hints undecorated,below,sticky,skip_taskbar,skip_pager
own_window_argb_visual true
own_window_argb_value 0
out_to_console no
use_xft yes
xftfont cure:size=10
update_interval 2
cpu_avg_samples 2
net_avg_samples 2
double_buffer yes
maximum_width 320
draw_shades no
draw_outline no
draw_borders no
stippled_borders 1
border_width 20
default_color white
default_shade_color white
default_outline_color white
alignment top_right
gap_x 15
gap_y 0
use_spacer left
no_buffers yes
uppercase no

TEXT


${color}${alignc}${time %A %B %d %Y | %H:%M:%S}

${alignc}$color Linux $kernel on $machine

${color}${alignc}${color lightgrey}Uptime: ${color}$uptime | ${color lightgrey}Load: $color$loadavg${color lightgrey}

${color lightgrey}${alignc}Battery :$color ${battery} | ${color lightgrey}Time: ${color}$battery_time
${alignc}${color #FFEF00}${battery_bar 8,278}
${color #656565}$stippled_hr$color
${alignc}${color lightgrey}${execi 1000 cat /proc/cpuinfo | grep 'model name' | sed -e 's/model name.*: //'| uniq}

${alignc}${color lightgrey}Total CPU Usage: ${color}${cpu cpu0}%
${alignc}${color #FFEF00}${cpubar cpu0 6,150}$color

${color lightgrey}Core: ${color}1 ${color #FFEF00}${cpubar cpu1 6,270}$color $alignc
${color lightgrey}Core: ${color}2 ${color #FFEF00}${cpubar cpu2 6,270}$color $alignc
${color lightgrey}Core: ${color}3 ${color #FFEF00}${cpubar cpu3 6,270}$color $alignc
${color lightgrey}Core: ${color}4 ${color #FFEF00}${cpubar cpu4 6,270}$color $alignc

${alignc}${color lightgrey}CPU Temperature: ${color}${hwmon 1 temp 1}C
${color #656565}$stippled_hr$color
${alignc}${color lightgrey}Resources

${color lightgrey}Ram ${alignc} ${color}$mem / $memmax ${alignr}${memperc}% Used
${color #FFEF00}${membar 6,318}
${color lightgrey}Swap ${alignc} ${color}${swap} / ${swapmax} ${alignr}${swapperc}% Used
${color #FFEF00}${swapbar 6,318}
${color lightgrey}Disk ${alignc} ${color}${fs_used} / ${fs_size} ${alignr}${fs_used_perc /}% Used
${color #FFEF00}${fs_bar 6,318 /}

${color lightgrey}Disk IO: $color ${diskio /dev/sda} ${alignr}${color lightgrey}Filesystem: ${color}${fs_type}
${color #656565}$stippled_hr$color
${alignc}${color lightgrey}Processes

${color lightgrey} PID Process${alignr}Memory CPU
${color}${top pid 1} ${top name 1}${alignr}${top mem_res 1} ${top cpu 1}%
${color}${top pid 2} ${top name 2}${alignr}${top mem_res 2} ${top cpu 2}%
${color}${top pid 3} ${top name 3}${alignr}${top mem_res 3} ${top cpu 3}%
${color}${top pid 4} ${top name 4}${alignr}${top mem_res 4} ${top cpu 4}%
${color}${top pid 5} ${top name 5}${alignr}${top mem_res 5} ${top cpu 5}%
${color}${top pid 6} ${top name 6}${alignr}${top mem_res 6} ${top cpu 6}%
${color}${top pid 7} ${top name 7}${alignr}${top mem_res 7} ${top cpu 7}%
${color}${top pid 8} ${top name 8}${alignr}${top mem_res 8} ${top cpu 8}%
${color}${top pid 9} ${top name 9}${alignr}${top mem_res 9} ${top cpu 9}%
${color}${top pid 10} ${top name 10}${alignr}${top mem_res 10} ${top cpu 10}%
${color #656565}$stippled_hr$color

E233: cannot open display

Trying to launch Firefox or Conky results in the error:  E233: cannot open display.

This is because we need to set the DISPLAY variable.  I prefer adding to my ~/.bashrc file.

So add the following line to the end of your ~/.bashrc:

export DISPLAY=:0

Save and close the Windows Bash prompt and re-open it.