Organizational Research By

Surprising Reserch Topic

MySQL-INDEX - how long to create an index?

Can anyone tell me how adding a key scales in MySQL? I have 500,000,000 rows in a database, trans, with columns i (INT UNSIGNED), j (INT UNSIGNED), nu (DOUBLE), A (DOUBLE). I try to index a column, e.g.
ALTER TABLE trans ADD KEY idx_A (A);
and I wait. For a table of 14,000,000 rows it took about 2 minutes to execute on my MacBook Pro, but for the whole half a billion, it's taking 15hrs and counting. Am I doing something wrong, or am I just being naive about how indexing a database scales with the number of rows?

asked Sep 22, 2013 in MYSQL by rajesh
0 votes

Related Hot Questions

2 Answers

0 votes

There are a couple of factors to consider:

  • Sorting is a N.log(N) operation.
  • The sort for 14M rows might well fit in main memory; the sort with 500M rows probably doesn't, so the sort spills to disk, which slows things up enormously.

Since the factor is about 30 in size, the nominal sort time for the big data set would be of the order of 50 times as long - under two hours. However, you need 8 bytes per data value and about another 8 bytes of overhead (that's a guess - tune to mySQL if you know more about what it stores in an index). So, 14M × 16 ≈ 220 MB main memory. But 500M × 16 ≈ 8 GB main memory. Unless your machine has that much memory to spare (and MySQL is configured to use it), then the big sort is spilling to disk and that accounts for a lot of the rest of the time.

answered Sep 22, 2013 by rajesh
0 votes
Firstly, your table definition could make a big difference here. If you don't need NULL values in your columns, define them NOT NULL. This will save space in the index, and presumably time while creating it.
As for the time taken to create the indexes, this requires a table scan and will show up as REPAIR BY SORTING. It should be quicker in your case (i.e. massive data set) to create a new table with the required indexes and insert the data into it, as this will avoid the REPAIR BY SORTING operation as the indexes are built sequentially on the insert. There is a similar concept explained in this article.
CREATE DATABASE trans_clone;
CREATE TABLE trans_clone.trans LIKE originalDB.trans;
ALTER TABLE trans_clone.trans ADD KEY idx_A (A);
Then script the insert into chunks (as per the article), or dump the data using MYSQLDUMP:
mysqldump originalDB trans  --extended-insert --skip-add-drop-table --no-create-db --no-create-info > originalDB .trans.sql
mysql trans_clone < originalDB .trans.sql
This will insert the data, but will not require an index rebuild (the index is built as each row is inserted) and should complete much faster.
answered Sep 22, 2013 by rajesh