Tuesday, October 23, 2007

putting it all together

Last post was about Rainbow testing. So that's old news, and I lost count of the number of bugs it found, and have been fixed. Probably 30+ crashes.

Now, I have to put together all these odds & ends of code I wrote disparately over the last 18 months into a single collection to enable complete end-to-end testing.

I have now got roughly the following:

o) random table maker
o) query maker based on any tables, (using predefined rules)
o) data generator for any tables
o) complete charset collection for each valid character
o) database of mysql functions, data types
o) multi-threaded testing environment for any queries
o) query results comparer for any queries
o) rainbow, which will provide help in making good coverage of functions, etc.

What's needed still? Well, I would like 56 hours in a day ... More posts later when v0.001 of the integrated is semi-functional :)

Wednesday, September 19, 2007

mysql crash log analyzer ?

So with over 1000 crashes in my logs from just one night of simple queries, I have a painful time to find new crashes in the 90MB file. So I'm taking a detour on the rainbow query generator for 2 days. Will be writing an error log analyzer instead :)


step 1: upload the mysqld and the mysqld.sym file for the version you're working with.
step 2: import the binary and symbols into a mysql table
step 3: upload the error log
step 4: parse the error log into seperate crashes
step 5: find the stack traces for each crash, and resolve them using the symbols
step 6: determine if this crash is matching any existing crashes.

With thousands of crashes on various builds of mysqld, the above system can be useful to me. Also, I'd import all the crashes from reports on bugs.mysql.com for searching purposes. This toy will allow me to identify whether a crash is new, or it's been seen before.

some results of rainbow

ok folks. here's some results:



mysql> select last_errno,count(*) from
queryqueue group by last_errno;
+------------+----------+
| last_errno | count(*) |
+------------+----------+
| 0 | 1600796 |
| 1048 | 1971 |
| 1053 | 1 |
| 1139 | 35 |
| 1267 | 19722 |
| 1270 | 4243 |
| 1271 | 8944 |
| 1416 | 2284 |
| 1580 | 23225 |
| 2003 | 28 |
| 2013 | 1606 |
+------------+----------+
11 rows in set (0.00 sec)


error 2013 means lost connection to server (read: server crashed).
so there are many bugs found already. 1606 crashes out of 1.6 million
executed queries, is great.

check my rss feed for the exact bugs ...

Wednesday, September 12, 2007

good news!

my girlfriend's grandfather wants a blog!

Friday, September 7, 2007

sql rainbow tables

I thought of a brilliant QA plan. Create sql rainbow tables! Just trust me, this system of QA will be very powerful in bug finding. Especially important will be the comparison of old/new versions of mysql. Writing a prototype now, on my day off.

Just during prototype development, I've discovered a handful of bugs. I'm testing all functions that are documented in the manual. This is alot that I've written up:


mysql> select count(*),category from func group by category;
+----------+--------------+
| count(*) | category |
+----------+--------------+
| 6 | arithmetic |
| 7 | bit |
| 60 | casting |
| 47 | comparison |
| 52 | datetime |
| 18 | encryption |
| 127 | geometry |
| 17 | information |
| 7 | logical |
| 32 | mathematical |
| 9 | misc |
| 47 | string |
| 2 | xml |
+----------+--------------+
13 rows in set (0.00 sec)


Let me explain this rainbow tables concept with a simple example.
Suppose we have a function, like "GREATEST()".

I run a query like this:

SELECT GREATEST(col1,col2,col3) FROM table0;

The columns col1,col2,col3 will form any combination of column type. So we have int, date, string, blob, spatial! Given a combination of all datatypes and all function parameters, it leaves us with a good few million queries. Of course, a few million queries is not so much for a machine to execute quickly..

Sunday, August 26, 2007

cross-check results using triggers and views

I've been struggling to come up with a decent prototype for the self-verifying random testcases I wrote about earlier. In the mean time, I wrote a set of triggers and a view which should yield the same information, no matter what you do to the underlying table. If they don't, there's a bug in the server.

The test table, t1 is simple. It has an int and a varchar column. There are triggers after insert/update/delete. The triggers update an evolving summary table.

The evolving summary table maintains the average+sum value of the int, and the average+sum length of the varchar field. The view selects these values from the table too.

To test it, I ran 50 threads of random insert/update/delete queries to table t1 for a few minutes. Then after the test had stopped I confirmed the results matched. (i.e. the running totals had been correctly maintained - if they match the view).

Here's the setup of table t1, the running summary table, and the view

Next, I hit the server with random update/insert/delete for a few minutes:


- completed spawning new database worker threads
- 15 threads running, 0003961 successful queries. (330.083333 QPS).
- 15 threads running, 0005334 successful queries. (114.416667 QPS).
- 15 threads running, 0006206 successful queries. (72.666667 QPS).
- 15 threads running, 0006783 successful queries. (48.083333 QPS).
- 15 threads running, 0007779 successful queries. (83.000000 QPS).
- 15 threads running, 0008981 successful queries. (100.166667 QPS).
- 15 threads running, 0010393 successful queries. (117.666667 QPS).
- 15 threads running, 0010979 successful queries. (48.833333 QPS).
- 15 threads running, 0011373 successful queries. (32.833333 QPS).
- 15 threads running, 0011875 successful queries. (41.833333 QPS).
- 15 threads running, 0012399 successful queries. (43.666667 QPS).
- 15 threads running, 0012870 successful queries. (39.250000 QPS).
- 15 threads running, 0013566 successful queries. (58.000000 QPS).
- 15 threads running, 0013775 successful queries. (17.416667 QPS).
- 15 threads running, 0014001 successful queries. (18.833333 QPS).
- 15 threads running, 0014265 successful queries. (22.000000 QPS).
- 15 threads running, 0014869 successful queries. (50.333333 QPS).
- 15 threads running, 0015325 successful queries. (38.000000 QPS).
- 15 threads running, 0016425 successful queries. (91.666667 QPS).
- 15 threads running, 0016743 successful queries. (26.500000 QPS).
- 15 threads running, 0016919 successful queries. (14.666667 QPS).
- 15 threads running, 0017180 successful queries. (21.750000 QPS).
- 15 threads running, 0017448 successful queries. (22.333333 QPS).
- 15 threads running, 0017745 successful queries. (24.750000 QPS).
- 15 threads running, 0017997 successful queries. (21.000000 QPS).
- 15 threads running, 0018095 successful queries. (8.166667 QPS).
- waiting for threads to finish



And - let's see if the results match up.


mysql> select * from t1_summary;
+--------------+-------------+-----------------+--------------------+-----------------+
|num_elements | sum_id | sum_length_name | avg_id | avg_length_name |
| 10237 | 76231668630 | 1198066 | 7446680.5343377098 | 117.0329195294 |
+--------------+-------------+-----------------+--------------------+-----------------+
1 row in set (0.00 sec)

mysql> select * from t1_view;
+-------------+-----------------+--------------+-----------------+
|sum_id | sum_length_name | avg_id | avg_length_name |
+-------------+-----------------+--------------+-----------------+
|76231668630 | 1198066 | 7446680.5343 | 117.0329 |
+-------------+-----------------+--------------+-----------------+
1 row in set (0.05 sec)



Yup, they do :) What have I actually tested here ?? Hm. At least one thing. The trigger's execution didn't miss anything. Why is this first step important ?? Because my program doesn't know the correct answer to the above 2 queries. It sends random garbage to server, hence it doesn't know how to verify a result. Having the server keep a running total, and comparing that to a final total from the view, is some very small assurance I think.

Next step will be to use more detailed functions instead of SUM and AVG. Something whereby the order of execution will matter... More on this later.

Saturday, August 25, 2007

how to undelete rows from a fixed length myisam table

You have inadvertently deleted some rows from your table and want them back. This is semi-doable with fixed row format of MyISAM. I put together a few steps, which I'll compliment with a program that does it for you, later.

  • save output of SHOW TABLE STATUS LIKE 't1' and SHOW CREATE TABLE `t1`.
  • shutdown mysql server asap!
  • backup t1.frm, t1.MYI, t1.MYD immediately
  • create a new table t1_recover, which has no auto-inc and no unique/pk.
  • remove the files t1_recover.MYD and t1_recover.MYI
  • write a C program that scans through t1.MYD reading blocks of length Avg_row_length and checking if first byte indicates a row is marked as deleted.
  • if it's deleted, dump it as is into the file t1_recover.MYD
  • goto mysql prompt. issue REPAIR TABLE `t1_recover` USE_FRM
Some notes.

You cannot recover entire record. You'll lose first 6 bytes of each record (or how every long data pointer length is). Typically, this is the first 2 int columns trashed.

You cannot determine which columns contained NULL in the recovered records (assuming some columns are nullable).

If you anticipate to have to use this recovery method often, then you can create a fake char(9) column at the beginning of the table, so that when a row is deleted, this column will be overridden instead of your real data..

Here is the code! myisam undeleter

Example usage below:
1) create a table and delete some records:

mysql> create table mydel(id int, name char(15), address char(15), bday datetime, daysleft decimal(10,5));
Query OK, 0 rows affected (0.06 sec)

mysql> insert into mydel values (1,'shane','down south','2005-09-01',10.5);
Query OK, 1 row affected (0.01 sec)

mysql> insert into mydel values (2,'susan car','up north','2001-02-01',14.1);
Query OK, 1 row affected (0.00 sec)

mysql> insert into mydel values (3,'bobby scott','chinatown','1976-12-21',55.5);
Query OK, 1 row affected (0.00 sec)

mysql> delete from mydel where id=2 or id=3;
Query OK, 2 rows affected (0.00 sec)

mysql> flush tables;
Query OK, 0 rows affected (0.00 sec)

mysql> show table status;
+--------+--------+------------+------+----------------+-------------+
| Name | Engine | Row_format | Rows | Avg_row_length | Data_length |
+--------+--------+------------+------+----------------+-------------+
| mydel | MyISAM | Fixed | 1 | 49 | 147 |
+--------+--------+------------+------+----------------+-------------+

2) next, build and run the app:


> gcc -o undeleter -lm ./undeleter.c
>
> ./undeleter /5.1/data/test/mydel.MYD /5.1/data/test/mydel_recover.MYD 49
allocating a block of memory for records, 49 bytes
starting to read records
found deleted record at offset 49
found deleted record at offset 98
-----------------
completed
-----------------
records copied into '/5.1/data/test/mydel_recover.MYD': 2
records present in '/5.1/data/test/mydel.MYD': 1

next steps:
1. remove the MYI for '/5.1/data/test/mydel_recover.MYD'
2. make sure '/5.1/data/test/mydel_recover.MYD' definition .frm is created without any unique or PK keys
3. repair table use_frm


3) next, sort out the recovered table


sbester@www:/5.1/data/test> cp ./mydel.frm ./mydel_recover.frm

mysql> repair table mydel_recover use_frm;
+--------------------+--------+----------+------------------------------------+
| Table | Op | Msg_type | Msg_text |
+--------------------+--------+----------+------------------------------------+
| test.mydel_recover | repair | warning | Number of rows changed from 0 to 2 |
| test.mydel_recover | repair | status | OK |
+--------------------+--------+----------+------------------------------------+
2 rows in set (0.00 sec)

mysql> select * from mydel_recover;
+------+-------------+-----------+---------------------+----------+
| id | name | address | bday | daysleft |
+------+-------------+-----------+---------------------+----------+
| -1 | san car | up north | 2001-02-01 00:00:00 | 14.10000 |
| 0 | ☺bby scott | chinatown | 1976-12-21 00:00:00 | 55.50000 |
+------+-------------+-----------+---------------------+----------+
2 rows in set (0.00 sec)


Notice! I just cp the .frm. That's only because I didn't need to strip out the
PK or auto-increment column. As you seen, `id` is trashed. `name`
is partially trash. the `id` would have caused alot of duplicate key
errors if you still had a PK.

More information here: MySQL_Internals_MyISAM

Tuesday, August 21, 2007

new direction for test development

I've been thinking about this today. My attempts thus far to write random queries which stress various parts of the server have worked, and served a purpose - but have been rather huge and clumsy.

For example a 5 union select, each comprising of 7 joins did have it's value.. The problem with huge random queries is simply that the results cannot be verified as easily as they were constructed (unless you run the same on various DBMS). These huge tests are mostly only good for checking if the server crashes.

Now, I wish to try write testcases that have these properties:
  • small
  • results are self-verified
  • each test is random, but reproducible given a seed value
Tonight I'll throw together a code-generator to do simple insert/select and post any interesting results here later.

Monday, August 13, 2007

a moment in time

I pulled the latest 5.1BK sources today and built mysqld. I must admit, it fared pretty well against my little charset tester. 0 failures and over 120 tests done which took >5 hours.

To give a clue to the score last time I run the same tests, here are the bugs I opened!

Sunday, August 12, 2007

sad news

my girlfriend hates blogs :(

Tuesday, July 24, 2007

That error 127 MyISAM bug finally...

Gotcha: Bug #29838

The bug itself seems so simple to repeat. I'm boggling how it was not seen before. :-0 I really tried too many overly-complex tests, IMHO. Sometimes, it's still best to keep it simple!

Thursday, May 24, 2007

that key_cache bug, 17332

This massive fix has been pushed into 5.1.19 and so far I cannot repeat any crashes!! Well done Ingo and reviewers ! My testing continues!

changing key_buffer_size on a running server can crash under load

Sunday, May 20, 2007

partition maintenance?

It seems doing any partition maintenance in combination with some concurrent workload will fail with many different errors in 5.1. For more or less the same underlying lack of locking, I've filed a few bugs.

To cause a myisam table corruption or server crash (or innodb assert) all you need is 1 thread inserting, and 1 thread altering/optimizing partitions: Of course, the more threads the easier the problems will happen..

Bug #28477
Bug #28487
Bug #28488
Bug #28489
Bug #28490