doug dobies
about | projects | linkedin | github | twitter |



If you have MySQL replication running in your environment, you very likely have encountered this error before.
Why does this happen?
Before I explain that, we have to understand exactly how MySQL Master<>Slave replication works:
Two threads are created on the SLAVE to the MASTER.

  • The IO thread downloads events from the master binary logs, appending those events to the relay log.
  • The SQL thread reads the events from the relay logs, and executes them on the local DB.

  • These threads need to store their state.

    The IO thread keeps track of the next event to DOWNLOAD from the MASTER binary logs using 2 variables
    (check Master_Log_File and Read_Master_Log_Pos) from a SHOW SLAVE STATUS command.

    The SQL thread keeps track of the next event to EXECUTE on the local DB using 4 variables:
    position (in the SLAVE relay log)
    relay log file (from the MASTER binary log)
    relay log position (from the MASTER binary log)

    The SLAVE reads the relay-info.log file, and applies the events stored there. If a crash occurs on the SLAVE between the commit, and the update of the file, replication will break.

    If the IO thread has appended the data to the relay log, but has not yet updated it's state file when MySQL crashes, the same event is re-appended to the relay log.

    If the SQL thread has committed the event but has not yet updated its state file when MySQL crashes, the same event is re-run.

    You end up with a SLAVE DB that has applied an event/transaction, but has not updated the log file that is used to determine which event/transaction to apply next.
    In simple terms successful replication looks like this:

    relay-info.log says to apply event/transaction 100
    event/transaction 100 = INSERT INTO TABLE VALUES(100,some value);
    The SLAVE DB applies that event/transaction, and updates the relay-info.log file.
    The SLAVE reads the file, gets the next transaction writes it to the file
    The SLAVE reads the relay.log. file, which says apply event/transaction 101
    rinse and repeat..

    But if this happens:

    relay-info.log says to apply event/transaction 100
    event/transaction 100 = INSERT INTO TABLE VALUES( 100, some value );
    The SLAVE DB applies that event/transaction and immediately crashes.
    The SLAVE DB restarts
    The SLAVE DB reads which says to apply event/transaction 100
    The SLAVE DB errors attempting to commit the event/transaction
    The DB TABLE cannot commit the event as that INSERT has already been applied.
    If the table includes a primary key, or a unique key, replication will fail with Error 1062 DUPLICATE KEY ERROR
    If the table doesn't have either key, then the slave will successfully commit the transaction, and you get inconsistent data.
    ie: 2 rows with the exact same values on the SLAVE, where the MASTER has only a single row with that data.

    Armed with this information, you now know WHY this error happens. But thats just half the battle. You have to check the relevant logs to see which event/transaction is attempting to be executed. The logs will tell you if the next transaction is one that has already been commited to the Slave. If that is the case, it's safe to SKIP the next transaction.

    Now lets look at the contents of the relevant files:

    The file looks like the following:

    FILENAME /var/lib/mysql/
    18 - number of lines in the file
    mysql-bin.000001 - name of the current master binary log being read from the master
    589 - current position within the master binary log that has been read by the master.

    The file looks like this:
    FILENAME /var/lib/mysql/
    /var/log/mariadb/mysql-relay-bin.000002 - filename of current relay log
    873 - current position within the relay log, events up to this position have been applied to the slave DB
    mysql-bin.000001 - name of the master binary log, which is read by relay log
    589 - the position within the master binary log that have already been applied on the slave.

    [root@server1 mysql]# cat

    Using the above example, we see that mysql-relay-bin.000004 is the file containing the events/tranactions. So next, we view the contents of that file:

    [root@server1 mysql]# mysqlbinlog /var/log/mysql/mysql-relay-bin.000004
    SET TIMESTAMP=1486576801/*!*/;
    INSERT INTO datav2 VALUES('2017-02-08 12:00:01', '66')
    # at 2772
    #170208 13:00:01 server id 1 end_log_pos 2515 Xid = 88952

    We can see that there is an INSERT statement executed on the datav2 table. The next logical step would be to login to the slave DB, and verify if this statement has already been exectuted.

    mysql> DESC datav2; (to find the structure of the table, so we have something to query.)

    Lets assume the table consistes of a timestamp, and and an integer variable.
    Next, we'd query the DB/Table to see if a duplicate entry exists.

    mysql> SELECT * from datav2 where timesteamp like '2017-02-08 12:00:01';

    If this query returns a result, and the integer following the timestamp is '66' then you have your reason for the duplicate key error.
    The mysql-relay-bin.000004 log is trying to INSERT duplicate data into the Slave.
    You can safely SKIP this duplicate entry with the following command:


    Now try to restart the slave with the START SLAVE; command.
    If there are more than one events/transactions that have been duplicated, the error will re-occur when trying start the slave.

    There are those that say 'NEVER SKIP! YOU'LL LOSE DATA!' and you may, but if you follow this guide, you will know if that data has already been commited to the DB. Skipping a transaction, that would simply write a record that already exists in the DB will not cause any data loss.

    If have any any questions, or comments, feel free to ping me!

    <this is a work in progress>