Pages

2011/08/08

Replication Agent Failure

I recently spent several days investigating a replication failure. Finding the answer specific to my problem was difficult, so I thought I would post my findings and hope someone else may find it useful.

The day after installing a few security patches, replication to the subscriber database began to fail. The Distribution Agent had the following error message:

Executed as user: <UserAccount>. Replication-Replication Distribution Subsystem: agent <AgentName> failed. Executed as user: <UserAccount>. A required privilege is not held by the client. The step failed. (Error 14151). The step failed.

This message is usually caused by changing the SQL Server service using the Windows Service Control manager, which cannot grant the required permissions to start the service appropriately or to run SQL Agent jobs. SQL Server Configuration Manager should be used instead and the correct way to fix it is to set the service account to the Local System account, then back to the domain account using the SQL Server Configuration Manager. (http://support.microsoft.com/kb/911305/)

Since I did not change the service account recently, I thought this could not be the correct answer to the problem. But since I did get the same error message, I thought it would not hurt to restart the service with SQL Server Configuration Manger anyway. Of course, this did not work, but error message does mean it was security related.

Then I changed all the replication agents to have the maximum privileges possible. This did work either. This led me to think that this was not related to the articles themselves and maybe not even to Replication or SQL Server. I found the following error message in Event Viewer:

Log Name: System
Source: Security-Kerberos
Event ID: 4

The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server <servername>$. The target name used was <servername>$. This indicates that the target server failed to decrypt the ticket provided by the client. This can occur when the target server principal name (SPN) is registered on an account other than the account the target service is using. Please ensure that the target SPN is registered on, and only registered on, the account used by the server. This error can also happen when the target service is using a different password for the target service account than what the Kerberos Key Distribution Center (KDC) has for the target service account. Please ensure that the service on the server and the KDC are both updated to use the current password. If the server name is not fully qualified, and the target domain (<domainname.com>) is different from the client domain (<domainname.com>), check if there are identically named server accounts in these two domains, or use the fully-qualified name to identify the server.

This error message occurs when two or more computer accounts have the same SPN registered.
(http://support.microsoft.com/kb/321044)

To fix this we deleted the computer account entries in AD, then disjoined and rejoined the server from the domain. Except for a few expired articles, all replication articles synchronized on their own without intervention. I reinitialized the expired articles.

This fixed our problem so far, but if anyone has any experience with this your feedback is welcomed.

 

Regards,

Jon