Trigger in Managed Package in error state, can't be cleared

I've seen several discussions around this issue but no clear recommendation. Certainly not one that is well suited to ISV development.

Post from Chuck Liddell in the Official: Platform Events success group:

For a normal Platform Event trigger that moves into the Error State (stops listening) because of errors, you can edit the trigger code, save, and the trigger will subscribe again.

However, a subscriber to a managed package cannot edit the trigger code for a managed trigger. If a managed trigger moves into Error State, how can the package subscriber restore it?

Compiling all Apex, re-installing the package, and upgrading the package all do not seem to work, ...

And a more recent post related specifically to the Winter '20 release.

I've got multiple customers in a panic because their previously-stable Apex triggers on Platform Events are tripping into error mode repeatedly because of exceeding replay attempts. None of these triggers use the ReplayException mechanics.

The following was posted by Chuck on the GDS slack -

Any Apex trigger on a Platform Event will unsubscribe from that event (move into an error state) if it hits too many retries. I'm a little fuzzy on the exact behavior of this, as I've seen it happen several times in the last few weeks and not just on triggers that do something with EventBus.RetryableException. The error message you get is The Apex trigger YourTrigger on platform event YourEvent__e has exceeded the maximum number of retry attempts. The trigger is now in the error state and has stopped processing new events.

Normally (for non-ISVs) to resolve this you figure out why you had errors, fix them in your trigger code, and save it. Obviously, this is a little tricky for managed packages.

I've done some testing, and simply installing again, or upgrading to a later version, does not turn the trigger back on. Neither does compiling all Apex in the subscriber org. A full uninstall and reinstall will, of course, work but that's pretty much a non-option for most circumstances.

The only way to turn the trigger back on is to make any edit to your trigger, including whitespace, and include that change in your next packaged version. And finally, we arrive at my advice: I'm going to recommend that any managed package that has Apex triggers on Platform Events add a script to their deployment process to touch / add whitespace to these Triggers on every version release.

A little ham-handed, perhaps, but the benefit is that every update to a later version of your package will allow a customer to turn triggers back on if they have turned off for some reason.

It is probably worth revisiting creating a new package version that includes a modification to the trigger in question. By all accounts that should reactivate it. It's still not an ideal solution long term.


Salesforce Docs on Email Notifications for Triggers in Error State


FIXED! According to tech support...

In winter'20 release, there was additional functionality added for retrying the Platform event trigger without the EventBus.RetryableException implementation.

The change - In case of any Limit exceptions while Platform event trigger is executing then the platform trigger context will be retried for 10 times similar to EventBus.RetryableException implementation.

As the issue started due a feature addition in Platform event and was reverted back from the release.