The modules that are integrated with the Alert Management module (including THM, EQM, etc) as well as DOC currently have limited controls to prevent large numbers of text / email notifications from being sent, risking PwC platform performance, stability (reputation scores in Twilio/Sendgrid), and cost, as well as client user experience:
On AM:
Client roles have access to set up alerts without PwC / Partner Super Admin supervision/control
Alert rules allow setting up alerts that might not be practical (eg. 0 degree alert threshold for temp sensor that is placed in a location that will always be above 0 degrees)
Alerts are retriggered every time a message comes through if the device / alert criteria are met, regardless of whether there's a prior active alert for that same device/alert rule
Devices may send messages very frequently (sometimes several an hour or more), that could trigger alerts
There aren't limits to the number of devices / users that can be included in Alert Groups / On-Call groups, or "uniqueness" enforced for emails / phone numbers
TBD Can alerts/alert groups/on call groups be duplicated? (i.e. same device is part of multiple alert groups that notify multiple times for the same device message)
TBD: If an email / phone number is invalid, no longer active/able to receive messages, or blocks our number, we still send the notification and assume the cost (client doesn't know this is happening)
DOC:
Same as AM, though there are fewer "criteria" clients are allowed to adjust for alert rules (open, close, tampered)
RRB:
RRB doesn't alert On Call users if there is a prior active notification from the same device / location in sequence (different locations for same device should "reset" this). However RRB doesn't have any limit to On-Call users or total text / email usage like the other apps, nor does it enforce unique emails/phone numbers in the on-call list (creating duplicate message risks)
Other considerations:
Sendgrid emails and Twilio texts are about $.01 per message. (Need to verify this and any message limits)
The RRB functionality where an active alert from the same device/location in sequence doesn't trigger a new alert may not be the best solution. Alerts that go unacknowledged for days/weeks/longer could "block" new alerts from triggering notifications. A "cooldown" or other type of alerting criteria may be a better way to enhance this
The alerting functionality may want to consider situations where a situation is getting worse (eg. a water leak that is escalating, a temperature that is rising above more critical / severe thresholds, etc). These may warrant a new alert notification that would override other "cooldown" criteria (or maybe we allow different alerts for the same device groups for this reason)
Is there a "Max" notification volume we want to enforce? If so, is this a function of the number of devices? Other criteria?
We don't pass on alert notification costs, show clients the potential cost of an alert rule (azure alerts are a great example of showing this), show clients if there's a problem with an email/phone number, etc. Do we want to attempt to do this?
Suggest evaluating short term / medium term fixes:
Short term - can we set up alerts in Twilio/Sendgrid of abnormal usage so PwC can respond to a situation?
Medium term - are ther practical changes that can be made to the above to prevent the issues above, while still providing a great customer experience? Can these be made across the modules (including DOC / RRB)?