We’ve been fixing our email notifications for the last few days. Problem was that php was timing out in the background for the groups with 1000+ followers. So the answer was to enable the crontab to process the emails. To prevent the pending notifications in the database to be sent out, we enabled all the notifications in the database. Unfortunately that wasn’t enough and looks like a lot of pending notifications that you were supposed to be receiving through the last year got sent out all at once.
We take complete responsibility for our mistake. We really have nothing to say other than an apology and doing our best to prevent such an incident next time.
Update: Nov 11th – 1:13 AM
Still working with the Rackspace engineers to fix an issue. We rebooted the server to prevent more emails to get sent out. Unfortunately an update in Cenos is having conflict with Rackspace systems and we can’t even login to bring the site back up. Stay tuned for more updates.
Update: Nov 11th – 3:30 AM
It is 2.5 hours later and we are still working with the Rackspace team to bring back our server. This has been one of the toughest nights we’ve had in the history of Anahita project. A notificationmageddon and a server reboot issue right afterwards. We are getting there though. Stay tuned for more updates.
Update: Nov 11th – 4:00 AM
We’re back online. Rackspace support team and engineers were amazing and very helpful. We really appreciate the time they put to debug the issue. Thankfully it is a long weekend which means we have work to do to fix the notification bug. The silverlining is that it’s better if it happens to us first instead of your projects.
This was one of the most challenging nights in the Anahita history. Knowing that, we’ll get over this too.
Eating your own dog food, can sometimes make you really sick!
Update: Nov 11th – 2:20 PM
Rogue notifications are stopped now. Thank goodness for the long weekend which allows us to focus on fixing this issue which was the result of a code logic bug and server configuration. We’re going to simulate and fix the issue on our development machines and post a report for you. That’s so you’ll be prepared if you had to deal with a similar scenario in the future for your projects.
In other words, we sometimes break things here so you don’t have to.
Update: Nov 11th – 11:50 PM
There was no bug in Anahita. It was our server configuration which lead to the rogue notification issue and an error in our migration script. All issues are solved now and server is performing as expected. Situation is rolled back to green now.