What happened?

At 4:17pm WIB, we picked up an unusual spike in activities with our messaging queue system. Our team started investigating the issue and worked with our service provider for resolution.
Based on the investigation, our capacity messaging queue system hit the threshold for the number of connections to a single host.
At 6:20pm we re-triggered redeployments to close current connections. 4 minutes after the deployment, temporary memory usage spiked due to the way the messaging system was designed by the vendor to handle the closing of connections.
This spike caused one of our instances to be degraded and resulted in bad gateway errors and delayed callbacks.
At 6:40pm, we re-triggered redeployments to recover our systems.
At 7:23pm, our systems fully recovered and all delayed callbacks were sent.

What measures have we taken to prevent this issue in the future?

Action items we are taking to prevent issues from happening again in the future:

Improving our monitoring systems: We will lower the thresholds for identifying anomalous messaging connections so that our on-call teams are alerted earlier to rectify spikes.
Improving our development processes: We are adding reconnection logic on queue clients to reestablish the connections to healthy instance nodes, to mitigate errors when any single instance goes down.

We understand that this was an upsetting situation for you as well as your customers, and know that you are counting on our reliability for the smooth operation of your business. We are truly sorry and apologize for the negative impact of this incident on your customers and business. We are committed to learn from this event and to improve our services even further to serve you better.

If you require any assistance or have further questions, please contact us at help@xendit.co or through live chat at https://www.xendit.co/.

We strive to continue improving our services every day and do our best to prevent a repeat of this incident. Thank you for your trust in using Xendit to power your business.

Posted Jul 13, 2020 - 12:06 WIB

Resolved

Dear Customer,

We would like to inform you that all issues with Xendit payments and Dashboard have now recovered.

Our team will be running a post-mortem to ensure this issue does not occur in the future.

We apologize for any inconveniences caused. And we thank you for your patience.

Let us know if you are still experiencing issues.
Regards,

Posted Jul 09, 2020 - 19:21 WIB

Identified

We would like to give an update regarding the issues below:

You might receive errors when the Xendit dashboard accessing homepage
You might be unable to activate payment methods

Other Xendit products and payment callbacks have now recovered, and we are monitoring the health to ensure this issue is resolved.

Posted Jul 09, 2020 - 19:12 WIB

Investigating

Dear Customers,
We would like to inform you regarding an issue that might be affecting your transactions and might be delaying callbacks.

Xendit dashboard is also experiencing errors intermittently.

We are currently investigating the issue with our team. And we will update this as soon as possible.

We apologize for any inconveniences that might affect your experience.

Regards
Xendit

Posted Jul 09, 2020 - 18:55 WIB

This incident affected: Dashboard (Dashboard), API (Cards, eWallets, Cardless Credit, Virtual Accounts, Retail Outlet, Subscriptions, Payouts, xenDisburse, xenPlatform Accounts), and Callback (eWallets, Virtual Accounts, Retail Outlets, Invoices, Disbursements).