Performance Disruption of Phrase Portal (EU) between January 15, 2026, 04:15 PM UTC and 06:03 PM UTC

Incident Report for Phrase

Postmortem

Introduction

We would like to share details about an incident that affected the availability of the RawMT UI portal between 4:15 PM and 6:03 PM UTC on January 15, 2026. During this time, users were unable to access the portal. The disruption was caused by a capacity limit being exceeded in our routing infrastructure. This post-mortem outlines what occurred and the steps we are taking to prevent similar issues in the future.

Timeline

Jan 15, 2026 @ 4:15 PM UTC – A high-severity alert was triggered as the RawMT UI portal became inaccessible to users.

Jan 15, 2026 @ 4:20–5:00 PM UTC – Engineering teams identified that the issue was related to the routing layer reaching a fixed platform limit, preventing service traffic from being properly directed.

Jan 15, 2026 @ 5:03 PM UTC – A mitigation was applied, re-routing affected services through alternate infrastructure.

Jan 15, 2026 @ 6:03 PM UTC – Service was confirmed fully restored and the incident was marked as stable.

Root Cause

The issue occurred when the platform’s traffic routing infrastructure reached a fixed capacity limit on the number of active service associations it could manage. Once this limit was exceeded, configuration updates failed, preventing traffic from being correctly routed to certain services.

Although application services remained operational, they became unreachable to users because there were no valid routing paths available. This resulted in a complete outage of the RawMT UI portal.

Actions to Prevent Recurrence

  1. Improved Alerting
    Alert thresholds for routing capacity have been adjusted to ensure earlier visibility and to elevate urgency before reaching critical limits.
  2. Service Distribution Across Infrastructure
    Services have been actively rebalanced across multiple routing layers to reduce pressure on any single entry point.
  3. Planned Architecture Improvements
    Work is underway to evaluate longer-term changes to the routing architecture to better support scale and reduce the risk of reaching fixed limits in the future.
Posted Jan 23, 2026 - 17:19 CET

Resolved

This incident has been resolved.
Posted Jan 15, 2026 - 19:50 CET

Update

We are continuing to monitor for any further issues.
Posted Jan 15, 2026 - 19:22 CET

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Jan 15, 2026 - 19:15 CET

Update

A fix has been deployed and now Phrase Portal is accessible. We continue to monitor performance.
Posted Jan 15, 2026 - 19:15 CET

Identified

Engineers identified the cause of the issue and will deploy a fix.
Posted Jan 15, 2026 - 19:04 CET

Investigating

Phrase Portal is currently not accessible. Our engineering team is investigating the issue.
Posted Jan 15, 2026 - 18:38 CET
This incident affected: Phrase Portal (EU).