Performance Disruption of Phrase TMS (EU) Project Management component between January 27, 2026, 10 AM UTC and 5:38 PM UTC

Incident Report for Phrase

Postmortem

Introduction

We would like to share more details about the events that occurred on January 27, 2026, between approximately 10:00 AM UTC and 6:30 PM CET, which led to a performance disruption of the Phrase TMS (EU) Project Management component.

During this time, some users experienced slow or unresponsive behavior when creating or editing projects. The issue was caused by limitations in the underlying database connections used by the LQA service, which is internally used by the Project Management component.

Timeline (UTC)

Jan 27, 2026 @ ~10:00 AM
First customer reports indicate slow or unresponsive project creation and editing in the EU region.

10:54 AM
Automated alert triggered due to a high number of active backend sessions in the EU production environment.

11:00–11:30 AM
Engineering investigation begins. Logs indicate that the LQA service is unable to obtain database connections, with repeated timeouts when requesting connections from the application’s connection pool.

11:40 PM
Application logs show “Too many connections” errors from the underlying database. Some service instances restart as a result.

12:00–3:00 PM
Initial mitigation steps taken:

  • Increase of database connection limits.
  • Increase of application-side connection pool limits.
  • Additional monitoring and logging enabled.

The issue improves but intermittent errors persist.

3:30 PM
Decision made to scale the underlying production database instance to a larger size and adjust connection limits accordingly.

~5:30 PM
Database scaling completed. Error rates drop and no further “Too many connections” errors are observed. System performance stabilizes.
Incident marked as resolved after continued monitoring confirmed stable behavior.

Root Cause

The disruption was caused by exhaustion of available database connections in the LQA service’s underlying production database.

The LQA service uses a connection pool to communicate with the database. Under increased load, the configured connection limits on both the application side and the database side were insufficient. As more requests were processed, all available database connections were consumed. Once the limit was reached:

  • New requests could not obtain a database connection.
  • Requests timed out after waiting for a free connection.
  • Some service instances restarted due to repeated failures.
  • Project creation and editing operations became slow or temporarily unresponsive.

Although the database itself was operational, the maximum number of allowed concurrent connections was too low for the actual usage patterns in production. Additionally, the size of the database instance limited how many connections could be supported safely.

This combination led to a bottleneck in the LQA service, which in turn affected the Project Management component in the EU region.

Actions to Prevent Recurrence

To reduce the likelihood of similar incidents in the future, we are implementing the following measures:

  1. Database Capacity Increase
    The production database instance for the affected service has been scaled to a larger size to support higher load and more concurrent connections.
  2. Improved Monitoring and Visibility
    Additional database performance monitoring has been enabled to provide better insight into:
* Active connections
* Slow queries
* Resource utilization
  1. Resilience Improvements in Project Management
    We have initiated follow-up work to improve system resilience so that if the LQA service becomes slow or temporarily unavailable, it does not fully block project creation or editing operations.
Posted Feb 23, 2026 - 08:10 CET

Resolved

The issue has been fixed. Project management is stable.
Posted Jan 27, 2026 - 21:11 CET

Monitoring

A fix has been implemented. Project management is stable and we are monitoring performance.
Posted Jan 27, 2026 - 18:57 CET

Update

A fix is still under active development by our engineering team. Project Management component is more stable now.
Posted Jan 27, 2026 - 15:34 CET

Update

A fix is still under active development by our engineering team.
Posted Jan 27, 2026 - 14:57 CET

Identified

Our engineering team has identified the root cause as an LQA-related issue and is actively working to resolve it. The project creation and configuration pages may be unavailable during this time.
Posted Jan 27, 2026 - 13:54 CET
This incident affected: Phrase TMS (EU) (Project management).