kagi - Notice history

100% - uptime

[Search] us-east4 - Operational

100% - uptime
Apr 2025 · 100.0%May · 100.0%Jun · 99.57%
Apr 2025
May 2025
Jun 2025

[Search] us-west2 - Operational

100% - uptime
Apr 2025 · 100.0%May · 100.0%Jun · 99.56%
Apr 2025
May 2025
Jun 2025

[Search] europe-west2 - Operational

100% - uptime
Apr 2025 · 100.0%May · 100.0%Jun · 99.57%
Apr 2025
May 2025
Jun 2025

[Search] asia-east2 - Operational

100% - uptime
Apr 2025 · 100.0%May · 100.0%Jun · 99.57%
Apr 2025
May 2025
Jun 2025

[Search] us-central1 - Operational

100% - uptime
Apr 2025 · 100.0%May · 100.0%Jun · 99.53%
Apr 2025
May 2025
Jun 2025

[Search] europe-west4 - Operational

100% - uptime
Apr 2025 · 100.0%May · 100.0%Jun · 99.56%
Apr 2025
May 2025
Jun 2025

[Search] asia-southeast1 - Operational

100% - uptime
Apr 2025 · 100.0%May · 100.0%Jun · 99.49%
Apr 2025
May 2025
Jun 2025

[Search] australia-southeast1 - Operational

100% - uptime
Apr 2025 · 100.0%May · 100.0%Jun · 99.53%
Apr 2025
May 2025
Jun 2025

[Search] southamerica-east1 - Operational

100% - uptime
Apr 2025 · 100.0%May · 100.0%Jun · 99.57%
Apr 2025
May 2025
Jun 2025
100% - uptime

[Assistant] us-east4 - Operational

100% - uptime
Apr 2025 · 100.0%May · 100.0%Jun · 99.61%
Apr 2025
May 2025
Jun 2025

[Assistant] us-west2 - Operational

100% - uptime
Apr 2025 · 100.0%May · 100.0%Jun · 99.73%
Apr 2025
May 2025
Jun 2025

[Assistant] europe-west2 - Operational

100% - uptime
Apr 2025 · 100.0%May · 99.65%Jun · 99.52%
Apr 2025
May 2025
Jun 2025

[Assistant] asia-east2 - Operational

100% - uptime
Apr 2025 · 100.0%May · 100.0%Jun · 99.91%
Apr 2025
May 2025
Jun 2025

[Assistant] us-central1 - Operational

100% - uptime
Apr 2025 · 100.0%May · 100.0%Jun · 99.71%
Apr 2025
May 2025
Jun 2025

[Assistant] europe-west4 - Operational

100% - uptime
Apr 2025 · 100.0%May · 100.0%Jun · 99.71%
Apr 2025
May 2025
Jun 2025

[Assistant] asia-southeast1 - Operational

100% - uptime
Apr 2025 · 100.0%May · 99.84%Jun · 99.47%
Apr 2025
May 2025
Jun 2025

[Assistant] australia-southeast1 - Operational

100% - uptime
Apr 2025 · 100.0%May · 100.0%Jun · 99.81%
Apr 2025
May 2025
Jun 2025

[Assistant] southamerica-east1 - Operational

100% - uptime
Apr 2025 · 100.0%May · 100.0%Jun · 99.85%
Apr 2025
May 2025
Jun 2025
100% - uptime

[Translate] us-central1 - Operational

100% - uptime
Apr 2025 · 100.0%May · 99.85%Jun · 99.72%
Apr 2025
May 2025
Jun 2025

[Translate] europe-west4 - Operational

100% - uptime
Apr 2025 · 99.91%May · 99.41%Jun · 99.69%
Apr 2025
May 2025
Jun 2025

[Translate] asia-southeast1 - Operational

100% - uptime
Apr 2025 · 100.0%May · 99.83%Jun · 99.63%
Apr 2025
May 2025
Jun 2025
99% - uptime

https://orionfeedback.org - Operational

99% - uptime
Apr 2025 · 100.0%May · 100.0%Jun · 95.77%
Apr 2025
May 2025
Jun 2025

https://kagifeedback.org - Operational

99% - uptime
Apr 2025 · 100.0%May · 100.0%Jun · 95.69%
Apr 2025
May 2025
Jun 2025

https://kagi.com/smallweb - Operational

100% - uptime
Apr 2025 · 100.0%May · 100.0%Jun · 99.91%
Apr 2025
May 2025
Jun 2025

https://kagi.com/fastgpt - Operational

100% - uptime
Apr 2025 · 100.0%May · 99.89%Jun · 99.94%
Apr 2025
May 2025
Jun 2025

https://kagi.com/summarizer - Operational

100% - uptime
Apr 2025 · 99.93%May · 99.96%Jun · 99.54%
Apr 2025
May 2025
Jun 2025

Notice history

Sep 2025

Features release for Kagifeedback & Orionfeedback
  • Completed
    September 10, 2025 at 9:00 AM
    Completed
    September 10, 2025 at 9:00 AM
    Maintenance has completed successfully
  • In progress
    September 10, 2025 at 8:00 AM
    In progress
    September 10, 2025 at 8:00 AM
    Maintenance is now in progress
  • Planned
    September 10, 2025 at 8:00 AM
    Planned
    September 10, 2025 at 8:00 AM

    We're planning to release new features for Kagifeedback.org and will be performing scheduled maintenance. During this maintenance window, https://kagifeedback.org and https://orionfeedback.org will be temporarily unavailable, with expected downtime of up to 1 hour.

https://kagi.com/summarizer outage
  • Resolved
    Resolved

    https://kagi.com/summarizer is now operational! This update was created by an automated monitoring service.

  • Investigating
    Investigating

    https://kagi.com/summarizer cannot be accessed at the moment. This incident was created by an automated monitoring service.

Aug 2025

Major outage across all regions
  • Postmortem
    Postmortem

    Hi,

    I'm Daniel, the SRE (Site Reliability Engineer) at Kagi. Working with me on this incident was Zac and Seth, both Senior Engineers.


    Summary

    Search, Assistant, and the API were completely offline and returned 500 errors, or “stream timeout” errors to everyone who tried to access them. This was due to a migration in the database holding a transaction lock for the duration of the outage. Once the transaction was aborted, every affected service completely recovered.


    Timeline

    3:09 AM UTC - The process to deploy new code to production was started

    3:09 AM UTC - The DB migration was started in a transaction

    3:13 AM UTC - The us-east assistant alert fired. This was followed shortly by alerts for every region and included search and the api.

    3:15 AM UTC - Investigation is started

    3:18 AM UTC - Engineers involved jump into a shared channel to diagnose the issue

    3:18 AM UTC - First user reports of Kagi.com being down (Thanks to Pasithea (Pas) from Discord!)

    3:24 AM UTC - Transaction locking is identified as a probable cause of failed requests

    3:30 AM UTC - Transaction is stopped by an automated process and Kagi.com recovers

    3:32 AM UTC - The hung migration is identified as the source of the transaction lock


    Root Cause

    Why was Kagi.com offline?

    Any request that needed to fetch information from the database was timing out, causing search, assistant, and the API to be offline.


    Here are the 5 whys:

    Q: Why was the information unable to be fetched from the database?

    A: The connection pool was full as it had been slowly filling up with connections waiting on a specific transaction to finish. No new connections were available to handle user information queries as we have a max number of connections per vm that we allow to the database

    Q: Why were the connections being blocked by a transaction?

    A: There was a migration running that had locked an entire table in a long running transaction. Any query made to this table had to wait for the transaction to finish before being able to fetch data.

    Q: Why was the migration taking so long?

    A: The migration was building an index to allow users to do a full text search on their assistant interactions. This involved reading the messages from the disk, indexing them, then writing the index to the disk

    Q: Why did we not expect it to take long?

    A: We followed the recommended and documented way to set this index up, and when the migration ran in our staging environment, it finished without timing out or blocking connections

    Q: Why was running it in staging different from production?

    A: Our staging database is not really a good representation of production, as it has much less data stored in it.


    Resolution and Recovery

    Once the migration was stopped, the blocked connections in the connection pool were able to finish. This freed up the connections to handle other things like serving searches, the API, and the Assistant.

    After verifying that everything was working as expected and nothing else needed to be done, we flipped a switch in our CI/CD system to prevent any future production deployments from happening until the migration was removed.

    Once we have verified that the bad migration has been removed, we will turn the switch back off, and allow production deployments to continue.


    Corrective and Preventative Measures

    The root cause of the issue was that our staging environment has a lot less data stored in it than production. This difference does not allow our staging environment to surface certain kinds of issues before they get sent to production.

    We will be filling our staging database up to make it comparable to production so that future migrations and changes in our SQL queries will be done against a large amount of data.

    As the transaction was already cancelled, we did not need to make any changes in production to delete a partially changed table.

    Thanks for being patient with us during this outage. As always we are blown away by the positive community response when we have these kinds of issues.

    Thank you!

    Daniel

  • Resolved
    Resolved

    We know the cause of the issue, and the issue is now resolved.

    We will do a post-mortem of the incident sometime soon.

  • Monitoring
    Monitoring

    The service is coming back, but we are still monitoring.

  • Investigating
    Investigating
    We are currently investigating this incident.

Jul 2025

Jul 2025 to Sep 2025

Next