Intermittent issues with Sharepoint/OneDrive/MS Teams
10:00 AM Update (12/9/19): Microsoft continues to monitor performance. While it appears that the services have stabilized, they are not yet ready to declare victory. The next update will be sent either when there is a significant status change or when Microsoft confirms that the issues are resolved.
8:20 AM Update (12/7/19): Microsoft has rolled back a recent configuration change within the SharePoint Online caching infrastructure that they believe to have been causing increased failure rates and latency for customers accessing SharePoint, OneDrive and MS Teams.
10:00 AM Update (12/6/19): Microsoft has determined that the issues started on Sunday evening (12/1/19), rather than Tuesday as first reported. They have also broadened the scope of the issue. Below is the latest update from Microsoft.
User Impact: Users may see latency or intermittent page load failures for SharePoint Online sites and accessing OneDrive files.
More info: Users will primarily see impact to SharePoint Online and OneDrive for Business. We’ve confirmed that some users will also see additional impact to PowerApps, PowerAutomate, OneDrive Sync, Word, Excel, OneNote, and background applications. During periods of impact, users may experience latency when accessing OneDrive files or receive a “503 Service Unavailable” or “Server busy” message from SharePoint. While our throttles are in place, users may experience problems when performing migrations, OneDrive for Business syncs, OneNote syncs and large file uploads. As part of our on-going investigation to identify the cause of the issue, we’ve confirmed that impact began earlier than what we originally identified, and have updated the impact start time accordingly.
Current status: We’re continuing to work on multiple lines of investigation and mitigation. Some testing of the mitigation options have yielded positive results. We’re validating the results of these mitigation tests to confirm if they can be applied to the rest of the affected infrastructure. During this time, some users may intermittently experience impact. However, we are continuing to monitor traffic across the EMEA region to ensure optimum service availability and to take appropriate mitigation actions as needed.
3:45 PM Update (12/5/19): While Microsoft hasn’t identified a definitive root cause at this time, they are focusing on a recent code change that may be causing the impact.
9:00 AM Update (12/5/19): Microsoft has not yet been determined root cause. Based on the small number of issues the Harvard community has reported to date, however, they seem to be doing a reasonably good job optimizing service load through targeted mitigations to ensure service availability while they continue to investigate.
HUIT has been communicating with our Account Executive, expressed concerns, and have asked for an ETA on resolution.
8:30 PM Update (12/4/19): Microsoft continues to monitor the situation, and they are seeing indications that their throttling actions have been successful in stabilizing the service and they continue to investigate into the cause of the problem.
2:00 PM Update (12/4/19): We have no additional updates. Microsoft continues to investigate the issue, monitoring impacts and remediating as necessary.
10:00 AM Update (12/4/19): Microsoft continues to investigate root cause while in parallel monitoring performance and applying throttling as needed. The next update will be sent by 2 PM.
5:20 AM Update (12/4/19): At 5:00 AM this morning, Microsoft reported their monitoring tools once again indicate that users may experience slowness and timeouts accessing SharePoint sites, OneDrive content and MS Teams Files.
They are restarting the infrastructure components they believe to be impacted and applying targeted throttles to optimize service performance as they continue work to isolate the source of the underlying issue.
7:00 PM Update (12/3/19): Microsoft has identified the root cause and shared the following details to date: “An influx in traffic through the affected infrastructure that routes requests to the user’s SQL content databases has resulted in delays and intermittent access issues.”. They continue to monitor infrastructure health and report that the dynamic throttling implemented earlier today is alleviating impact as they continue to develop a long-term fix for the problem.
4:00 PM Update (12/3/19):
Microsoft continues to work on formulating a long-term fix for the problem. In the interim, they believe the configuration changes they implemented earlier today will help mitigate the issues in the short term.
We are experiencing intermittent issues accessing SharePoint sites, OneDrive for Business Content and MS Teams file repositories.
According to Microsoft, the issues began at approximately 11:30 AM this morning. As of the time of this email, however, only a handful of users in our environment have reported any impact.
Start time: December 3, 2019 11:33 AM
Status: Service degradation
User impact: Users may intermittently be unable to access OneDrive for Business content.
Title: Can’t access content
User Impact: Users may intermittently be unable to access OneDrive for Business content. Current status: We’re applying a dynamic throttle that is expected to throttle non-critical tasks and systemically lower utilization. We’re also implementing a configuration change that will allow the affected infrastructure to process additional service load while keeping resources balanced. Scope of impact: Your organization is affected by this event, and all users attempting to access OneDrive for Business content may be intermittently impacted.