Azure ATP Backup
While on assignment at a client managing their email system, I was alerted by the helpdesk that SMTP domains that were on the “Allow” list were becoming trapped in the company spam filter.
Logging into the Microsoft ATP console, I was presented with a shock of a lifetime, expecting to see over 400 entries of allowed domains, there were a total of 3.
A look in the Office 365 Audit log verified that a specific operations team member deleted the allowed domains from the policy. Case closed right? Wrong.
I called the operations team member and asked what he did and why would he delete over 400 entries in the Allow list, he said he didn’t, and I believed him. To manually delete over 400 entries in the ATP console will take hours. After talking with the CIO and assuring him that I would get to the bottom of this, I had my work cut out for me at a 3:30 on a Friday afternoon.
Get to work!
Step 1: Recover the allow list, and get it back into ATP. Luckily I was able to export the allow list from the previous spam filter and combined with the helpdesk tickets that were being generated I was able to recover close to 80% of the lost list.
Step 2: How did this happen? After carefully listening to the operations tech, I retraced his steps with my account and was able to reproduce disappearance of the allow list which later resulted in a bug being filed with Microsoft. To reproduce the bug, the operator would open a policy, edit the allow list, and navigate to a different policy, edit the allow list and return the original list. This order of steps would wipe out the contents of the allow list, if the operator was inattentive and would click save, the elements in the allow array would become the new entries in the allow list for the policy.
Step 3: How to prevent this from happening again? Prevent operations from making any changes to the ATP policies, not a long term fix by any means.
Step 4: Develop a process for adding and removing domains from the allow list via a programmatic method. PowerShell to the rescue. I developed a PowerShell script that would run and utilize a ‘master’ domain list that would connect to Office 365 ATP and enumerate the entries in the policy for both adding and removing domains. Basically I would copy the list off allowed domains, into a temp file then utilize the PowerShell Compare-Object method, if the entry was prefixed with a ‘<’ then the domain was added, if the entry was a ‘>’ the domain was removed from the ATP policy.
Step 5: Problem solved, the operations team now had the ability to bypass the buggy ATP interface and carry on with their daily operations of managing the domain allow list and a master copy of the domain allow list along with verbose logging each time the script executed.
Step 6: Log a call with Microsoft and spend the next two weeks demonstrating this newly discovered bug with every level of support within Office 365 Development. Once Microsoft confirmed this a bug, I was happy to inform the CIO and in turn provide some credibility back to the operations team.
The learning experience here is even a company as large as Microsoft can lose your configurations, always keep a backup!
-Joseph Noga, CTO, of Komodo Cloud