So the other day I moved a web role containing WCF services over to an Azure website. Which seemed like a breeze, after deployment I called up the svc file in the browser and all seemed fine. However when I tested with an actual client of the service it received only 502 Bad Gateway responses.
Now there are lots of reasons 502 responses happen, especially in cloud environments where load balancers and what not sit between you and the site/service. However after some research a pattern started to emerge where infrastructure problems seemed unlikely to cause this problem, and a few seemingly random questions on stack overflow caused me to consider: might the problem be caused by my own code/configuration.
You see, a regular website or service should usually not respond with a 502 bad gateway, this is mostly something proxies and load balancers etc. do (as far as I know). In this case too, the error is returned by some intermediate device and not the webserver itself. This intermediate device does this because the website severed then TCP connection abruptly. For instance because the application pool for the website was shutdown unexpectedly. And in a .NET WCF service, what causes the application pool to shutdown unexpectedly is usually something that brings the .NET application domain down. Stuff like, OutOfMemoryException, StackOverflowException and the like.
If you don’t catch these kinds of exceptions yourself (and indeed you usually should not, but that is another discussion entirely) and they bring down the application domain, no logging is done whatsoever (not as far as I could find, and I’ve searched for it quite a while). So the best way to find out what is really going on is remote debugging the azure website. A good tutorial on that can be found here. Be sure to deploy a debug build of your website for easiest debugging.
So now you have that connected, hit that offending service with your client, and presto… you get a nice unhandled exception pop-up which will make you google some more find a solution for that problem and then you have rid yourself of that pesky 502 error. Except… in my case no unhandled exception popped up, I double checked my exception handling settings (twice) to make sure I had that set correctly. So this means… its not my code…
Back to the debugger, this time I turn off the ‘Just My Code’ feature in the debugger settings hit the service again and get presented with an actual unhandled exception. My particular problem was related to the one described in this Stack Overflow post.
I hope writing these steps down lets me (and maybe someone else) fix it considerably faster next time I hit this error. This was quite a long afternoon of headaches I’d love to get back.