Friday, May 2, 2025

Understanding Relative URLs in .NET with HttpClient

 I've been doing this a long time. More than 20 years as a web developer in various roles. Nearly all of it in the .NET Framework. And today I realized that I am still ignorant of some of the basics of how the internet is supposed to work. 

This week I ran into a nasty problem where one service was attempting to call another web service, but instead of getting a good result, I would get a 500 response from the called service. The strange thing was that the targeted service worked fine when called by Postman or curl. Even stranger, the targeted service did not log anything for the failed response. It's as if the call never happened.

I got deep into the bowels of our service libraries trying to figure out why the first service couldn't reach the second service. I dug into our middleware layers looking for problems with exception handling, with logging, and came up empty. No matter what I tried, my calling service always got a 500 response back. It made no sense at all.

Another oddity was that I could get the service call to succeed in my local sandbox, but it wouldn't work deployed to an external environment. That is, it would work for things like "localhost:80/api/something" but not "http://test.mydomain.com/api/something". 

For .NET, you create an request using the HttpClient that specifies the Base Url and the HttpRequest specifies the Relative Url of the endpoint you're trying to reach. HttpClient uses the HttpRequest object to build the final URL being targeted. Like most web developers, I wrongly assumed that this was some formal way of joining the strings together.

After a lot of googling, I discovered my error. HttpClient follows the spec for Relative Uniform Resource Locators. RFC1801 Section 4  This means that the rules for joining a base URL with a relative URL are not as simple as joining two strings. As Walter Sobchak famously opined, "Mark it zero...there are rules!" This means you need to understand how relative paths are applied when creating a request.

The gist of it is this: your base URL needs to have a trailing slash if you expect the relative URL to be appended to the end without any additional changes. If you don't include a trailing slash, the last portion of the base URL gets treated like a resource on the relative path, and gets replaced by the relative URL.

base: http://a/b/c

relative: x/y

result: http://a/b/x/y

base: http://a/b/c/

relative: x/y

result: http://a/b/c/x/y

In my case my base URL was "http://test.mydomain.com/target-service" and my relative URL was "api/resource". What actually got called was "http://test.mydomain.com/api/resource". The result is that instead of the call going to the web service I expected, it was going to the default service, in this case it was the web UI service. The Web UI was expecting headers for authentication and such which the request did not have, so it threw a 500 error (it probably should have been a 401 which added a layer to this onion). This explains why I didn't get a 404, and I didn't see errors on the targeted service logs. I did indeed find error log entries for the failed call to the web UI service with the bad URL. (Bonus: the reason why it worked for localhost:80 was because HttpClient automatically appends the trailing slash for you when the base URL ends with a port number!)

There are quite a few Stack Overflow questions about this, with numerous comments complaining about this implementation, some people calling it a bug or wrong. Sorry, but Microsoft got it right, and sadly other than the spec, I could not google any tutorials that explain this is supposed to work. It's something you're bound to learn the hard way, I guess.