Question for Sys admins about trailing slash
With that said, how much of a difference in load times is it going to make if you use example.com/blah versus example.com/blah/? The server does have to do an internal redirect of some sort without the slash, correct? Or no? There are a lot of blogs saying it slows down the load times, but I want to get to the bottom of this for sure.
Thanks you gals/guys!
15 Replies
@sweh:
Depending on how you configured your web server it may be an external redirect (historically that's what it was) so without a trailing / the client needs to make 2 queries to get the required page.
Could you explain the basics of how it would be an external or internal redirect? And how you could make a non-trailing slash only require one query instead?
I realize everyone's time on this forum in particular is valuable; thank you for the help.
I generally redirect, say,
In my humble opinion, there should be one and only one canonical URL for any public-facing resource, and any alternate forms of that URL should 301 to the canonical form. Also, that canonical form – unless it is actually dynamically-generated based on user input (e.g. a search) or isn't seen by humans -- should be a good-looking URL that doesn't have server-specificity (an individual server's hostname), file extensions, ? query stuff, ugly hashes, or upper-case letters. In other words, URLs like
http://webserver43.cluster.example.com/WankApp/index.php?disContent=d4017948438805dd3371c02706bcc36864a836c920c99100262a47502ac41859923c0c343f86412b0cdb5eb9115bc68708f6d2055ffd19d591edaa1d309bb064
ought to be a criminal offense. The resources required to redirect URLs that do|don't end with / to the "proper" form are comparatively minor. Just choose one way to do things and make it so.
There are many blogs that suggest that not using a slash on the end increases the workload of your server internally, there by slowing load times. I see what you both mean by external now.
Thanks again.
@Akuta:
@sweh:Depending on how you configured your web server it may be an external redirect (historically that's what it was) so without a trailing / the client needs to make 2 queries to get the required page.
Could you explain the basics of how it would be an external or internal redirect? And how you could make a non-trailing slash only require one query instead?
I realize everyone's time on this forum in particular is valuable; thank you for the help.
Here's an example of a generic apache install with no tuning or optimisation. The "tmp" resource is really a directory. So:
% telnet localhost 80
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET /tmp HTTP/1.0
Host: localhost
HTTP/1.1 301 Moved Permanently
Date: Sat, 14 Jul 2012 12:27:17 GMT
Server: Apache/2.2.3 (CentOS)
Location: http://localhost/tmp/
Content-Length: 303
Connection: close
Content-Type: text/html; charset=iso-8859-1
We can see the server is telling the client to request a different page (this time with the trailing /). So now the client has to make a second query of the webserver so that's two sets of http requests and two sets of processing.
It's recommended to always put a / at the end of URLs that refer to directories.
@hoopycat:
There is absolutely no HTTP-level difference between
http://example.com/blah andhttp://example.com/blah/ ,
There is a different; they are two different URIs and can be processed differently. In apache if you use mod_rewrite (or other modules that use the requested URI) this difference can catch you out if you're not careful. The default (and common, and expected) behaviour on many servers is that if the target of the URI is a directory and the URI doesn't end in a / then send a 301 redirect telling the client to access this as a directory.
> In my humble opinion, there should be one and only one canonical URL for any public-facing resource
Typically, on web servers, if you request a directory then it will search for some form of index file (eg index.html) in the directory to display; so "/tmp/" and "/tmp/index.html" might be two URLs for the same resource.
All of this has happened since 1995, when I started playing with web servers
@Akuta:
There are many blogs that suggest that not using a slash on the end increases the workload of your server internally, there by slowing load times. I see what you both mean by external now.
They are correct; missing the trailing / will cause additional unnecessary traffic and work.
@sweh:
@hoopycat:There is absolutely no HTTP-level difference between
http://example.com/blah andhttp://example.com/blah/ ,
There is a different; they are two different URIs and can be processed differently. In apache if you use mod_rewrite (or other modules that use the requested URI) this difference can catch you out if you're not careful. The default (and common, and expected) behaviour on many servers is that if the target of the URI is a directory and the URI doesn't end in a / then send a 301 redirect telling the client to access this as a directory.
> In my humble opinion, there should be one and only one canonical URL for any public-facing resource
Typically, on web servers, if you request a directory then it will search for some form of index file (eg index.html) in the directory to display; so "/tmp/" and "/tmp/index.html" might be two URLs for the same resource.All of this has happened since 1995, when I started playing with web servers
:-)
I understand what you're saying about /tmp/ if it's a real directory, but if we're talking about a virtual directory setup as /virtual-directory, does the server still internally have to 301 to /virtual-directory/ to check for an "index" file, even though there's no difference client-side? (I hope that made sense, hah)
@sweh:
There is a different; they are two different URIs and can be processed differently.
You are indeed correct… I meant to say that there's no difference, at the specification level, with regards how they are processed by a server. They are two entirely different URLs.
Also, you're right about how stuff is handled when serving out static files, etc. It's been awhile since I've just had a pile of HTML files to serve out, and I forget that using the filesystem as a database is a perfectly acceptable choice.
@Akuta:
if we're talking about a virtual directory setup as /virtual-directory, does the server still internally have to 301 to /virtual-directory/ to check for an "index" file, even though there's no difference client-side? (I hope that made sense, hah)
I don't know what you mean by "virtual directory".
If you're implementing a URI using mod_rewrite (or similar) then these rewrites will occur before the filesystem is accessed. So if your rewrite is for /virtual-directory/(.*) then the / needs the be in the URL to match the rule and no / means no match and 'cos there's no real directory then you won't get a redirect and you won't get a match; you'll likely get a 404 "page not found" error.
If you're using the Apache "Alias" command (eg Alias /icons/ "/var/www/icons/") then you see a similar thing; the / is in the alias and so it needs to be in the request; no trailing / means no match and no page returned (404).
If you do "Alias /icons /var/www/icons" then /icons will be 301 redirected to /icons/ and /iconsxxx will likely fail.
If you're implementing a URI via some rewrite rule (eg /xyz -> myphpmodule) then it will depend on how the module implements its pattern matching.
So it's all dependent on how you implement "Virtual directory".
Why don't you just test? Request a URL (as in my example) without the / and see if you get a 301 response?
@hoopycat:
Also, you're right about how stuff is handled when serving out static files, etc. It's been awhile since I've just had a pile of HTML files to serve out, and I forget that using the filesystem as a database is a perfectly acceptable choice.
:-)
I'm an old school guy; if it's not a text file on the filesystem then it's too complicated a solution:-)
@Akuta:
…please god don't talk about SEO.
As of yet I haven't seen Him directly reply on the forums, though I certainly found it interesting that you would pray in a forum posting.
James
@Akuta:
I understand what you're saying about /tmp/ if it's a real directory, but if we're talking about a virtual directory setup as /virtual-directory, does the server still internally have to 301 to /virtual-directory/ to check for an "index" file, even though there's no difference client-side? (I hope that made sense, hah)
Since most of this thread has referred to static files, if by "virtual directory" you mean a location being handled purely in code as a representation of a resource on your site, then no, there's no requirement for a redirect, as you can return whatever data you want for whatever URI you receive. If you are using a framework for your code you may find differences in how you receive a request (or it is routed by the framework) with or without a trailing slash that you'll need to be aware of. But in the end it's up to you how you define your URI structure (and whether or not a trailing slash changes the interpretation of a resource).
Of course, when writing your application, you may find yourself implementing redirects even though they aren't technically required. For example, let's say you decide to return a resource under "/virtual-directory", do you now preclude using "/virtual-directory/", do you return something different or do you return the same thing?
If you're going to return the same thing, doing an internal redirect hides the fact that the two URIs are the same resource, which can affect search engine indexing, for example. It's actually better to do the external redirect to ensure that regardless of the initial request, the final resource being delivered has a canonical name. Of course, getting folks to use the canonical name in the first place then avoids the redirect. Any internal links within your site should always use the canonical versions.
– David
PS: If at any point your code hands back control to the web server (say to more quickly satisfy a static file component), you'll be subject to that server's rules, which may be another reason to do some of your own redirects to align yourself with your local server behavior, at least for any URIs being satisfied directly by the server.