Comprehensive guide to .htaccess
Tutorial written and contributed by Feyd, moderator of the JK Forum. Please see tutorial footnote for additional/bio info on author.
I am sure that most of you have heard of htaccess, if just vaguely, and that you may think you have a fair idea of what can be done with an htaccess file. You are more than likely mistaken about that, however. Regardless, even if you have never heard of htaccess and what it can do for you, the intention of this tutorial is to get you two moving along nicely together.
If you have heard of htaccess, chances are that it has been in relation to implementing custom error pages or password protected directories. But there is much more available to you through the marvelously simple .htaccess file.
A Few General Ideas
An htaccess file is a simple ASCII file, such as you would create through a text editor like NotePad or SimpleText. Many people seem to have some confusion over the naming convention for the file, so let me get that out of the way.
.htaccess is the file extension. It is not file.htaccess or somepage.htaccess, it is simply named .htaccess
In order to create the file, open up a text editor and save an empty page as .htaccess (or type in one character, as some editors will not let you save an empty page). Chances are that your editor will append its default file extension to the name (ex: for Notepad it would call the file .htaccess.txt). You need to remove the .txt (or other) file extension in order to get yourself htaccessing--yes, I know that isn't a word, but it sounds keen, don't it? You can do this by right clicking on the file and renaming it by removing anything that doesn't say .htaccess. You can also rename it via telnet or your ftp program, and you should be familiar enough with one of those so as not to need explaining.
htaccess files must be uploaded as ASCII mode, not BINARY. You may need to CHMOD the htaccess file to 644 or (RW-R--R--). This makes the file usable by the server, but prevents it from being read by a browser, which can seriously compromise your security. (For example, if you have password protected directories, if a browser can read the htaccess file, then they can get the location of the authentication file and then reverse engineer the list to get full access to any portion that you previously had protected. There are different ways to prevent this, one being to place all your authentication files above the root directory so that they are not www accessible, and the other is through an htaccess series of commands that prevents itself from being accessed by a browser, more on that later)
Most commands in htaccess are meant to be placed on one line only, so if you use a text editor that uses word-wrap, make sure it is disabled or it might throw in a few characters that annoy Apache to no end, although Apache is typically very forgiving of malformed content in an htaccess file.
htaccess is an Apache thing, not an NT thing. There are similar capabilities for NT servers, though in my professional experience and personal opinion, NT's ability in these areas is severely handicapped. But that's not what we're here for.
htaccess files affect the directory they are placed in and all sub-directories, that is an htaccess file located in your root directory (yoursite.com) would affect yoursite.com/content, yoursite.com/content/contents, etc. It is important to note that this can be prevented (if, for example, you did not want certain htaccess commands to affect a specific directory) by placing a new htaccess file within the directory you don't want affected with certain changes, and removing the specific command(s) from the new htaccess file that you do not want affecting this directory. In short, the nearest htaccess file to the current directory is treated as the htaccess file. If the nearest htaccess file is your global htaccess located in your root, then it affects every single directory in your entire site.
Before you go off and plant htaccess everywhere, read through this and make sure you don't do anything redundant, since it is possible to cause an infinite loop of redirects or errors if you place something weird in the htaccess.
Also... some sites do not allow use of htaccess files, since depending on what they are doing, they can slow down a server overloaded with domains if they are all using htaccess files. I can't stress this enough: You need to make sure you are allowed to use htaccess before you actually use it. Some things that htaccess can do can compromise a server configuration that has been specifically setup by the admin, so don't get in trouble.
Successful Client Requests
|
200 |
OK |
201 |
Created |
202 |
Accepted |
203 |
Non-Authorative Information |
204 |
No Content |
205 |
Reset Content |
206 |
Partial Content |
Client Request Redirected
|
300 |
Multiple Choices |
301 |
Moved Permanently |
302 |
Moved Temporarily |
303 |
See Other |
304 |
Not Modified |
305 |
Use Proxy |
Client Request Errors
|
400 |
Bad Request |
401 |
Authorization Required |
402 |
Payment Required (not used yet) |
403 |
Forbidden |
404 |
Not Found |
405 |
Method Not Allowed |
406 |
Not Acceptable (encoding) |
407 |
Proxy Authentication Required
|
408 |
Request Timed Out |
409 |
Conflicting Request |
410 |
Gone |
411 |
Content Length Required |
412 |
Precondition Failed |
413 |
Request Entity Too Long |
414 |
Request URI Too Long |
415 |
Unsupported Media Type |
Server Errors
|
500 |
Internal Server Error |
501 |
Not Implemented |
502 |
Bad Gateway
|
503 |
Service Unavailable
|
504 |
Gateway Timeout
|
505 |
HTTP Version Not Supported
|
|
Error Documents
In order to specify your own ErrorDocuments, you need to be slightly familiar with the server returned error codes. (List to the right). You do not need to specify error pages for all of these, in fact you shouldn't. An ErrorDocument for code 200 would cause an infinite loop, whenever a page was found...this would not be good.
You will probably want to create an error document for codes 404 and 500, at the least 404 since this would give you a chance to handle requests for pages not found. 500 would help you out with internal server errors in any scripts you have running. You may also want to consider ErrorDocuments for 401 - Authorization Required (as in when somebody tries to enter a protected area of your site without the proper credentials), 403 - Forbidden (as in when a file with permissions not allowing it to be accessed by the user is requested) and 400 - Bad Request, which is one of those generic kind of errors that people get to by doing some weird stuff with your URL or scripts.
In order to specify your own customized error documents, you simply need to add the following command, on one line, within your htaccess file:
You can name the pages anything you want (I'd recommend something that would prevent you from forgetting what the page is being used for), and you can place the error pages anywhere you want within your site, so long as they are web-accessible (through a URL). The initial slash in the directory location represents the root directory of your site, that being where your default page for your first-level domain is located. I typically prefer to keep them in a separate directory for maintenance purposes and in order to better control spiders indexing them through a ROBOTS.TXT file, but it is entirely up to you.
If you were to use an error document handler for each of the error codes I mentioned, the htaccess file would look like the following (note each command is on its own line):
ErrorDocument 400 /errors/badrequest.html
ErrorDocument 401 /errors/authreqd.html
ErrorDocument 403 /errors/forbid.html
ErrorDocument 404 /errors/notfound.html
ErrorDocument 500 /errors/serverr.html
You can specify a full URL rather than a virtual URL in the ErrorDocument string (http://yoursite.com/errors/notfound.html vs. /errors/notfound.html). But this is not the preferred method by the server's happiness standards.
You can also specify HTML, believe it or not!
ErrorDocument 401 "<body bgcolor=#ffffff><h1>You have to actually <b>BE</b> a <a href="#">member</A> to view this page, Colonel!
The only time I use that HTML option is if I am feeling particularly saucy, since you can have so much more control over the error pages when used in conjunction with xSSI or CGI or both. Also note that the ErrorDocument starts with a " just before the HTML starts, but does not end with one...it shouldn't end with one and if you do use that option, keep it that way. And again, that should all be on one line, no naughty word wrapping!
Now you should have yourself some brand-spanking new error documents...go off and destroy your site to see some of those beautiful ErrorDocuments get pulled up.
(Note: that last part is optional)
Next, we are moving on to password protection, that last frontier before I dunk you into the true capabilities of htaccess. If you are familiar with setting up your own password protected directories via htaccess, you may feel like skipping ahead.
Password protection
Ever wanted a specific directory in your site to be available only to people who you want it to be available to? Ever got frustrated with the seeming holes in client-side options for this that allowed virtually anyone with enough skill to mess around in your source to get in? htaccess is the answer!
There are numerous methods to password protecting areas of your site, some server language based (such as ASP, PHP or PERL) and client side based, such as JavaScript. JavaScript is not as secure or foolproof as a server-side option, a server side challenge/response is always more secure than a client dependant challenge/response. htaccess is about as secure as you can or need to get in everyday life, though there are ways above and beyond even that of htaccess. If you aren't comfortable enough with htaccess, you can password protect your pages any number of ways, and JavaScript Kit has plenty of password protection scripts for your use.
The first thing you will need to do is create a file called .htpasswd. I know, you might have problems with the naming convention, but it is the same idea behind naming the htaccess file itself, and you should be able to do that by this point. In the htpasswd file, you place the username and password (which is encrypted) for those whom you want to have access.
For example, a username and password of wsabstract (and I do not recommend having the username being the same as the password), the htpasswd file would look like this:
wsabstract:y4E7Ep8e7EYV
Notice that it is UserName first, followed by the Password. There is a handy-dandy tool available for you to easily encrypt the password into the proper encoding for use in the httpasswd file.
For security, you should not upload the htpasswd file to a directory that is web accessible (yoursite.com/.htpasswd), it should be placed above your www root directory. You'll be specifying the location to it later on, so be sure you know where you put it. Also, this file, as with htaccess, should be uploaded as ASCII and not BINARY.
Create a new htaccess file and place the following code in it:
AuthUserFile /usr/local/you/safedir/.htpasswd
AuthGroupFile /dev/null
AuthName EnterPassword
AuthType Basic
require user wsabstract
The first line is the full server path to your htpasswd file. If you have installed scripts on your server, you should be familiar with this. Please note that this is not a URL, this is a server path. Also note that if you place this htaccess file in your root directory, it will password protect your entire site, which probably isn't your exact goal.
The second to last line require user is where you enter the username of those who you want to have access to that portion of your site. Note that using this will allow only that specific user to be able to access that directory. This applies if you had an htpasswd file that had multiple users setup in it and you wanted each one to have access to an individual directory. If you wanted the entire list of users to have access to that directory, you would replace Require user xxx with require valid-user.
The AuthName is the name of the area you want to access. It could anything, such as "EnterPassword". You can change the name of this 'realm' to whatever you want, within reason.
We are using AuthType Basic because we are using basic HTTP authentication.
Enabling SSI Via htaccess
Many people want to use SSI, but don't seem to have the ability to do so with their current web host. You can change that with htaccess. A note of caution first...definitely ask permission from your host before you do this, it can be considered 'hacking' or violation of your host's TOS, so be safe rather than sorry:
AddType text/html .shtml
AddHandler server-parsed .shtml
Options Indexes FollowSymLinks Includes
The first line tells the server that pages with a .shtml extension (for Server parsed HTML) are valid. The second line adds a handler, the actual SSI bit, in all files named .shtml. This tells the server that any file named .shtml should be parsed for server side commands. The last line is just techno-junk that you should throw in there.
And that's it, you should have SSI enabled. But wait...don't feel like renaming all of your pages to .shtml in order to take advantage of this neat little toy? Me either! Just add this line to the fragment above, between the first and second lines:
AddHandler server-parsed .html
A note of caution on that one too, however. This will force the server to parse every page named .html for SSI commands, even if they have no SSI commands within them. If you are using SSI sparingly on your site, this is going to give you more server drain than you can justify. SSI does slow down a server because it does extra stuff before serving up a page, although in human terms of speed, it is virtually transparent. Some people also prefer to allow SSI in html pages so as to avoid letting anyone who looks at the page extension to know that they are using SSI in order to prevent the server being compromised through SSI hacks, which is possible. Either way, you now have the knowledge to use it either way.
If, however, you are going to keep SSI pages with the extension of .shtml, and you want to use SSI on your Index pages, you need to add the following line to your htaccess:
DirectoryIndex index.shtml index.html
This allows a page named index.shtml to be your default page, and if that is not found, index.html is loaded. More on DirectoryIndex later.
Deny users by IP
Is there a pesky person perpetrating pain upon you? Stalking your site from the vastness of the electron void? Blockem!
In your htaccess file, add the following code--changing the IPs to suit your needs--each command on one line each:
order allow,deny
deny from 123.45.6.7
deny from 012.34.5.
allow from all
You can deny access based upon IP address or an IP block. The above blocks access to the site from 123.45.6.7, and from any sub domain under the IP block 012.34.5. (012.34.5.1, 012.34.5.2, 012.34.5.3, etc.) I have yet to find a useful application of this, maybe if there is a site scraping your content you can block them, who knows.
You can also set an option for deny from all, which would of course deny everyone. You can also allow or deny by domain name rather than IP address (allow from .javascriptkit.com works for www.javascriptkit.com or virtual.javascriptkit.com, etc.)
Change your default directory page
Some of you may be wondering, just what in the world is a DirectoryIndex? Well, grasshopper, this is a command which allows you to specify a file that is to be loaded as your default page whenever a directory or url request comes in, that does not specify a specific page. Tired of having yoursite.com/index.html come up when you go to yoursite.com? Want to change it to be yoursite.com/ILikePizzaSteve.html that comes up instead? No problem!
DirectoryIndex filename.html
This would cause filename.html to be treated as your default page, or default directory page. You can also append other filenames to it. You may want to have certain directories use a script as a default page. That's no problem too!
DirectoryIndex filename.html index.cgi index.pl default.htm
Placing the above command in your htaccess file will cause this to happen: When a user types in yoursite.com, your site will look for filename.html in your root directory (or any directory if you specify this in the global htaccess), and if it finds it, it will load that page as the default page. If it does not find filename.html, it will then look for index.cgi; if it finds that one, it will load it, if not, it will look for index.pl and the whole process repeats until it finds a file it can use. Basically, the list of files is read from left to right.
Every once in a while, I use this method for the following needs: Say I keep all my include files in a directory called include, and that I keep all my image files in a directory called images, I don't want people to be able to directory browse through them (even though we can prevent that through another htaccess trick, more later) I would specify a DirectoryIndex entry, in a specific htaccess file for those two directories, for /redirect/index.pl that is a redirect page that redirects a request for those directories to be sent to the homepage. Or I could just specify a directory index of index.pl and upload an index.pl file to each of those directories. Or I could just stick in an htaccess redirect page, which is our next subject!
Redirects
Ever go through the nightmare of changing significantly portions of your site, then having to deal with the problem of people finding their way from the old pages to the new? It can be nasty. There are different ways of redirecting pages, through http-equiv, javascript or any of the server-side languages. And then you can do it through htaccess, which is probably the most effective, considering the minimal amount of work required to do it.
htaccess uses redirect to look for any request for a specific page (or a non-specific location, though this can cause infinite loops) and if it finds that request, it forwards it to a new page you have specified:
Redirect /olddirectory/oldfile.html http://yoursite.com/newdirectory/newfile.html
Note that there are 3 parts to that, which should all be on one line : the Redirect command, the location of the file/directory you want redirected relative to the root of your site (/olddirectory/oldfile.html = yoursite.com/olddirectory/oldfile.html) and the full URL of the location you want that request sent to. Each of the 3 is separated by a single space, but all on one line. You can also redirect an entire directory by simple using Redirect /olddirectory http://yoursite.com/newdirectory/
Using this method, you can redirect any number of pages no matter what you do to your directory structure. It is the fastest method that is a global affect.
Prevent viewing of .htaccess file
If you use htaccess for password protection, then the location containing all of your password information is plainly available through the htaccess file. If you have set incorrect permissions or if your server is not as secure as it could be, a browser has the potential to view an htaccess file through a standard web interface and thus compromise your site/server. This, of course, would be a bad thing. However, it is possible to prevent an htaccess file from being viewed in this manner:
<Files .htaccess>
order allow,deny
deny from all
</Files>
The first line specifies that the file named .htaccess is having this rule applied to it. You could use this for other purposes as well if you get creative enough.
If you use this in your htaccess file, a person trying to see that file would get returned (under most server configurations) a 403 error code. You can also set permissions for your htaccess file via CHMOD, which would also prevent this from happening, as an added measure of security: 644 or RW-R--R--
Adding MIME Types
What if your server wasn't set up to deliver certain file types properly? A common occurrence with MP3 or even SWF files. Simple enough to fix:
AddType application/x-shockwave-flash swf
AddType is specifying that you are adding a MIME type. The application string is the actual parameter of the MIME you are adding, and the final little bit is the default extension for the MIME type you just added, in our example this is swf for ShockWave File.
By the way, here's a neat little trick that few know about, but you get to be part of the club since JavaScript Kit loves you: To force a file to be downloaded, via the Save As browser feature, you can simply set a MIME type to application/octet-stream and that immediately prompts you for the download. I have no idea how that would be useful, but that question has come up in our Forums from time to time, so there ya' go.
Preventing hot linking of images
In the webmaster community, "hot linking" is a curse phrase. Also known as "bandwidth stealing" by the angry site owner, it refers to linking directly to non-html objects not on one own's server, such as images, .js files etc. The victim's server in this case is robbed of bandwidth (and in turn money) as the violator enjoys showing content without having to pay for its deliverance. The most common practice of hot linking pertains to another site's images.
Using .htaccess, you can disallow hot linking on your server, so those attempting to link to an image on your site, for example, is shown either the door (a broken image), or the lion's mouth (another image of your choice, such as a "Barbara Streisand" picture- no emails please). There is just one small catch- unlike the rest of the .htaccess functionalities we saw earlier, disabling hot linking also requires that your server supports mod_rewrite. Inquire your web host regarding this.
With all the pieces in place, here's how to disable hot linking of images on your site. Simply add the below code to your .htaccess file, and upload the file either to your root directory, or a particular subdirectory to localize the effect to just one section of your site:
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www.)?mydomain.com/.*$ [NC]
RewriteRule .(gif|jpg)$ - [F]
Be sure to replace "mydomain.com" with your own. The above code causes a broken image to be displayed when its hot linked.
If you're feeling bitter, you can set things up so an alternate image is displayed in place of the hot linked one. The code for this is:
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www.)?mydomain.com/.*$ [NC]
RewriteRule .(gif|jpg)$ http://www.mydomain.com/nasty.gif [R,L]
Same deal- replace mydomain.com with your own, plus nasty.gif.
Time to pour a bucket of cold water on hot linking!
Preventing Directory Listing
Do you have a directory full of images or zips that you do not want people to be able to browse through? Typically a server is setup to prevent directory listing, but sometimes they are not. If not, become self-sufficient and fix it yourself:
The * is a wildcard that matches all files, so if you stick that line into an htaccess file in your images directory, nothing in that directory will be allowed to be listed.
On the other hand, what if you did want the directory contents to be listed, but only if they were HTML pages and not images? Simple says I:
This would return a list of all files not ending in .jpg or .gif, but would still list .txt, .html, etc.
And conversely, if your server is setup to prevent directory listing, but you want to list the directories by default, you could simply throw this into an htaccess file the directory you want displayed:
If you do use this option, be very careful that you do not put any unintentional or compromising files in this directory. And if you guessed it by the plus sign before Indexes, you can throw in a minus sign (Options -Indexes) to prevent directory listing entirely--this is typical of most server setups and is usually configured elsewhere in the apache server, but can be overridden through htaccess.
If you really want to be tricky, using the +Indexes option, you can include a default description for the directory listing that is displayed when you use it by placing a file called HEADER in the same directory. The contents of this file will be printed out before the list of directory contents is listed. You can also specify a footer, though it is called README, by placing it in the same directory as the HEADER. The README file is printed out after the directory listing is printed.
Conclusion & More Information
Of course, I can't list every possible use of htaccess here, just the more notable and useful ones (read: for fun and profit). There is a list of Apache Directives you can use for your htaccess files, though not all of them are designed to be used by htaccess. Consult the documentation for the directive you are looking to use and make sure that you can actually use it as an htaccess string.
You should also go through the Apache User's Guide for more detailed information if you are really serious about making your life easier as a webmaster. You don't need to update all 4,000 of the pages on your site individually, by hand, in order to change one file reference...honestly!
In any event, I hope you got a better idea of the power available to you through this relatively simple little Clark Kent-ish file. You really do have the ability to save yourself a lot of time and grief by using htaccess, especially when you add to that the power of SSI and xSSI.
|