Redirect, Change URLs or Redirect HTTP to HTTPS in Apache

asked May 20, 2015 in htaccess by rahulgii
0 votes

3 Answers

0 votes
mod_rewrite syntax order

mod_rewrite has some specific ordering rules that affect processing. Before anything gets done, the RewriteEngine On directive needs to be given as this turns on mod_rewrite processing. This should be before any other rewrite directives.

RewriteCond preceding RewriteRule makes that ONE rule subject to the conditional. Any following RewriteRules will be processed as if they were not subject to conditionals.

RewriteEngine On
RewriteCond %{HTTP_REFERER}          ^https?://serverfault\.com(/|$)
RewriteRule $/blog/(.*)\.html        $/blog/$1.sf.html
In this simple case, if the HTTP referrer is from, redirect blog requests to special serverfault pages (we're just that special). However, if the above block had an extra RewriteRule line:

RewriteEngine On
RewriteCond %{HTTP_REFERER}          ^https?://serverfault\.com(/|$)
RewriteRule $/blog/(.*)\.html        $/blog/$1.sf.html
RewriteRule $/blog/(.*)\.jpg         $/blog/$1.sf.jpg
All .jpg files would go to the special serverfault pages, not just the ones with a referrer indicating it came from here. This is clearly not the intent of the how these rules are written. It could be done with multiple RewriteCond rules:

RewriteEngine On
RewriteCond %{HTTP_REFERER}          ^https?://serverfault\.com(/|$)
RewriteRule ^/blog/(.*)\.html        /blog/$1.sf.html
RewriteCond %{HTTP_REFERER}          ^https?://serverfault\.com(/|$)
RewriteRule ^/blog/(.*)\.jpg         /blog/$1.sf.jpg
But probably should be done with some trickier replacement syntax.

RewriteEngine On
RewriteCond %{HTTP_REFERER}                ^https?://serverfault\.com(/|$)
RewriteRule ^/blog/(.*)\.(html|jpg)        /blog/$1.sf.$2
The more complex RewriteRule contains the conditionals for processing. The last parenthetical, (html|jpg) tells RewriteRule to match for either html or jpg, and to represent the matched string as $2 in the rewritten string. This is logically identical to the previous block, with two RewriteCond/RewriteRule pairs, it just does it on two lines instead of four.

Multiple RewriteCond lines are implicitly ANDed, and can be explicitly ORed. To handle referrers from both ServerFault and Super User (explicit OR):

RewriteEngine On
RewriteCond %{HTTP_REFERER}                ^https?://serverfault\.com(/|$)    [OR]
RewriteCond %{HTTP_REFERER}                ^https?://superuser\.com(/|$)
RewriteRule ^/blog/(.*)\.(html|jpg)        /blog/$1.sf.$2
To serve ServerFault referred pages with Chrome browsers (implicit AND):

RewriteEngine On
RewriteCond %{HTTP_REFERER}                ^https?://serverfault\.com(/|$)
RewriteCond %{HTTP_USER_AGENT}             ^Mozilla.*Chrome.*$
RewriteRule ^/blog/(.*)\.(html|jpg)        /blog/$1.sf.$2
RewriteBase is also order specific as it specifies how following RewriteRule directives handle their processing. It is very useful in .htaccess files. If used, it should be the first directive under "RewriteEngine on" in an .htaccess file. Take this example:

RewriteEngine On
RewriteBase /blog
RewriteCond %{HTTP_REFERER}           ^https?://serverfault\.com(/|$)
RewriteRule ^(.*)\.(html|jpg)         $1.sf.$2
This is telling mod_rewrite that this particular URL it is currently handling was arrived by way of instead of the physical directory path (/home/$Username/public_html/blog) and to treat it accordingly. Because of this, the RewriteRule considers it's string-start to be after the "/blog" in the URL. Here is the same thing written two different ways. One with RewriteBase, the other without:

RewriteEngine On

##Example 1: No RewriteBase##
RewriteCond %{HTTP_REFERER}                                   ^https?://serverfault\.com(/|$)
RewriteRule /home/assdr/public_html/blog/(.*)\.(html|jpg)     $1.sf.$2

##Example 2: With RewriteBase##
RewriteBase /blog
RewriteCond %{HTTP_REFERER}           ^https?://serverfault\.com(/|$)
RewriteRule ^(.*)\.(html|jpg)         $1.sf.$2
As you can see, RewriteBase allows rewrite rules to leverage the web-site path to content rather than the web-server, which can make them more intelligible to those who edit such files. Also, they can make the directives shorter, which has an aesthetic appeal.

RewriteRule matching syntax

RewriteRule itself has a complex syntax for matching strings. I'll cover the flags (things like [PT]) in another section. Because Sysadmins learn by example more often than by reading a man-page I'll give examples and explain what they do.

RewriteRule ^/blog/(.*)$    /newblog/$1
The .* construct matches any single character (.) zero or more times (*). Enclosing it in parenthesis tells it to provide the string that was matched as the $1 variable.

RewriteRule ^/blog/.*/(.*)$  /newblog/$1
In this case, the first .* was NOT enclosed in parens so isn't provided to the rewritten string. This rule removes a directory level on the new blog-site. (/blog/2009/sample.html becomes /newblog/sample.html).

RewriteRule ^/blog/(2008|2009)/(.*)$   /newblog/$2
In this case, the first parenthesis expression sets up a matching group. This becomes $1, which is not needed and therefore not used in the rewritten string.

RewriteRule ^/blog/(2008|2009)/(.*)$   /newblog/$1/$2
In this case, we use $1 in the rewritten string.

RewriteRule ^/blog/(20[0-9][0-9])/(.*)$   /newblog/$1/$2
This rule uses a special bracket syntax that specifies a character range. [0-9] matches the numerals 0 through 9. This specific rule will handle years from 2000 to 2099.

RewriteRule ^/blog/(20[0-9]{2})/(.*)$  /newblog/$1/$2
This does the same thing as the previous rule, but the {2} portion tells it to match the previous character (a bracket expression in this case) two times.

RewriteRule ^/blog/([0-9]{4})/([a-z]*)\.html   /newblog/$1/$2.shtml
This case will match any lower-case letter in the second matching expression, and do so for as many characters as it can. The \. construct tells it to treat the period as an actual period, not the special character it is in previous examples. It will break if the file-name has dashes in it, though.

RewriteRule ^/blog/([0-9]{4})/([-a-z]*)\.html  /newblog/$1/$2.shtml
This traps file-names with dashes in them. However, as - is a special character in bracket expressions, it has to be the first character in the expression.

RewriteRule ^/blog/([0-9]{4})/([-0-9a-zA-Z]*)\.html   /newblog/$1/$2.shtml
This version traps any file name with letters, numbers or the - character in the file-name. This is how you specify multiple character sets in a bracket expression.
answered May 20, 2015 by rahulgii
0 votes
RewriteRule flags

The flags on rewrite rules have a host of special meanings and usecases.

RewriteRule ^/blog/([0-9]{4})/([-a-z]*).\html  /newblog/$1/$2.shtml  [L]
The flag is the [L] at the end of the above expression. Multiple flags can be used, separated by a comma. The linked documentation describes each one, but here they are anyway:

L = Last. Stop processing RewriteRules once this one matches. Order counts!
C = Chain. Continue processing the next RewriteRule. If this rule doesn't match, then the next rule won't be executed. More on this later.
E = Set environmental variable. Apache has various environmental variables that can affect web-server behavior.
F = Forbidden. Returns a 403-Forbidden error if this rule matches.
G = Gone. Returns a 410-Gone error if this rule matches.
H = Handler. Forces the request to be handled as if it were the specified MIME-type.
N = Next. Forces the rule to start over again and re-match. BE CAREFUL! Loops can result.
NC = No case. Allows jpg to match both jpg and JPG.
NE = No escape. Prevents the rewriting of special characters (. ? # & etc) into their hex-code equivalents.
NS = No subrequests. If you're using server-side-includes, this will prevent matches to the included files.
P = Proxy. Forces the rule to be handled by mod_proxy. Transparently provide content from other servers, because your web-server fetches it and re-serves it. This is a dangerous flag, as a poorly written one will turn your web-server into an open-proxy and That is Bad.
PT = Pass Through. Take into account Alias statements in RewriteRule matching.
QSA = QSAppend. When the original string contains a query ( append the original query string to the rewritten string. Normally it would be discarded. Important for dynamic content.
R = Redirect. Provide an HTTP redirect to the specified URL. Can also provide exact redirect code [R=303]. Very similar to RedirectMatch, which is faster and should be used when possible.
S = Skip. Skip this rule.
T = Type. Specify the mime-type of the returned content. Very similar to the AddType directive.

You know how I said that RewriteCond applies to one and only one rule? Well, you can get around that by chaining.

RewriteEngine On
RewriteCond %{HTTP_REFERER}          ^https?://serverfault\.com(/|$)
RewriteRule ^/blog/(.*)\.html        /blog/$1.sf.html     [C]
RewriteRule ^/blog/(.*)\.jpg         /blog/$1.sf.jpg
Because the first RewriteRule has the Chain flag, the second rewrite-rule will execute when the first does, which is when the previous RewriteCond rule is matched. Handy if Apache regular-expressions make your brain hurt. However, the all-in-one-line method I point to in the first section is faster from an optimization point of view.

RewriteRule ^/blog/([0-9]{4})/([-0-9a-zA-Z]*)\.html   /newblog/$1/$2.shtml
This can be made simpler through flags:

RewriteRule ^/blog/([0-9]{4})/([-0-9a-z]*)\.html   /newblog/$1/$2.shtml   [NC]
Also, some flags also apply to RewriteCond. Notably, NoCase.

RewriteCond %{HTTP_REFERER}        ^https?://serverfault\.com(/|$)     [NC]
Will match ""
answered May 20, 2015 by rahulgii
0 votes
Using rewritemap

There are lots of things you can do with rewritemaps. Rewritemaps get declared using the Rewritemap directive, and can then be used both in RewritCond evaluations, and in RewriteRule Subsitutions.

The general syntax for RewriteMap is:

RewriteMap MapName MapType:MapSource
For example:

RewriteMap examplemap txt:/path/to/file/map.txt
You can then use the mapname for constructs like this:

The map contains key/value pairs. If the key is found, the value is subsituted. Simple maps are just plain text files, but you can use hash maps, and even SQL queries. More details are in the docs:

Unescaping strings.

There are four internal maps you can use to do some manipulations. Especially unescaping strings can come in handy.

For example: I want to test for the string "café" in the query string. However, the browser will escape this before sending it to my server, so I 'll need to either figure out what the URL escaped version is for every string I wish to match, or I can just unescape it...

RewriteMap unescape int:unescape

RewriteCond %{QUERY_STRING}  (location|place)=(.*)
RewriteCond ${unescape:%2}   café
RewriteRule ^/find/$         /find/1234? [L,R]
Note how I use one RewriteCond to just capture the argument toe the query string parameter, and then use the map in the second rewriteCond to unescape it. This then gets compared. Also note how I need to us %2 as key in the rewritemap, as %1 will contain either "location" or "place". When you use parentheses to group patterns they will also be captured, wether you plan to use the result of the capture or not...
answered May 20, 2015 by rahulgii