Searching many folders for arbitrary subfolders/files

New to mod_rewrite? This is a good place to start.

Searching many folders for arbitrary subfolders/files

Postby jon23d » Fri Sep 25, 2009 11:36 pm

Sorry for the length of this, I just wanted to make sure it made sense :)

Code: Select all
######
# Urls are converted to parameters in dispatcher.php, it takes care of all routing other than this
# I don't want to have to screw around with mime-types and headers serving resources when Apache
# Does it so well.  I think I just don't quite understand RewriteCond...
#
# But I want arbitrary files and folders to be accessible through a few folders.
# I know what the container folders are, but I won't know in the future what
# subfolders it they may contain. The accessible folders should be: webroot, resources, layouts, and admin/layouts.
#
# Sometimes this will be deployed in a subfolder within a domain (mydomain.com/my-app/), and other times it wont.
# This partcular test is located in /rewrite.
#
# If the browser requests http:://domain/js/my-js-file.js, I would like for the file to be served out of the first
# folder it finds it in (webroot/js/my-js-file.js, or resources/js/my-js-file.js, etc.)
#
# If the file doesn't exist, then the application will deal with the 404.
#
# Other than these files, everything else goes to dispatcher.
#
# The apache manual said to do this:
#    RewriteCond         /your/docroot/dir1/%{REQUEST_FILENAME}  -f
#    RewriteRule  ^(.+)  /your/docroot/dir1/$1  [L]
#
# But yea....  I've played with RewriteBase every which way, but...
#
# Every process in this application will take place in the same folder as the .htaccess.
# All resources and links will have fully-qualified links.
#
# Wanna test?  This is my current directory structure:
#
#  /rewrite/
#  /rewrite/.htaccess
#  /rewrite/dispatcher.php
#  /rewrite/resources/
#  /rewrite/resources/js/
#  /rewrite/resrouces/js/test.js
#  /rewrite/webroot/
#  /rewrite/webroot/js/
#  /rewrite/webroot/js/test.js
#
####### BEGIN HTACESS


Options +FollowSymLinks -MultiViews -Indexes
RewriteEngine On

# resources go here
RewriteCond rewrite/webroot%{REQUEST_URI}  -f
RewriteRule ^(.+)$ rewrite/webroot%{REQUEST_URI} [L]

#  everything else goes here
RewriteRule ^(.*)$ dispatcher.php [QSA,L,NC]
jon23d
 
Posts: 19
Joined: Sun Feb 01, 2009 11:36 pm

Update

Postby jon23d » Sat Sep 26, 2009 12:02 pm

Okay, so I'm a little closer. I've made a few changes to the requirements. The layouts folders will no longer be accessible, but I did add an admin/resources folder that should be accessible at something like admin/logo.jpg.

So far this is working for files in resources, files in webroot, but not for admin. I also am trying to get it to no allow someone to type in a url that points to a valid file with the webroot or resources folder in it.

I'm confused about something, I thought that the L flag caused all processing to halt and the file to be sent? This doesn't seem to be the case... I think I am missing some vital concept here.

Current file
-------------

Options +FollowSymLinks -MultiViews -Indexes
RewriteEngine On

# admin resources - not working
RewriteCond %{REQUEST_URI} !/admin/resources/
RewriteCond %{DOCUMENT_ROOT}/rewrite/admin/resources/$1 -f
RewriteRule ^admin/(.*)$ admin/resources/$1 [QSA,L,NC]

# publicly accessible webroot folder
RewriteCond %{REQUEST_URI} !/webroot/
RewriteCond %{DOCUMENT_ROOT}/rewrite/webroot/$1 -f
RewriteRule ^(.*)$ webroot/$1 [QSA,L,NC]

# publicly accessible resources folder
RewriteCond %{REQUEST_URI} !/resources/
RewriteCond %{DOCUMENT_ROOT}/rewrite/resources/$1 -f
RewriteRule ^(.*)$ resources/$1 [QSA,L,NC]

# everything else goes here
RewriteCond %{REQUEST_URI} !/webroot/
RewriteCond %{REQUEST_URI} !/resources/
RewriteCond %{REQUEST_URI} !/admin/resources/
RewriteRule ^(.*)$ dispatcher.php [QSA,L,NC]
jon23d
 
Posts: 19
Joined: Sun Feb 01, 2009 11:36 pm

Postby richardk » Sat Sep 26, 2009 1:39 pm

I'm confused about something, I thought that the L flag caused all processing to halt and the file to be sent? This doesn't seem to be the case... I think I am missing some vital concept here.

In .htaccess files and <Directory>s, the internal sub request for /rewrite/webroot/file is like a new request (because of the late processing) and the .htaccess file is processed again. You can stop rules matching by adding the following condition
Code: Select all
# Don't match sub requests.
RewriteCond %{ENV:REDIRECT_STATUS} ^$


Try
Code: Select all
Options +FollowSymLinks -MultiViews -Indexes

RewriteEngine On

# admin resources
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{DOCUMENT_ROOT}/rewrite/admin/resources/$1 -f
RewriteRule ^admin/(.+)$ admin/resources/$1 [NC,QSA,L]

# publicly accessible webroot folder
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{DOCUMENT_ROOT}/rewrite/webroot/$0 -f
RewriteRule ^.+$ webroot/$0 [NC,QSA,L]

# publicly accessible resources folder
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{DOCUMENT_ROOT}/rewrite/resources/$0 -f
RewriteRule ^.+$ resources/$0 [NC,QSA,L]

# everything else goes here
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteRule ^.+$ dispatcher.php [NC,QSA,L]
richardk
 
Posts: 8800
Joined: Wed Dec 21, 2005 7:50 am

second update

Postby jon23d » Sat Sep 26, 2009 2:21 pm

I was just about to post the new stuff, it seems to be working perfectly! After changing the logging options, I was easily able to walk through this. I did something very similar to what you posted:

Code: Select all
Options +FollowSymLinks -MultiViews -Indexes
RewriteEngine On

# Only run this once
RewriteCond %{ENV:REDIRECT_STATUS} 200
RewriteRule .* - [L]

# admin resources
RewriteCond %{DOCUMENT_ROOT}/rewrite/admin/webroot/$1 -f
RewriteRule ^admin/(.*)$ admin/webroot/$1 [QSA,L,NC]

# publicly accessible webroot folder
RewriteCond %{DOCUMENT_ROOT}/rewrite/webroot/$1 -f
RewriteRule ^(.*)$ webroot/$1 [QSA,L,NC]

# publicly accessible resources folder
RewriteCond %{DOCUMENT_ROOT}/rewrite/resources/$1 -f
RewriteRule ^(.*)$ resources/$1 [QSA,L,NC]

# everything else goes here
RewriteRule ^(.*)$ dispatcher.php [QSA,L,NC]


Now am I am left with only two things to do in here.

1) I need to lock off the admin resources for users not logged in, but I'm not sure how exactly to tie in php authentication to apache authentication... I guess I could use a cookie, but that seems awfully insecure.. perhaps session variables can be made available?

2) In a perfect world, the subdirectory of the webroot (if any), in which the resides, shouldn't have to be hard-coded in here. Short of creating a generator for .htaccess, is there any way to pull that in?

Thanks!

By the way Richard, I spent some time reading on here last night, you sure do get around, it seems you've been working with this for a while - mod_rewrite has the unique ability to make me feel deficient as a programmer...
jon23d
 
Posts: 19
Joined: Sun Feb 01, 2009 11:36 pm

Postby richardk » Sat Sep 26, 2009 2:49 pm

1) I need to lock off the admin resources for users not logged in, but I'm not sure how exactly to tie in php authentication to apache authentication... I guess I could use a cookie, but that seems awfully insecure.. perhaps session variables can be made available?

What are you trying to protect? The whole directory?
$_SESSION variables are not available.
The only ways i can think of doing it are to use a RewriteMap PHP program (you need access to the server configuration) or to run all requests through a PHP script that checks the auth.

2) In a perfect world, the subdirectory of the webroot (if any), in which the resides, shouldn't have to be hard-coded in here. Short of creating a generator for .htaccess, is there any way to pull that in?

Try adding
Code: Select all
RewriteCond %{REQUEST_URI}?$0 ^((.+)/)?([^?]+)\?(.+)$
RewriteRule ^.*$ - [E=BASE%2]

after
Code: Select all
RewriteRule .* - [L]

and using
Code: Select all
%{ENV:BASE}

instead of "/rewrite".
richardk
 
Posts: 8800
Joined: Wed Dec 21, 2005 7:50 am

Postby jon23d » Sat Sep 26, 2009 3:27 pm

I've never seen $0 before, what is that exactly?

It didn't quite work though, heres what I got from the log at those portions:

Code: Select all
(3) [per-dir /Applications/MAMP/htdocs/rewrite/] strip per-dir prefix: /Applications/MAMP/htdocs/rewrite/admin/test.php -> admin/test.php

(3) [per-dir /Applications/MAMP/htdocs/rewrite/] applying pattern '^.*$' to uri 'admin/test.php'

(4) RewriteCond: input='/rewrite/admin/test.php?admin/test.php' pattern='^((.+)/)?([^?]+)\?(.+)$' => matched


(3) [per-dir /Applications/MAMP/htdocs/rewrite/] strip per-dir prefix: /Applications/MAMP/htdocs/rewrite/admin/test.php -> admin/test.php

(3) [per-dir /Applications/MAMP/htdocs/rewrite/] applying pattern '^admin/(.*)$' to uri 'admin/test.php'

(4) RewriteCond: input='/Applications/MAMP/htdocs/admin/webroot/test.php' pattern='-f' => not-matched


I'm not 100% sure what I'm looking at, but it looks like you are saying I can extract the subfolder based on the request? So, add a / in front of whatever is in $0, strip that from REQUEST_URI, and voila? I like it, but what is $0...

In response to your question about the folder, yes, I do want to protect the entire thing. Is it likely that somebody is going to be trying to steal logos from the admin area, probably not, but I don't want to reveal any information that I don't have to. I'll probably do the same in the same thing in the front-end at some point as well for sites that use subscription-type content that is stored on disk that can't included in an html page.

I've just started looking, but I wouldn't be surprised if I couldn't just add users/passwords to an auth file and fake apache authentication with php.
jon23d
 
Posts: 19
Joined: Sun Feb 01, 2009 11:36 pm

Postby richardk » Sun Sep 27, 2009 10:42 am

The error is that (hopefully)
Code: Select all
RewriteCond %{REQUEST_URI}?$0 ^((.+)/)?([^?]+)\?(.+)$
RewriteRule ^.*$ - [E=BASE%2]

should be
Code: Select all
RewriteCond %{REQUEST_URI}?$0 ^(/.+)?/([^?]+)\?\2$
RewriteRule ^.*$ - [E=BASE:%1]

Shown here:
Code: Select all
(4) RewriteCond: input='/Applications/MAMP/htdocs/admin/webroot/test.php' pattern='-f' => not-matched

The path does not contain /rewrite so %{ENV:BASE} is not being set correctly.

I've never seen $0 before, what is that exactly?

It contains everything. Like $1 with ^(.*)$.

I'm not 100% sure what I'm looking at, but it looks like you are saying I can extract the subfolder based on the request?

Yes and put it in %{ENV:BASE}. The %{REQUEST_URI}?$0 produces /rewrite/admin/test.php?admin/test.php then the regular expression
Code: Select all
^(/.+)?/([^?]+)\?\2$

uses a backreference/variable (\2) (like $2 or %2 but in the regular expression) in it that means what is matched after the ? is also matched in the second () before the ?. Using Backreferences in The Regular Expression.
richardk
 
Posts: 8800
Joined: Wed Dec 21, 2005 7:50 am

Postby jon23d » Sun Sep 27, 2009 12:34 pm

Excellent, that did the trick. I had to change some of the expressions in a few of the conditions for some reason, but I ran my tests in / and in /rewrite, both worked perfectly, thanks!

The last thing I need in here now then is to protect the admin exports. I think what Ill do is just issue a cookie that contain the name of a temporary file created on login, destroyed on logout or session expiration. If the user has the cookie, and the file exists, then it'll go through. I can't think of any other way without using apache authentication.

Thanks again for your help, here is my completed code in case you are interested:

Code: Select all
Options +FollowSymLinks -MultiViews -Indexes
RewriteEngine On

# PHP settings
php_value session.use_cookies 1
php_value session.use_only_cookies 1
php_value session.cookie_httponly 1
php_value session.save_path "./sessions/"

# Only run this file once
RewriteCond %{ENV:REDIRECT_STATUS} 200
RewriteRule .* - [L]

# Get the subfolder (if any) that this app is in
RewriteCond %{REQUEST_URI}?$0 ^(/.+)?/([^?]+)\?\2$
RewriteRule ^.*$ - [E=BASE:%1]

# Pass CSS and JS files through to the parser, NEW_REQUEST will replace REQUREST_URI in dispatcher
# --- admin /webroot
RewriteCond %{REQUEST_URI} /admin/(.*\.(css|js))$
RewriteCond %{DOCUMENT_ROOT}%{ENV:BASE}/admin/webroot/$1 -f
RewriteRule ^admin/(.*\.(css|js))$ dispatcher.php [QSA,L,NC,E=NEW_REQUEST:admin/default/parseResource/admin/webroot/$1]
# --- front end /resources
RewriteCond %{REQUEST_URI} (.*\.(css|js))$
RewriteCond %{DOCUMENT_ROOT}%{ENV:BASE}/resources/$1 -f
RewriteRule ^((.*\.)(css|js))$ dispatcher.php [QSA,L,NC,E=NEW_REQUEST:default/parseResource/resources/$1]
# --- front end /webroot
RewriteCond %{REQUEST_URI} (.*\.(css|js))$
RewriteCond %{DOCUMENT_ROOT}%{ENV:BASE}/webroot/$1 -f
RewriteRule ^((.*\.)(css|js))$ dispatcher.php [QSA,L,NC,E=NEW_REQUEST:default/parseResource/webroot/$1]
 
# admin resources - requires protection still
RewriteCond %{DOCUMENT_ROOT}%{ENV:BASE}/admin/webroot/$1 -f
RewriteRule ^admin/(.*)$ admin/webroot/$1 [QSA,L,NC]
RewriteCond %{DOCUMENT_ROOT}%{ENV:BASE}/admin/exports/$1 -f
RewriteRule ^admin/(.*)$ admin/exports/$1 [QSA,L,NC]

# publicly accessible webroot folder
RewriteCond %{DOCUMENT_ROOT}%{ENV:BASE}/webroot/$1 -f
RewriteRule ^(.*)$ webroot/$1 [QSA,L,NC]

# publicly accessible resources folder
RewriteCond %{DOCUMENT_ROOT}%{ENV:BASE}/resources/$1 -f
RewriteRule ^(.*)$ resources/$1 [QSA,L,NC]

# everything else goes here
RewriteRule ^(.*)$ dispatcher.php [QSA,L,NC]
jon23d
 
Posts: 19
Joined: Sun Feb 01, 2009 11:36 pm

Postby richardk » Sun Sep 27, 2009 2:32 pm

You can remove all lines like
Code: Select all
RewriteCond %{REQUEST_URI} /admin/(.*\.(css|js))$

as they're doing the same thing as the RewriteRule.

When you have something like
Code: Select all
^(.*\.(css|js))$

you should probably use .+ or it will match a request to /.css.

In rules like
Code: Select all
RewriteRule ^(.*)$ resources/$1 [QSA,L,NC]

.+ should be used because there has to be something for the -f file test.
richardk
 
Posts: 8800
Joined: Wed Dec 21, 2005 7:50 am

Postby jon23d » Sun Sep 27, 2009 2:59 pm

Thank you, those worked perfectly.
jon23d
 
Posts: 19
Joined: Sun Feb 01, 2009 11:36 pm


Return to Beginner's Corner

Who is online

Users browsing this forum: Google [Bot] and 22 guests

cron