[Pro] robots.txt

On my website I have a development folder and a folder for websites I’m hosting. I want to keep search engine robots OUT of my developmental folder, but what about my hosted domain folder? Should they be kept out of there as well?

I think this is the code I’m supposed to use in the robots.txt file, but where do I put it?

User-agent: *
Disallow: /dev/

offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

This file goes at the root of your site ie yoursite.com/robots.txt

Your example excludes robots only from yoursite.com/dev - which is fine if it is the only folder you wish to protect.

You can also add other folders

Disallow: /test/
Disallow: /anotherfolder/
Disallow: /yetanotherfolder/

Ad infinitum

D


offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

Thanks Dave. What about the hosted domain folders?

I currently have Google crawling them at the top level domain. Not sure what happens if I disallow the folder that the shared domain exists in.


offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

Also, should User-agent: * be at the top of the robots.txt file and what does that command do?


offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

should User-agent: * be at the top of the robots.txt file and what does that command do?

What that means that it targets ALL user agents ie ALL robots

You can get specific and only target certain ones.

Without knowing the specifics of your server set up I cant give you a definitive answer to the shared domain question.

Are you saying that you host client sites in subfolders/sub domains of your site?

D


offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

On 14 Oct 2012, 10:48 pm, DeltaDave wrote:

Are you saying that you host client sites in subfolders/sub domains of your site?

D

Yes, that’s how GoDaddy works. The actual domain is hosted in a subdirectory, but to the browser it looks like a top level domain, sort of like a subdomain.


offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

Aaahhh! GoDaddy!

Should have known

I would have thought that they would be treated as standalones and not be affected.

If they appear as TLDs then there will be no robots file at www.theirsites.com/robots.txt

D


offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

The robots.txt should be wherever the main index.html is for the particular site you are trying to control. So if in the GoDaddy Bizarro-world that is public_html/www.somedomain.com/index.html, then you would put the robots.txt file at public_html/www.somedomain.com/robots.txt

Walter

On Oct 14, 2012, at 7:47 PM, RavenManiac wrote:

On 14 Oct 2012, 10:48 pm, DeltaDave wrote:

Are you saying that you host client sites in subfolders/sub domains of your site?

D

Yes, that’s how GoDaddy works. The actual domain is hosted in a subdirectory, but to the browser it looks like a top level domain, sort of like a subdomain.


offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options


offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

Thanks Walter. That makes sense. But, what about hosted websites?

Since I’m using a CMS with some of my customers’ websites, I would prefer that all searches go through www.client1.com instead of the subfolder where the website files actually exists, which in this case would be www.mycompany.com/client1.

If the user went directly to the subfolder, which is possible, that would result in the user seeing CMS pages with no data.


offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

On Oct 15, 2012, at 11:52 AM, RavenManiac wrote:

Thanks Walter. That makes sense. But, what about hosted websites?

Since I’m using a CMS with some of my customers’ websites, I would prefer that all searches go through www.client1.com instead of the subfolder where the website files actually exists, which in this case would be www.mycompany.com/client1.

If the user went directly to the subfolder, which is possible, that would result in the user seeing CMS pages with no data.

That’s a job for mod_rewrite in Apache. In your .htaccess file, very near the top, add something like this:

RewriteEngine On
RewriteCond %{HTTP_HOST} !^www.example.com$ [NC] <br>  
RewriteRule .? http://www.example.com%{REQUEST_URI} [R=301,L] 

Make sure that the preferred domain name is present in both the second and third lines. Also, make sure that if you already have a RewriteEngine On line, you do not duplicate it. Just add the next two lines after it. Note the backslashes used to escape the dots in the URL in the second line. Those are critical.

Also, always use a proper programmer’s text editor to create or alter your .htaccess file, but you already knew that…

Walter


offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

Thanks Walter. I’m using TextWrangler for all my coding edits.


offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options