i am using mod rewrite feature of wsn links. So is yahoo indexing the site dynamically and google statically?
And i made a stupid mistake when started the site by linking my categories using the domain without the www such as http://mysite.com/.... while my main page is http://www.mysite.com with the www protocal.
Just wana share with others. And if you guys are also using mod rewrite, do you find the same referrals url?
Added: I just found out that yahoo is indexing my details and report pages. I already include a disallow command in the robots.txt disallowing all report and details pages. Why is yahoo still indexing those pages? Should i disallow index.php?
Search engines simply follow links, they can't reverse engineer a server to figure out what apache is thinking, so it's got to be just an age issue I'd think. Search engines are stubborn in that if a URL was linked the first day the SE indexed, even if that was ten years ago, it'll keep reindexing that URL so long as it's not a 404.
For example, this forum is still constantly getting old phpbb and Invision URLs indexed because it once used phpbb and invision, and those URLs aren't 404 errors, even though they don't do what they were intended to do. As far as I can see there's nothing that can really be done about this.
I already include a disallow command in the robots.txt disallowing all report and details pages.
One of the really annoying things about robots.txt is that you can only disallow URLs, not files -- with the exception of some wildcard methods that are largely unsupported, though google supports one I believe. So report.php may be excluded, but the SE will still index report.php?id=2.
Though as far as details and reports pages and the like, 3.2 has an option to throw 404s to recognized search engines on non-essential pages to make them get lost.
Like Paul said, robots.txt stops the pages from being fetched. It does not stop the URLs from being listed if they are found, thus you end up having a lot of URL only listings in search engines. Oddly enough the recommended way of removing the URL only listing is to ALLOW them to be followed in your robots.txt, and then add an on-page "meta robots noindex". Also, Google has a form to request URL removals, I don't know about Yahoo. Either way, it would be a lot of work.
morton82 wrote:
And i made a stupid mistake when started the site by linking my categories using the domain without the www such as http://mysite.com/.... while my main page is http://www.mysite.com with the www protocal.
You can change your .htaccess to do a permanent redirect from domain.com to www.domain.com. IMO, This is a good idea to do on any site because people sometimes link without the www. Its off topic, so PM me if you need help adding it.
Paul, it sounds like a problem like this may fix itself if the default .htaccess config used a 301 redirect to category names. Currently the dynamic show links url on any site that switches to mod_rewrite will continue to show pages if that url is requested. Would there be any problems with doing a 301 redirect from the dynamic URL to the new one?
Off the top of my head it seems like it would just be adding, " [R=301,L]" to the end of the ReWrite rules. This would in theory clean up any dynamic listing that a search engine continues to index. I can't think of any side-effects. Currect me if i'f wrong.
You can change your .htaccess to do a permanent redirect from domain.com to www.domain.com. IMO, This is a good idea to do on any site because people sometimes link without the www. Its off topic, so PM me if you need help adding it.
I have already done a htaccess redirect from the domain http://mydomain.com to http://www.domain.com. And whenever i type in mysite.com, it will redirect to www.mysite.com.
BUT it will NOT do the same for wsnlinks category pages. For example, if i type in the url mysite.com/wsnlinks/cat1, it will not redirect to http://www.mysite.com/wsnlinks/cat1
There is another htaccess file in my wsnlinks folder and i do not dare to touch it.
thus you end up having a lot of URL only listings in search engines.
Nope. Yahoo is still indexing the latest pages. I checked the caches and found my latest cache all the time. But when i type in mysite.com/wsnlinks/cat1, nothing was found. So i guess Yahoo is not indexing the mod rewrite pages but instead the .php pages.
Though as far as details and reports pages and the like, 3.2 has an option to throw 404s to recognized search engines on non-essential pages to make them get lost.
So should i upgrade to 3.20? using 3.15 now. Google is indexing the page fine. Google indexed both .php pages and mod rewite pages but only lists mod rewrite pages on the serps. And i found the cache of .php pages to be february something.
However, yahoo is still getting those details and report pages which is useless. I want to optimize yahoo bot to index my site deeper.
Weird that I just read this, considering just an hour ago I added this to one of my websites for the first time. Here's the .htaccess mod that'll do the trick. Yep, I tested it.
mrowton wrote: Oddly enough the recommended way of removing the URL only listing is to ALLOW them to be followed in your robots.txt, and then add an on-page "meta robots noindex".
Interesting. That's something I might be able to automate for recognized spiders. It sounds a little risky when the page it's being told not to index has the same content as a page that it should spider (index.php?catid=... vs /cat-name/). But I guess SEs wouldn't be smart/dumb enough to drop all duplicates of the page.
On the .htaccess matter, mrowton and SteveKarr -- I don't really know anything (except what I've read in this thread) about setting redirect numbers for rewrite lines, but I'm not sure this will really make search engines give up on the old urls. After all, the search engine is coming in at index.php?action=displaycat&catid=x and that URL is not affected by .htaccess. How could something tacked onto a rewrite rule affect a page that the rewrite rule isn't rewriting?
Actually, doesn't the 301 redirect make it a 301 redirect from the /category-name/ url to the index.php?action=displaycat&catname=category-name, since that's the direction the .htaccess is handling? Thus it would actually be having about the exact opposite effect from the intended, I would think... will try this out later, but I'm not sure how I'd detect what the 301 is doing even when testing.
morton82 wrote: But when i type in mysite.com/wsnlinks/cat1, nothing was found. So i guess Yahoo is not indexing the mod rewrite pages but instead the .php pages.
Actually it's likely spidering both, but like any search engine it checks for duplicate pages and only shows one version. Whatever algorithm it uses to decide such ties has it favoring the non-rewritten URLs (maybe just because they're older, who knows). The only way to solve this to make it prefer the rewritten urls would be the meta robots noindex possible feature discussed above.
So should i upgrade to 3.20?
That's up to you. If you have a lot of template customizations, you may want to be sure you have some time available.
0/5
1
2
3
4
5
This thread is closed, so you cannot post a reply.
Comments on Different Indexing from Yahoo and Google
Forum Regular
Usergroup: Customer
Joined: Jan 21, 2005
Total Topics: 32
Total Comments: 102
Hi,
I just noticed that referrals from Yahoo are coming from my site with urls like these:-
www.mysite.com/wsnlinks/index.php?action=displaycat&catid=29
and Referrals from google from these:-
http://mysite.com/wsnlinks/CAT1
i am using mod rewrite feature of wsn links. So is yahoo indexing the site dynamically and google statically?
And i made a stupid mistake when started the site by linking my categories using the domain without the www such as http://mysite.com/.... while my main page is http://www.mysite.com with the www protocal.
Just wana share with others. And if you guys are also using mod rewrite, do you find the same referrals url?
Added: I just found out that yahoo is indexing my details and report pages. I already include a disallow command in the robots.txt disallowing all report and details pages. Why is yahoo still indexing those pages? Should i disallow index.php?
developer
Usergroup: Administrator
Joined: Dec 20, 2001
Location: Diamond Springs, California
Total Topics: 61
Total Comments: 7868
Search engines simply follow links, they can't reverse engineer a server to figure out what apache is thinking, so it's got to be just an age issue I'd think. Search engines are stubborn in that if a URL was linked the first day the SE indexed, even if that was ten years ago, it'll keep reindexing that URL so long as it's not a 404.
For example, this forum is still constantly getting old phpbb and Invision URLs indexed because it once used phpbb and invision, and those URLs aren't 404 errors, even though they don't do what they were intended to do. As far as I can see there's nothing that can really be done about this.
I already include a disallow command in the robots.txt disallowing all report and details pages.
One of the really annoying things about robots.txt is that you can only disallow URLs, not files -- with the exception of some wildcard methods that are largely unsupported, though google supports one I believe. So report.php may be excluded, but the SE will still index report.php?id=2.
Though as far as details and reports pages and the like, 3.2 has an option to throw 404s to recognized search engines on non-essential pages to make them get lost.
Forum Regular
Usergroup: Customer
Joined: Feb 19, 2004
Location: Michigan
Total Topics: 57
Total Comments: 185
I just noticed that referrals from Yahoo are coming from my site with urls like these:-
www.mysite.com/wsnlinks/index.php?action=displaycat&catid=29
and Referrals from google from these:-
http://mysite.com/wsnlinks/CAT1
And i made a stupid mistake when started the site by linking my categories using the domain without the www such as http://mysite.com/.... while my main page is http://www.mysite.com with the www protocal.
Paul, it sounds like a problem like this may fix itself if the default .htaccess config used a 301 redirect to category names. Currently the dynamic show links url on any site that switches to mod_rewrite will continue to show pages if that url is requested. Would there be any problems with doing a 301 redirect from the dynamic URL to the new one?
Off the top of my head it seems like it would just be adding, " [R=301,L]" to the end of the ReWrite rules. This would in theory clean up any dynamic listing that a search engine continues to index. I can't think of any side-effects. Currect me if i'f wrong.
Forum Regular
Usergroup: Customer
Joined: Jan 21, 2005
Total Topics: 32
Total Comments: 102
I have already done a htaccess redirect from the domain http://mydomain.com to http://www.domain.com. And whenever i type in mysite.com, it will redirect to www.mysite.com.
BUT it will NOT do the same for wsnlinks category pages. For example, if i type in the url mysite.com/wsnlinks/cat1, it will not redirect to http://www.mysite.com/wsnlinks/cat1
There is another htaccess file in my wsnlinks folder and i do not dare to touch it.
Nope. Yahoo is still indexing the latest pages. I checked the caches and found my latest cache all the time. But when i type in mysite.com/wsnlinks/cat1, nothing was found. So i guess Yahoo is not indexing the mod rewrite pages but instead the .php pages.
So should i upgrade to 3.20? using 3.15 now. Google is indexing the page fine. Google indexed both .php pages and mod rewite pages but only lists mod rewrite pages on the serps. And i found the cache of .php pages to be february something.
However, yahoo is still getting those details and report pages which is useless. I want to optimize yahoo bot to index my site deeper.
Member
Usergroup: Customer
Joined: Aug 18, 2004
Total Topics: 4
Total Comments: 17
Weird that I just read this, considering just an hour ago I added this to one of my websites for the first time. Here's the .htaccess mod that'll do the trick. Yep, I tested it.
RewriteCond %{HTTP_HOST} ^domain.com
RewriteRule ^(.*)$ http://www.domain.com/$1 [R=permanent,L]
Enjoy!
- Steve
developer
Usergroup: Administrator
Joined: Dec 20, 2001
Location: Diamond Springs, California
Total Topics: 61
Total Comments: 7868
Oddly enough the recommended way of removing the URL only listing is to ALLOW them to be followed in your robots.txt, and then add an on-page "meta robots noindex".
Interesting. That's something I might be able to automate for recognized spiders. It sounds a little risky when the page it's being told not to index has the same content as a page that it should spider (index.php?catid=... vs /cat-name/). But I guess SEs wouldn't be smart/dumb enough to drop all duplicates of the page.
On the .htaccess matter, mrowton and SteveKarr -- I don't really know anything (except what I've read in this thread) about setting redirect numbers for rewrite lines, but I'm not sure this will really make search engines give up on the old urls. After all, the search engine is coming in at index.php?action=displaycat&catid=x and that URL is not affected by .htaccess. How could something tacked onto a rewrite rule affect a page that the rewrite rule isn't rewriting?
Actually, doesn't the 301 redirect make it a 301 redirect from the /category-name/ url to the index.php?action=displaycat&catname=category-name, since that's the direction the .htaccess is handling? Thus it would actually be having about the exact opposite effect from the intended, I would think... will try this out later, but I'm not sure how I'd detect what the 301 is doing even when testing.
developer
Usergroup: Administrator
Joined: Dec 20, 2001
Location: Diamond Springs, California
Total Topics: 61
Total Comments: 7868
But when i type in mysite.com/wsnlinks/cat1, nothing was found. So i guess Yahoo is not indexing the mod rewrite pages but instead the .php pages.
Actually it's likely spidering both, but like any search engine it checks for duplicate pages and only shows one version. Whatever algorithm it uses to decide such ties has it favoring the non-rewritten URLs (maybe just because they're older, who knows). The only way to solve this to make it prefer the rewritten urls would be the meta robots noindex possible feature discussed above.
So should i upgrade to 3.20?
That's up to you. If you have a lot of template customizations, you may want to be sure you have some time available.