Webmastersite.net
Register Log In

Different Indexing from Yahoo and Google

Comments on Different Indexing from Yahoo and Google

morton82
Forum Regular

Usergroup: Customer
Joined: Jan 21, 2005

Total Topics: 32
Total Comments: 102
Posted Jun 19, 2005 - 2:19 AM:

Hi,

I just noticed that referrals from Yahoo are coming from my site with urls like these:-
www.mysite.com/wsnlinks/index.php?action=displaycat&catid=29

and Referrals from google from these:-
http://mysite.com/wsnlinks/CAT1

i am using mod rewrite feature of wsn links. So is yahoo indexing the site dynamically and google statically?

And i made a stupid mistake when started the site by linking my categories using the domain without the www such as http://mysite.com/.... while my main page is http://www.mysite.com with the www protocal.

Just wana share with others. And if you guys are also using mod rewrite, do you find the same referrals url?

Added: I just found out that yahoo is indexing my details and report pages. I already include a disallow command in the robots.txt disallowing all report and details pages. Why is yahoo still indexing those pages? Should i disallow index.php?
Paul
developer

Usergroup: Administrator
Joined: Dec 20, 2001
Location: Diamond Springs, California

Total Topics: 61
Total Comments: 7868
Paul
Posted Jun 19, 2005 - 3:31 AM:

Search engines simply follow links, they can't reverse engineer a server to figure out what apache is thinking, so it's got to be just an age issue I'd think. Search engines are stubborn in that if a URL was linked the first day the SE indexed, even if that was ten years ago, it'll keep reindexing that URL so long as it's not a 404.

For example, this forum is still constantly getting old phpbb and Invision URLs indexed because it once used phpbb and invision, and those URLs aren't 404 errors, even though they don't do what they were intended to do. As far as I can see there's nothing that can really be done about this.

I already include a disallow command in the robots.txt disallowing all report and details pages.

One of the really annoying things about robots.txt is that you can only disallow URLs, not files -- with the exception of some wildcard methods that are largely unsupported, though google supports one I believe. So report.php may be excluded, but the SE will still index report.php?id=2.

Though as far as details and reports pages and the like, 3.2 has an option to throw 404s to recognized search engines on non-essential pages to make them get lost.
mrowton
Forum Regular

Usergroup: Customer
Joined: Feb 19, 2004
Location: Michigan

Total Topics: 57
Total Comments: 185
mrowton
Posted Jun 19, 2005 - 8:51 AM:

morton82 wrote:

I just noticed that referrals from Yahoo are coming from my site with urls like these:-
www.mysite.com/wsnlinks/index.php?action=displaycat&catid=29

and Referrals from google from these:-
http://mysite.com/wsnlinks/CAT1
Like Paul said, robots.txt stops the pages from being fetched. It does not stop the URLs from being listed if they are found, thus you end up having a lot of URL only listings in search engines. Oddly enough the recommended way of removing the URL only listing is to ALLOW them to be followed in your robots.txt, and then add an on-page "meta robots noindex". Also, Google has a form to request URL removals, I don't know about Yahoo. Either way, it would be a lot of work.

morton82 wrote:

And i made a stupid mistake when started the site by linking my categories using the domain without the www such as http://mysite.com/.... while my main page is http://www.mysite.com with the www protocal.
You can change your .htaccess to do a permanent redirect from domain.com to www.domain.com. IMO, This is a good idea to do on any site because people sometimes link without the www. Its off topic, so PM me if you need help adding it.

Paul, it sounds like a problem like this may fix itself if the default .htaccess config used a 301 redirect to category names. Currently the dynamic show links url on any site that switches to mod_rewrite will continue to show pages if that url is requested. Would there be any problems with doing a 301 redirect from the dynamic URL to the new one?

Off the top of my head it seems like it would just be adding, " [R=301,L]" to the end of the ReWrite rules. This would in theory clean up any dynamic listing that a search engine continues to index. I can't think of any side-effects. Currect me if i'f wrong.
morton82
Forum Regular

Usergroup: Customer
Joined: Jan 21, 2005

Total Topics: 32
Total Comments: 102
Posted Jun 19, 2005 - 10:24 AM:

You can change your .htaccess to do a permanent redirect from domain.com to www.domain.com. IMO, This is a good idea to do on any site because people sometimes link without the www. Its off topic, so PM me if you need help adding it.


I have already done a htaccess redirect from the domain http://mydomain.com to http://www.domain.com. And whenever i type in mysite.com, it will redirect to www.mysite.com.

BUT it will NOT do the same for wsnlinks category pages. For example, if i type in the url mysite.com/wsnlinks/cat1, it will not redirect to http://www.mysite.com/wsnlinks/cat1

There is another htaccess file in my wsnlinks folder and i do not dare to touch it.

thus you end up having a lot of URL only listings in search engines.

Nope. Yahoo is still indexing the latest pages. I checked the caches and found my latest cache all the time. But when i type in mysite.com/wsnlinks/cat1, nothing was found. So i guess Yahoo is not indexing the mod rewrite pages but instead the .php pages.

Though as far as details and reports pages and the like, 3.2 has an option to throw 404s to recognized search engines on non-essential pages to make them get lost.

So should i upgrade to 3.20? using 3.15 now. Google is indexing the page fine. Google indexed both .php pages and mod rewite pages but only lists mod rewrite pages on the serps. And i found the cache of .php pages to be february something.

However, yahoo is still getting those details and report pages which is useless. I want to optimize yahoo bot to index my site deeper.
SteveKarr
Member

Usergroup: Customer
Joined: Aug 18, 2004

Total Topics: 4
Total Comments: 17
Posted Jun 20, 2005 - 2:20 AM:

Weird that I just read this, considering just an hour ago I added this to one of my websites for the first time. Here's the .htaccess mod that'll do the trick. Yep, I tested it.

RewriteCond %{HTTP_HOST} ^domain.com

RewriteRule ^(.*)$ http://www.domain.com/$1 [R=permanent,L]

Enjoy!

- Steve
Paul
developer

Usergroup: Administrator
Joined: Dec 20, 2001
Location: Diamond Springs, California

Total Topics: 61
Total Comments: 7868
Paul
Posted Jun 20, 2005 - 4:32 AM:

mrowton wrote:
Oddly enough the recommended way of removing the URL only listing is to ALLOW them to be followed in your robots.txt, and then add an on-page "meta robots noindex".


Interesting. That's something I might be able to automate for recognized spiders. It sounds a little risky when the page it's being told not to index has the same content as a page that it should spider (index.php?catid=... vs /cat-name/). But I guess SEs wouldn't be smart/dumb enough to drop all duplicates of the page.


On the .htaccess matter, mrowton and SteveKarr -- I don't really know anything (except what I've read in this thread) about setting redirect numbers for rewrite lines, but I'm not sure this will really make search engines give up on the old urls. After all, the search engine is coming in at index.php?action=displaycat&catid=x and that URL is not affected by .htaccess. How could something tacked onto a rewrite rule affect a page that the rewrite rule isn't rewriting?

Actually, doesn't the 301 redirect make it a 301 redirect from the /category-name/ url to the index.php?action=displaycat&catname=category-name, since that's the direction the .htaccess is handling? confused Thus it would actually be having about the exact opposite effect from the intended, I would think... will try this out later, but I'm not sure how I'd detect what the 301 is doing even when testing.
Paul
developer

Usergroup: Administrator
Joined: Dec 20, 2001
Location: Diamond Springs, California

Total Topics: 61
Total Comments: 7868
Paul
Posted Jun 20, 2005 - 4:40 AM:

morton82 wrote:
But when i type in mysite.com/wsnlinks/cat1, nothing was found. So i guess Yahoo is not indexing the mod rewrite pages but instead the .php pages.


Actually it's likely spidering both, but like any search engine it checks for duplicate pages and only shows one version. Whatever algorithm it uses to decide such ties has it favoring the non-rewritten URLs (maybe just because they're older, who knows). The only way to solve this to make it prefer the rewritten urls would be the meta robots noindex possible feature discussed above.

So should i upgrade to 3.20?

That's up to you. If you have a lot of template customizations, you may want to be sure you have some time available.
Search thread for
Download thread as
  • 0/5
  • 1
  • 2
  • 3
  • 4
  • 5



This thread is closed, so you cannot post a reply.