SEO Services
Home >>
Webmaster Forums >>
VBulletin and Mod_RewriteVBulletin and Mod_Rewrite
Darren said: "I attempted to follow this [url="http://www.daniweb.com/techtalkforums/post78755.html"]VBulletin Mod_Rewrite Tutorial[/url] and the one here: [url="http://www.seo-guy.com/forum/archive/index.php/t-5112.html"]SEOGuy Version[/url] with little luck.
Both of these are missing an obvious point: what happens when a thread goes past one page. Or two? Or even 3 or even 600, which is common at websites like this one. In the event of pagination, more rules are needed.
I also need to ignore the various display modes. I think that removing the dynamic pages (anything with a '?') should solve that problem easily.
[b]The Problem with Dynamic Pages[/b]
The problem with the various display modes is that they'll show up as duplicate content. Here's the same page, referred to by different names:
showthread152.html <--- rewritten - the one we want!
showthread.php?p=152#post152 <---- last post in thread view
showthread.php?p=152#post152gotonewpost <---- new post mode
etc.
We only want version of these in Google. Here's how we solve it.
[code]
User-agent: Googlebot
disallow: /*?
[/code]
This means GoogleBot will not index any pages with a ? in them. So far so good.
[b]Steps to Solve:[/b]
1) Disallow dynamic spidering for GoogleBot
2) Rewrite the Forums and Threads only, removing all the chaff that makes up a message board.
3) Script the rewrite so that it takes into account pagination: thread21-page3.html should suffice.
I'll post my progress here for those who come later."
edwin said: "you can't use the navigation to the first or last menu, or that will cause dupe content too."
Darksat said: "also make sure threads such as
showthread.php?p=152#post152
end up as
showthread152.html#post152
not as a new thread."
edwin said: "all you really need is the main rewritten page indexed.
all the paginated pages and user-variable pages are crap."
Darren said: "Here's what I decided: just leave 2 types of pages:
forums.html - main forum pages - paginated page names just like the old forums, so PR will be preserved
thread123.html - numbered threads, rewritten as static html, first page only.
We already have the robots.txt to exclude GoogleBot, so no crawling of dynamic pages will happen.
Recall:
[code]
User-Agent: Googlebot
Disallow: /*?
Disallow: /cgi-bin
[/code]
When we see the site: command clean up a bit, we should start seeing just those pages.
This way we eliminate any chance of duplicate content, and have nice clean html pages that should rank high in Google. Since we moved the old site to it's own sub-domain, and linked from the footer, we should pass enough PR for the 21,000 pages of actual content to get indexed on that sub-domain. We'll continue to run AdSense ads on there, and tell the visitors to visit the main page.
Over time, the 21,000 pages of clean content on the archive should come in very handy, since it's all true unique content."
edwin said: "you need to show us the logs when google re-crawls!"