url重写与 google yahoo 蜘蛛(1)

url rewrite and google spider

我们知道做网站.要对搜索引擎采取一定的措施..至少也得url重写一下.可是呢.问题来了
我们发现google yahoo蜘蛛没有办法捉到网页了 baidu可以.

全部是500错误或是302错误..被弄得晕晕.

只好把错误给记下来了.

void Application_Error(object sender, EventArgs e)

{

Exception error = Server.GetLastError();

string path = Server.MapPath(DateTime.Now.ToShortDateString()+"error.log");

System.IO.StreamWriter sw = new System.IO.StreamWriter(path, true);

sw.WriteLine("DateTime:"+DateTime.Now+"\t"+error.ToString());

for (int i = 0; i < Request.Params.Count; i++)

{

sw.WriteLine( Request.Params.Keys[i] + "：" + Request.Params[i].ToString()+"");

}

sw.WriteLine("End\r\n");

sw.Close();

}

一看原来是
System.Web.HttpUnhandledException: 引发类型为“System.Web.HttpUnhandledException”的异常。 ---> System.Web.HttpException: 无法使用前导 .. 在顶级目录上退出。这个错误.搜索一下立即就发现有人碰到这个问题了

原因

因为使用了URLRewirter的缘故，google的搜索引擎是不支持cookie，因此asp.net会自动把session标识插入在返回的url中，这样搜索爬虫使用..退回到上级目录的时候就会出错了。

解决方案

1.设置cookieless = UseCookies，不管客户端是否支持cookie都使用cookie。

2.因为默认cookieless = UseDeviceProfile，所以可以为搜索引擎建立一个设备文件.browser，弄虚作假一下。《Get GoogleBot to crash your .NET 2.0 site》就有给出了这样的做法了。

3.修改程序，将里面的相对路径（~/）改成绝对路径表示（可以使用Resolve方法）。

http://www.cnblogs.com/hjf1223/archive/2006/10/14/529227.html 有较为详细的描述