昨天我特地清空了www.blogguy.cn的日志,就是想详细分析一下blogguy.cn的访问日志和错误日志。
今天早上一早过来就下载blogguy.cn.error,这个是我的错误日志,18.5k,不小了。
赶紧打开看看。
先截取一段大家看看:
- [Thu Jul 08 13:25:19 2010] [error] [client 116.24.173.198] File does not exist: /var/www/blogguy.cn/showtb.asp
- [Thu Jul 08 13:29:29 2010] [error] [client 65.55.51.112] File does not exist: /var/www/blogguy.cn/robots.txt
- [Thu Jul 08 13:30:27 2010] [error] [client 115.194.112.90] File does not exist: /var/www/blogguy.cn/mofei_login.asp
- [Thu Jul 08 13:30:27 2010] [error] [client 115.194.112.90] File does not exist: /var/www/blogguy.cn/mofei_login.asp
- [Thu Jul 08 13:34:17 2010] [error] [client 207.46.199.179] File does not exist: /var/www/blogguy.cn/robots.txt
- [Thu Jul 08 13:38:44 2010] [error] [client 60.240.249.211] File does not exist: /var/www/blogguy.cn/robots.txt
- [Thu Jul 08 13:43:51 2010] [error] [client 67.218.116.169] File does not exist: /var/www/blogguy.cn/robots.txt
- [Thu Jul 08 13:44:01 2010] [error] [client 216.129.119.40] File does not exist: /var/www/blogguy.cn/robots.txt
- [Thu Jul 08 13:44:32 2010] [error] [client 216.129.119.49] File does not exist: /var/www/blogguy.cn/robots.txt
- [Thu Jul 08 13:45:11 2010] [error] [client 67.218.116.170] File does not exist: /var/www/blogguy.cn/robots.txt
- [Thu Jul 08 13:45:44 2010] [error] [client 216.129.119.44] File does not exist: /var/www/blogguy.cn/robots.txt
- [Thu Jul 08 13:46:08 2010] [error] [client 67.218.116.166] File does not exist: /var/www/blogguy.cn/robots.txt
- [Thu Jul 08 13:46:32 2010] [error] [client 67.218.116.162] File does not exist: /var/www/blogguy.cn/robots.txt
- [Thu Jul 08 13:47:28 2010] [error] [client 67.218.116.164] File does not exist: /var/www/blogguy.cn/robots.txt
- [Thu Jul 08 13:48:16 2010] [error] [client 216.129.119.43] File does not exist: /var/www/blogguy.cn/robots.txt
- [Thu Jul 08 13:50:40 2010] [error] [client 216.129.119.46] File does not exist: /var/www/blogguy.cn/robots.txt
- [Thu Jul 08 13:51:37 2010] [error] [client 67.218.116.165] File does not exist: /var/www/blogguy.cn/robots.txt
- [Thu Jul 08 13:51:52 2010] [error] [client 216.129.119.45] File does not exist: /var/www/blogguy.cn/robots.txt
- [Thu Jul 08 13:52:39 2010] [error] [client 67.218.116.171] File does not exist: /var/www/blogguy.cn/robots.txt
- [Thu Jul 08 13:53:03 2010] [error] [client 216.129.119.48] File does not exist: /var/www/blogguy.cn/robots.txt
- [Thu Jul 08 13:54:54 2010] [error] [client 67.218.116.168] File does not exist: /var/www/blogguy.cn/robots.txt
- [Thu Jul 08 13:55:44 2010] [error] [client 110.75.169.72] File does not exist: /var/www/blogguy.cn/cgi-sys
- [Thu Jul 08 13:55:51 2010] [error] [client 216.129.119.47] File does not exist: /var/www/blogguy.cn/robots.txt
- [Thu Jul 08 14:04:22 2010] [error] [client 110.75.169.213] File does not exist: /var/www/blogguy.cn/cgi-sys
- [Thu Jul 08 14:05:11 2010] [error] [client 202.160.178.91] File does not exist: /var/www/blogguy.cn/robots.txt
- [Thu Jul 08 14:08:26 2010] [error] [client 207.46.195.233] File does not exist: /var/www/blogguy.cn/robots.txt
- [Thu Jul 08 14:10:41 2010] [error] [client 67.218.116.163] File does not exist: /var/www/blogguy.cn/robots.txt
- [Thu Jul 08 14:16:12 2010] [error] [client 124.126.179.137] File does not exist: /var/www/blogguy.cn/showtb.asp
那些asp文件就不用管了,一般是扫描器或者机器自动发帖搞的。也就是所谓的游手好闲的黑客搞得。
看robots.txt,出线这么多错误,而且我还没有指定404错误的跳转,不知道对搜索引擎的爬虫算不算友好,这点是不肯定的。
但是有一点是可以肯定的:如此多的错误造成的日志文件增大以及处理日志文件的开销是可以避免的,于是上传了一个robots.txt,
内容也很简单:
User-agent: *
Disallow: /admin
因为www.blogguy.cn的东西基本都是公开的,就admin是blogguy.cn的管理目录,就不要爬虫爬了,即使他去爬也是爬不进去。
这样搞一下,明天再看看效果。
顺便在.htaccess中把404跳转做到网站首页上去。
ErrorDocument 404 http://www.blogguy.cn/
明天继续看错误。


#1
