<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>我的备忘录 &#187; log</title>
	<atom:link href="http://www.nimab.org/tag/log/feed" rel="self" type="application/rss+xml" />
	<link>http://www.nimab.org</link>
	<description>穷困潦倒</description>
	<lastBuildDate>Mon, 29 Aug 2011 02:53:07 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>几行bash分析日志并报警强力蜘蛛</title>
		<link>http://www.nimab.org/2008/08/25/51.html</link>
		<comments>http://www.nimab.org/2008/08/25/51.html#comments</comments>
		<pubDate>Mon, 25 Aug 2008 10:45:57 +0000</pubDate>
		<dc:creator>xdanger</dc:creator>
				<category><![CDATA[bash]]></category>
		<category><![CDATA[shell]]></category>
		<category><![CDATA[log]]></category>

		<guid isPermaLink="false">http://www.nimab.org/?p=51</guid>
		<description><![CDATA[被SB蜘蛛抓烦了，今天早上一看，有2个IP一小时就抓了我80G的页面，还都是动态页面。 首先让 Apache 记的日志最小化，好处不用说了。 SetEnvIfNoCase Request_URI \.gif$ dontlog SetEnvIfNoCase Request_URI \.jpg$ dontlog SetEnvIfNoCase Request_URI \.png$ dontlog SetEnvIfNoCase Request_URI \.swf$ dontlog SetEnvIfNoCase Request_URI \.css$ dontlog SetEnvIfNoCase Request_URI \.js$ dontlog SetEnvIfNoCase Request_URI \.ico$ dontlog CustomLog "&#124;/usr/local/sbin/cronolog /var/log/httpd/%Y-%m/%d-%H.ip" "%{X-Forwarded-For}i" env=!dontlog 因为我的 Apache 是躲在 n 层代理的后面，所以只能记录 %{X-Forwarded-For}，里面包含真实 IP，但是需要下一步分析去取出。 cd /var/log/httpd f=`date -d '1 hours ago' +%Y-%m/%d-%H.ip` ip=`sed 's#^\([0-9\.]\{1,\}\)[0-9 \.,\s]\{1,\}#\1#' [...]]]></description>
			<content:encoded><![CDATA[<p>被SB蜘蛛抓烦了，今天早上一看，有2个IP一小时就抓了我80G的页面，还都是动态页面。</p>
<p>首先让 Apache 记的日志最小化，好处不用说了。</p>
<p><code><br />
    SetEnvIfNoCase Request_URI \.gif$ dontlog<br />
    SetEnvIfNoCase Request_URI \.jpg$ dontlog<br />
    SetEnvIfNoCase Request_URI \.png$ dontlog<br />
    SetEnvIfNoCase Request_URI \.swf$ dontlog<br />
    SetEnvIfNoCase Request_URI \.css$ dontlog<br />
    SetEnvIfNoCase Request_URI \.js$  dontlog<br />
    SetEnvIfNoCase Request_URI \.ico$ dontlog<br />
    CustomLog "|/usr/local/sbin/cronolog /var/log/httpd/%Y-%m/%d-%H.ip" "%{X-Forwarded-For}i" env=!dontlog<br />
</code></p>
<p>因为我的 Apache 是躲在 n 层代理的后面，所以只能记录 %{X-Forwarded-For}，里面包含真实 IP，但是需要下一步分析去取出。</p>
<p><code><br />
cd /var/log/httpd<br />
f=`date -d '1 hours ago' +%Y-%m/%d-%H.ip`<br />
ip=`sed 's#^\([0-9\.]\{1,\}\)[0-9 \.,\s]\{1,\}#\1#' $f | awk '{a[$1]++ } END{for(i in a){print a[i] " " i}}' | sort -rn | head`<br />
curl -u 机器人的饭否登录名:密码 -d status="$ip" http://api.fanfou.com/statuses/update.xml<br />
rm $f<br />
</code></p>
<p>测试下来一小时 10M 的 log，分析一下也就 3 秒左右，还有1秒是发送给饭否的报警的。如果记录的直接就是真实 IP，那可以去掉 sed 那段，分析应该还会快很多（log文件就小很多了）。</p>
<p>看到有夸张的，不是常规搜索引擎蜘蛛的话，就咔嚓掉。<br />
<code><br />
iptables -A INPUT -s xxx.xxx.xxx.xxx/29 -j DROP<br />
</code><br />
1          202.106.186.*        163蜘蛛<br />
2          202.108.36.*        163蜘蛛<br />
3          202.108.44.*        163蜘蛛<br />
4          202.108.45.*        163蜘蛛<br />
5          202.108.5.*        163蜘蛛<br />
6          202.108.9.*        163蜘蛛<br />
7          220.181.12.*        163蜘蛛<br />
8          220.181.13.*        163蜘蛛<br />
9          220.181.14.*        163蜘蛛<br />
10        220.181.15.*        163蜘蛛<br />
11        220.181.28.*        163蜘蛛<br />
12        220.181.31.*        163蜘蛛<br />
13        222.185.245.*        163蜘蛛<br />
14        202.165.100.*        3721蜘蛛<br />
15        220.181.19.*        百度蜘蛛<br />
16        159.226.50.*        百度蜘蛛<br />
17        202.108.11.*        百度蜘蛛<br />
18        202.108.22.*        百度蜘蛛<br />
19        202.108.23.*        百度蜘蛛<br />
20        202.108.249.*        百度蜘蛛<br />
21        202.108.250.*        百度蜘蛛<br />
22        61.135.145.*        百度蜘蛛<br />
23        61.135.146.*        百度蜘蛛<br />
24        64.124.85.*        become.com<br />
25        61.151.243.*        china蜘蛛<br />
26        202.165.96.*        gais.cs.ccu.edu.tw<br />
27        216.239.33.*        google蜘蛛<br />
28        216.239.35.*        google蜘蛛<br />
29        216.239.37.*        google蜘蛛<br />
30        216.239.39.*        google蜘蛛<br />
31        216.239.51.*        google蜘蛛<br />
32        216.239.53.*        google蜘蛛<br />
33        216.239.55.*        google蜘蛛<br />
34        216.239.57.*        google蜘蛛<br />
35        216.239.59.*        google蜘蛛<br />
36        64.233.161.*        google蜘蛛<br />
37        64.233.189.*        google蜘蛛<br />
38        66.102.11.*        google蜘蛛<br />
39        66.102.7.*        google蜘蛛<br />
40        66.102.9.*        google蜘蛛<br />
41        66.249.64.*        google蜘蛛<br />
42        66.249.65.*        google蜘蛛<br />
43        66.249.66.*        google蜘蛛<br />
44        66.249.71.*        google蜘蛛<br />
45        66.249.72.*        google蜘蛛<br />
46        72.14.207.*        google蜘蛛<br />
47        61.135.152.*        iask蜘蛛<br />
48        65.54.188.*        msn蜘蛛<br />
49        65.54.225.*        msn蜘蛛<br />
50        65.54.226.*        msn蜘蛛<br />
51        65.54.228.*        msn蜘蛛<br />
52        65.54.229.*        msn蜘蛛<br />
53        207.46.98.*        msn蜘蛛<br />
54        207.68.157.*        msn蜘蛛<br />
55        194.224.199.*        noxtrumbot<br />
56        220.181.8.*        Outfox<br />
57        221.239.209.*        Outfox<br />
58        217.212.224.*        psbot<br />
59        219.133.40.*        QQ蜘蛛<br />
60        202.96.170.*        QQ蜘蛛<br />
61        202.104.129.*        QQ蜘蛛<br />
62        61.135.157.*        QQ蜘蛛<br />
63        219.142.118.*        sina蜘蛛<br />
64        219.142.78.*        sina蜘蛛<br />
65        61.135.132.*        sohu蜘蛛<br />
66        220.181.26.*        sohu蜘蛛<br />
          220.181.19.*<br />
67        61.135.158.*        tom蜘蛛<br />
68        66.196.90.*        yahoo蜘蛛<br />
69        66.196.91.*        yahoo蜘蛛<br />
70        68.142.249.*        yahoo蜘蛛<br />
71        68.142.250.*        yahoo蜘蛛<br />
72        68.142.251.*        yahoo蜘蛛<br />
73        202.165.102.*        yahoo中国蜘蛛<br />
74        202.160.178.*        yahoo中国蜘蛛<br />
75        202.160.179.*        yahoo中国蜘蛛<br />
76        202.160.180.*        yahoo中国蜘蛛<br />
77        202.160.181.*        yahoo中国蜘蛛<br />
78        202.160.183.*        yahoo中国蜘蛛<br />
79        72.30.101.*        yahoo蜘蛛<br />
80        72.30.102.*        yahoo蜘蛛<br />
81        72.30.103.*        yahoo蜘蛛<br />
82        72.30.104.*        yahoo蜘蛛<br />
83        72.30.107.*        yahoo蜘蛛<br />
84        72.30.110.*        yahoo蜘蛛<br />
85        72.30.111.*        yahoo蜘蛛<br />
86        72.30.128.*        yahoo蜘蛛<br />
87        72.30.129.*        yahoo蜘蛛<br />
88        72.30.131.*        yahoo蜘蛛<br />
89        72.30.133.*        yahoo蜘蛛<br />
90        72.30.134.*        yahoo蜘蛛<br />
91        72.30.135.*        yahoo蜘蛛<br />
92        72.30.216.*        yahoo蜘蛛<br />
93        72.30.226.*        yahoo蜘蛛<br />
94        72.30.252.*        yahoo蜘蛛<br />
95        72.30.97.*        yahoo蜘蛛<br />
96        72.30.98.*        yahoo蜘蛛<br />
97        72.30.99.*        yahoo蜘蛛<br />
98        74.6.74.*        yahoo蜘蛛<br />
99        202.108.4.*        中搜蜘蛛<br />
100      202.108.4.*        中搜蜘蛛<br />
101      202.108.33.*      中搜蜘蛛<br />
102      202.96.51.*        中搜蜘蛛<br />
103      219.142.53.*        中搜蜘蛛 </p>
]]></content:encoded>
			<wfw:commentRss>http://www.nimab.org/2008/08/25/51.html/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

