IPv4和IPv6地址库

常用地址库

研究了下IP地址库,目前比较常用的库有下面几个:

  • 纯真数据库 :完全免费,精度不高,可以通过(www.cz88.net/soft/setup.zip)下载安装包;
  • IPIP数据库:国内做的最好的IP地址库,免费版的差强人意;
  • GeoIP:免费版国内的城市精度一般,收费版比较精确,数据比较有特色,还同时提供了经纬度信息;
  • Ip2Location:试了一下,挺好用的,不过地址都是汉语拼音或英文,想用汉字的,数据得自己处理一下;

关于IPv6

关于为什么要使用IPv6,可以参看协议森林04 地址耗尽危机 (IPv4与IPv6地址)

IPV6的长度为128位,是IPV4地址长度的4倍。所以IPV4的点分十进制格式不再适用,采用十六进制表示,IPV6有三种表示方法:

  1. 冒分十六进制表示法:格式为X:X:X:X:X:X:X:X,每个X表示地址中的16b,以十六进制表示,例如:ABCD:EF01:2345:6789:ABCD:EF01:2345:6789,这种表示法中的前导0是可以省略的,例如:

    2001:0DB8:0000:0023:0008:0800:200C:417A→ 2001:DB8:0:23:8:800:200C:417A

  2. 0位压缩表示法:在某些情况下,IPv6地址中可能包含很长一段0,就可以把0压缩为“::”,但为了保证解析地址的唯一性,“::”只能出现一次,例如:

    FF01:0:0:0:0:0:0:1101 → FF01::1101

    0:0:0:0:0:0:0:1 → ::1

  3. 内嵌IPv4表示法:为了实现IPv4-IPv6互通,IPv4的地址会嵌入到IPv6地址中,此时地址通常表示为:X:X:X:X:X:X:d.d.d.d,前96b采用冒分十六进制表示,后32b则采用IPv4的点分十进制表示,如:192.168.0.1与::FFFF:192.168.0.1,在前96b中,0位压缩法依旧适用。

扩展阅读:

关于IPv6的地址库,本人研究了国外Ip2Location中的免费数据,精度一般,最后发现了ZX公司的IPDB,数据收录基本能满足日常学习研究使用。

数据分析

纯真数据库(IPv4)

纯真数据库的安装包中提供了解压工具,可以将qqwry.dat的数据格式转换为txt格式,转换后的数据格式如下:

0.0.0.0         0.255.255.255   IANA 保留地址
1.0.0.0         1.0.0.0         美国 亚太互联网络信息中心(CloudFlare节点)
1.0.0.1         1.0.0.1         美国 APNIC&CloudFlare公共DNS服务器
1.0.0.2         1.0.0.255       美国 亚太互联网络信息中心(CloudFlare节点)
1.0.1.0         1.0.3.255       福建省 电信
1.0.4.0         1.0.7.255       澳大利亚 墨尔本Goldenit有限公司

第一列是起始IP、第二列是截止IP、第三列是地区、第四列是运营商信息。

IPDB(IPv6)

ZX公司的IPDB相对麻烦一些,没有提供相关的解压工具,需要自己分析数据格式,找到了Github上Rhilip大神的项目,并做了更改:

#!/usr/bin/python3
# -*- coding: utf-8 -*-
# Copyright (c) 2017-2020 Rhilip <rhilipruan@gmail.com>
import re
import os

dir = os.path.dirname(__file__)

v4db_path = os.path.join(dir, 'db/qqwry.dat')
v6db_path = os.path.join(dir, 'db/ipv6wry.db')

v6ptn = re.compile(r'^[0-9a-f:.]{3,51}$')
v4ptn = re.compile(r'.*((25[0-5]|2[0-4]d|[0-1]?dd?).){3}(25[0-5]|2[0-4]d|[0-1]?dd?)$')


def parseIpv4(ip):
    sep = ip.rfind(':')
    if sep >= 0:
        ip = ip[sep + 1:]
    if v4ptn.match(ip) is None:
        return -1
    v4 = 0
    for sub in ip.split('.'):
        v4 = v4 * 0x100 + int(sub)
    return v4


def parseIpv6(ip):
    if v6ptn.match(ip) is None:
        return -1
    count = ip.count(':')
    if count >= 8 or count < 2:
        return -1
    ip = ip.replace('::', '::::::::'[0:8 - count + 1], 1)
    if ip.count(':') < 6:
        return -1
    v6 = 0
    for sub in ip.split(':')[0:4]:
        if len(sub) > 4:
            return -1
        if len(sub) == 0:
            v6 = v6 * 0x10000
        else:
            v6 = v6 * 0x10000 + int(sub, 16)
    return v6


def parseIp(ip):
    ip = ip.strip()
    ip = ip.replace('*', '0')
    v4 = parseIpv4(ip)
    v6 = parseIpv6(ip)
    v2002 = v6 >> (3 * 16)
    if v2002 == 0x2002:
        v4 = (v6 >> 16) & 0xffffffff
    v2001 = v6 >> (2 * 16)
    if v2001 == 0x20010000:
        v4 = ~int(''.join(ip.split(':')[-2:]), 16)
        v4 = int(bin(((1 << 32) - 1) & v4)[2:], 2)
    return v4, v6


class IpDb(object):
    except_raw = 0x19
    osLen = ipLen = dLen = dbAddr = size = None

    def __init__(self, db_path):
        with open(db_path, 'rb') as f:
            db = f.read()
            self.db = db

        if db[0:4] != 'IPDB'.encode():
            self.type = 4
            self._init_v4db()
        else:
            self.type = 6
            self._init_v6db()

    def _init_v4db(self):
        self.osLen = 3
        self.ipLen = 4
        self.dLen = self.osLen + self.ipLen
        self.dbAddr = int.from_bytes(self.db[0:4], byteorder='little')
        endAddr = int.from_bytes(self.db[4:8], byteorder='little')
        self.size = (endAddr - self.dbAddr) // self.dLen

    def _init_v6db(self):
        self.osLen = self.db[6]  # 3
        self.ipLen = self.db[7]  # 8
        self.dLen = self.osLen + self.ipLen
        self.dbAddr = int.from_bytes(self.db[0x10: 0x18], byteorder='little')  # 50434
        self.size = int.from_bytes(self.db[8:0x10], byteorder='little')  # 140045
        

    def getSize(self):
        return self.size

    def getData(self, index):
        self.checkIndex(index)
        addr = self.dbAddr + index * self.dLen
        ip = int.from_bytes(self.db[addr: addr + self.ipLen], byteorder='little')
        return ip

    def checkIndex(self, index):
        if index < 0 or index >= self.getSize():
            raise Exception

    def getLoc(self, index):
        self.checkIndex(index)
        addr = self.dbAddr + index * self.dLen
        # ip = int.from_bytes(self.db[addr: addr + self.ipLen],
        # byteorder='little')
        lAddr = int.from_bytes(self.db[addr + self.ipLen: addr + self.dLen], byteorder='little')
        # print('ip_addr: %d ip: %d lAddr:%d' % (addr, ip, lAddr))
        if self.type == 4:
            lAddr += 4
        loc = self.readLoc(lAddr, True)
        if self.type == 4:
            loc = loc.decode('cp936')
            loc = loc.replace('CZ88.NET', '')
        if self.type == 6:
            loc = loc.decode('utf-8')
        return loc

    def readRawText(self, start):
        bs = []
        if self.type == 4 and start == self.except_raw:
            return bs
        while self.db[start] != 0:
            bs += [self.db[start]]
            start += 1
        return bytes(bs)

    def readLoc(self, start, isTwoPart=False):
        jType = self.db[start]
        if jType == 1 or jType == 2:
            start += 1
            offAddr = int.from_bytes(self.db[start:start + self.osLen], byteorder='little')
            if offAddr == 0:
                return 'Unknown address'
            loc = self.readLoc(offAddr, True if jType == 1 else False)
            nAddr = start + self.osLen
        else:
            loc = self.readRawText(start)
            nAddr = start + len(loc) + 1
        if isTwoPart and jType != 1:
            partTwo = self.readLoc(nAddr)
            if loc and partTwo:
                loc += b' ' + partTwo
        return loc

    def searchIp(self, val):
        index = self.binarySearch(val)
        if index < 0:
            return "Unknown address"
        if index > self.getSize() - 2:
            index = self.getSize() - 2
        return self.getLoc(index)

    def binarySearch(self, key, lo=0, hi=None):
        if not hi:
            hi = self.getSize() - 1
        while lo <= hi:
            if hi - lo <= 1:
                if self.getData(lo) > key:
                    return -1
                elif self.getData(hi) <= key:
                    return hi
                else:
                    return lo
            mid = (lo + hi) // 2
            data = self.getData(mid)
            if data is not None and data > key:
                hi = mid - 1
            elif data is not None and data < key:
                lo = mid
            else:
                return mid
        return -1


class IpQuery(object):
    def __init__(self):
        self.v6db = IpDb(v6db_path)
        self.v4db = IpDb(v4db_path)

    def searchIp(self, ip):
        ret = ''
        err = None
        try:
            v4, v6 = parseIp(ip)
            # print('v4: %d v6: %d' % (v4, v6))
            if v6 >= 0:
                print(v6)
                ret += self.v6db.searchIp(v6)
            if v4 >= 0:
                if ret != '':
                    ret += ' > '
                ret += self.v4db.searchIp(v4)
        except Exception as e:
            err = "Internal server error"
        return {
            "ip": ip,
            "loc": ret if ret else None,
            "stats": err or ("Can't Format IP address." if ret == '' else "Success")
        }


if __name__ == '__main__':
    # ipquery = IpQuery()
    # ip = '2001:250:230::'
    # ip = '42.156.139.1'
    # ip = '182.117.109.0'
    # ip = '114.242.248.*'
    # ip = None
    # result = ipquery.searchIp(ip)
    v6db = IpDb(v6db_path)
    i = 0
    fs = open('ipv6.csv','w',encoding="utf-8")
    while(i < v6db.size - 1):
        fs.write(str(v6db.getData(i)) + "," + str(v6db.getData(i + 1) - 1) +
        "," + v6db.getLoc(i) + "
")
        i+=1
    fs.close()

导出后的数据格式如下:

0,28428538856079359,IANA保留地址
28428538856079360,28428538856079360,IANA特殊地址 包含v4地址的v6地址
28428538856079361,28428538856144895,IANA保留地址
28428538856144896,28428538856210431,IANA特殊地址 包含v4地址的v6地址
28428538856210432,72057594037927935,IANA保留地址
72057594037927936,72057594037927936,IANA特殊地址 仅用于丢弃的地址
72057594037927937,2306124484190404607,IANA保留地址

第一位和第二位是将IPv6的前4位计算得到的值,第三位是地址。

查询代码

为了提升加载速度和代码的一致性,这里考虑将IPv4的地址库处理为和IPv6地址库一致的格式,处理代码如下:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import ipaddress

fw = open('ipv4.txt','w',encoding='utf-8')
for line in open('ip.txt','r'):    
    larr = line.replace('CZ88.NET','').strip('
').split(' ')
    larr = [sval for sval in filter(lambda s:s != '',larr)]
    start = int(ipaddress.IPv4Address(larr[0]))
    end = int(ipaddress.IPv4Address(larr[1]))
    address = larr[2]
    if len(larr) > 3:
        for i in range(len(larr) - 3):
            address+=larr[3 + i]
    print(start,end,address)
    fw.writelines(str(start) + ',' + str(end) + ',' + address+"
")
print('over')

查找方式使用二分法:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Net;
using System.Numerics;
using System.Text;

namespace Trail.Common
{
    /// <summary>
    /// IP地址库工具。
    /// </summary>
    public class IPLocationTool
    {
        private const string IPv4Path = "ipv4.txt";
        private const string IPv6Path = "ipv6.txt";
        private const string UnKnowIP = "未知地址";
        private static IPv4LocInfo[] _IPv4Infos = null;
        private static IPv6LocInfo[] _IPv6Infos = null;

        /// <summary>
        /// 加载地址库数据。
        /// </summary>
        public static void Load()
        {
            //IPv4
            using (var sr = new StreamReader(IPv4Path, Encoding.UTF8))
            {
                string line;
                var ipv4LocInfos = new List<IPv4LocInfo>();
                while (!string.IsNullOrEmpty(line = sr.ReadLine()))
                {
                    var lineArr = line.Split(new string[] { "," }, StringSplitOptions.RemoveEmptyEntries);
                    IPv4LocInfo ipv4LocInfo = new IPv4LocInfo()
                    {
                        Start = Convert.ToUInt32(lineArr[0]),
                        End = Convert.ToUInt32(lineArr[1]),
                        Address = lineArr[2]
                    };
                    ipv4LocInfos.Add(ipv4LocInfo);
                }
                _IPv4Infos = ipv4LocInfos.ToArray();
            }

            using (var sr = new StreamReader(IPv6Path, Encoding.UTF8))
            {
                string line;
                var ipv6LocInfos = new List<IPv6LocInfo>();
                while (!string.IsNullOrEmpty(line = sr.ReadLine()))
                {
                    var lineArr = line.Split(new string[] { "," }, StringSplitOptions.RemoveEmptyEntries);
                    IPv6LocInfo ipv6LocInfo = new IPv6LocInfo()
                    {
                        Start = BigInteger.Parse(lineArr[0]),
                        End = BigInteger.Parse(lineArr[1]),
                        Address = lineArr[2]
                    };
                    ipv6LocInfos.Add(ipv6LocInfo);
                }
                _IPv6Infos = ipv6LocInfos.ToArray();
            }
        }

        /// <summary>
        /// 二分查找IP地址。
        /// </summary>
        /// <param name="ip">IP。</param>
        /// <returns>地址。</returns>
        public static string BinSearch(string ip)
        {
            //IPv6
            if (ip.Contains(":"))
            {
                var ipNum = IPv6ToIndex(ip);
                int high = _IPv6Infos.Length;
                for (int low = 0; low <= high;)
                {
                    var point_index = (high + low) / 2;
                    if (ipNum < _IPv6Infos[point_index].Start)
                    {
                        high = point_index - 1;
                        continue;
                    }
                    else if (ipNum > _IPv6Infos[point_index].End)
                    {
                        low = point_index + 1;
                        continue;
                    }
                    return _IPv6Infos[point_index].Address;
                }
            }
            //IPv4
            else
            {
                //转数字
                var ipNum = IPv4ToNumber(ip);
                int high = _IPv4Infos.Length;
                for (int low = 0; low <= high;)
                {
                    var point_index = (high + low) / 2;
                    if (ipNum < _IPv4Infos[point_index].Start)
                    {
                        high = point_index - 1;
                        continue;
                    }
                    else if (ipNum > _IPv4Infos[point_index].End)
                    {
                        low = point_index + 1;
                        continue;
                    }
                    return _IPv4Infos[point_index].Address;
                }
            }
            return UnKnowIP;
        }

        /// <summary>
        /// IPv4转换为数值。
        /// </summary>
        /// <param name="ip">IPv4的地址。</param>
        /// <returns>数值。</returns>
        public static long IPv4ToNumber(string ip)
        {
            var ipArr = ip.Split(new char[] { '.' });
            return long.Parse(ipArr[0]) * 16777216 + long.Parse(ipArr[1]) * 65536 + long.Parse(ipArr[2]) * 256 + long.Parse(ipArr[3]);
        }

        /// <summary>
        /// IPV6转换为数值。
        /// </summary>
        /// <param name="ip">IPV6的地址。</param>
        /// <returns>数值。</returns>
        private static BigInteger IPv6ToNumber(string ip)
        {
            IPAddress address;
            BigInteger ipnum;
            if (IPAddress.TryParse(ip, out address))
            {
                byte[] addrBytes = address.GetAddressBytes();

                if (BitConverter.IsLittleEndian)
                {
                    List<byte> byteList = new List<byte>(addrBytes);
                    byteList.Reverse();
                    addrBytes = byteList.ToArray();
                }

                if (addrBytes.Length > 8)
                {
                    //IPv6
                    ipnum = BitConverter.ToUInt64(addrBytes, 8);
                    ipnum <<= 64;
                    ipnum += BitConverter.ToUInt64(addrBytes, 0);
                }
                else
                {
                    //IPv4
                    ipnum = BitConverter.ToUInt32(addrBytes, 0);
                }
                return ipnum;
            }
            return 0;
        }

        /// <summary>
        /// IPV6转为索引值(IPv6是按头四位索引分配地址)。
        /// </summary>
        /// <param name="ip">IPV6的地址。</param>
        /// <returns>数值。</returns>
        private static BigInteger IPv6ToIndex(string ip)
        {
            //补齐::
            int count = ip.ToCharArray().Count(p => p.Equals(':'));
            ip = ip.Replace("::", ":::::::".Substring(0, 8 - count + 1));
            if (ip.ToCharArray().Count(p => p.Equals(':')) < 6)
                return -1;
            BigInteger v6 = 0;
            var ipArr = ip.Split(new string[] { ":" }, StringSplitOptions.None);

            for (int i = 0; i < 4; i++)
            {
                if (string.IsNullOrEmpty(ipArr[i]))
                    v6 = v6 * 0x10000;
                else
                {
                    v6 = v6 * 0x10000 + Int64.Parse(ipArr[i], System.Globalization.NumberStyles.HexNumber);
                }
            }
            return v6;
        }
    }

    /// <summary>
    /// IPv4地址信息。
    /// </summary>
    public class IPv4LocInfo
    {
        /// <summary>
        /// 范围起始。
        /// </summary>
        public uint Start { get; set; }

        /// <summary>
        /// 范围结束。
        /// </summary>
        public uint End { get; set; }

        /// <summary>
        /// 归属地。
        /// </summary>
        public string Address { get; set; }
    }

    /// <summary>
    /// IPv4地址信息。
    /// </summary>
    public class IPv6LocInfo
    {
        /// <summary>
        /// 范围起始。
        /// </summary>
        public BigInteger Start { get; set; }

        /// <summary>
        /// 范围结束。
        /// </summary>
        public BigInteger End { get; set; }

        /// <summary>
        /// 归属地。
        /// </summary>
        public string Address { get; set; }
    }
}

测试代码如下:

    /// <summary>
    /// 测试
    /// </summary>
    /// <param name="args">参数</param>
    static void Main(string[] args)
    {
        try
        {
            IPLocationTool.Load();
            var beginTime = DateTime.Now;
            //IPv4测试
            Console.WriteLine(IPLocationTool.BinSearch("61.152.197.155"));  //上海市网友
            Console.WriteLine(IPLocationTool.BinSearch("211.143.205.140")); //福建省漳州市
            Console.WriteLine(IPLocationTool.BinSearch("218.57.116.146")); //山东省青岛市
            Console.WriteLine(IPLocationTool.BinSearch("121.35.180.254")); //广东省深圳市
            Console.WriteLine(IPLocationTool.BinSearch("112.13.166.125")); //浙江省丽水市
            Console.WriteLine(IPLocationTool.BinSearch("61.181.236.137")); //天津市宝坻区
            Console.WriteLine(IPLocationTool.BinSearch("1.65.212.143")); //香港
            //IPv6测试
            Console.WriteLine(IPLocationTool.BinSearch("2409:8a00::"));  //中国北京市东城区
            Console.WriteLine(IPLocationTool.BinSearch("2408:8410:47ff:ffff:1155:658:1254:632"));  //中国天津市红桥区
            Console.WriteLine(IPLocationTool.BinSearch("2409:8a0c:1200::"));  //中国山西省太原市娄烦县
            Console.WriteLine(IPLocationTool.BinSearch("2409:8a15:9400::"));  //中国辽宁省辽阳市灯塔市
            Console.WriteLine("运行完毕,耗时{0}ms", (DateTime.Now - beginTime).TotalMilliseconds);
        }
        catch (Exception ex)
        {
            Console.WriteLine(ex.Message + "
" + ex.StackTrace);
        }

        Console.WriteLine("Over");
        Console.Read();
    }
原文地址:https://www.cnblogs.com/krockey/p/10983437.html