使用Chrome无头浏览器获取puzzle team club解谜游戏的谜面

零、用什么工具爬取网站

  之前的两个游戏谜面,都是眼看,手动输入的,这给解谜带来了一些不方便。尤其是那种special daily battle之类的,谜面都很大,一个个写很费时。有没有什么方法能快速拿到谜面,并且把谜面直接输出到文件里?答案是爬虫,网页抓取。

  只是puzzle team club的网页防爬虫措施做得太好,网页里没有关于谜面的信息,抓来的数据包分析不出(我会说是包的数量太多了吗),只能用无头浏览器。

  开始使用phantomJS,获取网页代码部分Python代码如下:

def getChessByPhantomJS():
    driver = webdriver.PhantomJS()
    driver.get('https://www.puzzle-dominosa.com/?size=8')
    source = driver.page_source
    driver.quit()
#
View Code

  但是运行结果不如意,最终只给了一个没有谜面的基本模板网页。

  用Chrome效果有如何呢?(不晓得如何配置chrome无头浏览器的可以右转baidu)

def getChessByChrome():
    path = r'D:chromedriver.exe'
    chrome_options = Options()
    #后面的两个是固定写法 必须这么写
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--disable-gpu')
    driver = webdriver.Chrome(executable_path=path,chrome_options=chrome_options)
    try:
        driver.get('https://www.puzzle-dominosa.com/?size=8')
    except Exception as e:
        print(e)
    source = driver.page_source
    driver.quit()
    return source
View Code

  运行结果(不如说是运行过程,因为这个B一直不退出)

DevTools listening on ws://127.0.0.1:62344/devtools/browser/8c9f8f4a-407a-4045-b
41c-b9f898d4d37b
[1203/174652.884:INFO:CONSOLE(1)] "Uncaught TypeError: window.googletag.pubads i
s not a function", source: https://www.puzzle-dominosa.com/build/js/public/new/d
ominosa-95ac3646ef.js (1)
View Code

  可以给程序加个超时退出:

def getChessByChrome():
    path = r'D:chromedriver.exe'
    chrome_options = Options()
    #后面的两个是固定写法 必须这么写
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--disable-gpu')
    driver = webdriver.Chrome(executable_path=path,chrome_options=chrome_options)
    try:
        driver.set_page_load_timeout(30)
        driver.get('https://www.puzzle-dominosa.com/?size=8')
    except Exception as e:
        print(e)
    source = driver.page_source
    driver.quit()
    return source
View Code

  这样就能把网页代码交给分析函数,输出谜面了。

一、如何拿到dominosa谜面

  不过就做到这里还没完,我们要的是谜面。为此,我们需要分析代码:

 图1.dominosa游戏的谜面代码

  看到了吧?这里的谜面直接反映在代码的class名上,cell3对应谜面的3,而且同级元素超过谜面单位长度时,谜面会换行。

  代码可以这样写:

def solve():
    source = getChessByChrome()
    htree = etree.HTML(source)
    chessSize = len(htree.xpath('//div[@id="game"]/div/div/div/..'))
    puzzleId = htree.xpath('//div[@class="puzzleInfo"]/p/span/text()')
    if len(puzzleId) != 0:
        puzzleId = puzzleId[0]
    else:
        puzzleId = htree.xpath('//div[@class="puzzleInfo"]/p/text()')[0]
    x = (round((4 * chessSize + 1)**0.5) - 1) // 2
    print(x)
    print(x+1)
    chess = ''
    for i,className in enumerate(htree.xpath('//div[@id="game"]/div/div/div/..')):
        value = className.xpath('./@class')[0].split(' ')[1][4:]
        if i % (x+1) == x:
            chess += value + '
'
        else:
            chess += value + ' '
    with open('dominosaChess' + puzzleId + '.txt','w') as f:f.write(chess[:-1])
View Code

  这样就可以拿到使用Dancing link X (舞蹈链)求解dominosa游戏这里面要求的谜面文件了。

  附带一提,这里为了查询谜面方便,输出的文件名字带有谜面ID;如果这是特别谜题,则输出的文件名字带有特别谜题的标题。

  附带一些运行结果与谜面对比图(文件名dominosaChess7,092,762.txt):

4 5 2 2 7 3 3 0 6
2 7 5 6 2 6 4 1 5
4 4 5 6 0 2 6 0 2
7 3 3 5 0 0 3 4 4
0 1 3 3 4 1 3 2 1
5 7 0 5 3 2 1 1 6
1 6 6 7 5 2 6 7 1
7 4 0 0 4 5 1 7 7

  对应谜面截图:

 图2.ID为7,092,762的谜面

二、如何拿到star battle谜面

  拿到符合使用深度优先搜索DFS求解star battle游戏这里面要求的谜面文件要费点功夫。

  咱们查看下图吧:

 图3.star battle谜面代码

  这里的谜面代码class名字都有一定意义,比如bl表示左侧有分割线,br表示右侧有分割线。

  这里只给我们提供了分割线,我们需要的是标示每个方格所属是哪个块的那种排布。要做到这种,我们需要使用BFS,宽度优先搜索。

def solve():
    if url.find('size=') == -1:
        limit = 1
    else:
        size = url.split('size=')[1]
        size = int(size)
        if size >= 1 and size <= 4:
            limit = 1
        elif size <= 6:
            limit = 2
        elif size <= 8:
            limit = 3
        else:
            limit = size - 5
    source = getChessByFile()
    htree = etree.HTML(source)
    chessSize = len(htree.xpath('//div[@id="game"]/div/div'))
    puzzleId = htree.xpath('//div[@class="puzzleInfo"]/p/span/text()')
    if len(puzzleId) != 0:
        puzzleId = puzzleId[0]
    else:
        puzzleId = htree.xpath('//div[@class="puzzleInfo"]/p/text()')[0]
    chessSize = round(chessSize**0.5)
    chess = [[-1 for _ in range(chessSize)] for __ in range(chessSize)]
    borderss = [['' for _ in range(chessSize)] for __ in range(chessSize)]
    chessStr = ''
    maxBlockNumber = 0
    # br: on the right; bl: on the left; bb: on the down; bt: on the up
    for i,className in enumerate(htree.xpath('//div[@id="game"]/div/div[contains(@class,"cell")]')):
        x = i // chessSize
        y = i % chessSize
        value = className.xpath('./@class')[0]
        if value[:4] != 'cell':
            continue
        value = value.replace('cell selectable','')
        value = value.replace('cell-off','')
        borderss[x][y] = value
    for i in range(chessSize):
        for j in range(chessSize):
            if chess[i][j] != -1:
                continue
            queue = [(i, j)]
            chess[i][j] = str(maxBlockNumber)
            while len(queue) > 0:
                oldQueue = deepcopy(queue)
                queue = []
                for pos in oldQueue:
                    x, y = pos[0], pos[1]
                    #
                    if x > 0 and borderss[x][y].find('bt') == -1 and chess[x-1][y] == -1:
                        queue.append((x-1, y))
                        chess[x-1][y] = chess[i][j]
                    #
                    if x < chessSize - 1 and borderss[x][y].find('bb') == -1 and chess[x+1][y] == -1:
                        queue.append((x+1, y))
                        chess[x+1][y] = chess[i][j]
                    #
                    if y > 0 and borderss[x][y].find('bl') == -1 and chess[x][y-1] == -1:
                        queue.append((x, y-1))
                        chess[x][y-1] = chess[i][j]
                    #
                    if y < chessSize - 1 and borderss[x][y].find('br') == -1 and chess[x][y+1] == -1:
                        queue.append((x, y+1))
                        chess[x][y+1] = chess[i][j]
                    #
            maxBlockNumber += 1
    chessStr = '
'.join(' '.join(chessRow) for chessRow in chess)
    with open('starBattleChess' + puzzleId + '.txt','w') as f:f.write(str(limit)+'
'+chessStr)
View Code

  附带一些运行结果与谜面对比图(文件名starBattleChess3,876,706.txt):

1
0 0 1 1 2
0 0 3 1 2
0 0 3 4 4
0 3 3 4 4
0 3 3 4 4

  对应谜面截图:

 图4.ID为3,876,706的谜面

原文地址:https://www.cnblogs.com/dgutfly/p/11978537.html