第十四篇 Mongodb数据库

Mongodb

 SQL
    - 结构化查询语言
    - 关系数据库全都同SQL来操作
MongoDB
    - MongoDB是一个NoSQL的数据库
    - MongoDB是一款文档型数据库
    - 数据库指的就是一个存储数据的仓库
        数据库可以使我们完成对数据的持久化的操作
    - MongoDB数据库中存储的数据的基本单位就是文档,
        MongoDB中存储的就是文档,所谓文档其实就是一个“JSON”
    - MongoDB中的“JSON”我们称为BSON,比普通的JSON的功能要更加的强大
    - MongoDB数据库使用的是JavaScript进行操作的,在MongoDB含有一个对ES标准实现的引擎,
        在MongoDB中所有ES中的语法中都可以使用
    - 安装
                 - 配置环境变量
            C:Program FilesMongoDBServer3.2in
                 - 在c盘根目录
           - 创建一个文件夹 data
           - 在data中创建一个文件夹db
        
    - MongoDB的基本的指令
        - 启动服务器
            mongod --dbpath 路径 --port 端口号
            
        - 启动客户端
            mongo
    - 基本概念
        数据库(database)
        集合(collection)
        文档(document)
            - 在MongoDB中,数据库和集合都不需要手动创建,
                当我们创建文档时,如果文档所在的集合或数据库不存在会自动创建数据库和集合
        
        
    - MongoDB的CRUD的操作            
        - 基本操作
            use 数据库
                - 进入指定的数据库
            show dbs
                - 显示所有的数据库
            show collections
                - 显示数据库中所有的集合
            db
                - 显示当前所在的数据库
    
        - 向数据库中插入文档
            - db.collection.insert()
                - insert()可以向集合中插入一个或多个文档
                - 例子:向test数据库中的,stus集合中插入一个新的学生对象
                            {name:"孙悟空",age:18,gender:""}
                           db.stus.insert({name:"孙悟空",age:18,gender:""}
            - db.collection.insertOne()
                - 向集合中插入一个文档
            - db.collection.insertMany()
                - 向集合中插入多个文档
                
        - 查询数据库中的文档
            - db.collection.find()
                - 可以根据指定条件从集合中查询所有符合条件的文档
                - 返回的是一个数组
                 示例: db.stu.find({age:28})
            - db.collection.findOne()
                - 查询第一个符合条件的文档
                - 返回的是一个对象
            - db.collection.find().count()
                - 查询符合条件的文档的数量
                
        - 修改数据库中的文档
            - db.collection.update()
                - 可以修改、替换集合中的一个或多个文档
            - db.collection.updateOne()
                - 修改集合中的一个文档
            - db.collection.updateMany()
                - 修改集合中的多个文档
            - db.collection.replaceOne()
                - 替换集合中的一个文档
                
        - 删除集合中的文档
            - db.collection.remove()
                - 删除集合中的一个或多个文档(默认删除多个)
            - db.collection.deleteOne()
                - 删除集合中的一个文档
            - db.collection.deleteMany()
                - 删除集合中的多个文档
            - 清空一个集合
                db.collection.remove({})
            - 删除一个集合
                db.collection.drop()
            - 删除一个数据库
                db.dropDatabase()
                
示例:

1查询文档
/*
    查询
        db.collection.find()
        - find()用来查询集合中所有符合条件的文档
        - find()可以接收一个对象作为条件参数
            {} 表示查询集合中所有的文档
            {属性:值} 查询属性是指定值的文档
        - find()返回的是一个数组
            
        db.collection.findOne()
        - 用来查询集合中符合条件的第一个文档  
        - findOne()返回的是一个文档对象 
       
       db.collection.find({}).count() 
        - 查询所有结果的数量

       pretty() 方法以格式化的方式来显示所有文档。
               ---db.col.find().pretty()
*/
db.stus.find({_id:"hello"});
db.stus.find({age:16 , name:"白骨精"});
db.stus.find({age:28});
db.stus.findOne({age:28});

db.stus.find({}).count();

2插入文档                
/*
    向数据库插入文档
        db.<collection>.insert()
        - 向集合中插入一个或多个文档
        - 当我们向集合中插入文档时,如果没有给文档指定_id属性,则数据库会自动为文档添加_id
            该属性用来作为文档的唯一标识
        - _id我们可以自己指定,如果我们指定了数据库就不会在添加了,如果自己指定_id 也必须确保它的唯一性
        
    db.collection.insertOne()
        - 插入一个文档对象
    db.collection.insertMany() 
        - 插入多个文档对象
*/
db.stus.insert({name:"猪八戒",age:28,gender:""});

db.stus.insert([
    {name:"沙和尚",age:38,gender:""},
    {name:"白骨精",age:16,gender:""},
    {name:"蜘蛛精",age:14,gender:""}
]);

db.stus.insert({_id:"hello",name:"猪八戒",age:28,gender:""});

db.stus.find();
ObjectId()                


3 修改文档
/*
    修改
     db.collection.update(查询条件,新对象)
        - update()默认情况下会使用新对象来替换旧的对象
        - 如果需要修改指定的属性,而不是替换需要使用“修改操作符”来完成修改
            $set 可以用来修改文档中的指定属性
            $unset 可以用来删除文档的指定属性
        - update()默认只会修改一个
            
        db.collection.updateMany()
        - 同时修改多个符合条件的文档
   
        db.collection.updateOne()
        - 修改一个符合条件的文档    
        
        db.collection.replaceOne()
        - 替换一个文档
*/
db.stus.find({});

//替换
db.stus.update({name:"沙和尚"},{age:28});

db.stus.update(
    {"_id" : ObjectId("59c219689410bc1dbecc0709")},
    {$set:{
        gender:"",
        address:"流沙河"
    }}    
)

db.stus.update(
    {"_id" : ObjectId("59c219689410bc1dbecc0709")},
    {$unset:{
        address:1
    }}    
)

db.stus.updateMany(
    {"name" : "猪八戒"},
    {
        $set:{
            address:"猪老庄"
        }
    }    
);
    
db.stus.update(
    {"name" : "猪八戒"},
    
    {
        $set:{
        address:"呵呵呵"
        }
    }  ,
    {
        multi:true
    }    
)

db.stus.find()


4删除文档
/*
    db.collection.remove()
        - 删除一个或多个,可以第二个参数传递一个true,则只会删除一个
        - 如果传递一个空对象作为参数,则会删除所有的
    db.collection.deleteOne()
    db.collection.deleteMany()
    db.collection.drop() 删除集合
    db.dropDatabase() 删除数据库
    
        - 一般数据库中的数据都不会删除,所以删除的方法很少调用
            一般会在数据中添加一个字段,来表示数据是否被删除

*/ db.stu.deleteOne({name:"齐天大圣"})
db.stu.remove({name:"孙悟空"}, {justOne: false})    
db.stus.insert([
    
    {
        name:"zbj",
        isDel:0
        },
        {
        name:"shs",
        isDel:0
        },
    {
    name:"ts",
        isDel:0
    }

]);

db.stus.updateOne({name:"ts"},{$set:{isDel:1}});
    
db.stus.find({isDel:0})    
                
                
        
        
mongodb.txt
/*
  文档之间的关系
    一对一(one to one)
        - 夫妻 (一个丈夫 对应 一个妻子)
        - 在MongoDB,可以通过内嵌文档的形式来体现出一对一的关系
    
    一对多(one to many)/多对一(many to one)
        - 父母 - 孩子
          用户 - 订单
          文章 - 评论
          - 也可以通过内嵌文档来映射一对多的关系
          
    
    多对多(many to many)
       - 分类 - 商品
         老师 - 学生 
    
*/
db.wifeAndHusband.insert([
    {
        name:"黄蓉",
        husband:{
            name:"郭靖"
        }
    },{
        name:"潘金莲",
        husband:{
            name:"武大郎"
        }
    }

]);

db.wifeAndHusband.find();


//一对多 用户(users) 和 订单(orders)
db.users.insert([{
    username:"swk"
    },{
    username:"zbj"
}]);

db.order.insert({
    
    list:["牛肉","漫画"],
    user_id: ObjectId("59c47e35241d8d36a1d50de0")
    
});

db.users.find()
db.order.find()

//查找用户swk的订单
var user_id = db.users.findOne({username:"zbj"})._id;
db.order.find({user_id:user_id});

//多对多
db.teachers.insert([
    {name:"洪七公"},
    {name:"黄药师"},
    {name:"龟仙人"}
]);

db.stus.insert([
    {
        name:"郭靖",
        tech_ids:[
            ObjectId("59c4806d241d8d36a1d50de4"),
            ObjectId("59c4806d241d8d36a1d50de5")
        ]
    },{
        name:"孙悟空",
        tech_ids:[
            ObjectId("59c4806d241d8d36a1d50de4"),
            ObjectId("59c4806d241d8d36a1d50de5"),
            ObjectId("59c4806d241d8d36a1d50de6")
        ]
    }
])

db.teachers.find()

db.stus.find()
文档间关系

示例:

//1.进入my_test数据库
use my_test

//2.向数据库的user集合中插入一个文档  
db.users.insert({
    username:"sunwukong"
});

//3.查询user集合中的文档
db.users.find();

//4.向数据库的user集合中插入一个文档   
db.users.insert({
    username:"zhubajie"
});
   
//5.查询数据库user集合中的文档
db.users.find();

//6.统计数据库user集合中的文档数量
db.users.find().count();

//7.查询数据库user集合中username为sunwukong的文档
db.users.find({username:"sunwukong"});

//8.向数据库user集合中的username为sunwukong的文档,添加一个address属性,属性值为huaguoshan
db.users.update({username:"sunwukong"},{$set:{address:"huaguoshan"}});


//9.使用{username:"tangseng"} 替换 username 为 zhubajie的文档
db.users.replaceOne({username:"zhubajie"},{username:"tangseng"});    
    
//10.删除username为sunwukong的文档的address属性
db.users.update({username:"sunwukong"},{$unset:{address:1}});


//11.向username为sunwukong的文档中,添加一个hobby:{cities:["beijing","shanghai","shenzhen"] , movies:["sanguo","hero"]}
//MongoDB的文档的属性值也可以是一个文档,当一个文档的属性值是一个文档时,我们称这个文档叫做 内嵌文档
db.users.update({username:"sunwukong"},{$set:{hobby:{cities:["beijing","shanghai","shenzhen"] , movies:["sanguo","hero"]}}});
db.users.find();

//12.向username为tangseng的文档中,添加一个hobby:{movies:["A Chinese Odyssey","King of comedy"]}
db.users.update({username:"tangseng"},{$set:{hobby:{movies:["A Chinese Odyssey","King of comedy"]}}})

//13.查询喜欢电影hero的文档
//MongoDB支持直接通过内嵌文档的属性进行查询,如果要查询内嵌文档则可以通过.的形式来匹配
//如果要通过内嵌文档来对文档进行查询,此时属性名必须使用引号 
db.users.find({'hobby.movies':"hero"});

//14.向tangseng中添加一个新的电影Interstellar
//$push 用于向数组中添加一个新的元素
//$addToSet 向数组中添加一个新元素 , 如果数组中已经存在了该元素,则不会添加
db.users.update({username:"tangseng"},{$push:{"hobby.movies":"Interstellar"}});
db.users.update({username:"tangseng"},{$addToSet:{"hobby.movies":"Interstellar"}});
db.users.find();

//15.删除喜欢beijing的用户
db.users.remove({"hobby.cities":"beijing"});

//16.删除user集合
db.users.remove({});
db.users.drop();

show dbs;

//17.向numbers中插入20000条数据 7.2s
for(var i=1 ; i<=20000 ; i++){
    db.numbers.insert({num:i});
}

db.numbers.find()

db.numbers.remove({});


//0.4s
var arr = [];

for(var i=1 ; i<=20000 ; i++){
    arr.push({num:i});
}

db.numbers.insert(arr);
增删改查
//查询文档时,默认情况是按照_id的值进行排列(升序)
//sort()可以用来指定文档的排序的规则,sort()需要传递一个对象来指定排序规则 1表示升序 -1表示降序
//limit skip sort 可以以任意的顺序进行调用
db.emp.find({}).sort({sal:1,empno:-1});

//在查询时,可以在第二个参数的位置来设置查询结果的 投影(展示所要展示的字段))
db.emp.find({},{ename:1 , _id:0 , sal:1});
sort-投影
//18.查询numbers中num为500的文档
db.numbers.find({num:500})

//19.查询numbers中num大于5000的文档
db.numbers.find({num:{$gt:500}});
db.numbers.find({num:{$eq:500}});

//20.查询numbers中num小于30的文档
db.numbers.find({num:{$lt:30}});

//21.查询numbers中num大于40小于50的文档
db.numbers.find({num:{$gt:40 , $lt:50}});

//22.查询numbers中num大于19996的文档
db.numbers.find({num:{$gt:19996}});

//23.查看numbers集合中的前10条数据
db.numbers.find({num:{$lte:10}});

//limit()设置显示数据的上限
db.numbers.find().limit(10);
//在开发时,我们绝对不会执行不带条件的查询
db.numbers.find();

//24.查看numbers集合中的第11条到20条数据
/*
    分页 每页显示10条
        1-10     0
        11-20    10
        21-30    20
        。。。
        
        skip((页码-1) * 每页显示的条数).limit(每页显示的条数);
        
    skip()用于跳过指定数量的数据    
    
    MongoDB会自动调整skip和limit的位置
*/
db.numbers.find().skip(10).limit(10);

//25.查看numbers集合中的第21条到30条数据
db.numbers.find().skip(20).limit(10);

db.numbers.find().limit(10).skip(10);
条件查询
//26.将dept和emp集合导入到数据库中
db.dept.find()
db.emp.find()

//27.查询工资小于2000的员工
db.emp.find({sal:{$lt:2000}});

//28.查询工资在1000-2000之间的员工
db.emp.find({sal:{$lt:2000 , $gt:1000}});

//29.查询工资小于1000或大于2500的员工
db.emp.find({$or:[{sal:{$lt:1000}} , {sal:{$gt:2500}}]});

//30.查询财务部的所有员工
//(depno)
var depno = db.dept.findOne({dname:"财务部"}).deptno;
db.emp.find({depno:depno});

//31.查询销售部的所有员工
var depno = db.dept.findOne({dname:"销售部"}).deptno;
db.emp.find({depno:depno});

//32.查询所有mgr为7698的所有员工
db.emp.find({mgr:7698})

//33.为所有薪资低于1000的员工增加工资400元
db.emp.updateMany({sal:{$lte:1000}} , {$inc:{sal:400}});
db.emp.find()

pymongo模块

1.连接mongo

# -*- coding:utf-8 -*-
import pymongo  # 导入pymongo模块

def mongodb_init01():
    """连接mongodb"""
    mongo = pymongo.MongoClient(host='127.0.0.1', port=27017, tz_aware=True)


def mongodb_init02():
    """连接mongodb"""
    uri = "mongodb://{}:{}".format('127.0.0.1', 27017)
    mongo = pymongo.MongoClient(uri, tz_aware=True)

2.操作文档

emp1 = {"_id":1,"name":"武大郎","sex":"male","age":18,"hire_date":"20170301","post":"烧饼检察官","salary":7300.33}
emp2 = {"_id":2,"name":"武松","sex":"male","age":78,"hire_date":"20150302","post":"公务员","salary":71000000.31}
emp3 = {"_id":3,"name":"宋江","sex":"male","age":81,"hire_date":"20130305","post":"公务员","salary":78300}
emp4 = {"_id":4,"name":"林冲","sex":"male","age":73,"hire_date":"20140701","post":"公务员","salary":73500}
emp5 = {"_id":5,"name":"林冲","sex":"male","age":73,"hire_date":"20140701","post":"公务员","salary":73500}
emp6 = {"_id":6,"name":"柴进","sex":"male","age":28,"hire_date":"20121101","post":"公务员","salary":72100}
emp7 = {"_id":7,"name":"卢俊义","sex":"female","age":18,"hire_date":"20110211","post":"公务员","salary":79000}
emp8 = {"_id":8,"name":"高俅","sex":"male","age":18,"hire_date":"19000301","post":"公务员","salary":730000}
emp9 = {"_id":9,"name":"鲁智深","sex":"male","age":48,"hire_date":"20101111","post":"公务员","salary":710000}
emp10 = {"_id":10,"name":"史进","sex":"female","age":48,"hire_date":"20150311","post":"打手","salary":73000.13}
emp11 = {"_id":11,"name":"李逵","sex":"female","age":38,"hire_date":"20101101","post":"打手","salary":72000.35}
emp12 = {"_id":12,"name":"周通","sex":"female","age":18,"hire_date":"20110312","post":"打手","salary":71000.37}
emp13 = {"_id":13,"name":"石秀","sex":"female","age":18,"hire_date":"20160513","post":"打手","salary":73000.29}
emp14 = {"_id":14,"name":"李忠","sex":"female","age":28,"hire_date":"20170127","post":"打手","salary":74000.33}
emp15 = {"_id":15,"name":"吴用","sex":"male","age":28,"hire_date":"20160311","post":"文人","salary":710000.13}
emp16 = {"_id":16,"name":"萧让","sex":"male","age":18,"hire_date":"19970312","post":"文人","salary":720000}
emp17 = {"_id":17,"name":"安道全","sex":"female","age":18,"hire_date":"20130311","post":"文人","salary":719000}
emp18 = {"_id":18,"name":"公孙胜","sex":"male","age":18,"hire_date":"20150411","post":"文人","salary":718000}
emp19 = {"_id":19,"name":"朱贵","sex":"female","age":18,"hire_date":"20140512","post":"文人","salary":717000}

db.emp.insertMany([emp1, emp2, emp3, emp4, emp5, emp6, emp7, emp8, emp9, emp10, emp11, emp12, emp13, emp14, emp15, emp16, emp17, emp18, emp19])
emp.txt(测试数据)
# -*- coding:utf-8 -*-
import pymongo  # 导入pymongo模块
from bson import ObjectId
from pymongo import IndexModel
from pymongo.database import Database

mongo = pymongo.MongoClient(host='192.168.10.100', port=27017, tz_aware=True)


def handler_db():
    """操作数据库"""
    # 创建数据库
    # db = Database(name='abc', client=mongo)
    db = mongo.abc
    print(db)

    # 删除数据库
    print(mongo.drop_database('abc'))

    # 获取数据库
    for db_name in mongo.list_database_names():
        print(db_name)


def handler_collection():
    """操作集合"""
    # db = mongo.get_database('abc') # 获取数据库,如果没有会自动创建
    db = mongo.abc
    # 创建一个集合
    col = db.create_collection('col')  # 创建集合,如果存在就会报错
    print('创建的集合:', col)
    # 获取一个集合
    # col = db.get_collection('col') # 获取集合,如果没有会自动创建集合
    col = db.col  # 获取集合,如果没有会自动创建集合
    print('获取的集合:', col)
    # 获取所有集合
    collection_names = db.list_collection_names()  # 获取集合名称,返回list
    for name in collection_names:
        print('集合名称:', name)

    collections = db.list_collections()  # 返回所有集合对象
    for collection in collections:
        print('获取集合:', collection)
    # 删除集合
    print(db.drop_collection('col'))  # 删除集合,返回字典


def handler_index():
    """操作索引"""
    # 创建升序索引
    users = mongo.sxt.users
    r = users.create_index([('name', pymongo.ASCENDING)])
    print(r)
    # 创建降序索引
    r = users.create_index([('age', pymongo.DESCENDING)])
    print(r)
    # 创建混合索引
    r = users.create_index([('name', pymongo.ASCENDING), ('status', pymongo.DESCENDING)])
    print(r)
    # 创建唯一索引
    r = users.create_index([('user_id', pymongo.ASCENDING)], unique=True)
    print(r)

    # 删除索引
    users.drop_index('name_1')  # 如果索引不存在,就会报错
    users.drop_indexes()  # 删除所有索引

    # 创建多个索引
    user_id_index = IndexModel([('user_id', pymongo.ASCENDING)], unique=True)
    name_index = IndexModel([('name', pymongo.DESCENDING)])
    r = users.create_indexes([user_id_index, name_index])  # 返回索引名称的list
    print(r)


def handler_document():
    """操作文档"""
    shsxt = Database(name='sxt', client=mongo)  # 获取数据库
    col = shsxt.get_collection(name='col')  # 获取集合
    # 添加
    doc = {'name': 'xiaohuang', 'age': 18, 'sex': '男'}
    r = col.insert_one(doc)
    print(r.acknowledged, r.inserted_id)  # acknowledged是否成功,inserted_id主键id

    doc1 = {'name': '老王', 'age': 20, 'sex': '男'}
    doc2 = {'name': '老李', 'age': 15, 'sex': '女'}
    rs = col.insert_many([doc1, doc2], ordered=True)
    print(rs.acknowledged, rs.inserted_ids)  # acknowledged是否成功,inserted_id主键id
    # 删除
    d = col.delete_one({'name': '老王'})  # 删除一条
    print(d.acknowledged, d.deleted_count)  # acknowledged是否成功,deleted_count删除记录数

    d = col.delete_many({'name': '老李'})  # 删除多条
    print(d.acknowledged, d.deleted_count)  # acknowledged是否成功,deleted_count删除记录数

    # 修改
    upt = col.update_one(filter={'name': '老李'}, update={'$set': {'name': '小李', 'age': 19}}, upsert=True)
    print(upt.acknowledged, upt.matched_count, upt.modified_count, upt.raw_result, upt.upserted_id)

    upt = col.update_many(filter={'name': '老李'}, update={'$set': {'name': '小李', 'age': 19}}, upsert=True)
    print(upt.acknowledged, upt.matched_count, upt.modified_count, upt.raw_result, upt.upserted_id)

    # upt = col.update_one(filter={'_id': ObjectId("5b2d041fa3dbd1ad597db57e")}, update={'$set': {'name': '小李'}})
    # print(upt.acknowledged, upt.matched_count, upt.modified_count, upt.raw_result, upt.upserted_id)


if __name__ == '__main__':
    handler_document()
操作文档
# -*- coding:utf-8 -*-
from collections import OrderedDict

import pymongo

client = pymongo.MongoClient(host='192.168.10.100', port=27017, tz_aware=True)

db = client.get_database('shsxt')
emp = db.get_collection('emp')
user = db.get_collection('user')


def show(document):
    for d in document:
        print(d)


def test_and():
    """and操作"""
    '''
        db.user.find({"$and":[
            {"_id":{"$gte":3,"$lte":4}},
            {"age":{"$gte":4}}
        ]})
    '''
    r = user.find({"$and": [
        {"_id": {"$gte": 3, "$lte": 4}},
        {"age": {"$gte": 4}}
    ]})  # 返回Cursor
    print(r)
    show(r)


def test_or():
    """or操作"""
    '''
        db.user.find({"$or":[
            {"_id":{"$lte":1,"$gte":0}},
            {"_id":{"$gte":4}},
            {"name":"tianqi"}
        ]})
    '''
    r = user.find({"$or": [
        {"_id": {"$lte": 1, "$gte": 0}},
        {"_id": {"$gte": 4}},
        {"name": "tianqi"}
    ]})
    show(r)


def test_reg():
    """正则操作"""
    # 匹配规则:z开头、n或u结尾,不区分大小写
    # db.user.find({'name':/^z.*(n|u)$/i})
    r = user.find({"name": {"$regex": '^z.*(n|u)$'}})
    show(r)

    # db.users.find({name: /a/})
    r = user.find({'name': {'$regex': 'a'}})
    print()
    print('like %a%')
    show(r)
    # db.users.find({name: /^zh/}) // like 'pa%'
    r = user.find({'name': {'$regex': '^zh'}})
    print()
    print('like zh%')
    show(r)
    # db.users.find({name: /qi$/}) // like'%ro'
    r = user.find({'name': {'$regex': 'qi$'}})
    print()
    print('like %qi')
    show(r)


def test_array():
    """数组操作"""
    '''
        #查看所有人的第1个到第3个爱好,第一个{}表示查询条件为所有,第二个是显示条件
        db.user.find(
            {},
            {
                "_id":0,
                "name":0,
                "age":0,
                "addr":0,
                "hobbies":{"$slice":[0,2]},
            }
        )
    '''
    r = user.find({}, {
        "_id": 0,
        "name": 0,
        "age": 0,
        "addr": 0,
        "hobbies": {"$slice": [0, 2]},
    })
    show(r)


def test_page(page=1, page_size=2):
    """分页操作"""
    # 获取结果集
    r = user.find().skip((page - 1) * page_size).limit(page_size)
    show(r)
    # 获取总记录数
    # count = db.user.find().count()
    count = user.count()
    print('总记录数:', count)


def test_project():
    """投影操作"""
    '''
        db.user.find({
                "name":/^z.*(n|u)$/i
            },
            {
                "_id":0,
                "name":1,
                "age":1
            }
        )
    '''
    r = user.find({"name": {"$regex": "^z.*(n|u)$"}}, {
        "_id": 0,
        "name": 1,
        "age": 1
    })
    show(r)


def test_sort():
    """排序操作"""
    '''
        # 按姓名正序
        db.user.find().sort({"name":1})
        # 按年龄倒序 按id正序
        db.user.find().sort({"age":-1,'_id':1})
    '''
    r = user.find({}, {'age': 1}).sort('age', pymongo.ASCENDING)
    show(r)
    print()

    r = user.find({}, {'age': -1, '_id': 1}).sort([('age', pymongo.DESCENDING),
                                                   ('_id', pymongo.ASCENDING)])
    show(r)


def test_group():
    """分组查询"""
    # {"$match": {"字段": "条件"}}, 可以使用任何常用查询操作符$gt,$lt,$in等
    # select * from db1.emp where post='公务员';
    r = db.emp.aggregate([{"$match": {"post": "公务员"}}])

    # select * from db1.emp where id > 3 group by post;
    r = db.emp.aggregate([
        {"$match": {"_id": {"$gt": 3}}},
        {"$group": {"_id": "$post", 'avg_salary': {"$avg": "$salary"}}}
    ])

    # select * from db1.emp where id > 3 group by post having avg(salary) > 10000;
    r = db.emp.aggregate([
        {"$match": {"_id": {"$gt": 3}}},
        {"$group": {"_id": "$post", 'avg_salary': {"$avg": "$salary"}}},
        {"$match": {"avg_salary": {"$gt": 10000}}}
    ])

    # {"$group": {"_id": 分组字段, "新的字段名": 聚合操作符}}

    # 1、将分组字段传给$group函数的_id字段即可
    # {"$group": {"_id": "$sex"}}  # 按照性别分组
    # {"$group": {"_id": "$post"}}  # 按照职位分组
    # {"$group": {"_id": {"state": "$state", "city": "$city"}}}  # 按照多个字段分组,比如按照州市分组

    # 2、分组后聚合得结果,类似于sql中聚合函数的聚合操作符:$sum、$avg、$max、$min、$first、$last
    # 例1:select post,max(salary) from db1.emp group by post;
    r = db.emp.aggregate([{"$group": {"_id": "$post", "max_salary": {"$max": "$salary"}}}])

    # 例2:取每个部门最大薪资与最低薪资
    r = db.emp.aggregate(
        [{"$group": {"_id": "$post", "max_salary": {"$max": "$salary"}, "min_salary": {"$min": "$salary"}}}])

    # 例3:如果字段是排序后的,那么$first,$last会很有用,比用$max和$min效率高
    r = db.emp.aggregate([{"$group": {"_id": "$post", "first_id": {"$first": "$_id"}}}])

    # 例4:求每个部门的总工资
    r = db.emp.aggregate([{"$group": {"_id": "$post", "count": {"$sum": "$salary"}}},
                          {"$sort": {"count": 1}}])

    # 例5:求每个部门的人数
    r = db.emp.aggregate([{"$group": {"_id": "$post", "count": {"$sum": 1}}},
                          {"$sort": {"count": 1}}])

    # 3、数组操作符
    # {"$addToSet": expr}  # 不重复
    # {"$push": expr}  # 重复

    # 例:查询岗位名以及各岗位内的员工姓名:select post,group_concat(name) from db1.emp group by post;
    r = db.emp.aggregate([{"$group": {"_id": "$post", "names": {"$push": "$name"}}}])
    r = db.emp.aggregate([{"$group": {"_id": "$post", "names": {"$addToSet": "$name"}}}])
    show(r)


def test_project02():
    """投影操作"""
    # {"$project": {"要保留的字段名": 1, "要去掉的字段名": 0, "新增的字段名": "表达式"}}

    # select name,post,(age+1) as new_age from db1.emp;
    r = db.emp.aggregate([
        {
            "$project": {
                "name": 1,
                "post": 1,
                "new_age": {"$add": ["$age", 1]}
            }
        }
    ])
    show(r)


def test_sort_limit_skip():
    """排序限制跳过"""
    # 例1、取平均工资最高的前两个部门
    r = db.emp.aggregate([
        {
            "$group": {"_id": "$post", "平均工资": {"$avg": "$salary"}}
        },
        {
            "$sort": {"平均工资": -1}
        },
        {
            "$limit": 2
        }
    ])
    # 例2、取平均工资最高的第二个部门
    r = db.emp.aggregate([
        {
            "$group": {"_id": "$post", "平均工资": {"$avg": "$salary"}}
        },
        {
            "$sort": {"平均工资": -1}
        },
        {
            "$limit": 2
        },
        {
            "$skip": 1
        }
    ])
    show(r)


def test_sample():
    """随机获取"""
    # 随机获取3个文档
    r = db.emp.aggregate([
        {"$sample": {"size": 3}}
    ])
    show(r)


def test_str():
    """字符串操作"""
    # 截取字符串
    r = db.emp.aggregate([
        {
            "$project": {
                "_id": 0,
                "str": {"$substr": ["$sex", 0, 2]}
            }
        }
    ])

    # 拼接
    r = db.emp.aggregate([
        {
            "$project": {
                "name": 1,
                "post": 1,
                "name_sex": {"$concat": ["$name", "测试拼接", "$sex"]}
            }
        }
    ])

    # 将性别的英文转为大写
    r = db.emp.aggregate([{"$project": {"sex": {"$toUpper": "$sex"}}}])
    show(r)


if __name__ == '__main__':
    test_str()
mongodb_query.py

3 示例:scrapy+mongo(爬取bibi排行榜相关数据)

 bibi_ranke.py

import scrapy
from bibi.items import BibiItem

class BibiRankeSpider(scrapy.Spider):
    name = 'bibi_ranke'
    # allowed_domains = ['https://www.bilibili.com']
    start_urls = ['https://www.bilibili.com/v/popular/rank/all']

    def parse(self, response):

        titles =response.xpath('//div[@class="info"]//a[@class="title"]/text()').extract()
        play_nums =response.xpath('//div[@class="detail"]/span[@class="data-box"][1]/text()').extract()
        up_names =response.xpath("//div[@class='detail']/a//text()").extract()
        scores =response.xpath('//div[@class="info"]//div[@class="pts"]/div/text()').extract()

        play_nums=[i.strip() for i in play_nums]
        up_names=[i.strip() for i in up_names]
        # print(titles,play_nums,up_names,scores)
        # 打包
        for title,play_num,up_name,score in zip(titles,play_nums,up_names,scores):
            item=BibiItem()
            item['play_num']=play_num
            item['title']=title
            item['up_name']=up_name
            item['score']=score
            yield item

item.py

# Define here the models for your scraped items
#
# See documentation in:
# https://docs.scrapy.org/en/latest/topics/items.html

import scrapy


class BibiItem(scrapy.Item):
    _id =scrapy.Field()
    # define the fields for your item here like:
    title = scrapy.Field()
    play_num = scrapy.Field()
    up_name = scrapy.Field()
    score = scrapy.Field()

pipelines.py

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html


# useful for handling different item types with a single interface
from itemadapter import ItemAdapter
import pymongo


class BibiPipeline:
    # 爬虫开始时连接数据库
    def open_spider(self,spider):
        self.client =pymongo.MongoClient()

    def process_item(self, item, spider):
        # 存入mongodb数据库中
        self.client.bilibli.rank.insert_one(item)
        return item

    # 爬虫结束关闭数据库
    def close_spider(self,spider):
        self.client.close()
# Scrapy settings for bibi project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
#     https://docs.scrapy.org/en/latest/topics/settings.html
#     https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
#     https://docs.scrapy.org/en/latest/topics/spider-middleware.html

BOT_NAME = 'bibi'

SPIDER_MODULES = ['bibi.spiders']
NEWSPIDER_MODULE = 'bibi.spiders'

from fake_useragent import UserAgent
# Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = 'bibi (+http://www.yourdomain.com)'

USER_AGENT = UserAgent().random # 随机生成UA伪装

# Obey robots.txt rules
ROBOTSTXT_OBEY = False

# Configure maximum concurrent requests performed by Scrapy (default: 16)
#CONCURRENT_REQUESTS = 32

# Configure a delay for requests for the same website (default: 0)
# See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
DOWNLOAD_DELAY = 1
# The download delay setting will honor only one of:
#CONCURRENT_REQUESTS_PER_DOMAIN = 16
#CONCURRENT_REQUESTS_PER_IP = 16

# Disable cookies (enabled by default)
#COOKIES_ENABLED = False

# Disable Telnet Console (enabled by default)
#TELNETCONSOLE_ENABLED = False

# Override the default request headers:
#DEFAULT_REQUEST_HEADERS = {
#   'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
#   'Accept-Language': 'en',
#}

# Enable or disable spider middlewares
# See https://docs.scrapy.org/en/latest/topics/spider-middleware.html
#SPIDER_MIDDLEWARES = {
#    'bibi.middlewares.BibiSpiderMiddleware': 543,
#}

# Enable or disable downloader middlewares
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
#DOWNLOADER_MIDDLEWARES = {
#    'bibi.middlewares.BibiDownloaderMiddleware': 543,
#}

# Enable or disable extensions
# See https://docs.scrapy.org/en/latest/topics/extensions.html
#EXTENSIONS = {
#    'scrapy.extensions.telnet.TelnetConsole': None,
#}

# Configure item pipelines
# See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
   'bibi.pipelines.BibiPipeline': 300,
}

# Enable and configure the AutoThrottle extension (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/autothrottle.html
#AUTOTHROTTLE_ENABLED = True
# The initial download delay
#AUTOTHROTTLE_START_DELAY = 5
# The maximum download delay to be set in case of high latencies
#AUTOTHROTTLE_MAX_DELAY = 60
# The average number of requests Scrapy should be sending in parallel to
# each remote server
#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# Enable showing throttling stats for every response received:
#AUTOTHROTTLE_DEBUG = False

# Enable and configure HTTP caching (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
#HTTPCACHE_ENABLED = True
#HTTPCACHE_EXPIRATION_SECS = 0
#HTTPCACHE_DIR = 'httpcache'
#HTTPCACHE_IGNORE_HTTP_CODES = []
#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'
settings.py
作者:华王 博客:https://www.cnblogs.com/huahuawang/
原文地址:https://www.cnblogs.com/huahuawang/p/14814176.html