Python内置数据结构之字符串

1、字符串定义和特点

  定义:

    一个个字符组成的有序的序列,是字符的集合。如: 'sadasd121da'

    使用单引号、双引号、三引号引住的字符序列。

  特点:

    1、和数字一样都是字面常量。

    2、不可变的,有序,连续空间,可迭代

    3、Python3起,字符串就是Unicode类型

2、字符串的定义,初始化:

  举例:

s = 'strint'
s = "string"
s = '''string'''
s = 'hello 
world'
s = r'hello 
world' # 此时的
 就是普通字符
s = 'C:win
t' 
    ---->      
C:win
t  

s = R"C:win
t" 
    ---->
C:win
t

s = 'C:win\nt'
    ---->
C:win
t

s = """ select * from user """

3、字符串元素的访问--下标

  —字符串支持使用索引访问

    sql = "select * from user where name = 'tom'"

    sql[4]  ------>  'c'

    sql[4] = 'o'  因为字符串不可变,所以此操作抛异常   

  —有序的字符集合,字符序列,所以可以用for循环迭代。

    for c in sql:

      print(c) 

      print(type(c))   #  <class 'str'>

  —可迭代

    lst = list(sql) #  ['s', 'd', 'a']

4、字符串join链接****** 

  'string'.join(iterable)  ------>  返回 str

 1 sql = 'welcome to  beijing'
 2 s = sql.join(['1','1','1']) # s = sql.join(('#','1','1'))
 3 print(s)
 4 
 5 ------->
 6 '1welcome to  beijing1welcome to  beijing1'
 7 
 8 
 9 s = ''.join(sql)
10 print(s)
11 
12 ------->
13 'welcome to  beijing'
14 
15 s = '-'.join(sql)
16 print(s)
17 
18 ------->
19 'w-e-l-c-o-m-e- -t-o- - -b-e-i-j-i-n-g'
20 
21 
22 lst = ['1','2','3']
23 print('""'.join(lst)) # 1""2""3
24 print(' '.join(lst)) # 1 2 3
25 lst2 = ['1',[1,2],'3'] # [1,2]是列表,不是字符串, '[1,2]'
26 # print(' '.join(lst2))
27 lst3 = ['1','[1,2]','3']
28 print(' '.join(lst3)) # 1 [1,2] 3

5、字符串 + 链接

  + -----> 返回新的str(类似列表,元组)

6、比较 +  和 join 连接效率

 1 # 结论:join的效率都低于 +
 2 import datetime
 3 
 4 start = datetime.datetime.now()
 5 for i in range(1000000):
 6     s = ''.join(['pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab','pythontab'])
 7 delat = (datetime.datetime.now()- start).total_seconds()
 8 print('--',delat)
 9 
10 start = datetime.datetime.now()
11 for i in range(1000000):
12     s = 'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'+'pythontab'
13 delat = (datetime.datetime.now()- start).total_seconds()
14 print(delat)
15 
16 website= '%s%s%s' % ('python', 'tab', '.com')


-- 0.159004
0.0624
 1 # 链接次数少也是一样的结果!
 2 
 3 import datetime
 4 
 5 start = datetime.datetime.now()
 6 for i in range(1000000):
 7     s = ''.join(['pythontab','pythontab','pythontab'])
 8 delat = (datetime.datetime.now()- start).total_seconds()
 9 print('--',delat)
10 
11 start = datetime.datetime.now()
12 for i in range(1000000):
13     s = 'pythontab'+'pythontab'+'pythontab'
14 delat = (datetime.datetime.now()- start).total_seconds()
15 print(delat)
16 
17 website= '%s%s%s' % ('python', 'tab', '.com')
18 
19 
20 
21 
22 -- 0.228405
23 0.078

7、字符串分隔

  分隔字符串的方法分为2类:

    split系:

      将字符串按照分隔符分割成若干个字符串,并返回列表。

    partition 系

      将字符串按照分隔符分割成两段,返回这2段和分隔符的元组

  -——split

    帮助文档:str.split(sep=None, maxsplit=-1)  ------> 返回一个字符串列表

      从左往右依次分割

      sep 指定分隔符,缺省情况下空白字符串作为分隔符

      maxsplit 指定分割次数,-1 表示遍历整个字符串

 1                                       split 
 2 
 3 s1 = "Tom is a super student."
 4 #s1.split() # ['Tom', 'is', 'a', 'super', 'student.'] 默认分隔符为 空白字符串
 5 s2 = "Tom      is a super student."
 6 s2.split() # ['Tom', 'is', 'a', 'super', 'student.']
 7 s2.split('s') # ['Tom      i', ' a ', 'uper ', 'tudent.']
 8 s2.split('super') # ['Tom      is a ', ' student.']
 9 s2.split('super ')# ['Tom      is a ', 'student.']
10 s2.split(' ') # ['Tom', '', '', '', '', '', 'is', 'a', 'super', 'student.']
11 s3 = "Tom is 	a super student."
12 s3.split(' ')# ['Tom', 'is', 'a', 'super', 'student.']s3
13 s3.split(' ',maxsplit=2) # ['Tom', 'is', 'a super student.']
14 s3.split('	',maxsplit=2) # ['Tom is ', 'a super student.']
15 
16 s4 = "Tom is 		a   super student."
17 s4.split() # ['Tom', 'is', 'a', 'super', 'student.'] 注意空白字符和空格的区别
18 s4.split('	',maxsplit=-1) # ['Tom is ', '', 'a   super student.']
19 s4.split('	',maxsplit=1) # ['Tom is ', '	a   super student.']
 1                       rsplit
 2 
 3 s1 = "Tom  is a 			super 	student."
 4  
 5 s1.rsplit() # ['Tom', 'is', 'a', 'super', 'student.']
 6 s1.rsplit('s') # ['Tom i', ' a ', 'uper ', 'tudent.']
 7 s1.rsplit('super') # ['Tom is a ', ' student.']
 8 s1.rsplit(' ') # ['Tom', 'is', 'a', 'super', 'student.']
 9 s1.rsplit(' ',maxsplit=2) # ['Tom is a', 'super', 'student.']
10 s1.rsplit('	',maxsplit=2) # ['Tom is a 		', 'super ', 'student.']

  ——splitlines([keepends]) ----> 字符串列表

      按照行来切分字符串

      keepends 指定的是保留分隔符

      行分隔符包括 , ,  等(默认的),可以自定义的行尾分隔符,当然使用此方法就不能默认了,必须指明    

1                       splitlines
2 
3 s1 = "I'm a student.
You're a teacher"
4 s1.splitlines() # ["I'm a student.", "You're a teacher"]
5 s1.splitlines(True) # ["I'm a student.
", "You're a teacher"]
6 
7 s1 = "I'm a student.



You're a teacher" 
8 s1.splitlines() # ["I'm a student.", '', '', '', "You're a teacher"]
9 s1.splitlines(True) # ["I'm a student.
", '
', '
', '
', "You're a teacher"]

 

  ——partition(sep) ---->  元组 (head,sep,tail)

      从左往右,遇到分隔符就把字符串分割成两部分,返回头,分隔符,尾三部分的三元组,如果没有找到分隔符,就返回头,2个空元素的三元组。

      sep 分割字符串,必须指定

 1                     partition  rpartition
 2 
 3 s1 = "I'm a student.s
You're a teacher"
 4 s1.partition('s') # ("I'm a ", 's', "tudent.s
You're a teacher")
 5 s1.rpartition('s')# ("I'm a student.", 's', "
You're a teacher")
 6         # rpartition 从右往左 
 7 s1.partition('stu') # ("I'm a ", 'stu', "dent.s
You're a teacher")
 8 s1.partition(' ') #("I'm", ' ', "a student.s
You're a teacher")
 9 s1.partition('_') # ("I'm a student.s
You're a teacher", '', '')
10 
11 s = '[1,2],[3],[4,5]'
12 s.partition('3') # ('[1,2],[', '3', '],[4,5]')

 8、字符串大小写

  upper():全大写 ,返回新字符串

  lower():全小写,返回新字符串

  swapcase():交换大小写,返回新字符串

 1 In [18]: s = 'dsadasd'
 2 
 3 In [19]: s.upper()
 4 Out[19]: 'DSADASD'
 5 
 6 In [20]: s
 7 Out[20]: 'dsadasd'
 8 
 9 In [21]: s = 'dsaD'
10 
11 In [22]: s.lower()
12 Out[22]: 'dsad'
13 
14 In [23]: s.swapcase()
15 Out[23]: 'DSAd'
测试

9、字符串排版:

  title() ----> 返回str,标题的每个单词都大写

  capitalize() -----> 返回 str ,首个单词大写

  center(width [, fillchar] ) ----> 返回str  width 打印宽度, fillchar 填充的字符

  zfill(width) -----> str width 是宽度

  ljust(width [,fillchar])  --->  str   左对齐

  rjust

10、字符串修改*

  —replace(old,new[,count]) -----> str 

    字符串中找到匹配替换为新子串,返回新字符串

    count 表示替换几次,不指定就是全部替换

 1 IIn [1]: a ='www magedu com'
 2 
 3 In [2]: a.replace('ww','p')
 4 Out[2]: 'pw magedu com'
 5 
 6 In [3]: a.replace('www','p')
 7 Out[3]: 'p magedu com'
 8 
 9 In [4]: a.replace('w','p')
10 Out[4]: 'ppp magedu com'
11 
12 In [5]: a.replace('w','p',2)
13 Out[5]: 'ppw magedu com'
14 
15 In [6]: a.replace('w','python',2)
16 Out[6]: 'pythonpythonw magedu com'
17 
18 In [7]: a.replace('www','python',2)
19 Out[7]: 'python magedu com'
20 
21 In [8]: a.replace('www','python',5)
22 Out[8]: 'python magedu com'
test

  —strip([char]) -----> str

    从字符串两端 去除指定得到字符集chars中的所有字符

    如果插头上没有志宁,去除两端的空白字符。

In [10]: b = '  

very very very 	'

In [11]: b
Out[11]: '  

very very very 	'

In [12]: b.strip()# 默认空白字符
Out[12]: 'very very very'

In [13]: b.strip(' 	')
Out[13]: '

very very very'

In [14]: b.strip(' 	
')
Out[14]: '

very very very'

In [15]: b.strip(' 	
')
Out[15]: '

very very very'

In [16]: b.strip(' 	
y')
Out[16]: '

very very ver'

In [17]: b.strip(' 	
very')
Out[17]: '

'

In [18]: b.strip('	
very')
Out[18]: '  

very very very '



In [20]: c = 'abc abc ab bc'

In [21]: c.strip('abc')
Out[21]: ' abc ab '
test-strip
1 In [22]: a
2 Out[22]: 'www magedu com'
3 
4 In [23]: a.lstrip('w')
5 Out[23]: ' magedu com'
6 
7 In [24]: a.lstrip('m')
8 Out[24]: 'www magedu com'
lstrip-test

11、字符串查找*

  —find(sub [,start [, end]]) -----> int

    在指定的区间,从左到右,查找子串sub,找到返回索引,找不到返回 -1

  — rfind (sub [,start [, end]]) -----> int

    在指定能够的区间,从右往左,查找子串sub,找到返回正索引,找不到返回-1

 1 In [30]: a
 2 Out[30]: 'www magedu com'
 3 
 4 In [31]: s
 5 Out[31]: 'I am very very very sorry '
 6 
 7 In [32]: a.find('ww')
 8 Out[32]: 0
 9 
10 In [33]: a.find('w')
11 Out[33]: 0
12 
13 In [34]: a.find('m')
14 Out[34]: 4
15 
16 In [35]: a.find('m',5,-1)
17 Out[35]: -1
18 
19 In [36]: a.find('m',5)
20 Out[36]: 13
21 
22 In [38]: a.rfind('w')
23 Out[38]: 2
24 
25 In [39]: a.rfind('ww')
26 Out[39]: 1
27 
28 In [40]: a.find('ss')
29 Out[40]: -1
30 
31 In [41]: a.find('m',4,10000) # end可以无限写,但是没有意义
32 Out[41]: 4
find - test
 1 In [54]: s
 2 Out[54]: 'I am very very very sorry'
 3 
 4 In [55]: s.find('very',1,-1)
 5 Out[55]: 5
 6 
 7 In [56]: s.find('very',-5,-1)
 8 Out[56]: -1
 9 
10 In [57]: s.find('very',-10,-1)
11 Out[57]: 15
12 
13 In [58]: s.find('very',-1,-10)
14 Out[58]: -1
15 
16 In [59]: s.rfind('very',-1,-10)
17 Out[59]: -1
18 
19 总结:不论 rfind 还是 find,start都应该在end 的左侧,比如:5就在-1 的左侧,-1不在 -10 的左侧。rfind只是在区间内,找最右侧先匹配到的项。
关于start和end的大小关系

  —index(sub [,start[,end]]) ----> int  ,但是找不到会抛异常,得写异常处理,所以一般很少用,选用find 

  **** 时间复杂度:

      find 和 index 都是翻找,随着数据量增大,效率下降,都是O(n)

  —len(string)跟列表,元组一样,会提供

  —count(str [,start [,end]]) ----> int   时间复杂度是O(n)

      在指定区间,从左往右,统计子串sub出现的次数

 1 In [61]: s
 2 Out[61]: 'I am very very very sorry'
 3 
 4 In [62]: s.count('very')
 5 Out[62]: 3
 6 
 7 In [63]: s.count('very',10,-1)
 8 Out[63]: 2
 9 
10 In [64]: s.count('very',-10,-1)
11 Out[64]: 1
12 
13 In [65]: s.count('very',-1,-10)
14 Out[65]: 0
count - test

12、字符串判断*

   —startswith (suffix [,start [,end]]) -----> bool    ------> 可以用find 替换的

      a.find('www', 0, len('www')) == 0  ------> True or False  ------>好处是不抛异常

   —endwith ([prefix [ ,start [, end]])----> bool

 1 In [68]: s
 2 Out[68]: 'I am very very very sorry'
 3 
 4 In [69]: s.startswith('s')
 5 Out[69]: False
 6 
 7 In [70]: s.startswith('w')
 8 Out[70]: False
 9 
10 In [71]: s.startswith('I')
11 Out[71]: True
test

  — is 系列

    isalnum() ----> bool 

    isalpha()

    isdecimal();是否只包含十进制数字

    isdigit(): 是否全部是数字

    isidentifiter() :是不是字母和下划线开头,其他都是字母,数字,下划线

    islower() # 遍历,还不如都转大写或小写

    isupper()

    isspace()

13、字符串格式化:

    —字符串的格式化是一种能够凭借字符串输出样式的手段,更灵活。

      join 拼接,被拼接对象是可迭代对象

      + 拼接,但是需要将非字符串转化为字符串

    — printf - style formatting 来源于c语言的ptinf函数

        格式要求:

          占位符: 使用 % 和格式字符组成,例如 %s %d等。

          fornat % valuse ,格式字符串和被格式的值之间使用%分隔

          values 只能是一个对象,或是一个和格式字符串占位符数目相等的元组,或一个字典。

1 https://www.cnblogs.com/post/readauth?url=/JerryZao/p/9417465.html
 1 In [90]: '%.2f'% 1.23333
 2 Out[90]: '1.23'
 3 
 4 In [82]: 'i an %d' % 9
 5 Out[82]: 'i an 9'
 6 
 7 In [83]: 'i an %20d' % 9
 8 Out[83]: 'i an                    9'
 9 
10 In [84]: 'i an %-20d' % 9
11 Out[84]: 'i an 9                   '
12 
13 In [85]: '%3.2f%%, 0x%x, 0X%02X' %(89.32234324,10,12)
14 Out[85]: '89.32%, 0xa, 0X0C'
test 
1 IIn [91]: '%#x' % 10
2 Out[91]: '0xa'
3 
4 In [92]: '%#X' % 10
5 Out[92]: '0XA'
6 
7 In [93]: 'ox%X' % 10
8 Out[93]: 'oxA'
test

 

   

  — 使用format 的格式化******

       format 函数格式字符串语法----Python鼓励使用

         '{} {xxx}'.format(*args, **kwargs) ------> str

          args 是位置参数 ,是一个元组

          kwargs 是关键字参数,是一个字典

          花括号表示占位符

          {} 表示按照顺序匹配位置参数,{n} 表示取位置参数索引为n 的值

          {xxx} 表示在关键字参数中搜索名称一致的

          {{}},表示打印花括号

对齐:

In [97]: '{0}*{1}={2:<2}'.format(3,2,2*3)
Out[97]: '3*2=6 '

In [98]: '{0}*{1}={2:<02}'.format(3,2,2*3)
Out[98]: '3*2=60'

In [99]: '{0}*{1}={2:>2}'.format(3,2,2*3)
Out[99]: '3*2= 6'

In [100]: '{0}*{1}={2:>02}'.format(3,2,2*3)
Out[100]: '3*2=06'

In [101]: '{0}*{1}={2:0>2}'.format(3,2,2*3)
Out[101]: '3*2=06'

In [102]: '{0}*{1}={2:0<2}'.format(3,2,2*3)
Out[102]: '3*2=60'


In [132]: '{:*^30}'.format('*')
Out[132]: '******************************'
In [124]: '{:<#3}'.format(6)
Out[124]: '6  '

In [125]: '{:#<3}'.format(6)
Out[125]: '6##'

In [126]: '{:^3}'.format(6)
Out[126]: ' 6 '

进制:

In [104]: 'int:{0:d}'.format(42)
Out[104]: 'int:42'

In [105]: 'int:{0:d},hex{0:x},oct{0:o}'.format(42)
Out[105]: 'int:42,hex2a,oct52'

In [106]: 'int:{0:d},hex{0:#x},oct{0:#o}'.format(42)
Out[106]: 'int:42,hex0x2a,oct0o52'

In [107]: 'int:{0:d},hex{0:#x},oct{0:#o},bin{0:b}'.format(42)
Out[107]: 'int:42,hex0x2a,oct0o52,bin101010'

In [108]: 'int:{0:d},hex{0:#x},oct{0:#o},bin{0:#b}'.format(42)
Out[108]: 'int:42,hex0x2a,oct0o52,bin0b101010'

In [134]: '{:02X}'.format(123)
Out[134]: '7B'


浮点数:

In [111]: print('{:g}'.format(3**0.5)) #通用精度
1.73205

In [112]: print('{:f}'.format(3**0.5))
1.732051

In [113]: print('{:}'.format(3**0.5))
1.7320508075688772

In [114]: print('{:10f}'.format(3**0.5)) # 右对齐,10个位置
  1.732051

In [115]: print('{:2}'.format(102.231)) # 加小数位的宽度
102.231

In [116]: print('{:.2}'.format(102.231))
1e+02

In [117]: print('{:.2}'.format(3**0.5)) # 小数点后两位
1.7

In [118]: print('{:.2f}'.format(3**0.5))
1.73

In [119]: print('{:3.2f}'.format(3**0.5))
1.73

In [120]: print('{:3.1f}'.format(3**0.5))
1.7

In [121]: print('{:3.1f}'.format(0.2745))
0.3

In [122]: print('{:3.3f}'.format(0.2745))
0.275

In [123]: print('{:3.3%}'.format(0.2745))
27.450%

字符串format格式化test
test

  其他使用:https://www.cnblogs.com/post/readauth?url=/JerryZao/p/9417465.html

In [22]: aOut[22]: 'www magedu com'
In [23]: a.lstrip('w')Out[23]: ' magedu com'
In [24]: a.lstrip('m')Out[24]: 'www magedu com'

为什么要坚持,想一想当初!
原文地址:https://www.cnblogs.com/JerryZao/p/9445746.html