Pandas操作总结

3.1Pandas

3.1.1 Series

class pandas.Series(data = None, index = None, dtype = None, name = None, copy = False, fastpath = False)

data 表示传入的数据
index 表示索引
dtype 数据类型，默认会自己判断
name 设置名称
copy 拷贝数据，默认为 False

//通过传入列表创建series对象
import pandas as pd
a = ["Google", "Runoob", "Wiki"]
myvar = pd.Series(a, index = ["x", "y", "z"])
print(myvar)

//通过字典(键值对)创建Series
import pandas as pd
sites = {1: "Google", 2: "Runoob", 3: "Wiki"}
myvar = pd.Series(sites)
print(myvar)

注：以下均为jupyter中代码实例

In [1]:

import pandas as pd                       # 导入pandas库
ser_obj = pd.Series([1, 2, 3, 4, 5])      # 创建Series类对象
ser_obj

Out[1]:

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [2]:

# 创建Series类对象，并指定索引
ser_obj = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])
ser_obj

Out[2]:

a    1
b    2
c    3
d    4
e    5
dtype: int64

In [3]:

year_data = {2001: 17.8, 2002: 20.1, 2003: 16.5}
ser_obj2 = pd.Series(year_data)   # 创建Series类对象
ser_obj2

Out[3]:

2001    17.8
2002    20.1
2003    16.5
dtype: float64

In [4]:

ser_obj.index         # 获取ser_obj的索引

Out笔记本电脑怎么连接wifi[4]:

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

In [5]:

ser_obj.values       # 获取ser_obj的数据

Out[5]:

array([1, 2, 3, 4, 5], dtype=int64)

In [5]:

ser_obj[3]            # 获取位置索引3对应的数据

Out[5]:

In [6]:

ser_obj * 2

Out[6]:

a     2
b     4
c     6
d     8
e    10
dtype: int64

3.1.2 DataFrame

pandas.DataFrame(data, index, columns, dtype, copy)

data：一组数据笔记本电脑性价比排行(ndarray,series, map, lists, dict 等类pandas教程型)
index：索引值&#xfpython可以做什么工作f0c;或者可以称为行标签
columns&#xff1人工智能北京共识a;列标潘大帅在减肥签，默认为 RangeIndex (0, 1, 2, …, n)
dtype：数据类型
copy：拷贝数据挖掘的案例及分析数据，默认为 False

In [7]:

import numpy as np
import pandas as pd
演示demo_arr = np.array([['a', 'b', 'c'], ['d', 'e', 'f']]) 
# 创建数组
df_obj = pd.DataFrame(演示demo_arr)    # 基于数组创建DataFrame对象
df_obj

Out[7]:

	0	1	2
0	a	b	c
1	d	e	f

In [8]:

# 创建DataFrame对象，指定列索引
df_obj1 = pd.DataFrame(演示demo_arr, columns=['No1', 'No2', 'No3'])
df_obj1

Out[8]:

	No1	No2	No3
0	a	b	c
1	d	e	f

In [10]:

element = df_obj1['No2']  # 通过列索引的方式获取一列数据
element

Out[10]:

0    b
1    e
Name: No2, dtype: object

In [11]:

type(element)                # 查看返回结果的类型

Out[11]:

pandas.core.series.Series

In [11]:

element = df_obj1.No2  # 通过属性获取列数据
element

Out[11]:

0    b
1    e
Name: No2, dtype: object

In [12]:

type(element)           # 查看返回结果的类型

Out[12]:

pandas.core.series.Series

In [13]:

df_obj1['No4'] = ['g', 'h']
df_obj1

Out[13]:

	No1	No2	No3人工智能对人类社会发展的影响	No4
0	a	b	c	g
1	d	e	f	h

In [14]:

del df_obj1['No4']
df_obj1

Out[14]:

	No1	No2	No3
0	a	b	c
1	d	e	f

3pandas什么意思.2 索引操作及高级索引

3.2.1 索引对象

In [15]:

import pandas as pd
ser_obj = pd.Series(range(5), index=['a','b','c','d','e'])
ser_index = ser_obj.index
ser_index

Out[15]:

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

In [16]:

ser_obj

Out[16]:

a    0
b    1
c    2
d    3
e    4
dtype: int64

In [17]:

ser_index['2'] = 'cc'  # (执行时，将注释打开，便可以看到错误信息)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-17-3d779dc501cd> in <module>
----> 1 ser_index['2'] = 'cc'  # (执行时，将注释打开，便可以看到错误信息)
c:\users\dell\anaconda3\envs\pyg\lib\site-packages\pandas\core\indexes\base.py in __setitem__(self, key, value)
   4082 
   4083     def __setitem__(self, key, value):
-> 4084         raise TypeError("Index does not support mutable operations")
   4085 
   4086     def __getitem__(self, key):
TypeError: Index does not support mutable operations

In [22]:pandas

ser_obj1 = pd.Series(range(3), index=['a','b','c'])
ser_obj2 = pd.Series(['a','b','c'], index=ser_obj1.index)
ser_obj2.index is ser_obj1.index

Out[22]:

True

In [19]:

ser_obj1

Out[19]:

a    0
b    1
c    2
dtype: int64

In [20]:

ser_obj2

Ou人工智能的发展及应用t[20]:

a    a
b    b
c    c
dtype: object

3.2.2 重置索引

In [23]:

import pandas as pd
ser_obj = pd.Series([1, 2, 3, 4, 5], index=['c', 'd', 'a', 'b', 'e'])
ser_obj

Out[23]:

c    1
d    2
a    3
b    4
e    5
dtype: int64

In [24]:

# 重新索引
ser_obj2 = ser_obj.reindex(['a', 'b', 'c', 'd', 'e', 'f']) 
ser_obj2

Out[24]:

a    3.0
b    4.0
c    1.0
d    2.0
e    5.0
f    NaN
dtype: float64

In [21]:

# 重新索引时指定填充的缺失值
ser_obj2 = ser_obj.reindex(['a', 'b', 'c', 'd', 'e', 'f'], fill_value = 6)
ser_obj2

Out人工智能电影[21]:

a    3
b    4
c    1
d    2
e    5
f    6
dtype: int64

In [25]:

# 创建Series对象，并为其指定索引
ser_obj3 = pd.Series([1, 3, 5, 7], index=[0, 2, 4, 6])
ser_obj3

Out[25]:

0    1
2    3
4    5
6    7
dtype: int64

In [27]:

ser_obj3.reindex(range(6), method = 'ffill') # 重新索引，前向填充值

Out[27]:

0    1
1    1
2    3
3    3
4    5
5    5
dtype: int64

In [28]:

ser_obj3.reindex(range(6), method = 'bfill')# 重新索引，后向填充值

Out[28]:

0    1
1    3
2    3
3    5
4    5
5    7
dtype: int64

3.2.3 索引操作

In [25]:

import pandas as pd
ser_obj = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])
ser_obj[2]       # 使用索引位置获取数据

Out[25]:

In [26]:

ser_obj['c']    # 使用索引名称获取数据

O数据挖掘的方法有哪些ut[26]:

In [27]:

ser_obj[2: 4]           # 使用位置索引进行切片（我们看到区间是左闭右开的，这种语法符合python的一贯风格）

Out[27]:

c    3
d    4
dtype: int64

In [28]:

ser_obj['c': 'e']      # 使用索引名称进行切片

Out[28]:

c    3
d    4
e    5
dtype: int64

In [29]:

ser_obj[[0, 2, 4]]          # 通过不连续位置索引获取数据集(Dataset)

Out[29]:

c    1
a    3
e    5
dtype: int64

In [30]:

ser_obj[['a', 'c', 'd']]   # 通过不连续索引名称获取数据集(Dataset)

Out[30]:

a    1
c    3
d    4
dtype: int64

In [30]:

ser_bool = ser_obj > 2         # 创建布尔型Series对象
ser_bool

Out[30]:

c    False
d    False
a     True
b     True
e     True
dtype: bool

In [32]:

ser_obj[ser_bool]               # 获取结果为True的数据

Out[32]:

c    3
d    4
e    5
dtype: int64

In [31]:

arr = np.arange(12).reshape(3, 4)
df_obj = pd.DataFrame(arr, columns=['a', 'b', 'c', 'd'])
df_obj

Out[31]:

	a	b	c	d
0	0	1	2	3
1	4	5	6	7
2	8	9	10	11

In [34]:

df_obj['b']

Out[34]:

0    1
1    5
2    9
Name: b, dtype: int32

In [35]:

type(df_obj['b'])

Out[35]:

pandas.core.series.Series

In [36]:

df_obj[['b', 'd']]        # 获取不连续的Series对象

Out[36]:

	b	d
0	1	3
1	5	7
2	9	11

In [37]:

df_obj[: 2]               # 使用切片获取第0~1行的数据

Out[37]:

	a	b数据分析软件	c	d
0	0	1	2	3
1	4	5	6	7

In [38]:

# 使用多个切片先通过行索引获取第0~2行的数据，再通过不连续列索引获取第b、d列的数据
df_obj[: 3][['b', 'd']]

Out[38]:

	b	d
0	1	3
1	5	7
2	9	11

用loc和iloc花式索引

In [32]:

arr = np.arange(16).reshape(4, 4)
dataframe_obj = pd.DataFrame(arr, columns=['a', 'b', 'c', 'd'])
dataframe_obj

Out[32]:

	a	b	c	d
0	0	1	2	3
1	4	5	6	7人工智能专业
2	8	9	10	11
3	12	13	14	15

In [33]:

dataframe_obj.loc[:, ["c", "a"]]

Out[33]:

	c	a
0	2	0
1	6	4
2	10	8
3	14	12

In [34]:

dataframe_obj.iloc[:, [2, 0]]

Out[34]:

	c	a
0	2	0
1	6	4
2	10	8
3	14	12

In [35]:

dataframe_obj.loc[1:2, ['b','c']]#注意和iloc区别，loc对于行索引是双闭区间,iloc为左闭右开

Out[35]:

	b	c
1	5	6
2	9	10

In [36]:

dataframe_obj.iloc[1:3, [1, 2]]//iloc行索引为左闭右开

Out[36]:

	b	c
1	5	6
2	9	10

3.3 算术运算与数数据分析培训据对齐

In [37]:

obj_one = pd.Series(range(10, 13), index=range(3))
obj_one

Out[37]:python是什么

0    10
1    11
2    12
dtype: int64

In [38]:

obj_two = pd.Series(range(20, 25), index=range(5))
obj_two

Out[38]:

0    20
1    21
2    22
3    23
4    24
dtype: int64

In [39]:

obj_one + obj_two

Out[39]:

0    30.0
1    32.0
2    34.0
3     NaN
4     NaN
dtype: float64
//可以看到默认填充为NaN

In [40]:

obj_one.add(obj_two, fill_value = 0)   # 执行加法运算，补充缺失值

Oupandas读取csv文件t[40]:

0    30.0
1    32.0
2    34.0
3    23.0
4    24.0
dtype: float64
//注意:这里补充的缺失值是obj_one的缺失值填充而不是对加完的数据进行填充

3.4 数据排序

3.4.1 按索引排序

In [41]:数据分析师

import pandas as pd
ser_obj = pd.Series(range(10, 15), index=[5, 3, 1, 3, 2])
ser_obj

Out[41]:

5    10
3    11
1    12
3    13
2    14
dtype: int64

In [42]:

ser_obj.sort_index()        # 按索引进行升序排列

Out[42]:

1    12
2    14
3    11
3    13
5    10
dtype: int64

In [43]:

ser_obj.sort_index(ascending = False)  # 按索引进行降序排列

Out[43]:

5    10
3    11
3    13
2    14
1    12
dtype: int64

In [44]:

import pandas as pd
import numpy as np
df_obj = pd.DataFrame(np.arange(9).reshape(3, 3), index=[4, 3, 5])
df_obj

Out[44]:

	0	1	2
4	0	1	2
3	3	4	5
5	6	7	8

In [45]:

df_obj.sort_index()                      # 按索引升序排列

Out[45]:

	0	1	2
3	3	4	5
4	0	1笔记本电脑开不了机	2
5	6	7	8

In [46]:

df_obj.sort_index(ascending = False)     # 按索引降序排列

Out[46]:

	0	1	2
5	6	7	8
4	0	1	2
3	3	4	5

3.笔记本4.2 按值排序

In [49]:

ser_obj = pd.Series([4, np.nan, 6, np.nan, -3, 2])
ser_obj

Out[49]:

0    4.0
1    NaN
2    6.0
3    NaN
4   -3.0
5    2.0
dtype: float64

In [50]:

ser_obj.sort_values()   # 按值升序排列

Out[50]:

4   -3.0
5    2.0
0    4.0
2    6.0
1    NaN
3    NaN
dtype: float64

In [51]:

df_obj = pd.DataFrame([[0.4, -0.1, -0.3, 0.0],
                       [0.2, 0.6, -0.1, -0.7],
                       [0.8, 0.6, -0.5, 0.1]])
df_obj

Out[51]:

	0	1	2	3
0	0.4	-0.1	-0.3	0.0
1	0.2	0.6	-0.1	-0.7
2	0.8	0.6	-0.数据挖掘技术5	0.1

In [53]:

df_obj.sort_values(by = 2)  # 对列索引值为2的数据进行排序

Out[53]:

	0	1	2	3
2	0.8	0.6	-0.5	0.1
0	0.4	-0.1	-0.3	0.0
1	0.2	0.6	-0.1	-0.7

3.5 统计计算与描述

3.5.1pandas怎么读常用的统计python编程计算

In [54]:

df_obj = pd.DataFrame(np.arange(12).reshape(3, 4), columns=['a', 'b', 'c', 'd'])
df_obj

Out[54]:

	a	b	c	d
0	0	1	2	3
1	4	5	6	7人工智能的发展及应用
2	8	9	10	11

In [59]:

df_obj.sum()          # 计算每列元素的和

Out[59]:

a    12
b    15
c    18
d    21
dtype: int64

In [60]:

df_obj.max()         # 获取每列的最大值

Out[60]:

a     8
b     9
c    10
d    11
dtype: int32

In [61]:

df_obj.min(axis=1)   # 沿着横向轴，获取每行的最小值，axis默认为0

Out[61]:

0    0
1    4
2    8
dtype: int32

3.5.2 统计描述（descript）

In [62]:

df_obj = pd.DataFrame([[12, 6, -11, 19],
                       [-1, 7, 50, 36],
                       [5, 9, 23, 28]])
df_obj

Out[62]:

	0	1数据分析师工资	2	3
0	12	6	-11	19
1	-1	7	50	36
2	5	9	23	28

In [55]:

df_obj.describe()#提供数据各种维度的描述

Out[55]:

	a	b	c	d
count	3.0	3.0	3.0	3.0
mean	4.0	5.0	6.0	7.0
std	4.pandas读取csv文件0	4.0	4.0	4.0
min	0.0	1.0	2.0	3.0
25%	2.0	3.0	4.0	5.0
50%	4.0	5.0	6.0	7.0
75%	6.0	7.0	8.0	9.0
max	8.0	9.0	10.0	11.0

3.6 层次化索引

3.6.1 认识层数据分析次化索引

In [56]:

import numpy as np
import pandas as pd
mulitindex_series = pd.Series([15848,13472,12073.8,7813,7446,6444,15230,8269],
                              index=[['河北省','河北省','河北省','河北省',
                                      '河南省','河南省','河南省','河南省'],
                                     ['石家庄市','唐山市','邯郸市','秦皇岛市',
                                      '郑州市','开封市','洛阳市','新乡市']])
mulitindex_series

Opandas音标ut[56]:

河北省  石家庄市    15848.0
     唐山市     13472.0
     邯郸市     12073.8
     秦皇岛市     7813.0
河南省  郑州市      7446.0
     开封市      6444.0
     洛阳市     15230.0
     新乡市      8269.0
dtype: float64

In [65]:

import pandas as pd
from pandas import DataFrame,Series
# 占地面积为增加的列索引
mulitindex_df = DataFrame({'占地面积':[15848,13472,12073.8,7813,
                                   7446,6444,15230,8269]},
                          index=[['河北省','河北省','河北省','河北省',
                                  '河南省','河南省','河南省','河南省'],
                                 ['石家庄市','唐山市','邯郸市','秦皇岛市',
                                  '郑州市','开封市','洛阳市','新乡市']])
mulitindex_df

Out[65]:

		占地面积
河北省	石家庄市	1584数据挖掘概念与技术第三版课后答案8.0
	唐山市	13472.0
	邯郸市	12073.8
	秦皇岛市	7813.0
河南省	郑州市	7446.0
	开封市	6444.0
	洛阳市	15230.0
	新乡市	8269.0

In [57]:

from pandas import MultiIndex
# 创建包含多个元组的列表
list_tuples = [('A','A1'), ('A','A2'), ('B','B1'),
               ('B','B2'), ('B','B3')]
# 根据元组列表创建一个MultiIndex对象
multi_index = MultiIndex.from_tuples(tuples=list_tuples, 
                                     names=[ '外层索引', '内层索引'])
multi_index

Out[57]:

MultiIndex([('A', 'A1'),
            ('A', 'A2'),
            ('B', 'B1'),
            ('B', 'B2'),
            ('B', 'B3')],
           names=['外层索引', '内层索引'])

In [58]:笔记本电脑什么牌子好

# 导入所需要的包
import pandas as pd
values = [[1, 2, 3], [8, 5, 7], [4, 7, 7], [5, 5, 4], [4, 9, 9]]
df_indexs = pd.DataFrame(data=values, index=multi_index)
df_indexs

Out[58]:

		0	1	2
外层索引	内层索引
A	A1	1	2数据挖掘概念与技术第三版课后答案	3
	A2	8数据分析报告	5	7
B	B1	4	7	7
	B2	5	5	4
	B3	4	9	9

In人工智能概念股 [59]:

from pandas import MultiIndex
# 根据列表创建一个MultiIndex对象
multi_array = MultiIndex.from_arrays(arrays =[['A', 'B', 'A', 'B', 'B'], 
                                              ['A1', 'A2', 'B1', 'B2', 'B3']],
                                     names=['外层索引','内层索引'])
multi_array

Out[59]:

MultiIndex([('A', 'A1'),
            ('B', 'A2'),
            ('A', 'B1'),
            ('B', 'B2'),
            ('B', 'B3')],
           names=['外层索引', '内层索引'])

In [60]:

# 导入所需要的包
import pandas as pd
import numpy as np
values = np.array([[1, 2, 3], [8, 5, 7], [4, 7, 7],
                   [5, 5, 4], [4, 9, 9]])
df_array = pd.DataFrame(data=values, index=multi_array)
df_array

Oupython可以做什么工作t[60]:

		0	1	2
外层索引	内层索引
A	A1	1	2	3
B	A2	8	5	7
A	B1	4	7	7
B	B2	5	5	4
	B3	4	9	9

In [61]:

from pandas import MultiIndex
import pandas as pd
numbers = [0, 1, 2]
colors = ['green', 'purple']
multi_product = pd.MultiIndex.from_product([numbers, colors], 
                                           names=['number', 'color'])
multi_product

Out[61]:

MultiIndex([(0,  'green'),
            (0, 'purple'),
            (1,  'green'),
            (1, 'purple'),
            (2,  'green'),
            (2, 'purple')],
           names=['number', 'color'])

In [62]:

# 导入所需要的包
import pandas as pd
# 使用变量values接收DataFrame对象的值
values = np.array([[7, 5], [6, 6], [3, 1], [5, 5], [4, 5], [5, 3]])
df_product = pd.DataFrame(data=values, index=multi_product)
df_product

Out[62]:

		0	1
number	color
0	green	7	5
	purple	6	6
1	green	3	1
	purple	5	5
2	green	4	5
	purple	5	5

3.6.2 层次化索引的操作

In [63]:

from pandas import Series, DataFrame
ser_obj = Series([50, 60, 40, 94, 63, 101, 200, 56, 45],
                 index=[['小说', '小说', '小说',
                         '散文随笔', '散文随笔', '散文随笔',
                         '传记', '传记', '传记'],
                        ['高山上的小邮局', '失踪的总统', '绿毛水怪',
                         '皮囊', '浮生六记', '自在独行',
                         '梅西', '老舍自传', '库里传']])
ser_obj

Out[63]:

小说    高山上的小邮局     50
      失踪的总统       60
      绿毛水怪        40
散文随笔  皮囊          94
      浮生六记        63
      自在独行       101
传记    梅西         200
      老舍自传        56
      库里传         45
dtype: int64

In [64]:

ser_obj['小说']     # 获取所有外层索引为“小说”的数据

Out[64]:

高山上的小邮局    50
失踪的总统      60
绿毛水怪       40
dtype: int64

In [65]:

ser_obj[:,'自在独行']       # 获取内层索引对应的数据

Out[65]:

散文随笔    101
dtype: int64

In [66]:数据挖掘的方法有哪些

ser_obj.swaplevel()               # 交换外层索引与内层索引位置

Out[66]:

高山上的小邮局  小说       50
失踪的总统    小说       60
绿毛水怪     小说       40
皮囊       散文随笔     94
浮生六记     散文随笔     63
自在独行     散文随笔    101
梅西       传记      200
老舍自传     传记       56
库里传      传记       45
dtype: int64

In [67]:

from pandas import DataFrame,Series
df_obj = DataFrame({'str':['a','b','d','e','f','k','d','s','l'],
                    'num':[1, 2, 4, 5, 3, 2, 6, 2, 3]},
                   index=[['A', 'A', 'A', 'C', 'C', 'C', 'B', 'B', 'B'],
                          [1, 3, 2, 3, 1, 2, 4, 5, 8]])
df_obj

Out[67]:

		str	num
A	1	a	1
3	3	b	2
	2	d	4
C	3	e	5
	1	f	3
	2	k	2
B	4	d	6
5	5	s	2
	8	I	3

In [68]:

df_obj.sort_index()         # 按索引排序

Out[68]:

		str数据挖掘的案例及分析	num
A	1	a	1笔记本电脑怎么连接wifi
2	2	d	4
	3	b	2
B	4	d	6
	5	s	2
	8	I	3
C	1	f	3
	2	k	2
	3	e	5

3.7 读写数据操作

3.7.1 读写python下载安装教程文本文件

to_csv(path_or_buf=None,sep=',',na_rep='',float=format=None,columns=None,header=True,index=True,index_label=None,mode='w', ...)#写入csv

path_or_buf 文件路径
sep 分隔符，默认用逗号隔开

indepandas什么意思x 默认为True，如为False则不会显示索引

read_csv(filepath_or_buf,sep=',',delimiter=None,header='infer',index_col=None,usecols=None,prefix=None, ...)#读csv文件

header 指定行数用来作为列名
name 用于结笔记本电脑怎么连接wifi果的列名列表

注意&#xff1a数据挖掘分为;读TXT文件也可用read_csv()或pandas库read_table(),二者的区别主要在于分隔符不同，前者用逗号，后者用\t作分割符

In [70]:

import pandas as pd
df = pd.DataFrame({'one_name':[1,2,3], 'two_name':[4,5,6]})
df

Out[70]:

	one_name	two_name
0	1	4
1	2	5
2	3	6

In [71]:

# 将df对象写入到csv格式的文件中
df.to_csv(r'itc.csv',index=False)
'写入完毕'

Out[71]:

'写入完毕'

In [73]:

# import pandas as pd
file = open(r"itc.csv")
# 读取指定目录下的csv格式的文件
file_data = pd.read_csv(file)
file_data

Out[73]:

	one_name	two_name
0	1	4
1	2	5
2	3	6

In [74]:

import pandas as pd
file = open(r'itcast.txt',encoding = 'utf-8')
data = pd.read_table(file)
data

O数据分析师ut[74]:

	AAA
0	不不不

3.7.2 读写Excel文数据分析方法件

In [75]:

import pandas as pd
df1 = pd.DataFrame({'col1': ['传', '智'], 'col2': ['播', '客']})
df1.to_excel(r'itcast.xlsx', 'python基础班')
'写入完毕'

Out[75]:

'写入完毕'

3.7.3 读取HTML表格笔记本电脑开不了机数据

In [81]:

import pandas as pd
import requests
html_data = requests.get('http://kaoshi.edu.sina.com.cn/college/majorlist/')
#html_data = requests.get('http://kaoshi.edu.sina.com.cn/college/m/')用到了request库
html_table_data = pd.read_html(html_data.content,encoding='utf-8')
html_table_data[1]

Out[81]:

	0	1	2	3	4
0	专业名称	专业代码	专业大类	专业小类	操作
1	哲学类	0101	哲学	哲学类	开设院校加入对比
2	哲学	010101	哲学	哲学类	开设院校加入对比
3	逻辑学	010102	哲学	哲学类	开设院校加入对比
4	宗教学	010103	哲学	哲学类	开设院校加入对比
5	伦理学	010104	哲学	哲学类	开设院校加入对比
6	经济学类	0201	经济学	经济学类	开设院校加入对比
7	经济学	0201笔记本01	经济学	经济学类	开设院校加入对比
8	经济统计学	020102	经济学	经数据分析观念济学类	开设院校加入对比
9	国民经济管理	020103	经济学	经济学类	开人工智能专业设院校加入对比
10	资源与环境经济人工智能换脸鞠婧祎郑爽学	020104	经济学	经济学类	开设院校加入对比
11	商务经济学	020105	经济学	经人工智能之父济学类	开设院校加入对比
12	能源经济	020106	经济学	经济学类	开设院校加入对比
13	劳动经济学	020107	经济学	经人工智能专业济学类	开设院校加入对比
14	经济工程	020108	经济学	经济学类	开设院校加入对比
15	数字经济	020109	经济学	经济学类	开设院校加python基础教程入对比
16	财政学类	0202	经济pandas教程学	财政学类	开设院校加入对比
17	财政学pandas教程	020201	经人工智能之父济学	财政学类	开设院校加入对比
18	税收学	020202	经济学	财政学类	开设院校加入对比
19	金融学类	0203	经济学	金融学类	开设院校加入对比
20	金融学	020301	经济学	金融学类	开设院校加入对比

3.pandas读取excel文件7.4读写数据库

In [83]:

import pandas as pd
from pandas import DataFrame
from sqlalchemy import create_engine
'''  mysql账号为root  密码为123456 数据名：info  
数据表名称：person_info
engine = create_engine('mysql+mysqlconnector://root:123456@127.0.0.1/info')
pd.read_sql('person_info',engine)
'''

my数据挖掘分为sql账号：root

密码：123456

数据名：info

数据表名称：person_info

engine = create_engin数据分析培训e(‘mysql&#笔记本电脑怎么连接wifi043;mysqlconnector://root:123456@127.数据分析培训0.0.1/info’)

pd.read_sql(‘p人工智能概念股erson_info’,engine)

In [84]:

engine = create_engine('mssql+pymssql://teboho:teboho@127.0.0.1:1433/Teboho')
connection = engine.raw_connection()

In [85]:

sql = 'select * from C_test'
pd.read_sql(sql,con = connection )

Out[85]:

	id	name	sex
0	1笔记本电脑开机黑屏没反应怎么办002	si
1	1001	si

In [86]:

engine = create_engine('mssql+pymssql://db2018:db2018@210.44.125.12:1433/DB2018')
connection = engine.raw_connection()

In [88]:

sql = 'select * from course'
pd.read_sql(sql,con = connection )

Out[88]:

	course_id	t笔记本电脑开机黑屏没反应怎么办itle	dept_name	credits
0	BIO-101	Intro. to Biology	Biology	4.0
1	BIO-301	Genetics	Biology	4.0
2	BIO-399	Com人工智能之父putational B数据分析师高薪骗局iology	Bpython爬虫iology	3.0
3	CS-1数据分析培训01	Intro. to Computer Science	Comp. Sci.	4.0
4	CS-190	Game Design	Comp. Sci.	4.0
5	CS-315	Robotics	Comp. Sci.	3.0
6	CS-319	Image Processing	Comp. Sci.	3.0
7	CS-347	Database System Concepts	Cpython代码画樱花omp. Sci.	3.0
8	EE-181	Intro. to Digi数据挖掘tal Systems	Elec. Eng.	3.0
9	FIN-201	Investment Banki数据挖掘分为ng	Finance	3.0
10	HIS-351	World History	History	3.0
11	MU-199	Music Video Production	Music	3.0
12	PHY-101	Physicapython编程l Principles	Physics	4.0

In [89]:

import pymssql

In [90]:

connect = pymssql.connect('localhost', 'teboho', 'teboho', 'Teboho',charset='utf8') #服务器名,账户,密码,数据库名
if connect:
    print('success')
success

In [38]:

cursor = connect.cursor()   #创建一个游标对象,python里的sql语句都要通过cursor来执行
cursor.execute("create table C_test(id varchar(20), name varchar(20), sex varchar(4))")   #执行sql语句
connect.commit()  #提交
cursor.close()   #关闭游标  #关闭连接

In [51]:

cursor = connect.cursor()   #创建一个游标对象,python里的sql语句都要通过cursor来执行
sql = "insert into C_test (id, name, sex)values(1001, '张si', '女')".encode('utf-8')
cursor.execute(sql)   #执行sql语句
connect.commit()  #提交
cursor.close()

In [52]:

cursor = connect.cursor()   #创建一个游标对象,python里的sql语句都要通过cursor来执行
sql = "select name, sex from C_test".encode('utf-8')
cursor.execute(sql)   #执行sql语句
row = cursor.fetchone()  #读取查询结果,
while row:              #循环读取所有结果
    print("Name=%s, Sex=%s" % (row[0],row[1]))   #输出结果
    row = cursor.fetchone()
cursor.close()   
connect.close()
Name=si, Sex=
Name=si, Sex=

In [86]:

import pandas as pd
from pandas import DataFrame,Series
from sqlalchemy import create_engine
# mysql账号为root  密码为123456 数据名：info
# 数据表名称：person_info
# 创建数据库引擎
# mysql+pymysql 表示使用Mysql数据库的pymysql驱动
engine = create_engine('mysql+mysqlconnector://root:123456@127.0.0.1/info')
sql = 'select * from person_info where id >3;'
pd.read_sql(sql,engine)

Out[86]:

	id	name	age	height	gen人工智能之父der
0	4	刘华	59	175	男
1	5	王数据挖掘技术贤	18	172	女
2	6	周平	36	None	男
3	7	程坤	27	181	男
4	8	李平	38	160	女

In [87]:

from pandas import DataFrame,Series
import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy.types import *
df = DataFrame({"班级":["一年级","二年级","三年级","四年级"],
                              "男生人数":[25,23,27,30],
                              "女生人数":[19,17,20,20]})
# 创建数据库引擎
# mysql+pymysql 表示使用Mysql数据库的pymysql驱动
# 账号：root 密码：123456 数据库名：studnets_info
# 数据表的名称： students
engine=create_engine('mysql+mysqlconnector://root:123456@127.0.0.1/students_info')
df.to_sql('students',engine)

案例—读取北京市2006~2018年高考分数线表格信息及pandas读取csv文件分析pandas库

In [91]:

import pandas as pd
# 指定文件的路径
file_path = 'scores.xlsx'
# 指定列标签的索引列表
df_obj = pd.read_excel(file_path, header=[0, 1],engine = 'openpyxl')
df_obj

Out[91]:

	Unn数据分析报告amed: 0_level_0	一本分数据分析师数线	二本分数线
	Unnamed: 0_level_1	文科	理科	文科	理科
0	2018	576	532	488	432
1	2017	555	537	468	439
2	2016	58数据挖掘3	548	532	49数据挖掘导论完整版课后答案4
3	2015	579	548	527	495
4	2014	565	543	507	495
5	2013	549	550	494	505
6	201数据分析软件2	495	477	446	433
7	2011	524	484	481	435
8	2pandas怎么读音010	524	494笔记本电脑怎么连接wifi	474	441
9	2009	532	501	489	459
10	2008	515笔记本电脑什么牌子好	502	472	455
11	2007	528	531	489	478
12	2006	516	528	476	476

原因是最近xlrd更新到了2.0.1版本python可以做什么工作，只支持.xls文件。所以pandas.read_excel(‘xxx.xlsx’)会报错。

可以安装旧版pandas读取excel文件xlrd，在cmd中运行：

pip uninstall xlrd pip install xlrd==1.2.0

也可以用openpyxlpython代替xlrd打开.xlsx文件：

df=pandas.read_excel(‘data.xlsx’,engine=‘openpyxl’)

In [58]:

sorted_obj = df_obj.sort_index(ascending = False)
sorted_obj

Out[58]:

	Unnamed: 0_level_0	一本分数线	二本分数线
	Unnamed: 0_level_1	文科	理科	文科	理科
12	2006	516	528	47笔记6	476
11	2007	528pandas怎么读	531	489	478
10	2008	515	502	472python是什么	455
9	2009	532	501	489	459
8	2010	524	494	474	441
7	2011	524	484	481	435
6	2012	495	477pandas怎么读	446数据分析师高薪骗局	433
5	2013	549	550	494	505
4	2014	565	5人工智能专业43	507	495
3	2015	579	548	52笔记本电脑性价比排行7	495
2	2016	583	548	532	494
1	2017	555	537	468	439
0	2018	576	532	488	432

In [90]:

sorted_obj.max()

Out[90]:

一本分数线  文科    583
       理科    550
二本分数线  文科    532
       理科    505
dtype: int64

In [91]:

sorted_obj.min()

Out[91]:

一本分数线  文科    495
       理科    477
二本分数线  文科    446
       理科    432
dtype: int64

In [92]:

result1 = sorted_obj["一本分数线", "文科"].ptp() # 文科分数差
result1

Out[92]:

In [93]:

result2 = sorted_obj["一本分数线", "理科"].ptp()
result2

Out[93]:

In [94]:

result3 = sorted_obj["二本分数线", "文科"].ptp()
result3

Out[94]:

In [95]:

result4 = sorted_obj["二本分数线", "理科"].ptp()
result4

Out[95]:

In [96]:

ser_obj1 = sorted_obj['一本分数线','文科']
ser_obj1[2018] - ser_obj1[2017]

Out[96]:

In [97]:

ser_obj2 = sorted_obj['一本分数线','理科']
ser_obj2[2018] - ser_obj2[2017]

Out[97]:

-5

In [98]:

ser_obj3 = sorted_obj['二本分数线','文科']
ser_obj3[2018] - ser_obj3[2017]

Out[98]:

In [99]:

ser_obj4 = sorted_obj['二本分数线','理科']
ser_obj4[2018] - ser_obj4[2017]

Out[99]:

-7

In [100]:

sorted_obj.describe()

Out[100]:

	一本分数线	二本分数线
	文科	理科	文科	理科
count	13.000000	13.000000	13.000000	13.000000
mean	541.615385	521.153846	487.923077	464.384615
std	28.150010	25.986683	23.567144	27.274953
min	495.000000	477.000000	446.000000	432.000000
25%	524.000000	501.000000	474.000000	439.000000
50%	532.000000	531.000000	488.000000	459.000000
75%	565.000000	543.000000	494.000000	494.000000
max	583.000000	550.000000	532.000000	505.000000

原文链接：https://blog.csdn.net/weixin_45506188/article/details/121481443

Pandas操作总结

3.1Pandas

3.1.1 Series

3.1.2 DataFrame

3pandas什么意思.2 索引操作及高级索引

3.2.1 索引对象

3.2.2 重置索引

3.2.3 索引操作

用loc和iloc花式索引

3.3 算术运算与数数据分析培训据对齐

3.4 数据排序

3.4.1 按索引排序

3.笔记本4.2 按值排序

3.5 统计计算与描述

3.5.1pandas怎么读 常用的统计python编程计算

3.5.2 统计描述（descript）

3.6 层次化索引

3.6.1 认识层数据分析次化索引

3.6.2 层次化索引的操作

3.7 读写数据操作

3.7.1 读写python下载安装教程文本文件

3.7.2 读写Excel文数据分析方法件

3.7.3 读取HTML表格笔记本电脑开不了机数据

3.pandas读取excel文件7.4读写数据库

案例—读取北京市2006~2018年高考分数线表格信息及pandas读取csv文件分析pandas库

相关推荐

3.5.1pandas怎么读常用的统计python编程计算