Scipy库的简介与插值

漫谈大数据与数据分析 2020-04-28

765

本文包括：

SCIentific PYthon 简介插值插值示例线性插值多项式插值径向基函数径向基函数插值高维 `RBF` 插值

SCIentific PYthon 简介

Ipython
提供了一个很好的解释器界面。

Matplotlib
提供了一个类似 Matlab
的画图工具。

Numpy
提供了 ndarray
对象，可以进行快速的向量化计算。

Scipy
是 Python
中进行科学计算的一个第三方库，以 Numpy
为基础。

Pandas
是处理时间序列数据的第三方库，提供一个类似 R
语言的环境。

StatsModels
是一个统计库，着重于统计模型。

Scikits
以 Scipy
为基础，提供如 scikits-learn
机器学习和scikits-image
图像处理等高级用法。

Scipy
由不同科学计算领域的子模块组成：

子模块	描述
`cluster`	聚类算法
`constants`	物理数学常数
`fftpack`	快速傅里叶变换
`integrate`	积分和常微分方程求解
`interpolate`	插值
`io`	输入输出
`linalg`	线性代数
`odr`	正交距离回归
`optimize`	优化和求根
`signal`	信号处理
`sparse`	稀疏矩阵
`spatial`	空间数据结构和算法
`special`	特殊方程
`stats`	统计分布和函数
`weave`	C/C++ 积分

在使用 Scipy
之前，为了方便，假定这些基础的模块已经被导入：

1import numpy as np
2import scipy as sp
3import matplotlib as mpl
4import matplotlib.pyplot as plt

使用 Scipy 中的子模块时，需要分别导入：

1from scipy import linalg, optimize

对于一些常用的函数，这些在子模块中的函数可以在 scipy
命名空间中调用。另一方面，由于 Scipy
以 Numpy
为基础，因此很多基础的 Numpy
函数可以在scipy
命名空间中直接调用。

我们可以使用 numpy
中的 info
函数来查看函数的文档：

1np.info(optimize.fmin)

  1 fmin(func, x0, args=(), xtol=0.0001, ftol=0.0001, maxiter=None, maxfun=None,
  2      full_output=0, disp=1, retall=0, callback=None, initial_simplex=None)
  3
  4Minimize a function using the downhill simplex algorithm.
  5
  6This algorithm only uses function values, not derivatives or second
  7derivatives.
  8
  9Parameters
 10----------
 11func : callable func(x,*args)
 12    The objective function to be minimized.
 13x0 : ndarray
 14    Initial guess.
 15args : tuple, optional
 16    Extra arguments passed to func, i.e. ``f(x,*args)``.
 17xtol : float, optional
 18    Absolute error in xopt between iterations that is acceptable for
 19    convergence.
 20ftol : number, optional
 21    Absolute error in func(xopt) between iterations that is acceptable for
 22    convergence.
 23maxiter : int, optional
 24    Maximum number of iterations to perform.
 25maxfun : number, optional
 26    Maximum number of function evaluations to make.
 27full_output : bool, optional
 28    Set to True if fopt and warnflag outputs are desired.
 29disp : bool, optional
 30    Set to True to print convergence messages.
 31retall : bool, optional
 32    Set to True to return list of solutions at each iteration.
 33callback : callable, optional
 34    Called after each iteration, as callback(xk), where xk is the
 35    current parameter vector.
 36initial_simplex : array_like of shape (N + 1, N), optional
 37    Initial simplex. If given, overrides `x0`.
 38    ``initial_simplex[j,:]`` should contain the coordinates of
 39    the j-th vertex of the ``N+1`` vertices in the simplex, where
 40    ``N`` is the dimension.
 41
 42Returns
 43-------
 44xopt : ndarray
 45    Parameter that minimizes function.
 46fopt : float
 47    Value of function at minimum: ``fopt = func(xopt)``.
 48iter : int
 49    Number of iterations performed.
 50funcalls : int
 51    Number of function calls made.
 52warnflag : int
 53    1 : Maximum number of function evaluations made.
 54    2 : Maximum number of iterations reached.
 55allvecs : list
 56    Solution at each iteration.
 57
 58See also
 59--------
 60minimize: Interface to minimization algorithms for multivariate
 61    functions. See the 'Nelder-Mead' `method` in particular.
 62
 63Notes
 64-----
 65Uses a Nelder-Mead simplex algorithm to find the minimum of function of
 66one or more variables.
 67
 68This algorithm has a long history of successful use in applications.
 69But it will usually be slower than an algorithm that uses first or
 70second derivative information. In practice it can have poor
 71performance in high-dimensional problems and is not robust to
 72minimizing complicated functions. Additionally, there currently is no
 73complete theory describing when the algorithm will successfully
 74converge to the minimum, or how fast it will if it does. Both the ftol and
 75xtol criteria must be met for convergence.
 76
 77Examples
 78--------
 79>>> def f(x):
 80...     return x**2
 81
 82>>> from scipy import optimize
 83
 84>>> minimum = optimize.fmin(f, 1)
 85Optimization terminated successfully.
 86         Current function value: 0.000000
 87         Iterations: 17
 88         Function evaluations: 34
 89>>> minimum[0]
 90-8.8817841970012523e-16
 91
 92References
 93----------
 94.. [1] Nelder, J.A. and Mead, R. (1965), "A simplex method for function
 95       minimization", The Computer Journal, 7, pp. 308-313
 96
 97.. [2] Wright, M.H. (1996), "Direct Search Methods: Once Scorned, Now
 98       Respectable", in Numerical Analysis 1995, Proceedings of the
 99       1995 Dundee Biennial Conference in Numerical Analysis, D.F.
100       Griffiths and G.A. Watson (Eds.), Addison Wesley Longman,
101       Harlow, UK, pp. 191-208.

可以用 lookfor
来查询特定关键词相关的函数：

1np.lookfor("resize array")

 1Search results for 'resize array'
 2---------------------------------
 3numpy.chararray.resize
 4    Change shape and size of array in-place.
 5numpy.ma.resize
 6    Return a new masked array with the specified size and shape.
 7numpy.resize
 8    Return a new array with the specified shape.
 9numpy.chararray
10    chararray(shape, itemsize=1, unicode=False, buffer=None, offset=0,
11numpy.memmap
12    Create a memory-map to an array stored in a *binary* file on disk.
13numpy.squeeze
14    Remove single-dimensional entries from the shape of an array.
15numpy.expand_dims
16    Expand the shape of an array.
17numpy.ma.MaskedArray.resize
18    .. warning::

还可以指定查找的模块：

1np.lookfor("remove path", module="os")

1Search results for 'remove path'
2--------------------------------
3os.removedirs
4    removedirs(name)
5os.walk
6    Directory tree generator.

插值

1import numpy as np
2import matplotlib.pyplot as plt
3%matplotlib inline

设置 Numpy
浮点数显示格式：

1np.set_printoptions(precision=2, suppress=True)

从文本中读入数据，数据来自 http://kinetics.nist.gov/janaf/html/C-067.txt ，保存为结构体数组：

1data = np.genfromtxt("JANAF_CH4.txt", 
2                  delimiter="\t", # TAB 分隔
3                  skip_header=1,     # 忽略首行
4                  names=True,     # 读入属性
5                  missing_values="INFINITE",  # 缺失值
6                  filling_values=np.inf)      # 填充缺失值

显示部分数据：

1for row in data[:7]:
2    print(f"{row['TK']}\t{row['Cp']}")
3print("...\t...")

10.0    0.0
2100.0    33.258
3200.0    33.473
4250.0    34.216
5298.15    35.639
6300.0    35.708
7350.0    37.874
8...    ...

绘图：

1p = plt.plot(data['TK'], data['Cp'], 'kx')
2t = plt.title("JANAF data for Methane $CH_4$")
3a = plt.axis([0, 6000, 30, 120])
4x = plt.xlabel("Temperature (K)")
5y = plt.ylabel(r"$C_p$ ($\frac{kJ}{kg K}$)")

插值示例

假设我们要对这组数据进行插值。

先导入一维插值函数 interp1d
：

1interp1d(x, y)

1from scipy.interpolate import interp1d

1ch4_cp = interp1d(data['TK'], data['Cp'])

interp1d
的返回值可以像函数一样接受输入，并返回插值的结果。

单个输入值，注意返回的是数组：

1ch4_cp(382.2)

1array(39.57)

输入数组，返回的是对应的数组：

1ch4_cp([32.2,323.2])

1array([10.71, 36.71])

默认情况下，输入值要在插值允许的范围内，否则插值会报错：

1ch4_cp(8752)

 1---------------------------------------------------------------------------
 2
 3ValueError                                Traceback (most recent call last)
 4
 5<ipython-input-10-737d7cfff1c9> in <module>()
 6----> 1 ch4_cp(8752)
 7
 8
 9D:\Anaconda3\lib\site-packages\scipy\interpolate\polyint.py in __call__(self, x)
10     77         """
11     78         x, x_shape = self._prepare_x(x)
12---> 79         y = self._evaluate(x)
13     80         return self._finish_y(y, x_shape)
14     81 
15
16
17D:\Anaconda3\lib\site-packages\scipy\interpolate\interpolate.py in _evaluate(self, x_new)
18    662         y_new = self._call(self, x_new)
19    663         if not self._extrapolate:
20--> 664             below_bounds, above_bounds = self._check_bounds(x_new)
21    665             if len(y_new) > 0:
22    666                 # Note fill_value must be broadcast up to the proper size
23
24
25D:\Anaconda3\lib\site-packages\scipy\interpolate\interpolate.py in _check_bounds(self, x_new)
26    694                              "range.")
27    695         if self.bounds_error and above_bounds.any():
28--> 696             raise ValueError("A value in x_new is above the interpolation "
29    697                              "range.")
30    698 
31
32
33ValueError: A value in x_new is above the interpolation range.
34

但我们可以通过参数设置允许超出范围的值存在,不过由于超出范围，所以插值的输出是非法值：

1ch4_cp = interp1d(data['TK'], data['Cp'], bounds_error=False)
2ch4_cp(8752)

1array(nan)

可以使用指定值替代这些非法值：

1ch4_cp = interp1d(data['TK'],
2                  data['Cp'],
3                  bounds_error=False,
4                  fill_value=-999.25)
5ch4_cp(8752)

1array(-999.25)

线性插值

interp1d
默认的插值方法是线性，关于线性插值的定义，请参见：

维基百科-线性插值：https://zh.wikipedia.org/wiki/%E7%BA%BF%E6%80%A7%E6%8F%92%E5%80%BC
百度百科-线性插值：http://baike.baidu.com/view/4685624.htm

其基本思想是，已知相邻两点对应的值，那么对于之间的某一点，线性插值对应的值满足：点在所形成的线段上。

应用线性插值：

1T = np.arange(100,355,5)
2plt.plot(T, ch4_cp(T), "+k")
3p = plt.plot(data['TK'][1:7], data['Cp'][1:7], 'ro', markersize=8)

其中红色的圆点为原来的数据点，黑色的十字点为对应的插值点，可以明显看到，相邻的数据点的插值在一条直线上。

多项式插值

我们可以通过 kind
参数来调节使用的插值方法，来得到不同的结果：

nearest
最近邻插值
zero
0阶插值
linear
线性插值
quadratic
二次插值
cubic
三次插值
4,5,6,7
更高阶插值

径向基函数

关于径向基函数，可以参阅：

维基百科-Radial basis fucntion：https://en.wikipedia.org/wiki/Radial_basis_function

径向基函数，简单来说就是点处的函数值只依赖于与某点的距离：

1x = np.linspace(-3,3,100)

常用的径向基（RBF
）函数有：

高斯函数：

1plt.plot(x, np.exp(-1 * x **2))
2t = plt.title("Gaussian")

Multiquadric
函数：

1plt.plot(x, np.sqrt(1 + x **2))
2t = plt.title("Multiquadric")

Inverse Multiquadric
函数：

1plt.plot(x, 1. / np.sqrt(1 + x **2))
2t = plt.title("Inverse Multiquadric")

径向基函数插值

对于径向基函数，其插值的公式为：

我们通过数据点来计算出的值，来计算处的插值结果。

1from scipy.interpolate.rbf import Rbf

使用 multiquadric
核的：

1cp_rbf = Rbf(data['TK'], data['Cp'], function = "multiquadric")
2plt.plot(data['TK'], data['Cp'], 'k+')
3p = plt.plot(data['TK'], cp_rbf(data['TK']), 'r-')

使用 gaussian
核：

1cp_rbf = Rbf(data['TK'], data['Cp'], function = "gaussian")
2plt.plot(data['TK'], data['Cp'], 'k+')
3p = plt.plot(data['TK'], cp_rbf(data['TK']), 'r-')

使用 nverse_multiquadric
核：

1cp_rbf = Rbf(data['TK'], data['Cp'], function = "inverse_multiquadric")
2plt.plot(data['TK'], data['Cp'], 'k+')
3p = plt.plot(data['TK'], cp_rbf(data['TK']), 'r-')

不同的 RBF
核的结果也不同。

高维 `RBF` 插值

1from mpl_toolkits.mplot3d import Axes3D

三维数据点：

1x, y = np.mgrid[-np.pi/2:np.pi/2:5j, -np.pi/2:np.pi/2:5j]
2z = np.cos(np.sqrt(x**2 + y**2))

1fig = plt.figure(figsize=(12,6))
2ax = fig.gca(projection="3d")
3ax.scatter(x,y,z)

1<mpl_toolkits.mplot3d.art3d.Path3DCollection at 0x1a03c1d0>

3维 RBF
插值：

1zz = Rbf(x, y, z)

1xx, yy = np.mgrid[-np.pi/2:np.pi/2:50j, -np.pi/2:np.pi/2:50j]
2fig = plt.figure(figsize=(12,6))
3ax = fig.gca(projection="3d")
4ax.plot_surface(xx,yy,zz(xx,yy),rstride=1, cstride=1, cmap=plt.cm.jet)

1<mpl_toolkits.mplot3d.art3d.Poly3DCollection at 0x1a20a828>

数据库

文章转载自漫谈大数据与数据分析，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。