Mastering Python Scientific Computing
上QQ阅读APP看书,第一时间看更新

Python scientific computing

Python's support for scientific computing is composed of a number of packages and APIs for different functionalities required for scientific computing. For each category, we have multiple options and a most popular choice. The following are the examples of Python scientific computing options:

  • Chart plotting: At present, the most popular two-dimensional chart plotting package is matplotlib. There are several other plotting packages, such as Visvis, Plotly, HippoDraw, Chaco, MayaVI, Biggles, Pychart, and Bokeh. There are some packages that are built on top of matplotlib to provide enhanced functionality, such as Seaborn and Prettyplotlib.
  • Optimization: The SciPy stack has an optimization package. The other choices for the optimization functionality are OpenOpt and CVXOpt.
  • Advanced data analysis: Python supports integration with the R statistical package for advanced data analysis using RPy or the RSPlus-Python interface. There is a Python-based library for performing data analysis activities called pandas.
  • Database: PyTables is a package for managing hierarchical databases. This package is developed on top of HDF5 and is designed to efficiently process large datasets.
  • Interactive command shell: IPython is a Python package that supports interactive programming.
  • Symbolic computing: Python has packages such as SymPy and PyDSTool for supporting symbolic computing. Later in this chapter, we are going to cover the idea of symbolic computing.
  • Specialized extensions: SciKits provides special-purpose add-ons for SciPy, NumPy, and Python. The following a select list of Scikits packages:
    • scikit-aero: Aeronautical engineering calculations in Python
    • scikit-bio: Data structures, algorithms, and educational resources for bioinformatics
    • scikit-commpy: Digital communication algorithms with Python
    • scikit-image: Image processing routines for SciPy
    • scikit-learn: A set of Python modules for machine learning and data mining
    • scikit-monaco: Python modules for Monte Carlo integration
    • scikit-spectra: Spectroscopy in Python built on pandas
    • scikit-tensor: A Python module for multilinear algebra and tensor factorizations
    • scikit-tracker: Object detection and tracking for cell biology
    • scikit-xray: Data analysis tools for X-ray science
    • bvp_solver: A Python package for solving two-point boundary value problems
    • datasmooth: The Scikits data smoothing package
    • optimization: A Python module for numerical optimization
    • statsmodels: Statistical computations and models for use with SciPy
  • Third-party or non-scikit packages/applications/tools: There are a number of projects that have developed packages/tools for specific fields of science, such as astronomy, astrophysics, bioinformatics, geosciences, and many more. The following are some selected third-party packages/tools in Python for specific scientific fields:
    • Astropy: A community-driven Python package used to support astronomy and astrophysics computations
    • Astroquery: This package is a collection of tools used to access online astronomy data
    • BioPython: This is a collection of toolkits used to perform biological computations in Python
    • HTSeq: This package supports the analysis of high-throughput sequencing data in Python
    • Pygr: This is the toolkit for sequence and comparative genomic analysis in Python
    • TAMO: This is a Python application used to analyze transcriptional regulation using DNA sequence motifs
    • EarthPy: This is a collection of IPython notebooks that have examples from the earth science domain
    • Pyearthquake: A Python package for earthquake and MODIS analysis
    • MSNoise: This is a Python package for monitoring seismic velocity change using ambient seismic noise
    • AtmosphericChemistry: This tool supports exploration, construction, and conversion of atmospheric chemistry mechanics
    • Chemlab: This package is a complete library used to perform computations related to chemistry

Introduction to NumPy

Python programming is extended to support large arrays and matrices and a library of mathematical functions to manipulate these arrays. These arrays are multidimensional and this Python extension is called NumPy. After the success of the basic implementation of NumPy, it is extended with a number of APIs/tools, including matplotlib, pandas, SciPy, and SymPy. Let's take a look at the brief functionality of each of the subtools/sub-APIs of NumPy.

The SciPy library

SciPy is Python library designed and developed for scientists and engineers for performing operations related to scientific computing. It supports functionalities for different operations, such as optimization, linear algebra, calculus, interpolation, image processing, fast Fourier transformation, signal processing, and special functions. It solves ODEs and performs other tasks required in science and engineering. It is built on top of the NumPy array object and is a very essential component of the NumPy stack. This is why the NumPy stack and the SciPy stack are sometimes used as the same reference.

The SciPy Subpackage

The various subpackages of SciPy include the following:

  • constants: These are physical constants and conversion factors
  • cluster: Hierarchical clustering, vector quantization, and K-means
  • fftpack: Discrete Fourier transform algorithms
  • integrate: Numerical integration routines
  • interpolate: Interpolation tools
  • io: Data input and output
  • lib: Python wrappers to external libraries
  • linalg: Linear algebra routines
  • misc: Miscellaneous utilities (for example, image reading and writing)
  • ndimage: Various functions for multidimensional image processing
  • optimize: Optimization algorithms, including linear programming
  • signal: Signal processing tools
  • sparse: Sparse matrices and related algorithms
  • spatial: KD-trees, nearest neighbors, and distance functions
  • special: Special functions
  • stats: Statistical functions
  • weave: A tool for writing C/C++ code as Python multiline strings

Data analysis using pandas

The pandas library is an open source library designed to provide high-performance data manipulation and analysis functionalities in Python. Using pandas, users can process complete data analysis workflows in Python. Also, using pandas, the IPython toolkit, and other libraries, the Python environment for performing data analysis becomes very good in terms of performance and productivity. The pandas library has only one drawback; it supports only linear and panel regression. However, for other functionalities, we can use statsmodels and scikit-learn. The pandas library supports efficient merging and joining of datasets. It has bundles of tools for reading and writing data among different types of data sources, including in-memory, CSV, text files, Microsoft Excel, SQL databases, and the HDF5 format.