
Python scientific computing
Python's support for scientific computing is composed of a number of packages and APIs for different functionalities required for scientific computing. For each category, we have multiple options and a most popular choice. The following are the examples of Python scientific computing options:
- Chart plotting: At present, the most popular two-dimensional chart plotting package is matplotlib. There are several other plotting packages, such as Visvis, Plotly, HippoDraw, Chaco, MayaVI, Biggles, Pychart, and Bokeh. There are some packages that are built on top of matplotlib to provide enhanced functionality, such as Seaborn and Prettyplotlib.
- Optimization: The SciPy stack has an optimization package. The other choices for the optimization functionality are OpenOpt and CVXOpt.
- Advanced data analysis: Python supports integration with the R statistical package for advanced data analysis using RPy or the RSPlus-Python interface. There is a Python-based library for performing data analysis activities called pandas.
- Database: PyTables is a package for managing hierarchical databases. This package is developed on top of HDF5 and is designed to efficiently process large datasets.
- Interactive command shell: IPython is a Python package that supports interactive programming.
- Symbolic computing: Python has packages such as SymPy and PyDSTool for supporting symbolic computing. Later in this chapter, we are going to cover the idea of symbolic computing.
- Specialized extensions: SciKits provides special-purpose add-ons for SciPy, NumPy, and Python. The following a select list of Scikits packages:
scikit-aero
: Aeronautical engineering calculations in Pythonscikit-bio
: Data structures, algorithms, and educational resources for bioinformaticsscikit-commpy
: Digital communication algorithms with Pythonscikit-image
: Image processing routines for SciPyscikit-learn
: A set of Python modules for machine learning and data miningscikit-monaco
: Python modules for Monte Carlo integrationscikit-spectra
: Spectroscopy in Python built on pandasscikit-tensor
: A Python module for multilinear algebra and tensor factorizationsscikit-tracker
: Object detection and tracking for cell biologyscikit-xray
: Data analysis tools for X-ray sciencebvp_solver
: A Python package for solving two-point boundary value problemsdatasmooth
: The Scikits data smoothing packageoptimization
: A Python module for numerical optimizationstatsmodels
: Statistical computations and models for use with SciPy
- Third-party or non-scikit packages/applications/tools: There are a number of projects that have developed packages/tools for specific fields of science, such as astronomy, astrophysics, bioinformatics, geosciences, and many more. The following are some selected third-party packages/tools in Python for specific scientific fields:
Astropy
: A community-driven Python package used to support astronomy and astrophysics computationsAstroquery
: This package is a collection of tools used to access online astronomy dataBioPython
: This is a collection of toolkits used to perform biological computations in PythonHTSeq
: This package supports the analysis of high-throughput sequencing data in PythonPygr
: This is the toolkit for sequence and comparative genomic analysis in PythonTAMO
: This is a Python application used to analyze transcriptional regulation using DNA sequence motifsEarthPy
: This is a collection of IPython notebooks that have examples from the earth science domainPyearthquake
: A Python package for earthquake and MODIS analysisMSNoise
: This is a Python package for monitoring seismic velocity change using ambient seismic noiseAtmosphericChemistry
: This tool supports exploration, construction, and conversion of atmospheric chemistry mechanicsChemlab
: This package is a complete library used to perform computations related to chemistry
Introduction to NumPy
Python programming is extended to support large arrays and matrices and a library of mathematical functions to manipulate these arrays. These arrays are multidimensional and this Python extension is called NumPy. After the success of the basic implementation of NumPy, it is extended with a number of APIs/tools, including matplotlib, pandas, SciPy, and SymPy. Let's take a look at the brief functionality of each of the subtools/sub-APIs of NumPy.
The SciPy library
SciPy is Python library designed and developed for scientists and engineers for performing operations related to scientific computing. It supports functionalities for different operations, such as optimization, linear algebra, calculus, interpolation, image processing, fast Fourier transformation, signal processing, and special functions. It solves ODEs and performs other tasks required in science and engineering. It is built on top of the NumPy array object and is a very essential component of the NumPy stack. This is why the NumPy stack and the SciPy stack are sometimes used as the same reference.
The various subpackages of SciPy include the following:
constants
: These are physical constants and conversion factorscluster
: Hierarchical clustering, vector quantization, and K-meansfftpack
: Discrete Fourier transform algorithmsintegrate
: Numerical integration routinesinterpolate
: Interpolation toolsio
: Data input and outputlib
: Python wrappers to external librarieslinalg
: Linear algebra routinesmisc
: Miscellaneous utilities (for example, image reading and writing)ndimage
: Various functions for multidimensional image processingoptimize
: Optimization algorithms, including linear programmingsignal
: Signal processing toolssparse
: Sparse matrices and related algorithmsspatial
: KD-trees, nearest neighbors, and distance functionsspecial
: Special functionsstats
: Statistical functionsweave
: A tool for writing C/C++ code as Python multiline strings
Data analysis using pandas
The pandas library is an open source library designed to provide high-performance data manipulation and analysis functionalities in Python. Using pandas, users can process complete data analysis workflows in Python. Also, using pandas, the IPython toolkit, and other libraries, the Python environment for performing data analysis becomes very good in terms of performance and productivity. The pandas library has only one drawback; it supports only linear and panel regression. However, for other functionalities, we can use statsmodels
and scikit-learn
. The pandas library supports efficient merging and joining of datasets. It has bundles of tools for reading and writing data among different types of data sources, including in-memory, CSV, text files, Microsoft Excel, SQL databases, and the HDF5 format.