Python for Data Science

( Reblogging from – )

Setting Up Scientific Python

I’ve found that one of the most difficult parts of using the Scientific Python libraries is getting them installed and setup on my computer. pandas, scipy, numpy, and sklearn make heavy use of C/C++ extensions which can be difficult to compile and configure on whatever flavor of OS you use. In this post I’ll go over the easiest way to install the libraries you need to get up and running with Scientific Python.

Getting Python

Step 1 is getting Python of course! If you don’t already have Python2.7 installed on your computer, select one of these distributions from Double-check to make sure you don’t already have Python 2.7 installed, many UNIX distributions ship with it.

Installing pip

Next you need to install pip, Python package manager.


$ curl
$ python


Download Christoph Gohlke’s installer


Debian, Ubuntu
$ apt-get install python-pip
CentOS, Fedora
$ yum -y install python-pip

Make sure pip is on your PATH. If it isn’t, add the python/scripts directory to your PATH.

Enthought Free Distribution

Enthought, which provides commerical support for Scientific Python, is nice enough to publish an installer that works on Windows, OSX, and Linux. This eliminates a lot of headaches of having to compile libaries and ensures you get the most stable versions. There are different tiers of installers, including paid versions, but for most people the free version is all you’ll need. They’re website is a little tricky to navigate (they sort of funnel you to the non-free versions), but here’s the page you want. Select the distribution for your OS and it’ll start the download. This part could take a while. Packed into the installer are the following libraries:

  • scipy
  • numpy
  • ipython
  • matplotlib
  • pandas
  • sympy
  • nose
  • traits
  • chaco

Once the download finishes double-clicking the installer will get you setup with everything–including adding all libraries to your PYTHON PATH.

Installing sklearn, statsmodels, and patsy

Now that we’ve got core libaries installed, it’s time to add some fun stats packages. The Enthought distribution took care of the compiled dependencies. pip makes installing these libraries a breeze:

$ pip install --upgrade scikit-learn
$ pip install --upgrade statsmodels
$ pip install --upgrade patsy

These libraries are going to start spitting out a lot of garbage into the terminal during the install. Don’t worry, this is normal! You might want to take this time to have someone non-technical come by your computer.

Development environment

For ease of usage and interactive computing use IPython (  ) and it also  makes it easy to share your activity as an IPython Notebook.

That’s it! You should be ready to go. If you run into any problems (typically happens if you have previous versions of libraries installed), checkstackoverflow or the numpy/pandas/sklearn docs.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s