Select Page

Beginners Guide To Setting Up Your First Python Data Science Workspace

by | Jun 15, 2016 | Data Science

If you’re new to data science and looking to set up your first Python data science workspace. This guide is for you.

If you prefer to read the guide on setting up your Python data science workspace, read on.

Throughout the years, we have been using lots of different tools for data science including Matlab, Rapid Miner, Weka and SAS.

Those tools were used for tasks such as data mining, machine learning, clustering and additional data science related tasks. However as we go deeper into the field and moved on into more data science activities, the more commonly used and discussed tools are Python & R.

What Is The Difference Between Python & R?

If you do a quick search online, you’ll find an endless stream of debate going on sites like Reddit & Quora, debating on which one is better for data analysis and what’s the advantage of one over the other.

If you asked me which to go for, I’d tell you that I don’t have a personal preference. What matters more is choosing the right tool for the right data science task.

Generally, use R when…

r-programming-language

Your tasks are mainly standalone computing. R is, in fact, a great tool for exploratory work with a large number of tools that you can use to apply to various types of data analysis.

The primary focus of R, is to be used as a data analysis tool that is user-friendly, is good for data analysis, statistics and creating graphical models.

You’ll find many data scientists commonly using R together with RStudio IDE, a graphical user interface (GUI).

Use Python when…

python data science workshop

You want to focus on productivity and code readability. In Python we focus a lot on the code indentation, to make your code look great, clean and well indented. This makes it easier to edit or debug the code.

Python started off as a programming language that emphasizes on productivity and code readability. You’ll find a lot of people using Python or Django to write web applications and scripts.

It has a gradual learning curve which makes it easier for aspiring data scientists to pick up the language. Using Python also makes it easier to deploy your code to the production server and then, later on, integrate it with your existing web applications or databases.

For example, you can write Python scripts to process data periodically, pass it on to the ETL processor and carry out other related database tasks. Finally, you write your own scripts or data science programs to analyze the data.

Setting Up Your Python Data Science Workspace

Got your scoop on Python and R? I’m going to show you how to setup your Python data science workspace.

Remember, Python is just a programming language. You actually need other packages and libraries to support your data science tasks.

1 – Download & Setup Anaconda

anaconda - data science workshop

I recommend using Anaconda. Anaconda is a special distribution of data science tools that actually packages all the popular packages and libraries needed for data science activities.

Head to this domain: http://www.continuum.io/downloads

Here you’ll find the different setup versions for Windows (32 bit & 64 bit) and Mac OSX (64 bit).

Installation is relatively simple for both Windows & Mac. Simply follow the onscreen setup guide and you should be done in no time.

2 – Choosing your Prompt Console

I normally use Terminal Prompt on a Mac. If you’re on a Windows machine, you might want to use Command Prompt.

However if you want something more leveled up, I’d recommend you download cmder.

What is Cmder? 

cmder data science workshop

cmder is a console emulator for Windows. It has a great color scheme and a custom prompt layout. Download it for free and enjoy using all the commands and tools, similar to Terminal Prompt on a Mac and other Linux systems.

3 – Go Into Ananconda Navigator to Test

anaconda-navigator-data-science-tutorial

After you have installed Anaconda and cmder, launch Anaconda Navigator to test if you have successfully installed your Python data science workspace.

If you go into the navigator, you’ll find common applications such as Jupyter notebook, qtconsole, spyder, glueviz, anbd more that we will go through in the upcoming posts and videos.

For now, try launching Jupyter notebook and create a new file and type in a code, like an example below.

You can use this code to create a new file

You can use this code to create a new file

And that’s it. You’ve successfully setup your first data science workspace

You’ll see that your data science workspace has been successfully installed and you’re ready to go.

I’ll be coming out with more posts like this. One best way to know when I come out with new data science lessons is to stay subscribed to the newsletter on cherhan.net.

And as always, feel free to ask questions if you if you stumble upon anything when setting up your first Python data science workspace.