Sample Project & Tutorial

Last updated: February 28th, 20192019-02-28Project preview

RMOTR Notebooks Tutorial

Welcome to RMOTR Notebooks ❤️, a fully online 🤖, cloud-based ☁️ Data Science environment. All your work, analysis and datasets organized in the same place 🙌.

Objective of this tutorial:

Help you get started with RMOTR Notebooks for Data Science and Python programming.

Need help? Create an issue.

Jupyter Notebooks

This document that you're currently reading is a "Jupyter Notebook", and you've probably heard about it before. It's like a text document, but you can run code on it! It can also display inline graphs, pull data from Databases or show excel spreadsheets live! Isn't it amazing? 😄

Mildly interesting fact of the day:

Jupyter is a nod to 3 languages: Julia, Python, and R. Source @jakevdp.

This is a really quick tutorial on how to get started with Jupyter notebooks (and lab). It shouldn't take more than 10 minutes and you'll be writing Python code right away.

Part 1: everything is a cell

Jupyter Notebooks are organized as a set of "cells". Each cell can contain different types of content: like Python code (or R, Julia, etc), images or even human readable text (markdown), like the one you're currently reading.

I've left a couple of empty cells below for you to see them:

In [ ]:
 
In [ ]:
 

This is another cell containing Markdown (human readable) code. And below, another empty cell:

In [ ]:
 

You can edit these cells just by double clicking on them. Try editing the following cell:

👉 Double click on me 👈

When you double click the cell, it should open an "edit mode", and you should see something similar to:

image

If you're seeing those asterisks, it's because you've correctly entered "Edit Mode". Once you've made the changes, you have to "execute", or "run" the cell to reflect the changes. To do that just click on the little play button on the top menu bar:

image

Jupyter notebooks are optimized for an efficient workflow. There are many keyboard shortcuts that will let you interact with your documents, run code and make other changes; mastering these shortcuts will speed up your work. For example, there are two shortcuts to execute a cell:

  1. shift + return: Run cell and advance to the next one.
  2. ctrl + return: Run the cell but don't change focus.

Try them with the following cell:

In [7]:
2 + 2
Out[7]:
4

You can try executing these cells as many times as you want, it won't break anything

ctrl + Return effect:

As you can see in the following animation, the code is correctly executed (it returns 4) and the focus (the blue line at the left side of the cell) stays in the same cell.

ctrl+enter effect

Now compare it to the next shortcut, shift + return:

shift + Return effect:

shift+enter effect

As you can see, every time I execute code the focus changes to the cell below.


Part 2: Working with code

Jupyter notebooks have amazing features to include text and images and create beautiful, human readable documents as you've just seen. But their main benefit is working with code. Now we're going to import a few libraries and start experimenting with Python code. We've already done the simple 2 + 2 before, so let's do something a little bit more interesting. First, we need to import numpy and matplotlib:

In [10]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

RMOTR Notebooks include all the most popular Data Science and Deep Learning libraries already installed. And even if there's one missing, you can always install it in your own environment (more on that later). We've just imported these two libraries:

  • numpy the most popular Python library for array manipulation and numeric computing
  • matplotlib the most popular visualization library in the Python ecosystem.

Let's now execute a few lines of code and generate some plots:

In [6]:
x = np.linspace(0, 10, 500)
y = np.cumsum(np.random.randn(500, 6), 0)
In [7]:
plt.figure(figsize=(12, 7))
plt.plot(x, y)
plt.legend('ABCDEF', ncol=2, loc='upper left')
Out[7]:
<matplotlib.legend.Legend at 0x7f7c0f8c6748>

But what is that 😱? Just random generated datapoints, but you can clearly see how simple is to do numeric processing and plotting with RMOTR Notebooks.

Part 3: Interacting with data

RMOTR Notebooks and Jupyter Lab make it really simple to intereact with files in your local storage. These files are securely stored in the cloud and you can access them from anywhere in the world.

To show you the full potential of RMOTR Notebooks, we're going to pull cryptocurrencies prices from a public API and download them as Excel files, pretty fancy 😎. I need to import two libraries first: requests (to pull data from the web) and pandas to process it.

In [3]:
import requests
import pandas as pd

I have a predefined function that simplifies the process of importing data from Cryptowatch (for reference, check their docs).

In [4]:
def get_historic_price(symbol, exchange='bitfinex', after='2018-09-01'):
    url = 'https://api.cryptowat.ch/markets/{exchange}/{symbol}usd/ohlc'.format(
        symbol=symbol, exchange=exchange)
    resp = requests.get(url, params={
        'periods': '3600',
        'after': str(int(pd.Timestamp(after).timestamp()))
    })
    resp.raise_for_status()
    data = resp.json()
    df = pd.DataFrame(data['result']['3600'], columns=[
        'CloseTime', 'OpenPrice', 'HighPrice', 'LowPrice', 'ClosePrice', 'Volume', 'NA'
    ])
    df['CloseTime'] = pd.to_datetime(df['CloseTime'], unit='s')
    df.set_index('CloseTime', inplace=True)
    return df

I will now pull data from Bitcoin and Ether, two of the most popular cryptocurrencies, for the last 7 days:

In [5]:
last_week = (pd.Timestamp.now() - pd.offsets.Day(7))
last_week
Out[5]:
Timestamp('2018-10-04 19:05:56.958797')
In [6]:
btc = get_historic_price('btc', 'bitstamp', after=last_week)
In [7]:
eth = get_historic_price('eth', 'bitstamp', after=last_week)

Bitcoin:

In [8]:
btc.head()
Out[8]:
OpenPrice HighPrice LowPrice ClosePrice Volume NA
CloseTime
2018-10-04 19:00:00 6552.87 6572.03 6551.30 6557.04 177.172840 1162345.8
2018-10-04 20:00:00 6557.04 6573.01 6557.04 6565.92 192.690520 1265504.5
2018-10-04 21:00:00 6566.00 6572.98 6535.00 6545.05 206.450800 1351745.6
2018-10-04 22:00:00 6545.05 6557.58 6517.39 6545.00 178.217800 1163628.1
2018-10-04 23:00:00 6549.00 6563.98 6545.00 6554.20 62.939697 412657.6
In [11]:
btc['ClosePrice'].plot(figsize=(15, 7))
Out[11]:
<matplotlib.axes._subplots.AxesSubplot at 0x7ff3cc4322b0>

Ether:

In [12]:
eth.head()
Out[12]:
OpenPrice HighPrice LowPrice ClosePrice Volume NA
CloseTime
2018-10-04 19:00:00 221.84 223.27 221.83 222.92 1555.00230 346043.220
2018-10-04 20:00:00 222.83 223.41 222.45 222.65 340.15000 75863.090
2018-10-04 21:00:00 222.76 223.06 220.81 221.45 707.83680 156992.550
2018-10-04 22:00:00 221.45 221.54 218.55 220.59 1649.08060 362885.030
2018-10-04 23:00:00 220.54 222.32 220.25 221.17 228.63876 50586.176
In [13]:
eth['ClosePrice'].plot(figsize=(15, 7))
Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0x7ff3cc4175f8>

As you can see, we're able to pull data from the internet with just a few lines, create a DataFrame and plot it all within Jupyter Lab.

In [14]:
eth.head()
Out[14]:
OpenPrice HighPrice LowPrice ClosePrice Volume NA
CloseTime
2018-10-04 19:00:00 221.84 223.27 221.83 222.92 1555.00230 346043.220
2018-10-04 20:00:00 222.83 223.41 222.45 222.65 340.15000 75863.090
2018-10-04 21:00:00 222.76 223.06 220.81 221.45 707.83680 156992.550
2018-10-04 22:00:00 221.45 221.54 218.55 220.59 1649.08060 362885.030
2018-10-04 23:00:00 220.54 222.32 220.25 221.17 228.63876 50586.176

Bonus: Dynamic plots with Bokeh

We've also included Bokeh as part of this main distribution. Bokeh is a plotting library that generates interactive plots, that can be manipulated right within your browser.

We first need to import the libraries:

In [1]:
from bokeh.plotting import figure, output_file, show
from bokeh.io import output_notebook
In [2]:
output_notebook()
Loading BokehJS ...

And we generate the plot:

In [18]:
p1 = figure(x_axis_type="datetime", title="Crypto Prices")
p1.grid.grid_line_alpha=0.3
p1.xaxis.axis_label = 'Date'
p1.yaxis.axis_label = 'Price'

p1.line(btc.index, btc['ClosePrice'], color='#f2a900', legend='Bitcoin')
p1.line(eth.index, eth['ClosePrice'], color='#A6CEE3', legend='Ether')

p1.legend.location = "top_left"

show(p1)

☝️ as you can see, the plot is interactive. Try zomming in and out, and scrolling in the plot.

Part 4: Exporting to Excel

We're now ready to generate an Excel file from the downloaded prices. Working with Excel and other formats (like CSV or JSON) is extremely simple in Jupyter Lab (thanks to pandas and Python). Our first step will be to create an "Excel writer", a component from the pandas package:

In [33]:
writer = pd.ExcelWriter('cryptos.xlsx')

We'll now write both our Bitcoin and Ether data as separate sheets:

In [30]:
btc.to_excel(writer, sheet_name='Bitcoin')
In [31]:
eth.to_excel(writer, sheet_name='Ether')

And finally, we can save the file:

In [32]:
writer.save()

Once you've saved the file, you should see it in the left side navigation bar:

Excel file

Final words and how to get help

That's it! It's your time now to start working and playing around with jupyter lab and RMOTR Notebooks. This product is in an early stage, so we'd love to receive all your feedback and suggestions. If you need help or ideas for us to implement, create an issue in the following replo: https://github.com/rmotr/notebooks-help. It'll be highly appreciated!

Notebooks AI
Notebooks AI Profile20060