Complementary file types and IO tools¶
The pandas I/O API has a set of top level functions to let us work with a wide variety of file types.
In this lesson we'll show some file types pandas can work with besides the most known CSV, JSON and XLSX types.
import pandas as pd
When it comes to something like machine learning, after training your models, these models can't be saved to a .txt or .csv file, becouse it's an object with complex binary data.
Luckily, in programming, there are various terms for the process of saving binary data to a file that can be accessed later. In Python, this is called pickling. You may know it as serialization, or maybe even something else.
For our lucky, pandas handles pickles in its IO module, and all pandas objects are equipped with the
df = pd.DataFrame([[1,2,3], [4,5,6]], columns=['A','B','C']) df
to_pickle method uses Python's
cPickle module to save data structures to disk using the pickle format.
��� �pandas.core.frame�� DataFrame���)��}�(�_data��pandas.core.internals.managers��BlockManager���)��(]�(�pandas.core.indexes.base�� �pandas.core.indexes.range�� h}�(hhhK ��h��R�(KK��h!�]�(h%h&h'et�bh)Nu��R�a}��0.14.1�}�(�axes�h �blocks�]�}�(�values�h8�mgr_locs��builtins��slice���K KK��R�uaust�b�_typ�� dataframe�� _metadata�]�ub.
read_pickle method can be used to load any pickled pandas object (or any other pickled object) from file:
df = pd.read_pickle('out.pkl')
A handy way to grab data is to use the
read_clipboard method, which takes the contents of the clipboard buffer and passes them to the
For instance, you can copy the following text to the clipboard (CTRL-C on many operating systems):
A B C x 1 4 p y 2 5 q z 3 6 r
And then import the data directly to a
DataFrame by calling
df = pd.read_clipboard()
to_clipboard method can be used to write the contents of a
DataFrame to the clipboard.
Following which you can paste the clipboard contents into other applications (CTRL-V on many operating systems).
We can see that we got the same content back, which we had earlier written to the clipboard.
The top-level function
read_sas() can read (but not write) SAS xport (.XPT) and (since v0.18.0) SAS7BDAT (.sas7bdat) format files.
SAS files only contain two value types: ASCII text and floating point values (usually 8 bytes but sometimes truncated). For xport files, there is no automatic type conversion to integers, dates, or categoricals. For SAS7BDAT files, the format codes may allow date variables to be automatically converted to dates. By default the whole file is read and returned as a
We are going to load the
airline.sav7bdat file into a pandas
DataFrame using the