Workshop 8

Last updated: August 19th, 20202020-08-19Project preview

Police Data Project EOU3 - Data Preparation

Before running the Import Data Cell:

  • Ensure that police_data.zip (download from Aptem WS component, do not unzip) is saved in the SAME FOLDER AS THIS NOTEBOOK.

  • If you're still getting an error, try restarting your notebooks kernel (Above the Run button: Kernel > Restart) and re-running cell.

Import Data

In [3]:
# We use the import statement:

import pandas as pd              # This imports functionality for working with data tables
import os                        # Mainly used to navigate the folder structure on your computer

#os.chdir("") #Only used if you want to change working directory to point to where they are saved, if they're not saved in this notebook.

try:
    outcomes_data = pd.read_csv('outcomes_data.csv')
    stopandsearch_data = pd.read_csv('stopandsearch_data.csv')
    street_data = pd.read_csv('street_data.csv')
    print('Code has finished... no need to run this block again unless you restart your kernel')
except:
    print('Ensure outcomes_data.csv, street_data.csv and stopandsearch_data.csv are saved in the same folder as this notebook OR change the working directory to point to where they are saved using os.chdir() above.')
Code has finished... no need to run this block again unless you restart your kernel
In [4]:
outcomes_data.head()
Out[4]:
Unnamed: 0 Crime ID Month Reported by Falls within Longitude Latitude Location LSOA code LSOA name Outcome type Region
0 4 9f4e6d40277337c43f4e509dc2926f59dcb3c12b0c7e2b... 2019-07- Avon and Somerset Constabulary Avon and Somerset Constabulary NaN NaN No location NaN NaN Unable to prosecute suspect Avon-and-somerset
1 28 a13a45347e7b560b6dbdcb3e19f9c7bacaa33761287926... 2019-07- Avon and Somerset Constabulary Avon and Somerset Constabulary -2.576695 51.459326 On or near HARLESTON STREET E01033358 Bristol 054D Suspect charged Avon-and-somerset
2 78 6675b2f1d846767218e04e590f7df36f54031f791b984e... 2019-07- Avon and Somerset Constabulary Avon and Somerset Constabulary NaN NaN No location NaN NaN Local resolution Avon-and-somerset
3 94 e8ccedaee2a5a0cb86f74037762108e0054674a9931138... 2019-07- Avon and Somerset Constabulary Avon and Somerset Constabulary NaN NaN No location NaN NaN Suspect charged Avon-and-somerset
4 108 524aebdcc6cb569125af55e35bb3db6d73fa732e5a28a1... 2019-07- Avon and Somerset Constabulary Avon and Somerset Constabulary NaN NaN No location NaN NaN Unable to prosecute suspect Avon-and-somerset
In [3]:
stopandsearch_data.head(2)
Out[3]:
Unnamed: 0 Type Date Part of a policing operation Policing operation Latitude Longitude Gender Age range Self-defined ethnicity Officer-defined ethnicity Legislation Object of search Outcome Outcome linked to object of search Removal of more than just outer clothing Region Month
0 3490 Person and Vehicle search 2019-08-05T15:20:00+00:00 False NaN 51.542304 -0.054589 Male 18-24 Black/African/Caribbean/Black British - African Black Misuse of Drugs Act 1971 (section 23) Controlled drugs A no further action disposal NaN NaN Metropolitan 2019-08-
1 80 Person and Vehicle search 2019-07-05T18:35:00+00:00 False NaN NaN NaN Male 18-24 White - English/Welsh/Scottish/Northern Irish/... White Misuse of Drugs Act 1971 (section 23) Controlled drugs Arrest False True Hertfordshire 2019-07-
In [4]:
street_data.head(2)
Out[4]:
Unnamed: 0 Crime ID Month Reported by Falls within Longitude Latitude Location LSOA code LSOA name Crime type Last outcome category Context Region
0 12 aec28f2c86e9a44e79ab531dcf5bd1ee0c64eceb9a1846... 2019-07- Avon and Somerset Constabulary Avon and Somerset Constabulary -2.511761 51.409966 On or near Caernarvon Close E01014399 Bath and North East Somerset 001A Violence and sexual offences Unable to prosecute suspect NaN Avon-and-somerset
1 70 a121b01ae9645cedfc75adc7f5570b86391a6ac80095f9... 2019-07- Avon and Somerset Constabulary Avon and Somerset Constabulary -2.500575 51.411345 On or near Sherwood Road E01014403 Bath and North East Somerset 002B Criminal damage and arson Investigation complete; no suspect identified NaN Avon-and-somerset

Creating a Dataframe ready for modelling

Looking at the three tables (street_data, outcomes_data, stopandsearch_data) and considering how to answer the challenge posed in the brief, we are going to create a single table from the output with the following columns:

  • Crime ID x outcomes
  • Region x outcomes
  • Latitude x outcomes
  • Created a col that was a north south flag
  • Outcome type x
  • Crime type - street
  • create a col that was a pos neg flag x outcomes
  • Last outcome category - confirm final outcome of crime
In [5]:
### Positive == resolved/closed, positive effect, offender off the streets
In [6]:
# We will need numpy for the np.where function, used later
import numpy as np
In [7]:
outcomes_data['Outcome type'].unique()
Out[7]:
array(['Unable to prosecute suspect', 'Suspect charged',
       'Local resolution', 'Suspect charged as part of another case',
       'Formal action is not in the public interest',
       'Further investigation is not in the public interest',
       'Investigation complete; no suspect identified',
       'Offender given a caution',
       'Further action is not in the public interest',
       'Action to be taken by another organisation',
       'Offender given a drugs possession warning',
       'Offender given penalty notice'], dtype=object)
In [8]:
outcomes_data['Outcome type'].value_counts()
Out[8]:
Unable to prosecute suspect                            50460
Investigation complete; no suspect identified          48004
Suspect charged                                        13456
Local resolution                                        3571
Offender given a caution                                2205
Further investigation is not in the public interest     1831
Action to be taken by another organisation              1496
Formal action is not in the public interest             1233
Offender given a drugs possession warning                567
Offender given penalty notice                            392
Suspect charged as part of another case                  197
Further action is not in the public interest              66
Name: Outcome type, dtype: int64
In [9]:
### Defining Postive/Negative

#step 1. pull out distinct values of col

#.unique()
#.value_count()

outcomes_data['Outcome type'].unique()


##step define: pos % neg

## Positive - Resolution 
# Local resolution
# Suspect charged
# Offender given a drugs possession warning
# Offender given a caution
# Suspect charged as part of another case
# Offender given penalty notice

## Negative - No resolution 
# Unable to prosecute suspect
# no suspect identified

## Neutral (remove)
# Formal action is not in the public interest
# Further investigation is not in the public interest
# Further action is not in the public interest
# Action to be taken by another organisation
# Investigation complete
Out[9]:
array(['Unable to prosecute suspect', 'Suspect charged',
       'Local resolution', 'Suspect charged as part of another case',
       'Formal action is not in the public interest',
       'Further investigation is not in the public interest',
       'Investigation complete; no suspect identified',
       'Offender given a caution',
       'Further action is not in the public interest',
       'Action to be taken by another organisation',
       'Offender given a drugs possession warning',
       'Offender given penalty notice'], dtype=object)
In [10]:
outcomes_data['Outcome type'].isin({'Local resolution', 'Suspect charged'})
Out[10]:
0         False
1          True
2          True
3          True
4         False
          ...  
123473    False
123474    False
123475    False
123476    False
123477    False
Name: Outcome type, Length: 123478, dtype: bool
In [11]:
## Create new column called 'Pos_Neg'
import numpy as np

##new column where Positive outcomes is 1
outcomes_data['Pos_Neg']= np.where(outcomes_data['Outcome type'].isin({'Local resolution', 'Suspect charged','Awaiting court outcome','Offender given a drugs possession warning','Offender given a caution','Suspect charged as part of another case','Offender given penalty notice' }),1,0)
In [12]:
outcomes_data.head()
Out[12]:
Unnamed: 0 Crime ID Month Reported by Falls within Longitude Latitude Location LSOA code LSOA name Outcome type Region Pos_Neg
0 4 9f4e6d40277337c43f4e509dc2926f59dcb3c12b0c7e2b... 2019-07- Avon and Somerset Constabulary Avon and Somerset Constabulary NaN NaN No location NaN NaN Unable to prosecute suspect Avon-and-somerset 0
1 28 a13a45347e7b560b6dbdcb3e19f9c7bacaa33761287926... 2019-07- Avon and Somerset Constabulary Avon and Somerset Constabulary -2.576695 51.459326 On or near HARLESTON STREET E01033358 Bristol 054D Suspect charged Avon-and-somerset 1
2 78 6675b2f1d846767218e04e590f7df36f54031f791b984e... 2019-07- Avon and Somerset Constabulary Avon and Somerset Constabulary NaN NaN No location NaN NaN Local resolution Avon-and-somerset 1
3 94 e8ccedaee2a5a0cb86f74037762108e0054674a9931138... 2019-07- Avon and Somerset Constabulary Avon and Somerset Constabulary NaN NaN No location NaN NaN Suspect charged Avon-and-somerset 1
4 108 524aebdcc6cb569125af55e35bb3db6d73fa732e5a28a1... 2019-07- Avon and Somerset Constabulary Avon and Somerset Constabulary NaN NaN No location NaN NaN Unable to prosecute suspect Avon-and-somerset 0
In [13]:
## They only asked us to analyse the positive v negative outcomes 
##only want to return positive or negative (Filter to remove Neutral outcomes)
outcomes_data_filtered = outcomes_data[outcomes_data['Outcome type'].isin({'Formal action is not in the public interest'
                                                  , 'Further investigation is not in the public interest'
                                                  , 'Further action is not in the public interest'
                                                  , 'Action to be taken by another organisation'}) == False]
In [14]:
outcomes_data_filtered.head()
Out[14]:
Unnamed: 0 Crime ID Month Reported by Falls within Longitude Latitude Location LSOA code LSOA name Outcome type Region Pos_Neg
0 4 9f4e6d40277337c43f4e509dc2926f59dcb3c12b0c7e2b... 2019-07- Avon and Somerset Constabulary Avon and Somerset Constabulary NaN NaN No location NaN NaN Unable to prosecute suspect Avon-and-somerset 0
1 28 a13a45347e7b560b6dbdcb3e19f9c7bacaa33761287926... 2019-07- Avon and Somerset Constabulary Avon and Somerset Constabulary -2.576695 51.459326 On or near HARLESTON STREET E01033358 Bristol 054D Suspect charged Avon-and-somerset 1
2 78 6675b2f1d846767218e04e590f7df36f54031f791b984e... 2019-07- Avon and Somerset Constabulary Avon and Somerset Constabulary NaN NaN No location NaN NaN Local resolution Avon-and-somerset 1
3 94 e8ccedaee2a5a0cb86f74037762108e0054674a9931138... 2019-07- Avon and Somerset Constabulary Avon and Somerset Constabulary NaN NaN No location NaN NaN Suspect charged Avon-and-somerset 1
4 108 524aebdcc6cb569125af55e35bb3db6d73fa732e5a28a1... 2019-07- Avon and Somerset Constabulary Avon and Somerset Constabulary NaN NaN No location NaN NaN Unable to prosecute suspect Avon-and-somerset 0
In [15]:
outcomes_data_filtered = outcomes_data_filtered[['Crime ID','Outcome type','Latitude', 'Pos_Neg','Region','Month']]

### Outcome Type
### Crime ID
### Lattitude
### Pos Neg
### Region
### Month 
In [16]:
### Use only a selection of columns
outcomes_data_filtered = outcomes_data_filtered[['Crime ID','Outcome type','Latitude', 'Pos_Neg','Region','Month']]
outcomes_data_filtered.head(2)
Out[16]:
Crime ID Outcome type Latitude Pos_Neg Region Month
0 9f4e6d40277337c43f4e509dc2926f59dcb3c12b0c7e2b... Unable to prosecute suspect NaN 0 Avon-and-somerset 2019-07-
1 a13a45347e7b560b6dbdcb3e19f9c7bacaa33761287926... Suspect charged 51.459326 1 Avon-and-somerset 2019-07-
In [17]:
### Now we need to find some data from the street_data Dataframe
street_data.head(2)
Out[17]:
Unnamed: 0 Crime ID Month Reported by Falls within Longitude Latitude Location LSOA code LSOA name Crime type Last outcome category Context Region
0 12 aec28f2c86e9a44e79ab531dcf5bd1ee0c64eceb9a1846... 2019-07- Avon and Somerset Constabulary Avon and Somerset Constabulary -2.511761 51.409966 On or near Caernarvon Close E01014399 Bath and North East Somerset 001A Violence and sexual offences Unable to prosecute suspect NaN Avon-and-somerset
1 70 a121b01ae9645cedfc75adc7f5570b86391a6ac80095f9... 2019-07- Avon and Somerset Constabulary Avon and Somerset Constabulary -2.500575 51.411345 On or near Sherwood Road E01014403 Bath and North East Somerset 002B Criminal damage and arson Investigation complete; no suspect identified NaN Avon-and-somerset
In [ ]:
 
In [18]:
left_table = outcomes_data_filtered

right_table = street_data[['Crime ID', 'Crime type', 'Last outcome category']]

new_merge = left_table.merge(right_table, on='Crime ID', how='inner')
In [19]:
new_merge.head()
Out[19]:
Crime ID Outcome type Latitude Pos_Neg Region Month Crime type Last outcome category
0 24f8c2ec57dd6c5d1cd7c24087813fa628fe6fcf02f3e2... Suspect charged NaN 1 Avon-and-somerset 2019-07- Public order Awaiting court outcome
1 17bcc4da67beec3a149b48bc573fe123883b006fd8f169... Investigation complete; no suspect identified NaN 0 Avon-and-somerset 2019-07- Criminal damage and arson Investigation complete; no suspect identified
2 90324fd7400d9e75fdc1874045a23ff09670cc134bf2bd... Unable to prosecute suspect NaN 0 Avon-and-somerset 2019-07- Violence and sexual offences Unable to prosecute suspect
3 7e2a52c80b34196226b07fca28e68cb687f746339770c6... Unable to prosecute suspect NaN 0 Avon-and-somerset 2019-07- Public order Unable to prosecute suspect
4 1e679eb27027b86205373f4115c851398e0b3709afe14c... Suspect charged NaN 1 Avon-and-somerset 2019-07- Drugs Awaiting court outcome
In [20]:
#- Crime ID x outcomes
#- Region x outcomes
#- Latitude x outcomes
#- Created a col that was a north south flag 
#- Outcome type x
#- Crime type - street x
#- create a col that was a pos neg flag x outcomes
#- Last outcome category - confirm final outcome of crime
In [21]:
### Create sub table that contains count of crimes per region
In [22]:
new_merge.groupby(['Region', 'Crime type'])['Crime ID'].count()
Out[22]:
Region             Crime type                  
Avon-and-somerset  Bicycle theft                     2
                   Burglary                         49
                   Criminal damage and arson       151
                   Drugs                            33
                   Other crime                      10
                                                  ... 
Wiltshire          Robbery                           6
                   Shoplifting                     169
                   Theft from the person            15
                   Vehicle crime                    95
                   Violence and sexual offences    355
Name: Crime ID, Length: 541, dtype: int64
In [23]:
region_count = new_merge.groupby(['Region'])['Crime ID'].count()
In [24]:
region_count = region_count.reset_index()
In [25]:
region_count.head()
Out[25]:
Region Crime ID
0 Avon-and-somerset 1875
1 Bedfordshire 1999
2 Cambridgeshire 2376
3 Cheshire 2839
4 City-of-london 121
In [26]:
left_table = new_merge
right_table = region_count

## join these two together to get final table :) - check in 10.56
## mutual col is region

final_table = left_table.merge(right_table, on='Region', how='inner')
In [27]:
final_table.head() ###End of workshop - the below cells are only there as they are part of the solns on Aptem
Out[27]:
Crime ID_x Outcome type Latitude Pos_Neg Region Month Crime type Last outcome category Crime ID_y
0 24f8c2ec57dd6c5d1cd7c24087813fa628fe6fcf02f3e2... Suspect charged NaN 1 Avon-and-somerset 2019-07- Public order Awaiting court outcome 1875
1 17bcc4da67beec3a149b48bc573fe123883b006fd8f169... Investigation complete; no suspect identified NaN 0 Avon-and-somerset 2019-07- Criminal damage and arson Investigation complete; no suspect identified 1875
2 90324fd7400d9e75fdc1874045a23ff09670cc134bf2bd... Unable to prosecute suspect NaN 0 Avon-and-somerset 2019-07- Violence and sexual offences Unable to prosecute suspect 1875
3 7e2a52c80b34196226b07fca28e68cb687f746339770c6... Unable to prosecute suspect NaN 0 Avon-and-somerset 2019-07- Public order Unable to prosecute suspect 1875
4 1e679eb27027b86205373f4115c851398e0b3709afe14c... Suspect charged NaN 1 Avon-and-somerset 2019-07- Drugs Awaiting court outcome 1875
In [28]:
### Reduce street_data to only the two relevant columns and merge onto what we have already
to_merge = street_data[['Crime ID','Crime type']]
merged_data = outcomes_data_filtered.merge(to_merge, left_on = 'Crime ID', right_on = 'Crime ID', how='left')
In [29]:
merged_data.head(2)
Out[29]:
Crime ID Outcome type Latitude Pos_Neg Region Month Crime type
0 9f4e6d40277337c43f4e509dc2926f59dcb3c12b0c7e2b... Unable to prosecute suspect NaN 0 Avon-and-somerset 2019-07- NaN
1 a13a45347e7b560b6dbdcb3e19f9c7bacaa33761287926... Suspect charged 51.459326 1 Avon-and-somerset 2019-07- NaN
In [30]:
### Only column not yet created is the total crime count for each region
# First create a dataframe containing a summary of crime by region
region_summary = pd.DataFrame(merged_data.groupby('Region')['Crime ID'].count()).reset_index()
region_summary.columns = ['Region','Crime Count']
In [31]:
region_summary.head(2)
Out[31]:
Region Crime Count
0 Avon-and-somerset 3208
1 Bedfordshire 2304
In [32]:
merged_data.head(2)
Out[32]:
Crime ID Outcome type Latitude Pos_Neg Region Month Crime type
0 9f4e6d40277337c43f4e509dc2926f59dcb3c12b0c7e2b... Unable to prosecute suspect NaN 0 Avon-and-somerset 2019-07- NaN
1 a13a45347e7b560b6dbdcb3e19f9c7bacaa33761287926... Suspect charged 51.459326 1 Avon-and-somerset 2019-07- NaN
In [33]:
### Merge on region totals for our final table before modelling
final_table = merged_data.merge(region_summary, left_on='Region', right_on='Region', how='inner')
In [34]:
final_table.head(2)
Out[34]:
Crime ID Outcome type Latitude Pos_Neg Region Month Crime type Crime Count
0 9f4e6d40277337c43f4e509dc2926f59dcb3c12b0c7e2b... Unable to prosecute suspect NaN 0 Avon-and-somerset 2019-07- NaN 3208
1 a13a45347e7b560b6dbdcb3e19f9c7bacaa33761287926... Suspect charged 51.459326 1 Avon-and-somerset 2019-07- NaN 3208
Notebooks AI
Notebooks AI Profile20060