Projects

Socioeconomic Indicators in Chicago Data Analysis (using sqlite3)

DaTALK 2021. 4. 25.

The city of Chicago released a dataset of socioeconomic data to the Chicago City Portal. This dataset contains a selection of six socioeconomic indicators of public health significance and a “hardship index,” for each Chicago community area, for the years 2008 – 2012.

Scores on the hardship index can range from 1 to 100, with a higher index number representing a greater level of hardship.

A detailed description of the dataset can be found on the city of Chicago's website, but to summarize, the dataset has the following variables:

Community Area Number (ca): Used to uniquely identify each row of the dataset
Community Area Name (community_area_name): The name of the region in the city of Chicago
Percent of Housing Crowded (percent_of_housing_crowded): Percent of occupied housing units with more than one person per room
Percent Households Below Poverty (percent_households_below_poverty): Percent of households living below the federal poverty line
Percent Aged 16+ Unemployed (percent_aged_16_unemployed): Percent of persons over the age of 16 years that are unemployed
Percent Aged 25+ without High School Diploma (percent_aged_25_without_high_school_diploma): Percent of persons over the age of 25 years without a high school education
Percent Aged Under 18 or Over 64:Percent of population under 18 or over 64 years of age (percent_aged_under_18_or_over_64): (ie. dependents)
Per Capita Income (per_capita_income_): Community Area per capita income is estimated as the sum of tract-level aggragate incomes divided by the total population
Hardship Index (hardship_index): Score that incorporates each of the six selected socioeconomic indicators

In this Lab, we'll take a look at the variables in the socioeconomic indicators dataset and do some basic analysis with Python.

- the dataset looks like:

- we will connect the sqlite3 DB browser and make a new database file:

import sqlite3
conn = sqlite3.connect('new_name.sqlite')
cur = conn.cursor()
chicago_socioeconomic_data.to_sql('chicago',conn) # table name is 'chicago'

Q) How many rows are in the dataset?

cur.execute('select count(*) from chicago')
for row in cur:
    print(row[0])

ans) 78

Q) How many community areas in Chicago have a hardship index greater than 50.0?

cur.execute('select count(*) from chicago where hardship_index > 50.0')
for row in cur:
    print(row[0])

ans) 38

Q) What is the maximum value of hardship index in this dataset?

cur.execute('select MAX(hardship_index) from chicago')
for row in cur:
    print(row[0])

ans) 98.0

Q) Which community area which has the highest hardship index?

cur.execute('select community_area_name from chicago where hardship_index = 98.0')
for row in cur:
    print(row[0])

ans) Riverdale

Q) Which Chicago community areas have per-capita incomes greater than $60,000?

cur.execute('select community_area_name from chicago where per_capita_income_ > 60000')
for row in cur:
    print(row[0])

ans) Lake View, Lincoln Park, Near North Side, Loop

visualization Q) Create a scatter plot using the variables per_capita_income_ and hardship_index. Explain the correlation between the two variables.

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import pandas

selectQuery = "select per_capita_income_, hardship_index from chicago"
income_vs_hardship = pandas.read_sql(selectQuery, conn)
plot = sns.jointplot(x='per_capita_income_',y='hardship_index', data=income_vs_hardship)

- as Per Capita Income rises as the Hardship Index decreases

- the points on the scatter plot are somewhat closer to a straight line in the negative direction

(we have a negative correlation between the two variables)

'Projects' 카테고리의 다른 글

Analyzing US Economic Data - House Sales in King County, USA (0)	2021.06.28
Which is the best model to be able to predict car price based on the dataset? (0)	2021.06.05
What are the main characteristics which have the most impact on the car price? (0)	2021.05.24
Socioeconomic Indicators, Schools & Crimes in Chicago Data Analysis (using sqlite3) (0)	2021.04.25
Analyzing Historical Stock/Revenue Data and Building a Dashboard (0)	2021.04.07

Socioeconomic Indicators in Chicago Data Analysis (using sqlite3)

'Projects' 카테고리의 다른 글

댓글

티스토리툴바