San Francisco Crime Classification

Link to Original Kaggle Script

data_viz

So Lately I’ve been playing around with visualizations on Kaggle.

Here is one for SF Crime classifications which has been getting some attention.

For fun, I thought I’d do a comparison of the types of visualization platforms available from multiple programming languages.

1. JavaScript Charts

JavaScript has so many great timeless visualization eye candy, leading to a plethora of classic looks.

j1

j2

2. R Charts

  • Although maybe not the best general purpose language.
  • R just does so many things right on the visualization.

r1

  • Simple API Design that continues to be the standard reference

r2

3. Julia

julia1

julia2

4. Python

import os
import io
import codecs
import pandas as pd
import numpy as np
import string
import operator
from zipfile import ZipFile, is_zipfile

import seaborn as sns
import matplotlib.pyplot as plt
from contextlib import contextmanager
from string import capwords

# Plotting Options
sns.set_style("whitegrid")
sns.despine()

def plot_bar(df, title, filename):
    p = [
        'Set2', 'Paired', 'colorblind', 'husl',
        'Set1', 'coolwarm', 'RdYlGn', 'spectral'
    ]
    bar = df.plot(kind='barh',
                  title=title,
                  fontsize=8,
                  figsize=(12,8),
                  stacked=False,
                  width=.85,
                  colors = sns.color_palette(np.random.choice(p), len(df)),
    )
    bar.image.savefig(filename)
    plt.show()

def plot_top_crimes(df, column, title, fname, items=0):
    df.columns = df.columns.map(operator.methodcaller('lower'))

    by_col   = df.groupby(column)
    col_freq = by_col.size()
    col_freq.index = col_freq.index.map(capwords)

    col_freq.sort(ascending=True, inplace=True)

    plot_bar(col_freq[slice(-1, - items, -1)], title, fname)


def extract_csv(filepath):
    zp = ZipFile(filepath)
    csv = [f for f in zp.namelist() if os.path.splitext(f)[-1] == '.csv']
    return zp.open(csv.pop())

@contextmanager
def zip_csv_opener(filepath):
    fp = extract_csv(filepath) if is_zipfile(filepath) else open(filepath, 'rb')
    try:
        yield fp
    finally:
        fp.close()

def input_transformer(filepath):
    with zip_csv_opener(filepath) as fp:
        raw = fp.read().decode('utf-8')
    return pd.read_csv(io.StringIO(raw), parse_dates=True, index_col=0, na_values='NONE')

df = input_transformer('../input/train.csv.zip')

plot_top_crimes(df, 'category',   'Top Crime Categories',        'category.png')
plot_top_crimes(df, 'resolution', 'Top Crime Resolutions',       'resolution.png')
plot_top_crimes(df, 'pddistrict', 'Police Department Activity',  'police.png')
plot_top_crimes(df, 'dayofweek',  'Days of the Week',            'weekly.png')
plot_top_crimes(df, 'address',    'Top Crime Locations',         'location.png', items=20)
plot_top_crimes(df, 'descript',   'Descriptions',                'descript.png', items=20)

1 2 3 4 5 6

5. Tableau

SF Crime Dashboard

<div class='tableauPlaceholder' style='width: 1024px; height: 694px;'>

<noscript>
    <a href='#'>
        <img alt='Dashboard ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Sa&#47;SanFranciscoCrimeClassification&#47;Dashboard&#47;1_rss.png' style='border: none' />
    </a>
</noscript>

<object class='tableauViz' width='1024' height='694' style='display:none;'>
    <param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' />
    <param name='site_root' value='' />
    <param name='name' value='SanFranciscoCrimeClassification&#47;Dashboard' />
    <param name='tabs' value='no' />
    <param name='toolbar' value='yes' />
    <param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Sa&#47;SanFranciscoCrimeClassification&#47;Dashboard&#47;1.png' />
    <param name='animate_transition' value='yes' />
    <param name='display_static_image' value='yes' />
    <param name='display_spinner' value='yes' />
    <param name='display_overlay' value='yes' />
    <param name='display_count' value='yes' />
    <param name='filter' value='%3Aembed=y' />
    <param name='filter' value='%3AshowVizHome=no' />
    <param name='filter' value='%3AshowTabs=y' />
    <param name='filter' value='%3Adisplay_count=y' />
    <param name='filter' value='%3Adisplay_static_image=y' />
    <param name='filter' value='%3Aretry=yes' />
</object>

</div>
<script type='text/javascript' src='https://public.tableau.com/javascripts/api/viz_v1.js'></script>

Comments

comments powered by Disqus