ISSS608 Project Weblog: Assignment - London Crime Geo-referenced Time Series Clustering

Package	Description
dendextend	Functions to adjust the visualisation of a dendrogram object.
dtwclust	Clustering using DTW algorithm.
egg	Extensions for ggplot2 objects, that helps customisation.
ggdendro	Functions to extract the data from a
RColorBrewer	Provides colour palettes.
ggthemes	Adds extra themes, geoms, and sclaes for ggplot2.
lubridate	Functions to deal with date-time data.
plotly	Create interactive version of ggplot2 graphs using plotly.js library.
sf	A way to encode spatial vector data in a simple features format.
tidyverse	A set of packages that deals with tidy data format for data science.
timetk	Toolkit for working with time series in R.
tmap	Flexible visualisation package for geographical maps.
zoo	Z’s ordered observations to deal with irregular time series pattern.

Arguments	Description
series	Data in a form of list of series, numeric matrix, or a data frame where each observation is in a a row and the columns shows the time period. Can allow the users to choose the geospatial aggregation level (borough or ward), temporal aggreagation level (monthly or yearly), crime aggregation level (major or minor category, and which category), and crime data (raw count or normalized by population).
type	Type of clustering method to use. [“partitional,” “hierarchical,” “fuzzy”]
k	Number of clusters
preproc	function to pre-process the data, default is zscore() if centroid = “shape.” [“No preprocessing,” “zscore”]
distance	distance measure to see the dissimilarity between two time series. [“dtw,” “dtw2,” “dtw_basic,” “dtw_lb,” “lbk,” “sbd,” “gak,” “sdtw”]
centroid	a string or function to calculate centroid. Fuzzy clustering uses standard fuzzy c-means centroid by default. [“mean,” “median,” “shape,” “dba,” “sdtw_cent,” “pam,” “fcm,” “fcmdd”]

Ali, Mohammed, Ali Alqahtani, Mark W. Jones, and Xianghua Xie. 2019. “Clustering and Classification for Time Series Data in Visual Analytics: A Survey.” IEEE Access 7: 181314–38. https://doi.org/10.1109/access.2019.2958551.

Ansari, Mohd Yousuf, Amir Ahmad, Shehroz S. Khan, Gopal Bhushan, and Mainuddin. 2019. “Spatiotemporal Clustering: A Review.” Artificial Intelligence Review 53 (July): 2381–2423. https://doi.org/10.1007/s10462-019-09736-1.

“Beautiful Dendrogram Visualizations in r: 5+ Must Known Methods - Unsupervised Machine Learning - Easy Guides - Wiki - STHDA.” 2020. Sthda.com. http://www.sthda.com/english/wiki/beautiful-dendrogram-visualizations-in-r-5-must-known-methods-unsupervised-machine-learning.

Datastore, London. 2021a. “Recorded Crime: Geographic Breakdown.” London.gov.uk. https://data.london.gov.uk/dataset/recorded_crime_summary.

———. 2021b. “Statistical GIS Boundary Files for London – London Datastore.” London.gov.uk. https://data.london.gov.uk/dataset/statistical-gis-boundary-files-london.

Galili, Tal. 2020. “Frequently Asked Questions.” R-project.org. https://cran.r-project.org/web/packages/dendextend/vignettes/FAQ.html.

Hunt, Joel. 2019. “From Crime Mapping to Crime Forecasting: The Evolution of Place-Based Policing.” National Institute of Justice. https://nij.ojp.gov/topics/articles/crime-mapping-crime-forecasting-evolution-place-based-policing.

Kumar, Satyam. 2020. “Silhouette Method — Better Than Elbow Method to Find Optimal Clusters.” Medium; Towards Data Science. https://towardsdatascience.com/silhouette-method-better-than-elbow-method-to-find-optimal-clusters-378d62ff6891.

Li, Tianyi, Gregorio Convertino, Wenbo Wang, Haley Most, Virginia Cloudera, Cloudera Cloudera, Tristan Zajonc, and Yi-Hsun Tsai. n.d. “HyperTuner: Visual Analytics for Hyperparameter Tuning by Professionals User Experience Design User Experience Design Cloudera Data Science Workbench.” https://learningfromusersworkshop.github.io/papers/HyperTuner.pdf.

Moraga, Paula. 2017. “SpatialEpiApp : A Shiny Web Application for the Analysis of Spatial and Spatio-Temporal Disease Data.” Spatial and Spatio-Temporal Epidemiology 23 (November): 47–57. https://doi.org/10.1016/j.sste.2017.08.001.

Santos, Rachel Boba. 2017. Crime Analysis with Crime Mapping. Sage Publications, Inc. https://www.sagepub.com/sites/default/files/upm-binaries/46973_CH_1.pdf.

Sardá-Espinosa, Alexis. 2019. “Time-Series Clustering in r Using the Dtwclust Package.” The R Journal 11: 22. https://doi.org/10.32614/rj-2019-023.

Waehner, Kai. 2016. “Using Visual Analytics for Better Decisions: An Online Guide - RTInsights.” RTInsights. https://www.rtinsights.com/visual-analytics-data-discovery-death-pill/.

Assignment - London Crime Geo-referenced Time Series Clustering

Author

Affiliation

Published

DOI

1. Introduction

2. Literature Review

2.1. Clustering with spatiotemporal data

2.2. Dynamic Time Warping

2.3. Visual Analytics for Spatiotemporal Clustering

2.3.1. dtwclust Sample Shiny App

2.3.2. SpatialEpiApp Spatiotemporal Clustering for Disease

2.4. R Packages

3. Exploratory Data Analysis

3.1. Set Up Environment

3.2. Crime Data

3.3. Time Series Data Exploration

3.3.2 Explore raw count of crime recorded

3.3.2 Normalize crime count by population

3.4. Geospatial Data Exploration

4. Time Series Clustering

4.1. Partitional & Fuzzy Dynamic Time Warping Clustering

4.1.1. Silhouette Plot

4.1.2. Time Series by Partitional Cluster

4.1.3. Map by Partitional Cluster

4.2. Hierarchical Dynamic Time Warping Clustering

4.2.1. Dendrogram Plot

dendextend

ggdendro

4.2.2. Time Series by Hierarchical Cluster

4.2.3. Map by Hierarchical Cluster

4.3. Tadpole Dynamic Time Warping Clustering

5. Proposed Interactive Visualisation

5.1. Advantages of Incorporating Interactive Visual Analytics

Footnotes

References