The availability of low cost and ubiquitous sensors in city infrastructure provides high granular data at unprecedented spatiotemporal scales. “Smart Cities” envision to utilize this data to provide a healthy, happy and sustainable urban ecosystem by integrating the information and communication technology (ICT), Internet of things (IoT) and citizen participation to effectively manage and utilize city infrastructure and services. “Data Science” is an interdisciplinary field of scientific methods, processes, algorithms and systems to extract knowledge from data in various forms and provides fast and efficient understanding of the current dynamics of cities and ways to improve different services. This course will introduce scientific techniques that will allow the analysis, inference and prediction of large-scale data (e.g. GPS vehicular data, social media data, mobile phone data, individual social network data, etc.) that are present in city networks. Basics of the data science methods to analyze these datasets will be presented. The course will focus both on the methods and their application to smart-city problems. Python will be used to demonstrate the application of each method on datasets available to the instructor. Examples of problems that will be discussed include ridesharing platforms, smart and energy-efficient buildings, evacuation modeling, decision making during extreme events & urban resilience.
Overview
Syllabus
Unit 1. Introduction to Data Mining
Week 1: Introduction to the Course & Syllabus, Review of Statistical Methods
Instructor introduction, introduction to data mining, course overview, student introduction, introduction of statistical methods, modeling uncertainty, random variables, population and samples, and statistical inference
Week 2: Optimization, Data Pre-Processing
Introduction to optimization, optimization-basic concepts, optimization problem formulation, optimization algorithms, data and measurement, types of datasets, data quality, data pre-processing, and task identification
Week 3: Project Discussion/Introduction to Python
Introduction to Python, Python for data mining, optimization using Python, and data pre-processing using python
Unit 2. Data Mining Tasks
Week 4: Regression Analysis, Association Rule Mining
Introduction to regression analysis, Linear regression, Logistic regression models, Poisson regression models, applications of regression analysis to smart cities, introduction to associate rule mining, association rule mining applications to urban systems, and association rule mining approaches
Week 5: Association Rule Mining, Statistical Classification
A-priori algorithm, F-P growth algorithm, ECLAT, evaluation methods, introduction to the classification problem, Logistic regression, Naïve Bayes classifier, and Bayesian network classifier
Weeks 6 and 7: Decision Tree, Support Vector Machines
Introduction to decision trees, decision tree training, decision tree algorithms, practical issues with decision trees, introduction to support vector machines, support vector machines, ensemble classifiers, and classifier performance evaluation
Weeks 8 and 10: Introduction to Data Clustering, Clustering Algorithms: Partitional and Hierarchical
Introduction to data clustering, (dis)similarity measures, distribution (model)-based clustering algorithms, types of clustering algorithms, partitional clustering (k-means and its variants), and hierarchical clustering
Week 11: Other Clustering Approaches
Density-based clustering algorithms, cluster validity, characteristics of “data, clusters, and clustering algorithms”
Unit 3. Advanced Data Mining Techniques
Week 12: Neural Networks
Introduction to neural networks, a neuron model, learning an ANN model, multi-layer-feed-forward ANNs, ANN application to land use prediction
Week 13: Deep Learning
Introduction to deep learning, deep learning, and deep learning for smart cities
Week 14: Case studies of Data Science Applications for Smart Cities
Week 15: Virtual Exam and Project Submission
Taught by
Satish Ukkusuri