Course leader: Daniel Stamate
Course Description: See Week 1 Course presentation below and Intranet Course information for more details
Coursework: available here
Lectures (in bold)/ labs (in red) /optional homeworks:
|
Week 1 (6 Oct - ) |
Course presentation and Data Mining applications Data Mining: Introduction
pdf No lab Optional Homework: Search for the entry Data Mining in Wikipedia, and read about its notable uses. |
|
Week 2 (13 Oct - ) |
Data Mining introductory concepts, and the k Nearest Neighbour algorithm (continued from week 1) Lab: Code in Java an Automatic Diagnosing System based on Nearest Neighbour and 3 Nearest Neighbour algorithms Optional Homework: Explore KDnuggets, one of the most popular websites with various information and resources related to Data Mining (see in particular DM software). |
|
Week 3 (20 Oct - ) |
Data Mining Strategies
pdf Lab: Finish your Java coding from the previous week Optional Homework: Explore profiles of jobs in Data Mining on KDnuggets website. |
|
Week 4 (27 Oct - ) |
Data Mining Techniques pdf
Lab: Code in Java an Automatic Diagnosing System based on the k Nearest Neighbour algorithm Optional Homework: Read about Text Mining. |
|
Week 5 (3 Nov - ) |
Data Mining Techniques (continued from previous week) [+ demo with Weka using tutorial] Lab: Java Data Mining – finish your coding from the previous lab sessions. Optional Homework: Find a particular dataset that may present an interest for you to mine in the UCI KDD dataset repository. (The datasets are classified with respect to the practical problems to solve they have been created for) |
|
Week 6 (10 Nov - ) |
Reading week |
|
Week 7 (17 Nov - ) |
Data Mining Techniques (continued from previous week) [ + demo with Weka on association analysis with the dataset CCP_associations.csv and clustering with the datasets kmeans.arff and numeric_dataset_cluster.csv] Lab: Applications of Weka's supervised learning algorithms in medical diagnosing and medical research Optional Homework: Read about SAS Enterprise Miner (see additional resources entry below) |
|
Week 8 (24 Nov - ) |
Knowledge Discovery in
Databases pdf Lab: Perform Association analysis with Weka (allocate 5 min), see an online demo for k-means (allocate 5 min), and perform a clustering and assess clustering quality with Weka (allocate 40 min) Optional Homework: Read about IBM SPSS Modeler (formerly Clementine, see additional resources entry below). |
|
Week 9 (1 Dec - ) |
Data Mining with Neural
Networks pdf
[+ demo with Weka's Backpropagation algorithm on Feed-Forward
Neural Nets with telecomservice
dataset for churner prediction, and Portuguese
wine dataset for wine quality estimation] Lab: Data Mining application in Customer Analytics (Customer Churn/Defection/Attrition) Optional Homework: Read about Bioinformatics and the application of Data Mining & Machine Learning in this area |
|
Week 10 (8 Dec - ) |
Statistical Techniques
pdf [bring
laptops with you for practice in class on Linear
Regression - Office buildings
dataset (output:
value) and Portuguese wine dataset
(output: quality); estimation/prediction
using Regression Trees -
Portuguese wine dataset
(output: quality); Logistic
regression - Credit card
promotions dataset
(output: LIPromotion) and Breast cancer dataset
(output: Class); Naive
Bayesian classification - Credit
card promotions dataset
(output attribute: Sex)] Lab: Finish work from the previous week Optional Homework: Read about Music Data Mining. See in this paper how Data Mining, in particular Weka with the C4.5 decision tree building algorithm (J48), could be used in the Automatic Music Classification problem. |
|
Week 11 (15 Dec - ) |
Data Warehousing
pdf Seminar Optional Homework: Explore this website with useful/practical information about Data Warehousing |
|
Past exams |
Recent exam papers are available here; See student intranet for previous papers. |
|
Revision week |
Note: If reference to a book (chapter, section, exercise, etc) is made but the title is not provided explicitly, one should assume it is Roiger's DM book. The slides are based on Roiger's DM book completed with Han's DM book (see the first two book titles for lectures in the reading list below).
Lab
software to be used
-
Java Data Mining coding:
Follow these instructions
the first time you use a machine.
-
Data Mining/Machine Learning software:
Weka
:
lab working software (free download & documentation website);
you
are advised to install the
version
3.4.14 on your laptops for working at home and/or running demos in
the lectures. This version ensures full compatibility with the
recommended Weka/lab book (see Reading list below) and the online
course material. Download 3.4.14
windows jre for PC or 3.4.14
osx for Mac – you may need to install also Java if not
already installed. Weka
will be presented in class – see online course material.
Supplementary material on Weka
can
be found in Witten's book (available in the Library) – see
Reading list below , or here.
If you need further help with the software installation or if you experience problems with your laptops provided by the Department contact/see the System Admin team (email systems@doc.gold.ac.uk, Room 1, 25 St James).
Optional
Java coding tasks: (you
may try one of these in particular if you finished the lab work in a
session)
T1
Code in Java the
1R algorithm with
dataset.
T2
Code in Java the Naïve Bayes classification algorithm seen in
class, using a dataset of your choice.
Reading
list
1.
[Lecture] Richard Roiger and Michael Geatz "Data Mining, a
tutorial-based primer", Addison Wesley, 2003
2. [Lecture] Jiawei Han and Micheline Kamber "Data Mining:
Concepts and Techniques",
Morgan Kaufmann, 2006
3. [Lab – the Weka book] Ian Witten and Eibe Frank "Data
Mining: Practical Machine Learning Tools and Techniques"
,
Morgan
Kaufmann, 2005
+
additional
titles in Course
Description
Additional
resources:
Datasets for mining
see lab material for datasets used in class
UCI
Machine Learning repository
Other
dataset sources
Connect
Weka to databases
download
instructions
and files
Major
commercial Data Mining
& Statistical
software:
from IBM: IBM
SPSS Modeler / SPSS Clementine and IBM
SPSS Statistics (+Data Mining algorithms; College
licence)
from SAS Institute: SAS
Enterprise Miner (with detailed documentation)
and SAS
Statistics (free use on the cloud for students)
Other
software
KDnuggets
(Data
Mining, Knowledge Discovery, Genomic Mining, Web Mining)
KDNet
(Information
on data mining and knowledge discovery)
© Daniel Stamate 2011