Lecturer:
Dr Daniel Stamate
Lectures
(in bold)/
labs (in red)
/optional homeworks
|
Week 1 (11 Jan - ) |
Course presentation and Data Mining applications Data Mining: Introduction pdf Optional
Homework: Search for the entry Data Mining in Wikipedia,
and read about its notable uses. |
|
Week 2 (18 Jan - ) |
Data Mining Strategies
pdf Read Chapter 2 Lab: Diagnosing patients using Data Mining Optional Homework: Explore KDnuggets, one of the most popular websites with various information and resources related to Data Mining (see in particular DM software and suites). |
|
Week 3 (25 Jan - ) |
Data Mining with Weka: a tutorial Lab: Practice Data Mining with Weka - classification with Decision Trees and the Nearest Neighbour. Application in medical diagnoses and research Optional Homework: Explore profiles of jobs in Data Mining on KDnuggets website. |
|
Week 4 (1 Feb - ) |
Data Mining Techniques pdf [+ demo
with Weka on clustering and association analysis using datasets CCP_associations.csv
and kmeans.arff] Lab: Data Mining application in Customer Analytics |
|
Week 5 (8 Feb - ) |
Data Mining Techniques (continued) Lab: Continue work from previous week Optional Homework: Find a particular dataset that may present an interest for you to mine in the UCI KDD dataset repository. (The datasets are classified with respect to the practical problems to solve they have been created for) |
|
Week 6 (15 Feb - ) |
|
|
Week 7 (22 Feb - ) |
Data Mining Techniques (continued) Lab: Association analysis Optional Homework: Read about IBM SPSS Modeler (formerly Clementine, see additional resources entry below) |
|
Week 8 (1 Mar - ) |
Knowledge Discovery in Databases pdf Lab: Work on the assignment Optional Homework: Read about SAS Enterprise Miner (see additional resources entry below) |
|
Week 9 (8 Mar - ) |
Data Mining with Neural Networks pdf
[+ demo] Lab: Clustering numeric datasets Optional Homework: Read about RapidMiner (see additional resources entry below) |
|
Week 10 (15 Mar - ) |
Statistical Techniques pdf [bring
laptops with you for practice in class on Linear Regression -
Office buildings dataset
(output: value) and Portuguese wine dataset
(output: quality); estimation/prediction using Regression Trees -
Deer hunter dataset (output: Yes), Logistic
regression - Credit card promotions dataset (output:
LIPromotion) and Breast cancer dataset
(output: Class), Naive Bayesian classification -
Credit card promotions dataset (output attribute: Sex),
clustering using the EM technique - Iris plants dataset,
Conceptual clustering - Iris plants dataset] Lab: Apply the Knowledge Discovery in Data process with Weka's KnowledgeFlow Environment: Predictive Analytics for the Customer Retention problem Optional Homework: Read about Music Data Mining. See in this paper how Data Mining, in particular Weka with the C4.5 decision tree building algorithm (J48), could be used in the Automatic Music Classification problem. |
| Week 11 (22 Mar - ) |
Data Warehousing pdf Lab: Finish work from the previous week Optional Homework: Explore this website with useful/practical information about Data Warehousing |
| Past exams | Recent exam papers are available here; See student intranet for previous papers. |
| Revision week | Production rules and classifier evaluation |
Note: If reference to a book (chapter, section, exercise, etc) is made but the title is not provided explicitly, one should assume it is Roiger's book. See the essential titles below (Reading list).
-
Java coding: javacis338.zip
(library for handling datasets) and
Java
online tutorial
- Data Mining/Machine Learning software: Weka : lab working software (free download & documentation
website);
you are advised to install
the Weka book version 3.4.14 on your laptops for working at
home and/or running demos
in the lectures (this ensures full compatibility with the lab
material). A software presentation can be found here.
Optional Java coding tasks: (you may try one of these in
particular if you finished the lab work in a session)
T1 Code
in Java the 1R algorithm with dataset.
T2 Implement a Bayesian classifier.
T3 Implement the C4.5 algorithm.
Reading list
1.
[Lecture] Richard Roiger and Michael Geatz "Data Mining, a tutorial-based
primer", Addison Wesley, 2003
2. [Lecture] Jiawei Han and Micheline
Kamber "Data Mining: Concepts and Techniques",
Morgan Kaufmann, 2006 (2000 edition can also
be used)
3. [Lab] Ian Witten and Eibe Frank "Data Mining:
Practical Machine Learning Tools and Techniques" ,
Morgan Kaufmann, 2005
+
additional titles
in Course Description
Additional resources:
Datasets for mining
the ARFF
format used by Weka
various datasets archives
UCI KDD dataset repository
other
datasets sources
Connect
Weka to databases
download instructions
and files
Other Data Mining
software:
RapidMiner : free
download & documentation website
IBM
SPSS Modeler (formerly SPSS Clementine) plus
demo
SAS
Enterprise Miner plus documentation
Various
links
KDnuggets (Data
Mining, Knowledge Discovery, Genomic Mining, Web Mining)
KDNet (Information
on data mining and knowledge discovery)
Site maintained by
Daniel Stamate. Updated frequently.