CIS338: Data Mining 

Course leader: Daniel Stamate

Announcement Board

Course Description: See Week 1 Course presentation below and Intranet Course information for more details

Coursework: available here

Lectures (in bold)/ labs (in red) /optional homeworks:

Week 1 (6 Oct - )

Course presentation and Data Mining applications

Data Mining: Introduction pdf
Read Chapter 1 of Roiger's Data Mining book, completed with Han's Data Mining book, see reading list below

No lab

Optional Homework: Search for the entry Data Mining in Wikipedia, and read about its notable uses.

Week 2 (13 Oct - )

Data Mining introductory concepts, and the k Nearest Neighbour algorithm (continued from week 1)

Lab: Code in Java an Automatic Diagnosing System based on Nearest Neighbour and 3 Nearest Neighbour algorithms

Optional Homework: Explore KDnuggets, one of the most popular websites with various information and resources related to Data Mining (see in particular DM software). 

Week 3 (20 Oct - )

Data Mining Strategies pdf
Read Chapter 2 

Lab: Finish your Java coding from the previous week

Optional Homework: Explore profiles of jobs in Data Mining on KDnuggets website.

Week 4 (27 Oct - )

Data Mining Techniques pdf
Read Chapter 3 (Section 3.4: Genetic Learning, is optional)

Lab: Code in Java an Automatic Diagnosing System based on the k Nearest Neighbour algorithm

Optional Homework: Read about Text Mining.

Week 5 (3 Nov - )

Data Mining Techniques (continued from previous week) [+ demo with Weka using tutorial]

Lab: Java Data Mining – finish your coding from the previous lab sessions.

Optional Homework:  Find a particular dataset that may present an interest for you to mine in the UCI KDD dataset repository. (The datasets are classified with respect to the practical problems to solve they have been created for)

Week 6 (10 Nov - )

Reading week

Week 7 (17 Nov - )

Data Mining Techniques (continued from previous week) [ + demo with Weka on association analysis with the dataset CCP_associations.csv and clustering with the datasets  kmeans.arff and numeric_dataset_cluster.csv]

Lab: Applications of Weka's supervised learning algorithms in medical diagnosing and medical research

Optional Homework: Read about SAS Enterprise Miner (see additional resources entry below)

Week 8 (24 Nov - )

Knowledge Discovery in Databases  pdf
Read Chapter 5

Lab: Perform Association analysis with Weka (allocate 5 min), see an online demo for k-means (allocate 5 min), and perform a clustering and assess clustering quality with Weka (allocate 40 min)

Optional Homework: Read about IBM SPSS Modeler (formerly Clementine, see additional resources entry below).

Week 9 (1 Dec - )

Data Mining with Neural Networks pdf [+ demo with Weka's Backpropagation algorithm on Feed-Forward Neural Nets with telecomservice dataset for churner prediction, and Portuguese wine dataset for wine quality estimation]
Read Chapter 8 (Section 8.5 is optional)

Lab: Data Mining application in Customer Analytics (Customer Churn/Defection/Attrition)

Optional Homework: Read about Bioinformatics and the application of Data Mining & Machine Learning in this area

Week 10 (8 Dec - )

Statistical Techniques pdf [bring laptops with you for practice in class on Linear Regression - Office buildings dataset (output: value) and Portuguese wine dataset (output: quality); estimation/prediction using Regression Trees - Portuguese wine dataset (output: quality); Logistic regression - Credit card promotions dataset (output: LIPromotion) and Breast cancer dataset (output: Class); Naive Bayesian classification - Credit card promotions dataset (output attribute: Sex)]
Read Chapter 10 (Expectation Maximisation Clustering and Conceptual Clustering are optional)

Lab:  Finish work from the previous week

Optional Homework: Read about Music Data Mining. See in this paper how Data Mining, in particular Weka with the C4.5 decision tree building algorithm (J48), could be used in the Automatic Music Classification problem.

Week 11 (15 Dec - )

Data Warehousing  pdf
Read Chapter 6 (Sections 6.1 and 6.4 are optional)

Seminar

Optional Homework: Explore this website with useful/practical information about Data Warehousing

Past exams

Recent exam papers are available here; See student intranet for previous papers.

Revision week

Production rules and classifier evaluation  

Note: If reference to a book (chapter, section, exercise, etc) is made but the title is not provided explicitly, one should assume it is Roiger's DM book. The slides are based on Roiger's DM book completed with Han's DM book (see the first two book titles for lectures in the reading list below).

Lab software to be used 
- Java Data Mining coding
: Follow these instructions the first time you use a machine.
- Data Mining/Machine Learning software:
Weka : lab working software (free download & documentation website);
you are advised to install
the version 3.4.14 on your laptops for working at home and/or running demos in the lectures. This version ensures full compatibility with the recommended Weka/lab book (see Reading list below) and the online course material. Download 3.4.14 windows jre for PC or 3.4.14 osx for Mac – you may need to install also Java if not already installed. Weka will be presented in class – see online course material. Supplementary material on Weka can be found in Witten's book (available in the Library) – see Reading list below , or here.

If you need further help with the software installation or if you experience problems with your laptops provided by the Department contact/see the System Admin team (email systems@doc.gold.ac.uk, Room 1, 25 St James).

Optional Java coding tasks: (you may try one of these in particular if you finished the lab work in a session)
T1 Code in Java
the 1R algorithm  with dataset.
T2 Code in Java the Naïve Bayes classification algorithm seen in class, using a dataset of your choice.

Reading list
    1. [Lecture] Richard Roiger and Michael Geatz "Data Mining, a tutorial-based primer", Addison Wesley, 2003
    2. [Lecture] Jiawei Han and Micheline Kamber "Data Mining: Concepts and Techniques", 
        Morgan Kaufmann, 2006

    3. [Lab – the Weka book] Ian Witten and Eibe Frank "Data Mining: Practical Machine Learning Tools and Techniques" , 
        Morgan Kaufmann, 2005
    + additional titles in Course Description

Additional resources:
       Datasets for mining
               see lab material for datasets used in class
               
UCI Machine Learning repository
               
Other dataset sources
       Connect Weka to databases 
               download instructions and files 
      
Major commercial Data Mining & Statistical software:
              from IBM: IBM SPSS Modeler / SPSS Clementine and IBM SPSS Statistics (+Data Mining algorithms; College licence)
              from SAS Institute: SAS Enterprise Miner (with detailed documentation) and SAS Statistics (free use on the cloud for students)
              Other software
       
KDnuggets (Data Mining, Knowledge Discovery, Genomic Mining, Web Mining)
        KDNet (Information on data mining and knowledge discovery)

© Daniel Stamate 2011