|
Course catalogue no: |
6117CIT
|
|
Course title: |
Adv Topics in Computing Science |
|
Field of Education Code |
Computer Science |
|
Program/s |
2011Bachelor of Information Technology with Honours Program Convenor: V. Estivill-Castro 5107 Master of Information and Communication Technology Program Convenor: J. Gasston |
|
School: |
Computing and Information Technology |
|
Faculty: |
Engineering and Information
Technology |
|
Status of Course within program/s or academic plan/s |
Elective, honours |
|
Credit point value |
10 |
|
Prerequisites: |
Enrolment in Honours Program
or MIT |
|
Year and semester: |
Semester 1, 2003 and 2004 |
|
Course convenor |
Assoc.
Prof. Vladimir Estivill-Castro Office: Room 1.14 |
|
Teaching team members: |
Same as Course convenor |
|
Date course outline was last modified |
Feb. 26th, 2003. |
Computer technology and databases have provided many companies, institutions, government agencies and corporations with extraordinary power to collect and manipulate data about almost every aspect of their function and their activities. Data mining is the exploration and analysis, by automatic or semiautomatic means, of large quantities of data in order to discover meaningful patterns and rules. While the interpretation of discovered patterns demands their presentation in visual form, statistics is probably the most familiar approach to summarizing several observations into few measurements of tendency and spread that translate raw data into information for decision-making. Machine learning techniques can be regarded as exploring more flexible non-parametric models as well as more representations for knowledge. Many of the statistical or machine learning approaches translate into large and difficult optimisation and search problems that demand the use of heuristics developed in artificial intelligence.
The major aims and objectives of the subject are to:
Because of the use of techniques from Machine Learning, this course has close links to 3146CIT: Machine Learning. Students who complete the Machine Learning course may find that many useful links can be established between Data Mining and Machine Learning. Also, because of the issues of managing large datasets, the contents of 3166CIT: Database Management Systems offers material that is in close relationship to Data Mining. In particular, topics like On-Line Analytical Processing can be studied from the perspective of Databases or from the perspective of Knowledge Discovery and Data Mining.
This subject introduces a selection of current research topics in computing science that are not covered elsewhere in the Honours course. The particular topics are Knowledge Discovery and Data Mining, Inductive Machine Learning (supervised and unsupervised), and applications like Privacy, Spatial Data Mining, WEB usage Mining and Multimedia Data Mining.
Lecture 1. - Introduction to Knowledge Discovery and Data Mining
Lecture 2. - On-line Analytical Processing (OLAP)
Lecture 3. - Association Rule Mining.
Lecture 4. - Relationship to Machine Learning, Classification and Evaluation of Classifiers
Lecture 5. - Relation to Statistics, illustration with linear discriminants.
Lecture 6. - Clustering and illustration with spatial and categorical data
Lecture 10 - Competitive Learning and Kohonen Networks
Lecture 11 - Spatial Data Mining
Lecture 12. - Web usage Mining
Lectures 1 to 8 cover fundamental concepts and techniques in the current body of knowledge for this field. Lectures 9 to 12 highlight current research topics and may be adjusted according to the interest from participants.
Emphasis will be placed in generic research skills. This course will teach written communication of summaries, executive reports and literature reviews of research articles in Knowledge Discovery and Data Mining. The students will practices these writing skills. This course will also teach analysis and critical evaluation. There will be guided practice in analysing the contribution and assessing the merit of research papers. Another aspect that will be emphasized is the analysis and critical evaluation of different paradigms in Machine learning or in the techniques of Knowledge Discovery and Data Mining. Issues where students will be asked to perform such practice are a consideration of traditional statistics vs. data mining.
Problem solving and decision-making will be further developed by practical problems in Data Mining where machine-learning techniques must be selected for their solution. Skills leading to professional effectiveness will be fostered by the debate of issues like Privacy in Data Mining and some links to the ethics of Data Mining Research.
This subject is Mode A - Web Supplemented. A complementary WEB-site will make available lecture notes, WEB resources and reading lists with materials to complement lectures. Thus, participation on-line is optional for the student. Enrolled students will access information additional to that available in the University's calendar or handbook. The information includes the course descriptions and study guides, examination information, assessment overview and reading. The information is used to supplement traditional forms of delivery.
There is flexibility
in choosing 5 out of 11 packages of readings and students can propose their
own package. Students can propose the topic for the application of their programming
assignment and the programming language to implement the algorithm.
The course will develop from an initial introduction to Knowledge Discovery and Data Mining. The student should be able after the first lecture to understand the spirit and motivation for the filed. This multidisciplinary filed has adopted and developed its own methods and techniques. These constitute core material for appreciating tools, algorithms and methods and to be in a position to apply them to practical settings. The course will present and illustrate these core techniques and their fundamental algorithms. The impact of the techniques in application or contemporary issues is explored in the later section of the subject.
The course will have 2-hour weekly lectures. Material as described in the course content will be presented and discussed. Students will have to complete suggested weakly readings to complement the material from the lectures. A package of weekly readings consists of 2 to 3 research papers and a chapter in a textbook. Practical activities will be assessed as items of assessment. These will constitute of 1) the composition of summaries, 2) the comparison of learning methods and 3) the implementation and programming of sample of algorithms and techniques towards a potential application.
Lectures will provide the material and subject matter towards objectives 1 to 5. The selected readings will contribute towards objective 6. The practical activities will be assessed and will contribute towards all objectives. The material of the first lecture and the first group of readings will address objective 1. Similarly, readings are grouped towards topics and in relation to objectives 1 to 4.
1. 5 executive summaries or research surveys (These are assignments that involve a package of reading and summarizing activities. Reading consists of reading 3-5 research articles and analysing and evaluating means preparing an executive summary or survey). They must be between 1500 amd 2000 words excluding references. Each is worth 4% each (a total of 20% for the subject). You can submit up to 8 summaries out of different packages of readings, the best 5 will be used to compute your grade. A list of readings and their packaging will be available in the WEB site. You must add a reading to each package yourself with publication date 2000 or later. You may choose to build a package of your own. In that case, all papers must be from proceedings of the ICDM or KDD Conferences after 2000 or from the journal Knowledge Discovery and Data Mining.
·
DUE DATE: Week 2, 4, 6, 8 and 10
2. 2. One programming assignment for 20%. You are required to program yourself an algorithm of your choice in the programming language of your choice. But is must clearly be an algorithm for a core techniques in Data Mining and Knowledge Discovery. You must provide a report of your implementation including testing that validates to some extent, its correctness.
·
DUE DATE: Week 12
3. One research project for 40%. You are to propose a topic or problem that can be addressed by Data Mining techniques. You must obtain a dataset for your problem (perhaps data available on the WEB, or you may postulate an industrial partner who would supply the data). You must use at least two data mining techniques to attempt to solve the problem. You may use public domain software or demo version of commercial systems. You must produce a report analysing and critically valuating your experience.
·
DUE DATE: Week 7
4.
Exams. There will be 1 Final exam worth 20%.
·
EXAM Period
Packages of readings are directly linked to specific objectives. For instance, the first package contains articles that debate the nature and role of Data Mining Technology. Reading this particular package will expand on your understanding of why Data Mining technology has emerged and what can Data Mining do. The analysis and critical evaluation skills put in practice by performing an executive summary of the readings in the package will confirm such understanding.
Readings and executive summaries will also advance your core research skills; in particular, comprehension of recent research publications, summarization and analysis of literature in a very specific topic.
The practical implementation of an algorithm for a core task of data mining will confirm the understanding from lectures and readings. Arguing about its correctness will ensure that the inner workings and the subtle aspects of the data structures are fully understood.
The comparison of two data mining techniques and their application to a particular problem will reinforce objective 6.
There is no prescribed textbook. However, the following constitute excellent references for expanding on the material in Lectures or in the readings. Readings will be made available through the course WEB site.
|
Title |
Data mining: concepts and techniques
/ Jiawei Han and Micheline Kamber. |
|||
|
Author |
||||
|
Publication |
San Francisco: Morgan Kaufmann Publishers,
2001. |
|||
|
Description |
xxiv, 550 p. : ill. ; 24 cm. |
|||
|
Series |
||||
|
QGU Nathan |
QA76.9.D343 H36 2001 |
|||
|
Title |
Data mining : practical machine
learning tools and techniques with Java implementations / Ian H. Witten,
Eibe Frank. |
|||
|
Author |
||||
|
Publication |
San Francisco, Calif. : Morgan Kaufmann,
2000. |
|||
|
QGU Logan |
QA76.9.D343 W58 2000 |
|||
|
Title |
Data mining techniques : for marketing,
sales, and customer support / Michael J.A. Berry, Gordon Linoff. |
|||
|
Author |
||||
|
Publication |
New York ; Chichester, [England]
: Wiley Computer Pub., c1997. |
|||
|
QGU Nathan |
HF5415.125 .B47 1997 |
|||
This course will
be evaluated using a student questionnaire with some open questions.
Because of the
close links between Machine Learning and Data Mining, and effort will be made
to coordinate the two subjects such that they can provide some learning support
to each other. In particular, lecture timetabling may be adjusted to facilitate
attending both set of lectures, and some topics may be rearranged in their
sequence.
The course convenor should be contacted by e-mail in first instance regarding any difficulties with the course. A weekly schedule of the course convenor activities is available at his personal WEB page and could be used to potentially arrange an appointment in case of an urgent matter.