Tarique Anwar

Advanced Data Mining - CS724

Objective The objective of this course is to develop the state-of-the-art technical and research skills in Big Data Mining and Management.
Instructor Tarique Anwar
Office: Room 205, CSE Block, IIT Ropar
Email: tarique@iitrpr.ac.in
Teaching Assistants TBA
Class Schedule Monday, 09:00am-09:50am, Lecture
Tuesday, 10:00am-10:50am, Lecture
Wednesday, 11:00am-11:50am, Lecture
Venue M2, Lecture Hall Complex
Credits 4 (3 Lectures + 0 Tutotials + 2 Labs + 7 Self-study hours, weekly)
Who can take this course Pre-requisites: 1. CS356 (ADA) / CS506 (DSA), 2. CS524 (DM) / CS503 (ML)
For UG students, Grade must be [C] A- or higher in CS503 / CS524.

Syllabus ♣

Serial No. Topics Reference papers Lecture Notes / Slides
1. Managing and Mining Streaming and Time-Series Data Download Download
2. Managing and Mining Discrete Sequence Data Download Download
3. Managing and Mining Graph and Multirelational Data Download Download
4. Managing and Mining Spatial, Spatio-temporal, Object, Multimedia, Text, and Web Data Download Download
5. Managing and Mining Dynamic and Evolving Networks Download Download
6. Managing and Mining Urban Data Download Download
7. Recent Advancements in Big Data Mining and Management, Discussion on related research papers published in the last 5 years Download Download

Reference Papers

Serial No. Title Venue Download
1. gSparsify: Graph Motif Based Sparsification for Graph Clustering CIKM 2015 [Paper] [Presentation]
2. Automatic Discovery of Tactics in Spatio-Temporal Soccer Match Data SIGKDD 2018 [Paper] [Presentation]
3. The Flexible Socio Spatial Group Queries VLDB 2018 [Paper]
4. Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection ACM TKDD 2015 [Paper] [Presentation]
5. Efficient Computation of Multiple Density-Based Clustering Hierarchies ICDM 2017 [Paper]
6. MustaCHE: A Multiple Clustering Hierarchies Explorer VLDB 2018 [Paper] [Presentation]
7. Gotcha - Sly Malware! Scorpion: A Metagraph2vec Based Malware Detection System SIGKDD 2018 [Paper]
8. A Framework for Clustering Evolving Data Streams VLDB 2003 [Paper]
9. Clustering Stream Data by Exploring the Evolution of Density Mountain VLDB 2017 [Paper]
10. Real-time Constrained Cycle Detection in Large Dynamic Graphs VLDB 2018 [Paper]
11. Multiple Infection Sources Identification with Provable Guarantees CIKM 2016 [Paper]
12. Fast and Scalable Big Data Trajectory Clustering for Understanding Urban Mobility TITS 2018 [Paper] [Presentation]
13. Affective Neural Response Generation ECIR 2018 [Paper] [Presentation]

Assessment Policy ♣

Lab Report Oral Examination Weightage: 10%
Research Project Weightage: 40% 50%
Mid-Semster Examination ♦ Weightage: 25% 20%
Syllabus: Topics covered till the last class.
End-Semster Examination ♦ Weightage: 25% 20%
Syllabus: Entire Syllabus
Grading Policy A combination of absolute and relative grading will be followed.

♣ Tentative
♦ Some Quizzes and Exams will be open-book/notes. The exact format will be announced one day before the scheduled date. Keep checking the announcements at the bottom of this page.


1. Jiawei Han, Micheline Kamber and Jian Pei, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, 2011
2. Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Introduction to Data Mining, Addison-Wesley, 2005
3. Mohammad J Zaki and Wagner Meira Jr., Data Mining and Analysis: Fundamental Concepts and Algorithms, Cambridge University Press, 2014
4. Charu C. Aggarwal, Data Mining: The Textbook, Springer, 2015

Apart from the above books, the following Journals and Conferences may also be referenced.

1. Journals: IEEE Transactions on Knowledge and Data Engineering, IEEE Transactions on Big Data, ACM Transactions on Database Systems, ACM Transaction on Knowldge Discovery from Data, VLDB Journal, Data Mining and Knowledge Discovery, and Information Systems


20/01/2019: Those who are taking the course but have not submitted the ADD request on CRP, need to submit the request at an earliest (CRP has listed the course code as CS7XX).

Also, please join the Advanced Data Mining course on Moodle, with enrolment key cs724_201820192.

18/01/2019: A list of papers have been uploaded in the section of "Reference Papers" above. Groups of 1/2/3 students can be formed, and each group needs to select 1-2 (multiplied by the group size) papers of own choice, and inform me about the selected papers. The group is then supposed to thoroughly read through the papers, and present and discuss in the class.

15/01/2019: Tomorrow, we will discuss on the paper "gSparsify: Graph Motif Based Sparsification for Graph Clustering", CIKM 2015. Download a copy from here.

13/01/2019: I will be on leave tomorrow. So there will be no Advanced Data Mining classes tomorrow (Monday, 14th January 2019). Any urgent communications can be directed through email.

09/01/2019: Considering some requests, the minimum requirements of the CS524 (DM) and CS724 (ADM) have been changed. Now, the students who have already completed the CS503 (ML) course with a grade of B or lower, can enrol in CS524 (DM). The UG students can enrol in CS724 (ADM), only if their grade in CS524 is at least A-.

08/01/2019: The confusion about the timing of this course has been clarified and confirmed. Please see the details above. Classes are going to start from tomorrow. The first class is going to take place at 11am tomorrow in M2, Lecture Hall Complex.

08/01/2019: Welcome to the course CS724 - Advanced Data Mining. It will be taught by Dr Tarique Anwar (myself). The timing, venue and other details will be announced here shortly.

Note: This page will be updated regularly with all the helpful information and announcements. Students are recommended to keep checking the updates here.