Tarique Anwar

My research so far has been centered around the area of Data Science, with its applications in the domains of Cyber Security and Urban Transportation Systems. A summary of some of the major projects that I have worked on (or currently working) is given below.

Debunking Misinformation on the Web and its Treatment [2018 - Ongoing]

Did you see any fake news recently on the Internet? Please inform us.

These days we generate huge amounts of data online. While lots of such data are not of importance, a significant portion relates to some vital information. It is very important to ensure the correctness of information being shared globally on the Web. Spread of misinformation on the Web, purposefully or unknowingly, does a great harm to politics, journalism, democracy, cybersecurity, economics and other fields. Fake news and rumors shared on social media have a big impact on elections, and trust on the governments. Not just that, they also affect the financial processes. A single tweet about explosions in the White House caused a loss of $136.5 billion in the stock market in 2013. The objective of this project is to develop techniques based on database, data mining and artificial intelligence to manage the voluminous Web data, and detect the misinformation in them. Furthermore, it also aims to counter them with the corrective measures for a proper and efficient rectification. The project is expected to result into the following outcomes: i) develop efficient techniques to process large-scale social network data in real-time and detect the textual clusters and outliers in them; ii) develop AI-based techniques to detect fake news, rumors, and all other kinds of misinformation online; iii) develop techniques based on AI and Graph algorithms to counter the spread of misinformation through social networks online; and iv) develop a real-time system to continuously monitor and track the Web, detect the misinformation, and suggest corrective measures.

Members involved: Tarique Anwar, Surya Nepal, Cecile Paris, Jia Wu, Jian Yang, Michael Sheng, Syed Shafat Ali

Funding source(s): Data61 and Macquarie University

Outcomes*: [C.15], [C.16], [J.10]

Early Detection of Emerging Spatial Events from Spatio-Temporal Data Streams [2018 - Ongoing]

We usually become part of various kinds of gathering activities (aka events) in our daily lives, including big fat weddings, sports matches, concerts, political or social protests, and natural disasters. Such events are associated with some particular location, and thus called spatial events. In other words, spatial events are abstract entities existing for limited times at particular locations in space. They are characterized by their start and end times, geographic coordinates of occurrence, and some thematic attributes. The normal functioning of the day-to-day activities often get highly affected by the spatial events in the location of their occurrence. For example, a political protest or a religious procession in a big city leads to halting the nearby traffic and disrupting the normal business activities, which sometimes cause a huge monetary loss. While some of them have recurring patterns, others start without much knowledge in advance. An early detection of such events even before they take place is very important for us to avoid or manage unexpected disruptions of our routine works and plan alternatives. These days we record huge amounts of spatio-temporal mobility data of people in real-time through different sources that include installed sensors, crowd-sourcing platforms, smart-phones, and location-based social networks. The data from these sources form data streams that are rich in information. The aim of this project is to mine the spatio-temporal mobility data streams in order to detect the emerging spatial events at an early stage, so that the unforeseen problems can be alleviated.

Members involved: Tarique Anwar, Timos Sellis, Hai L. Vu, Rahul Pal Singh

Funding source(s): ISIRD, IIT Ropar


Spatial Partitioning of Road Traffic Networks and their Temporal Evolution (PhD thesis) [2013 - 2017]

Urban areas generally attract people from all interior areas. According to the current global trend, people are rapidly migrating from rural towards urban areas for several reasons that include availing better livelihood services and seeking better employment opportunities. Consequently, the population of cities all over the world is increasing significantly, and thereby raising the mobility demands manyfold. This strongly motivates the research areas of urban planning and urban computing to develop innovative technologies and move towards smart and more sustainable cities. As most of the urban population travel daily or frequently for their work or studies, traffic congestion has become a very important practical problem. It is affecting the urban population directly by incurring extra cost on the fuel and extra time spent, and indirectly in many ways. An important concern in smart urbanization of our societies is the avoidance of such congestions and maintenance of a smooth transportation. While the infrastructure development is one direction to deal with this problem, the analysis of spatial traffic data to discover the congestion formation and propagation patterns, and apply them to optimize the traffic flow is another direction. The research on road traffic networks data analysis is growing with the problems like fastest route computation, traffic clustering, traffic prediction, emerging event detection, anomaly detection and bottleneck identification. To discover the congestion patterns, the continuous tracking of the spatiotemporal evolution of the traffic load leading to congestions is an important problem. The research on development of methods to identify the congested partitions effectively and track their evolution efficiently has been very limited so far. In my PhD thesis, we aim to capture the spatiotemporal evolution of urban road traffic networks. To this end, we propose technical methods to effectively partition road traffic networks in order to obtain the differently congested partitions at a point of time, and incrementally update those partitions in an efficient manner in order to track their evolution in real time.

Members involved: Tarique Anwar, Chengfei Liu, Hai L. Vu, Christopher Leckie, Md. Saiful Islam, Timos Sellis, Serge P. Hoogendoorn

Funding source(s): NICTA / CSIRO Data61

Outcomes*: [C.7], [C.8], [C.9], [C.11], [J.6], [J.7], [J.8], [T.2]

A Web Surveillance Framework for Criminal Networks Identification and Monitoring [2012 - 2014]

Nowadays most of the concealed anti-social organizations and their supporters use World Wide Web (WWW) for propaganda, training, recruitment and fund-raising. It is becoming a challenging task for intelligence agencies to track movement and activities of suspects around the globe. Due to huge amount of Web data and increasing cyber crimes, it has been realized by many researchers that data mining techniques could be very useful for many national and international security initiatives. Primarily focusing on extraction of implicit and novel patterns (knowledge) from huge databases (structured data), data mining techniques have also proven to be very effective for analyzing unstructured (text documents) and semi-structured (Web documents) data. In the context of Web surveillance, data mining can be a potential means to identify terrorists and their activities through analyzing Web data and social networks evolved from communication networks including e-mails, enterprise portals, online forums, social networking sites like Facebook, Twitter, etc. In this project, we worked to design a knowledge-based Web surveillance framework that combines information retrieval, natural language processing and data mining techniques to identify criminals and their networks on the Web. The major components of our project are: (i) a data crawling and indexing process to retrieve focused multilingual Web contents, (ii) social network analysis to identify communities, leaders, (iii) sentiments analysis, (iv) in-depth web content mining to generate criminals profiles, (iv) in-depth web usage mining and link mining to track individual criminals and their linkages with the mined communities, (v) creation of a knowledge portal assist in national security. The system would provide necessary knowledge for the Saudi cyber law and security enforcement agencies to identify individuals who possibly can spread criminal ideologies, criminal profiling, criminal (social) network analysis, and visualization of criminal’s activities, linkages and relationships.

Members involved: Tarique Anwar, Muhammad Abulaish

Funding source(s): King Saud University, KACST NPST project 11-INF1594-02

Outcomes*: [C.3], [C.5], [Ch.1], [J.4], [J.5]

Datasets: Chat Log Dataset [This dataset can be dowloaded only for research purposes. Please cite the following paper, if you are using this dataset. "Tarique Anwar and Muhammad Abulaish, A Social Graph based Text Mining Framework for Chat Log Investigation, Digital Investigation, vol. 11 no. 4, pp. 349-362, Dec 2014"]

Suspect Profiling and Tracking on the Web [2011 - 2012]

Facebook, a common buzz everyone must have noticed around them; the other ones comprising Twitter, Orkut etc. Nowadays it has become a common trend to create and maintain personal information on WWW. Moreover the rapid growth of online friend-making pattern has resulted in full-fledged global social networking community where like-minded people exchange their thoughts. Online forums are the other common means that have attracted persons to discuss their topics of interest, because of their simplicity and ease of access. Unfortunately, these latest trends have also proliferated its use by various tech-savvy anti social elements for better communications among them and practicing various types of Cyber crimes. The Web is being used as a tool to practice these crimes in a sophisticated manner, which range from hacking for data theft or identity theft and online fraudulences to Cyber bullying, online pedophilia and Cyber terrorism. All these law breaking criminal offences are the biggest threat to the social life of common people and also security aspects of the nations. Criminal organizations find it as a powerful means to spread their propaganda around the globe simplifying their process to raise funds, recruit people, provide training, etc. The enormous amount of open data on the exponentially growing Web can surely bring forth a lot of valuable information to enable suspect profiling and tracking. In the project Warak-Warak we worked to develop a Web People Search Engine for profiling suspected users and tracking their malicious activities in the Cyberspace.

Members involved: Tarique Anwar, Muhammad Abulaish, Khaled Alghathbar

Funding source(s): King Saud University, KACST NPST project 11-INF1594-02

Outcomes*: [C.1], [C.6], [J.3]

Keyphrase Extraction from Text Documents (Maters thesis) [2010 - 2013]

To this date majority of useful information on the Web occur in the form of text, which are either unstructured or semi-structured in nature. The easy accessibility of electronic media and World Wide Web has conduced the exponential growth of text information overload. Thus even though there is no scarcity of information on the Web, locating, extracting and analyzing required information from this vast unstructured collection is a complex and challenging task. These complexities call for the need of an automatic text information processing system to identify and extract right information at right time for effective decision making. Keyphrases provide a semantic metadata that summarize and characterize documents and enable readers to quickly determine whether the given article is in the reader’s fields of interest. We worked to devise a light-weight machine learning approach for automatic keyphrase extraction using various lexical and semantic features mined from text documents, with the objective for its applicability to real time environments. The machine learning approach first builds a prediction model using training documents with known keyphrases, and then uses the model to find keyphrases in new documents.

Members involved: Tarique Anwar, Muhammad Abulaish

Outcomes*: [C.2], [J.1], [J.2], [T.1]

*PS: Please see the publications page to find the referred papers mentioned as outcomes above.