Mining High Utility Patterns Over Data Streams

Mining High Utility Patterns Over Data Streams
Author: Morteza Zihayat Kermani
Publisher:
Total Pages: 0
Release: 2016
Genre:
ISBN:

Download Mining High Utility Patterns Over Data Streams Book in PDF, Epub and Kindle

Mining useful patterns from sequential data is a challenging topic in data mining. An important task for mining sequential data is sequential pattern mining, which discovers sequences of itemsets that frequently appear in a sequence database. In sequential pattern mining, the selection of sequences is generally based on the frequency/support framework. However, most of the patterns returned by sequential pattern mining may not be informative enough to business people and are not particularly related to a business objective. In view of this, high utility sequential pattern (HUSP) mining has emerged as a novel research topic in data mining recently. The main objective of HUSP mining is to extract valuable and useful sequential patterns from data by considering the utility of a pattern that captures a business objective (e.g., profit, users interest). In HUSP mining, the goal is to find sequences whose utility in the database is no less than a user-specified minimum utility threshold. Nowadays, many applications generate a huge volume of data in the form of data streams. A number of studies have been conducted on mining HUSPs, but they are mainly intended for non-streaming data and thus do not take data stream characteristics into consideration. Mining HUSP from such data poses many challenges. First, it is infeasible to keep all streaming data in the memory due to the high volume of data accumulated over time. Second, mining algorithms need to process the arriving data in real time with one scan of data. Third, depending on the minimum utility threshold value, the number of patterns returned by a HUSP mining algorithm can be large and overwhelms the user. In general, it is hard for the user to determine the value for the threshold. Thus, algorithms that can find the most valuable patterns (i.e., top-k high utility patterns) are more desirable. Mining the most valuable patterns is interesting in both static data and data streams. To address these research limitations and challenges, this dissertation proposes techniques and algorithms for mining high utility sequential patterns over data streams. We work on mining HUSPs over both a long portion of a data stream and a short period of time. We also work on how to efficiently identify the most significant high utility patterns (namely, the top-k high utility patterns) over data streams. In the first part, we explore a fundamental problem that is how the limited memory space can be well utilized to produce high quality HUSPs over the entire data stream. An approximation algorithm, called MAHUSP, is designed which employs memory adaptive mechanisms to use a bounded portion of memory, to efficiently discover HUSPs over the entire data streams. The second part of the dissertation presents a new sliding window-based algorithm to discover recent high utility sequential patterns over data streams. A novel data structure named HUSP-Tree is proposed to maintain the essential information for mining recenT HUSPs. An efficient and single-pass algorithm named HUSP-Stream is proposed to generate recent HUSPs from HUSP-Tree. The third part addresses the problem of top-k high utility pattern mining over data streams. Two novel methods, named T-HUDS and T-HUSP, for finding top-k high utility patterns over a data stream are proposed. T-HUDS discovers top-k high utility itemsets and T-HUSP discovers top-k high utility sequential patterns over a data stream. T-HUDS is based on a compressed tree structure, called HUDS-Tree, that can be used to efficiently find potential top-k high utility itemsets over data streams. T-HUSP incrementally maintains the content of top-k HUSPs in a data stream in a summary data structure, named TKList, and discovers top-k HUSPs efficiently. All of the algorithms are evaluated using both synthetic and real datasets. The performances, including the running time, memory consumption, precision, recall and Fmeasure, are compared. In order to show the effectiveness and efficiency of the proposed methods in reallife applications, the fourth part of this dissertation presents applications of one of the proposed methods (i.e., MAHUSP) to extract meaningful patterns from a real web clickstream dataset and a real biosequence dataset. The utility-based sequential patterns are compared with the patterns in the frequency/support framework. The results show that high utility sequential pattern mining provides meaningful patterns in real-life applications.


Mining High Utility Patterns Over Data Streams
Language: en
Pages: 0
Authors: Morteza Zihayat Kermani
Categories:
Type: BOOK - Published: 2016 - Publisher:

GET EBOOK

Mining useful patterns from sequential data is a challenging topic in data mining. An important task for mining sequential data is sequential pattern mining, wh
High-Utility Pattern Mining
Language: en
Pages: 337
Authors: Philippe Fournier-Viger
Categories: Technology & Engineering
Type: BOOK - Published: 2019-01-18 - Publisher: Springer

GET EBOOK

This book presents an overview of techniques for discovering high-utility patterns (patterns with a high importance) in data. It introduces the main types of hi
Adaptive Stream Mining
Language: en
Pages: 224
Authors: Albert Bifet
Categories: Computers
Type: BOOK - Published: 2010 - Publisher: IOS Press

GET EBOOK

This book is a significant contribution to the subject of mining time-changing data streams and addresses the design of learning algorithms for this purpose. It
Mining of Massive Datasets
Language: en
Pages: 480
Authors: Jure Leskovec
Categories: Computers
Type: BOOK - Published: 2014-11-13 - Publisher: Cambridge University Press

GET EBOOK

Now in its second edition, this book focuses on practical algorithms for mining data from even the largest datasets.
Frequent Pattern Mining
Language: en
Pages: 480
Authors: Charu C. Aggarwal
Categories: Computers
Type: BOOK - Published: 2014-08-29 - Publisher: Springer

GET EBOOK

This comprehensive reference consists of 18 chapters from prominent researchers in the field. Each chapter is self-contained, and synthesizes one aspect of freq