A of Arts and Science, Jothipuram, Coimbatore-47. Dr.N.Balakumar

 

A Survey of Sequence Patterns in Data Mining
Techniques

Mrs.N.Nithya Research scholar, Pioneer College of
Arts and Science, Jothipuram, Coimbatore-47. Dr.N.Balakumar Assistant
Professor, Pioneer College of Arts and Science, Jothipuram, Coimbatore-47.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

Abstract

Data mining techniques are used in many areas in
the world to retrieve the useful knowledge from the very large amount of data.
Sequence pattern mining is the important techniques in data mining concepts
with the wide range of applications. The applications of the sequence patterns
data mining are weblog click streams, DNA sequences, sales analysis, telephone
calling patterns, stock markets and etc., The methods for sequential pattern
mining are categorized in to two approached. First approach is Apriori-based approach
and second is Pattern-Growth-based approaches. In this paper, a
methodical review of the sequential pattern mining algorithms is accomplished.
Finally, reasonable study is done on the base of important key features
reinforced by many algorithms and current research encounters are discoursed in
this area of data mining.

In this paper, an organized survey of the
sequential pattern mining algorithms is accomplished. This paper examines these
algorithms by studying the classification algorithm for
sequential pattern-mining. These algorithms classified into two
extensive classes. First, on the foundation of algorithms which are considered
to surge effectiveness of mining and the other, on the origin of numerous
additions of sequential pattern mining planned for certain application. At the
end, comparative analysis is done on the basis of important key features
supported by various algorithms and current research challenges are discussed.
4

Keywords: Data Mining, Sequence pattern, Association Rule,
Pattern Mining.

                                                                                                     

Mrs.N.Nithya, Dr.N.Balakumar

 

 

Introduction

In Knowledge Discovery
Process, Data mining techniques are divided into two major categories. These
are descriptive type and prediction type. Each of the type will have different
type of the approaches.

The sequential pattern
mining is anidenticalmain concept of data mining, a further extension to the
concept of association rule mining 1.The set of sequences of the given data
is called data-sequences. Customer transactions list is the data
sequences and the set of items is the transactions. Each transaction is
associated with the transaction time of the sequence database. Association rule
mining and the sequential pattern mining is more or less comparable, the events
linked with the time is the difference among them. The sequential pattern
mining determines the correlation between the dissimilar transactions, but in
the event of association rule mining it determines the association of items in
the similar transaction 2.

In this paper segments
are ordered as follows: Section II deals with types of the sequential pattern
mining models, Section III discusses limitations of sequential pattern mining
algorithms, Section IV discusses the comparative analysis of sequential pattern
mining algorithms, Section V discusses about the comparative analysis of
sequential pattern mining algorithms. Finally, the conclusion part is discussed
about the

Problem definition: Let I= {i1, i2, in} be a set of all items. An item set is
a non- empty set of items. A sequence is an ordered list of item sets. A
sequence is denoted by, where sj is an itemset, i.e., sj?I for 1?j?l. sj is also called an
element of the sequence and denoted as (x1,x2,…xm), where xk?I for 1?k?m. The number
of instances of items in a sequence is called the length of the sequence. A
sequence with length l is called a l-sequence. A sequence a=is called a
subsequence of b= and b a super sequence of a, denoted as a?b, if there exist integers

1?j1?j2…?jn?m such that a1? bj1 , a2? bj2 , … , an?bjn.

A sequence database D is a set of tuples where sid
is a sequence-id and s is a sequence. A tuple is said
to contain a sequence a, if a is a subsequence of s, i.e., a?s. The number of tuples in a sequence database D containing
sequence a is called the support of a, denoted as sup (a). 22

Given a sequence
database D and some user specified minimum support min_sup, a sequence a is a
sequential pattern in D if sup(a) min_sup. The sequential pattern mining
problem is to find the complete set of sequential pattern with respect to D and
min_sup.

                                      A Survey of Sequence Patterns in Data Mining
Techniques

Categories of sequence pattern mining Techniques

As defined
by Yen-Liang Chen and Ya-Han Hu 4 in latest years,
several methods in sequential pattern mining have been projected; these studies
cover a wide-ranging variety of problems. In general, there are two
different concerns in the area of sequential pattern mining in research. The
first is to increase the efficacy in sequential pattern mining process while
the other one is to. Secondly, extend the mining of sequential pattern to other
time- related patterns.

The algorithms of
sequential pattern mining are differed in two different ways, based on the
researches done on the fields of sequential pattern mining 3. First,
generating the sequences of candidates with storing, and the second is, how the
counting and testing performed on the candidate sequence in a frequent manner.
The main goal of the primary one is to reduce the generation of the total
number of candidate sequences, so that the I/O cost will be reduced. The main
goal of the second one is,to remove any database or data structure that has to
be sustained all the period for support of counting commitments only. The main
benefits and shortcomings of sequential pattern mining are listed in Table 1.

Table 1 Pros and Cons of Sequential Pattern Mining

 

Type/Techniques

Pros

Cons

Apriori-
Based Algorithms5

It is easy algorithm to implement.

It takes more memory, lot of space and it will
take more time for  the process of
candidate generation.

Pattern Growth Algorithm 7.

It can be faster when given large volume of
data.

Normally more multifarious to  progress, investigation and maintain

 

A Survey of Sequence Patterns in
Data Mining Techniques

Limitations of Sequential Pattern Mining
Algorithms

Sequential pattern
mining algorithms are typically centred on string. It is not focus on discovery
of the sequential patterns with the limitations in an agreed database. In query
languages like, SQL or MySQL, it will not permit the practice of the non-
aggregate functions for the portion of the query compilation 19.

Sequential pattern
mining retrieved the relationships among objects in sequential dataset 18.
The most familiar pattern mining in the sequential is Apriori. This algorithm,
also having the drawbacks like, too many candidate sets, more number of passes
over the databases. Another disadvantages of the above mentioned algorithm is,
requirement of the huge memory space 20.

The assignment of
determining entire frequent sequences in huge databases is relatively
interesting. The exploration of the memory space is tremendously large21.

Table 1 Comparative analysis of algorithm performance 3. The
symbol ?-?means an algorithm crashes with the parameters provided, and
memory usage could not be measured. 3

Algorithm

Data Set Size

Minimum Support

Execution

Memory Usage

Time(sec)

(MB)

 

 

 

 

Medium

Low(0.1%)

>3600

800

GSP Apriori

(D=200K)

Medium(1%)

2126

687

Large

Low(0.1%)

 

 

(D=800K)

Medium(1%)

 

Medium

Low(0.1%)

SPAM Apriori

(D=200K)

Medium(1%)

136

574

Large

Low(0.1%)

 

 

(D=800K)

Medium(1%)

674

1052

 

Medium

Low(0.1%)

31

13

PrefixSpanPattern

(D=200K)

Medium(1%)

5

10

Growth

Large

Low(0.1%)

1958

525

 

(D=800K)

Medium(1%)

798

320

 

 

 

 

 

Comparative Analysis of Sequential Pattern
Mining Algorithms

Sequential pattern
mining is precise significant because it is the foundation of numerous
applications. A sequential mining algorithm should discover the entire set of
patterns, when potentially, adequate the least support. Working with the big
data, the scalability is the one of the important issue of the mining the
knowledge from the huge amount of data. The above mentioned issue will be
raised in MapReduce model in the cloud. The SPAM algorithm, suggestively
decrease the mining period with big data, and also it will attain enormously great
scalability 11. The important and familiar algorithm for mining the data is
Apriori. Using this algorithm, finding the sequence data from
the d-dimensional sequence data is not possible. Using the PREFIXMD
SPAN algorithm, the retrieval of the sequence data is possible from
thed-dimensional data 14. Generating the huge amount of the unpromising
candidate sub sequences is difficult, while using
the Generate-and-test algorithm. This will be overcome, applying the
algorithm called Maximum weighted upper-bound model. The maximum
weighted upper-bound model will give the good performance of pruning
efficiency and also it will improve the performance efficiency 17.

The huge amount of
repeated projected databases in mining data sets will be creating applying
pattern growth type of algorithm. It will be overwhelmed, using the SMPM 13
algorithm. This algorithm will avoid the repeated projected database and evade
physical forecast 13. The greedy algorithm will raise the issues in the
sensor network applications, by creating the multiple interleaved patterns. The
GAIS 15 method algorithm will find the sequential pattern from the small
amount of quality data. The Frequent Pattern Tree type is another type for
finding the pattern using the sequence mining. In this algorithm will be work
in scanning the database many number of time. It will be time consuming
comparing with another type of algorithm. Yi Sui, Feng Jing Shao, Rencheng Sun
and Jinlong Wang were used the STMFP algorithm. In this algorithm required to
scan the database in a single. After the single scan itself, the tree can store
the all the sequences from the source data 9.

The Association rule
mining algorithm is the important type of algorithm in the Apriori model of
mining methods. The Apriori based association rule algorithm is the single
minimum support. The single minimum support cannot exactly discover the
interesting pattern.

The number of minimum
support is very high in the usage of MSCP growth algorithm 10. More number of
minimum supports will produce the interesting pattern.Xilu Wang and Weill Yao
used their optimum maximum sequence pattern mining for getting the sequence
pattern. The advantage of this algorithm is, to acquiring the sequential
pattern is very reliable. The existing mathematical models for mining the
sequential pattern will be failed in noisy data with the candidate
patterns12.

Measuring
the multidimensional-attribute of the material is not completely
measured concurrently in modified Apriori and PrefisSpan algorithms. It will be
overwhelmed using the Leaner Preference Tree (LPT) algorithm. The advantage of
this algorithm is, the learners actual learning favourite can be fulfilled
perfectly 16. Mining the pattern from the incremental data is very difficult
to handle. In this problem will be solved using the Direct Appending (DirApp)
algorithm. The improvement of this method, the incremental data can be easily
dealt and also the static database 8.

Performance Based
Comparative Study

The above table 1 described about the Comparative analysis of
different algorithm based on their performance in sequential mining. These
algorithms are studied with the help of the different size of the data sets.
The parameters chosen for these studies are Minimum Support, execution time (sec)
and memory usage (MB). The execution time is measured here is in the form
seconds and, the memory usages of these algorithm is measured in the form of
megabytes. In the data sizes, we have categories like medium size of data sets
and the large size of data sets.

The data size is denoted
as D. The value of D for the all the algorithms are categories in to medium
size and large size. The Medium size value is 200k and the large size value is
800k. Minimum support for the each algorithm is categorised as low and medium
for the both data sizes medium and large respectively.

The large amount of the
execution time taken by the GSP Apriori algorithm was more than the 3600sec
with the memory usage 800mb in the medium size of the data sets. The minimum
support of this highest execution time low. The Prefix Span pattern algorithm
execution time is very less. The execution time for this algorithm is 5sec with
the memory usage of MB in the large data set size of medium support size.

Table 2 Comparative Analysis of Sequential Pattern
Mining Algorithms

Reference paper

Author

Methodology Used

Algorithm Used

Existing System

Proposed System

ABCF 16

Mojtaba

Modified

Leaner

Multidimensional-

Learner’s actual

 

Salehi, Isa

Apriori and

Preference

attribute of

learning favourite

 

Nakhai

PrefixSpan

Tree (LPT)

materials is not

can be fulfilled

 

Kamalabadi

algorithms

 

Completely

perfectly

 

And

 

 

Measured

 

 

Mohammad

 

 

concurrently.

 

 

Bagher

 

 

 

 

 

Ghaznavi

 

 

 

 

 

Ghoushchi

 

 

 

 

OMSP

Xilu Wang and

Mathematic

Optimum

Noisy data, with

Acquired

M 12

Weill Yao

al Model.

maximum

fewer candidate

sequential patterns

 

 

 

sequence

patterns.

are reliable.

 

 

 

pattern

 

 

 

 

 

mining

 

 

MMS

Ya-Han Hu,

Apriori

MSCP-

Single minimum

Numerous

10

Fan Wu and

based

Growth

support cannot

minimum supports

 

Yi-Chun Liao

Association

 

exactly discover

possible.

 

 

Rule mining

 

interesting pattern.

 

 

Jen-Wei

 

Dealing the

It can easily deal

 

Huang, Taipei,

 

 

incremental data is

with a static

SPMP

Chi-Yao

 

Direct

difficult.

database or an

Tseng, Jian-

 

Appending

 

incremental

D 8

 

 

Chih Ou and

 

(DirApp)

 

database as well.

 

 

 

 

Ming-Syan

 

 

 

 

 

Chen

 

 

 

 

IFPT

Yi Sui,

Frequent

STMFP

Need to scan the

After the single

9

FengJing Shao,

Pattern Tree

Algorithm

database many

scan, the tree can

 

Rencheng Sun

 

 

times.

store the all the

 

and Jinlong

 

 

 

sequences.
 

 

Wang

 

 

 

 

 
 
GAIS

Ruotsalainen,

Greedy

GAIS

Issues in sensor

Sequential

15

M, Ala-

Algorithm

method.

network

patterns can be

 

Kleemola, T

 

 

application are

identified from

 

and Visa

 

 

multiple

little quality data

 

 

 

 

interleaved

 

 

 

 

 

patterns.

 

WSP

Guo-Cheng

Generate-

Maximum

Generate a large

Good enactment

17

Lan, Tzung-Pei

and-test.

weighted

number of

of pruning

 

Hong and

 

upper-

unpromising

efficiency and

Conclusions

In this paper, we discussed about the sequential
pattern mining and also briefly represented the major categories of the sequential
pattern mining. The comparison between the some of the types of algorithm was
discussed with the help of previously completed work. Primarily, this topicwas
initiated based on the improvement of the performance of the algorithm with the
help of the dissimilar data structure and representation. The comparative study
of different type of the algorithm is used for the mining the sequential
pattern. As well as, we discussed about the comparative analysis of the
algorithm performance. From the discussion about the pros and cons of
sequential mining, easily can be define the strength and their limitations. The
analysis of the comparison based on the different type of methodology and their
algorithms are discussed in detail.

Reference

1J. Han and M. Kamber,
?Data Mining: Concepts and Techniques?, Morgan Kaufman publishers, 2001.

2Vishal S.
Motegaonkar, Prof. Madhav V. Vaidya ?A Survey on Sequential Pattern Mining
Algorithms?, International Journal of Computer Science and Information
Technologies, Vol. 5 (2) , 2014, 2486-2492.

3Nizar R. Mabroukeh
and C. I. Ezeife, ?A Taxonomy of Sequential Pattern Mining Algorithms?, ACM
Computing Surveys, Vol. 43, No. 1, Article 3, Publication date: November 2010.

4J.Pei, J.Han,
B.MortazaviAsl, J.Wang, H.Pinto, Q.Chen, U.Dayal and M.- C.Hsu, ?Mining
sequential patterns by pattern-growth: The PrefixSpan approach?, IEEE
Transactions on Knowledge and Data Engineering, vol.16, no.11, 2004,
pp. 1424-1440.

5Hilderman R. J.,
Hamilton H. J.,?Knowledge Discovery and Interest Measures?,In: Kluwer Academic
Publishers, Boston, 2002.

6V.Chandra Shekhar
Rao, P.Sammulal,Ph.D, ?Survey on Sequential Pattern Mining Algorithms ?
International Journal of Computer Applications (0975 – 8887) Volume 76– No.12,
August 2013.

7Carl H. Mooney and
John F. Roddick, ACM Journal Name, Vol. V, No. N, M 20YY, Pages 1–46.

8Jen-Wei Huang ,
Nat. Taiwan University, Taipei, Chi-Yao Tseng, Jian-ChihOu
and Ming-Syan Chen,?A General Model for Sequential Pattern Mining
with a Progressive Database?, IEEE Trans. Knowledge and Data Eng., Volume:20 ,
Issue 9, page no 1153-1167, September 2008.

9Yi Sui, Feng, Jing
Shao, Rencheng Sun and Jinlong Wang,?A Sequential Pattern Mining Algorithm
Based on Improved FP-tree?,. Ninth ACIS International Conference on
Software Engineering, Artificial Intelligence, Networking, and
Parallel/Distributed Computing, 2008. SNPD 2008, Page(s): 440- 444.

10Ya-Han Hu, Fan
Wu and Yi-Chun Liao, ?Sequential pattern mining with multiple minimum
supports: A tree based approach?, 2nd International Conference on Software
Engineering and Data Mining (SEDM), 2010, Page(s): 428-433.

11Chun-Chieh Chen, Chi-Yao Tseng
and Ming-Syan Chen ?Highly Scalable Sequential Pattern Mining Based
on MapReduce Model on the Cloud?, IEEE International Congress on Big Data (Big
Data Congress), 2013, Page(s) 310- 317.

12Xilu Wang and Weill
Yao, ?Sequential Pattern Mining: Optimum Maximum Sequential Patterns and
Consistent Sequential Patterns?, IEEE International Conference on Integration
Technology, 2007, Page(s): 365-368.

13Yong-Gui Zou
and Hong Yu, ?Moving sequential pattern mining based on Spatial Constraints in
Mobile Environment?, IEEE International Conference on Intelligent Computing and
Intelligent Systems (ICIS), 2010, Page(s) 103- 107.

14Chung-Ching Yu
and Yen-Liang Chen, ?Mining Sequential Patterns from

15Ruotsalainen,
M, Ala-Kleemola, T and Visa, ?A GAIS: A Method for Detecting
Interleaved Sequential Patterns from Imperfect Data?, IEEE Symposium on
Computational Intelligence and Data Mining, 2007, Pages(s) 530- 534.

16Mojtaba Salehi, Isa
Nakhai Kamalabadi, Mohammad Bagher Ghaznavi Ghoushchi, ?Personalized recommendation
of learning material using sequential pattern mining and attribute based
collaborative filtering?, Education and Information Technologies, December
2014, Volume 19, Issue 4, page(s) 713-735.

17Guo-Cheng Lan, Tzung-Pei Hong
and Hong-Yu Lee, ?An efficient approach for finding weighted
sequential patterns from sequence databases?, Applied Intelligence, September
2014, Volume 41, Issue 2, pp 439-452.

18Thanh-Trung Nguyen, Phi-Khu Nguyen,
?A New Approach for Problem of Sequential Pattern Mining?, Lecture Notes in
Computer Science on Computational Collective Intelligence. Technologies and
Applications Volume 7653, 2012, pp 51-60.

19VangipuramRadhakrishna,
Chintakindi Srinivas and C.V.Guru Rao, “Constraint Based Sequential
Pattern Mining in Time Series Databases – A Two Way Approach”, AASRI
Conference on Intelligent Systems and Control, AASRI
Procedia 4(2013)313-318.

20ShamilaNasreen,
Muhammad AwaisAzamb, KhurramShehzada, UsmanNaeemc, and Mustansar Ali
Ghazanfara, ?Frequent Pattern Mining Algorithms for Finding Associated Frequent
Patterns for Data Streams: A Survey?, The 5th International Conference on
Emerging Ubiquitous Systems and Pervasive
Networks (EUSPN-2014), Procedia Computer Science 37

( 2014 ) 109 – 116.

21Mohammed J. Zak,
?SPADE: An Efficient Algorithm for MiningFrequent Sequences?, Kluwer Academic
Publishers. Manufactured in The Netherlands, 42, 31–60, 2001.

22RamakrishnanSrikant,
Rakesh Agrawal,? Mining Sequential Patterns: Generalizations and Performance
Improvements?, Advances in Database Technology — EDBT ’96, Lecture Notes in
Computer Science, Springer, Volume 1057, 1996, pp 1-17.

 

 

x

Hi!
I'm Isaac!

Would you like to get a custom essay? How about receiving a customized one?

Check it out