Title: | Frequent Contiguous Sequential Pattern Mining of Text |
---|---|
Description: | Mines contiguous sequential patterns in text. |
Authors: | Anantha Janakiraman |
Maintainer: | Anantha Janakiraman <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.2 |
Built: | 2025-03-13 03:45:55 UTC |
Source: | https://github.com/cran/CSeqpat |
Takes in the filepath and minimum support and performs pattern mining
CSeqpat(filepath, phraselenmin = 1, phraselenmax = 99999, minsupport = 1, docdelim, stopword = FALSE, stemword = FALSE, lower = FALSE, removepunc = FALSE)
CSeqpat(filepath, phraselenmin = 1, phraselenmax = 99999, minsupport = 1, docdelim, stopword = FALSE, stemword = FALSE, lower = FALSE, removepunc = FALSE)
filepath |
Path to the text file/text corpus |
phraselenmin |
Minimum number of words in a phrase |
phraselenmax |
Maximum number of words in a phrase |
minsupport |
Minimum absolute support for mining the patterns |
docdelim |
Document delimiter in the corpus |
stopword |
Remove stopwords from the document corpus (boolean) |
stemword |
Perform stemming on the document corpus (boolean) |
lower |
Lower case all words in document corpus (boolean) |
removepunc |
Remove punctuations from document corpus (boolean) |
A dataframe containing the frequent phrase patterns with their absolute support
test1 <- c("hoagie institution food year road ", "place little dated opened weekend fresh food") tf <- tempfile() writeLines(test1, tf) CSeqpat(tf,1,2,2,"\t",TRUE,FALSE,TRUE,FALSE)
test1 <- c("hoagie institution food year road ", "place little dated opened weekend fresh food") tf <- tempfile() writeLines(test1, tf) CSeqpat(tf,1,2,2,"\t",TRUE,FALSE,TRUE,FALSE)