Preview

Analysis Of Lazy Super Parent Tree Augmented Naive Bayes

Good Essays
Open Document
Open Document
1277 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Analysis Of Lazy Super Parent Tree Augmented Naive Bayes
In this section, we propose a heuristic, called Lazy Super Parent Tree Augmented Naive Bayes (LSPTAN) that seeks to solve the problems discussed above, enabling the application of a semi-Naive Bayes techniques in large ADC tasks. Thus, we can evaluate whether the premise of independence among attributes, assumed by Naive Bayes, impacts effectiveness in large ADC tasks, an open research problem.\looseness=-1

The Lazy Super Parent TAN (LSPTAN) heuristic is a postergated version of the SP-TAN that constructs a Tree Augmented Naive Bayes for each test example. Attributes dependencies are generated based on information from the example that is being classified. To build a lazy version of SP-TAN we adapted the method of evaluation and the selection of candidates for Super Parent and Favorite Children.\looseness=-1

The SP-TAN algorithm exploits accuracy to select a candidate to Super Parent ($a_{sp}$). In our strategy, we select the candidate $a_{sp}$ whose classification model generates
…show more content…
Therefore, LSPTAN builds a simpler network than the SP-TAN. We select only the best Super Parent to a test document. But there is no limitation on the choice of the Favorite Children. Thus, all the children attributes that increment the probability that the document belongs to a class, are included in the classification model.\looseness=-1

The LSPTAN heuristic initially builds the model based on Naive Bayes and initializes a set of orphans $O$, inserting into $O$ all the terms of the vocabulary. Then, for each test document, the technique evaluates each term as a Super Parent ($a_{sp}$) and, at the end, it selects as $a_{sp}$ the term that has the highest probability $P(c_i | d_t, a_{sp})$. Thus, the $P(c_i | d_t, a_{sp})$ for a $a_{sp}$ is defined as by the Equation~\ref{eq::lsptan}, where $f$ is the frequency of a term in the document $d_t$.\looseness=-1

\begin{center}
\begin{equation}

You May Also Find These Documents Helpful

  • Good Essays

    Scor eStore.com

    • 677 Words
    • 2 Pages

    Q2: Secondly, we are a bit unclear on the way in which the decision trees can be applied to this case.…

    • 677 Words
    • 2 Pages
    Good Essays
  • Satisfactory Essays

    Mis Decison Tree

    • 366 Words
    • 2 Pages

    Answer the two questions below and attach the screenshot(s) in your solution document where you found the answer.…

    • 366 Words
    • 2 Pages
    Satisfactory Essays
  • Powerful Essays

    DATA CLUSTERING

    • 1179 Words
    • 8 Pages

    The purpose of a score function in data mining is to rank models as a function of how useful…

    • 1179 Words
    • 8 Pages
    Powerful Essays
  • Good Essays

    Decision Tree

    • 1211 Words
    • 3 Pages

    After installing the software and reading the task description, I realized that the tool is pretty easy to use and that it is very helpful in structuring information, as I will explain later on in this write-up. When creating the decision tree I started with entering the existing data. By analyzing the data in a first view you can directly see that the first and last name does not have any influence on the loan grant respectively the loan amount, which seems to be self-explaining. It makes sense to start with the node with the highest number of different characteristics. This way the tree will become clearer. That’s why I started with the distinction of the age and afterwards chronologically with the loan type, the ability to pay and finally the past payment record. The loan amount that already includes the information whether a loan was granted (loan amount > 0 $) or not (loan amount = 0 $), was placed under each line of the tree. This results in a total of 72 paths to get to a loan amount as a consequence of the characteristics of the 4 named criteria.…

    • 1211 Words
    • 3 Pages
    Good Essays
  • Best Essays

    It Essay - Data Mining

    • 1998 Words
    • 8 Pages

    He, J. (2009). Advances in Data Mining: History and Future. Third International Symposium on Intelligent . Retrieved November 1, 2012, from http://ieeexplore.ieee.org.ezproxy.lib.ryerson.ca/stamp/stamp.jsp?tp=&arnumber=5370232&tag=1…

    • 1998 Words
    • 8 Pages
    Best Essays
  • Satisfactory Essays

    Gene Expression Data

    • 388 Words
    • 2 Pages

    | 2.4 ‘Gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns…

    • 388 Words
    • 2 Pages
    Satisfactory Essays
  • Better Essays

    Decision Tree

    • 1076 Words
    • 5 Pages

    Decision Trees are useful tools for helping you to choose between several courses of action.…

    • 1076 Words
    • 5 Pages
    Better Essays
  • Powerful Essays

    Data Mining

    • 1921 Words
    • 8 Pages

    Patterson, L. (2010, APR 27). The nine most common data mining techniques used in predictive…

    • 1921 Words
    • 8 Pages
    Powerful Essays
  • Satisfactory Essays

    Our group consensus is that additional information is not needed. We believe that all quantitative information needed to form a decision tree is available in the problem…

    • 346 Words
    • 4 Pages
    Satisfactory Essays
  • Satisfactory Essays

    Description Classification and regression based on an ensemble of decision trees. The package also provides extensions of ExtraTrees to multi-task learning and quantile regression. Uses Java implementation of the method.…

    • 1944 Words
    • 26 Pages
    Satisfactory Essays
  • Good Essays

    In pattern recognition, the k-Nearest Neighbors algorithm (or k-NN for short) is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression:…

    • 789 Words
    • 4 Pages
    Good Essays
  • Satisfactory Essays

    The Handbook of News Analytics \ in Finance Edited by Gautam Mitra and Leela Mitra WILEY A John Wiley and Sons, Ltd, Publication Contents Preface xiii Acknowledgements xvii…

    • 1789 Words
    • 22 Pages
    Satisfactory Essays
  • Satisfactory Essays

    40over using the feature vectors of length 50 (”WITHOUT TRANS”) is obvious. It can be…

    • 854 Words
    • 4 Pages
    Satisfactory Essays
  • Powerful Essays

    Spam

    • 2202 Words
    • 9 Pages

    Lastly, we investigate the feasibility of applying a supervised machine learning method to identify spammers…

    • 2202 Words
    • 9 Pages
    Powerful Essays
  • Powerful Essays

    In CBR approach, to classify the class for the new data, the system should calculate the local similarity for each attribute by comparing between the new case and the training dataset. After calculating the local similarity, this system will calculate the global similarity. To calculate the global similarity, the results of local similarity will times by weight. For this system, the weight was initializing by one (1) for each attributes. In this system, we calculate the normalization of weight because it can make easy to system to calculate the global similarity.…

    • 681 Words
    • 3 Pages
    Powerful Essays