Left Wing Extrimism

WORDNET BASED DOCUMENT CLUSTERING
Ashok Chirla Computer Science Engineering, V.R.Siddhartha Engineering College, Kanuru, Vijayawada, A.P., India ashok.chirla@gmail.com. Abstract— Document clustering is considered as an important tool in the fast developing information explosion era. It is the process of grouping text documents into category groups and has found applications in various domains like information retrieval, web information systems. Ontology based computing is emerging as a natural evolution of existing technologies to design with the information onslaught. In current dissertation work, background knowledge derived from WordNet as ontology is applied during preprocessing of documents for document clustering. Document vectors constructed from WordNet synsets is used as input for clustering. Comparative analysis is done between clustering using k-means and clustering using bi- secting k-means. A document Categorization tool is developed which summarizes the hierarchy of concepts obtained from WordNet during clustering phase. GUI tool contains the association between WordNet concepts and documents belonging to the concept. Keywords: Document clustering, Ontology, BOW, POS Tagging, Stemming, Labeling, bisecting k-means algorithm.
I. INTRODUCTION
With the abundance of text documents available through the Web and corporate document management systems, the partitioning of document sets into previously unseen categories ranks high on the priority list for many applications like business intelligence systems. Nowadays the problem is often not to access text information but to select the relevant documents [2].

The steady development of computer hardware technology in the last few years has led to large supplies of powerful and affordable computers, data collection equipments, and storage media. These technologies provide good support to the database and information industry and make a huge number of databases and information repositories

References: [1] A.Hotho and S.Staab A.Maedche (2001), “Ontology- based Text Clustering”, In proceedings of the IJCAI-2001 workshop Text Learning Beyond Supervision. [3] Michael Steinbach, George Karypis and Vipin Kumar (2001), “A Comparison of Document Clustering Techniques”, Department of Computer Science and Engineering, University of Minnesota, Technical Report 00-034. [4] Fellbaum, Christiane (2005), “WordNet and wordnets”, In Brown, Keith et al. (eds.), Encyclopedia of Language and Linguistics, Second Edition, Oxford: Elsevier, 665-670. [9] S C Punitha, K Mugunthadevi and M Punithavalli (2011), “Impact of Ontology based Approach on Document Clustering” International Journal of Computer Applications 22(2):22–26, May 2011. Published by Foundation of Computer Science. [10] Sam Scott, Stan Matwin(1997), “Text Classification Using WordNet Hypernyms”, Computer Science Dept., University of Ottawa, Ottawa, Canada.

Left Wing Extrimism

You May Also Find These Documents Helpful

Pt1420 Unit 1 Assignment

Pt1420 Unit 1 Assignment

Riordan Manufacturing Case Study

Riordan Manufacturing Case Study

Syntactic Similarity Of Medical Reports

Syntactic Similarity Of Medical Reports

Concept Briefing

Concept Briefing

Non-Hierarchical Cluster Analysis

Non-Hierarchical Cluster Analysis

How to Create a Topic Map ?

How to Create a Topic Map ?

Ghjkjh

Ghjkjh

CLUSTER ANALYSIS: ALGORITHMS AND ANALYSIS USING SAS

CLUSTER ANALYSIS: ALGORITHMS AND ANALYSIS USING SAS

Poster Abstract: Labeling Personal Characteristics from Mobile Phone Traces

Poster Abstract: Labeling Personal Characteristics from Mobile Phone Traces

Breadth-Frist Base Web Crawling Application

Breadth-Frist Base Web Crawling Application

Complementary relevance feedback-based content-based image retrieval

Complementary relevance feedback-based content-based image retrieval

An Approach for Customer Behavior Analysis Using Web Mining

An Approach for Customer Behavior Analysis Using Web Mining

The Significance Of The Cranfield Test On Index Language By Mitsie W Cleverdon Analysis

The Significance Of The Cranfield Test On Index Language By Mitsie W Cleverdon Analysis

Two-Stage Rejection Algorithm to Reduce Search Space for Character Recognition in Ocr

Two-Stage Rejection Algorithm to Reduce Search Space for Character Recognition in Ocr

Image Retrieval Using Ann

Image Retrieval Using Ann

Related Topics