Preview

Database Ralationship

Good Essays
Open Document
Open Document
7781 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Database Ralationship
Linking Named Entities to Any Database
Avirup Sil∗ Temple University Philadelphia, PA avi@temple.edu Yinfei Yang St. Joseph’s University Philadelphia, PA yangyin7@gmail.com Abstract
Existing techniques for disambiguating named entities in text mostly focus on Wikipedia as a target catalog of entities. Yet for many types of entities, such as restaurants and cult movies, relational databases exist that contain far more extensive information than Wikipedia. This paper introduces a new task, called Open-Database Named-Entity Disambiguation (Open-DB NED), in which a system must be able to resolve named entities to symbols in an arbitrary database, without requiring labeled data for each new database. We introduce two techniques for Open-DB NED, one based on distant supervision and the other based on domain adaptation. In experiments on two domains, one with poor coverage by Wikipedia and the other with near-perfect coverage, our Open-DB NED strategies outperform a state-of-the-art Wikipedia NED system by over 25% in accuracy.

Ernest Cronin∗ Penghai Nie St. Joseph’s University St. Joseph’s University Philadelphia, PA Philadelphia, PA ernest.cronin@gmail.com nph87903@gmail.com Ana-Maria Popescu Yahoo! Labs Sunnyvale, CA amp@yahoo-inc.com Alexander Yates Temple University Philadelphia, PA yates@temple.edu

referents, but exclusive focus on Wikipedia as a target for NED systems has significant drawbacks: despite its breadth, Wikipedia still does not contain all or even most real-world entities mentioned in text. As one example, it has poor coverage of entities that are mostly important in a small geographical region, such as hotels and restaurants, which are widely discussed on the Web. 57% of the named-entities in the Text Analysis Conference’s (TAC) 2009 entity linking task refer to an entity that does not appear in Wikipedia (McNamee et al., 2009). Wikipedia is clearly a highly valuable resource, but it should not be thought of as the only one. Instead of relying



References: Kedar Bellare and Andrew McCallum. 2007. Learning extractors from unlabeled text using relevant databases. In Sixth International Workshop on Information Integration on the Web. Kedar Bellare and Andrew McCallum. 2009. Generalized Expectation Criteria for Bootstrapping Extractors using Record-Text Alignment. In Empirical Methods in Natural Language Processing (EMNLP-09). Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. 2010. A theory of learning from different domains. Machine Learning, 79:151–175. John Blitzer, Ryan McDonald, and Fernando Pereira. 2006. Domain adaptation with structural correspondence learning. In EMNLP. Razvan Bunescu and Raymond Mooney. 2007. Learning to extract relations from the web using minimal supervision. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL07). R. Bunescu and M. Pasca. 2006. Using encyclopedic knowledge for named entity disambiguation. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL-06). Ying Chen and James Martin. 2007. Towards Robust Unsupervised Personal Name Disambiguation. In EMNLP, pages 190–198. Silviu Cucerzan. 2007. Large-scale named entity disambiguation based on wikipedia data. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 708–716. Nilesh N. Dalvi, Ravi Kumar, Bo Pang, and Andrew Tomkins. 2009. Matching Reviews to Objects using a Language Model. In EMNLP, pages 609–618. Nilesh N. Dalvi, Ravi Kumar, and Bo Pang. 2012. Object matching in tweets with spatial models. In WSDM, pages 43–52. Hal Daum´ III, Abhishek Kumar, and Avishek Saha. e 2010. Frustratingly easy semi-supervised domain adaptation. In Proceedings of the ACL Workshop on Domain Adaptation (DANLP). D. Downey, M. Broadhead, and O. Etzioni. 2007. Locating complex named entities in web text. In Procs. of the 20th International Joint Conference on Artificial Intelligence (IJCAI 2007). Anthony Fader, Stephen Soderland, and Oren Etzioni. 2009. Scaling wikipedia-based named entity disambiguation to arbitrary web text. In Proceedings of the WikiAI 09 - IJCAI Workshop: User Contributed Knowledge and Artificial Intelligence: An Evolving Synergy. Xianpei Han and Jun Zhao. 2009. Named entity disambiguation by leveraging Wikipedia semantic knowledge. In Proceeding of the 18th ACM Conference on Information and Knowledge Management (CIKM), pages 215–224. Johannes Hoffart, Mohamed Amir Yosef, Ilaria Bordino, Hagen Furstenau, Manfred Pinkal, Marc Spaniol, Bilyana Taneva, Stefan Thater, and Gerhard Weikum1. 2011. Robust Disambiguation of Named Entities in Text. In EMNLP, pages 782–792. Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, and Daniel S. Weld. 2011. KnowledgeBased Weak Supervision for Information Extraction of Overlapping Relations. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). Fei Huang and Alexander Yates. 2009. Distributional representations for handling sparsity in supervised sequence labeling. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL). Sayali Kulkarni, Amit Singh, Ganesh Ramakrishnan, and Soumen Chakrabarti. 2009. Collective annotation of wikipedia entities in web text. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 457–466. Tom Kwiatkowski, Luke Zettlemoyer, Sharon Goldwater, and Mark Steedman. 2011. Lexical Generalization in CCG Grammar Induction for Semantic Parsing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). M.E. Lesk. 1986. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of the SIGDOC Conference. Thomas Lin, Mausam, and Oren Etzioni. 2012. Entity linking at web scale. In Knowledge Extraction Workshop (AKBC-WEKEX), 2012. D.C. Liu and J. Nocedal. 1989. On the limited memory method for large scale optimization. Mathematical Programming B, 45(3):503–528. G.S. Mann and D. Yarowsky. 2003. Unsupervised personal name disambiguation. In CoNLL. Paul McNamee, Mark Dredze, Adam Gerber, Nikesh Garera, Tim Finin, James Mayfield, Christine Piatko, Delip Rao, David Yarowsky, and Markus Dreyer. 2009. HLTCOE Approaches to Knowledge Base Population at TAC 2009. In Text Analysis Conference. Rada Mihalcea and Andras Csomai. 2007. Wikify!: Linking documents to encyclopedic knowledge. In Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management (CIKM), pages 233–242. Mike Mintz, Steven Bills, Rion Snow, and Daniel Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics (ACL-2009), pages 1003–1011. Patrick Pantel and Ariel Fuxman. 2011. Jigs and Lures: Associating Web Queries with Structured Entities. In ACL. L. Ratinov, D. Roth, D. Downey, and M. Anderson. 2011. Local and global algorithms for disambiguation to wikipedia. In Proc. of the Annual Meeting of the Association of Computational Linguistics (ACL). Sebastian Riedel, Limin Yao, and Andrew McCallum. 2010. Modeling relations and their mentions without labeled text. In Proceedings of the Sixteenth European Conference on Machine Learning (ECML-2010), pages 148–163. Avi Silberschatz, Henry F. Korth, and S. Sudarshan. 2010. Database System Concepts. McGraw-Hill, sixth edition. Daniel S. Weld, Raphael Hoffmann, and Fei Wu. 2009. Using Wikipedia to Bootstrap Open Information Extraction. In ACM SIGMOD Record. Limin Yao, Sebastian Riedel, and Andrew McCallum. 2010. Collective cross-document relation extraction without labelled data. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP-2010), pages 1013–1023. Yiping Zhou, Lan Nie, Omid Rouhani-Kalleh, Flavian Vasile, and Scott Gaffney. 2010. Resolving surface forms to wikipedia topics. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling), pages 1335–1343.

You May Also Find These Documents Helpful

  • Better Essays

    Updike, John. “A&P.” Blackboard. ed. ENG 102-329. Ed. Gina Yanuzzi. Mount Laurel: BCC, Spring 2013. 1-8. Electronic.…

    • 1171 Words
    • 5 Pages
    Better Essays
  • Good Essays

    Isds Ch 5

    • 3328 Words
    • 14 Pages

    11) By applying a learning algorithm to parsed text, researchers from Stanford University's NLP lab have…

    • 3328 Words
    • 14 Pages
    Good Essays
  • Better Essays

    Dave Thomas was an All American philanthropist as well as a most successful business man. Thomas was the founder and CEO of Wendy 's Old Fashioned Hamburgers, which became popular for its square patties. He is also known for personally appearing in eight hundred television commercials for the chain from 1989 to 2002, more than any other person not just in the fast food industry but in television history (Newsweek 1). He created such an atmosphere in these ads that much of the public began to believed he was a professional actor. Starting what would be his long business venture at only fifteen, Dave Thomas would change the face of America (Wikipedia 1).…

    • 1115 Words
    • 5 Pages
    Better Essays
  • Good Essays

    Philip Lau, writer of the essay, “The Limitations of Wikipedia”, is successful in persuading his readers that the webpage Wikipedia should not be used for college level research. In his essay, Philip states that, “Wikipedia can be a beneficial starting point in gaining general information on a subject but users should be wary of incorrect information”. The essayist’s use of examples, facts and quotes are what makes his argument so convincing.…

    • 586 Words
    • 3 Pages
    Good Essays
  • Good Essays

    Evaluation of Wikipedia

    • 636 Words
    • 3 Pages

    Cited: Miller, Nora. "Wikipedia Revisited." ETC: A Review Of General Semantics 64.2 (2007): 147-150. Academic…

    • 636 Words
    • 3 Pages
    Good Essays
  • Good Essays

    References: Jason C. Miller & Hannah B. Murray Wikipedia in Court: When and how citing Wikipedia and other consensus websites is appropriate. St. John’s Law Review; spring 2012 Vol. 84 Issue 2, P633-656, 24p…

    • 875 Words
    • 4 Pages
    Good Essays
  • Better Essays

    Validity of Wikipedia

    • 1008 Words
    • 5 Pages

    Although not considered to be a credible source for gathering information, Wikipedia is used by everyone in the academic community for obtaining information. The site was established in early 2001 and its purpose is an open source concept using information entered by registered users. The website provides information or description on anyone from the president of the United States to information about a Boeing 747. Higher learning institutions do not consider Wikipedia as a credible or valid source for gathering information. A group of students entering the MBA program and the University Of Phoenix was tasked to debate whether Wikipedia is a creditable and valid source of information.…

    • 1008 Words
    • 5 Pages
    Better Essays
  • Better Essays

    The modern computer world brought major changes around us; it introduced a modern way of doing research through the evolution of Wikipedia. “If we value the pursuit of knowledge, we must be free to follow wherever that search may lead us. The free mind is not a barking dog, to be tethered on a ten-foot chain” (Stevenson Jr., 1900-1965).…

    • 1083 Words
    • 5 Pages
    Better Essays
  • Powerful Essays

    Agency Paper

    • 2391 Words
    • 6 Pages

    Agency Paper Sandra Halbeisen October 8, 2013 SOWK 6151 Professor Cynthia Medina Our Lady of The Lake University…

    • 2391 Words
    • 6 Pages
    Powerful Essays
  • Better Essays

    Credibility of Wikipedia

    • 1033 Words
    • 5 Pages

    There is much debate concerning the validity and reliability of Wikipedia. The internet has transformed the way we gather and learn information. Fifteen years ago, we had internet access, but not to the extent we do today. Even in the 1990’s, if a paper was required for a class that meant time was spent in the library looking up books with a card catalog. We had to read books, write down information, decipher and organize the information into a well-written paper. In the 21st century, students have everything at their fingertips. All that is required in typing is a keyword and an abundant amount of material is populated in a matter of seconds. With all this information available how do students know what is reliable and appropriate for academic use?…

    • 1033 Words
    • 5 Pages
    Better Essays
  • Satisfactory Essays

    information. Wikipedia has proven to be too unreliable for a variety of reasons for it to be trusted…

    • 925 Words
    • 4 Pages
    Satisfactory Essays
  • Satisfactory Essays

    This article was was well stated and made me realize how dangerous using wikipedia as a…

    • 435 Words
    • 1 Page
    Satisfactory Essays
  • Best Essays

    During recent years there has been an expansion of information system covering the same domain. These systems need more and more to communicate, collaborate and exchange content with each other to achieve common goals. Many domains know this revolution, including the learning. Learning systems have a rich and varied content must be accessible, sharable, and exchangeable while keeping the same interpretation during exchange. So we need semantic interoperability between learning systems.…

    • 1899 Words
    • 8 Pages
    Best Essays
  • Good Essays

    wiki

    • 505 Words
    • 3 Pages

    Wikipedia's departure from the expert-driven style of encyclopedia building and the presence of a large body of unacademic content have received extensive attention in print media. In 2006, Time magazine recognized Wikipedia's participation in the rapid growth of online collaboration and interaction by millions of people around the world, in addition to YouTube, Reddit, MySpace, and Facebook.[15] Wikipedia has also been praised as a news source due to articles related to breaking news often being rapidly updated.[16][17][18]…

    • 505 Words
    • 3 Pages
    Good Essays
  • Satisfactory Essays

    docum

    • 625 Words
    • 5 Pages

    all the... Document Details Views: 25 Words: 348 Cite This Essay Ready to get started? Upgrade Products Essays AP Notes Book Notes Citation…

    • 625 Words
    • 5 Pages
    Satisfactory Essays