Preview

Secure Document Similarity Detection

Good Essays
Open Document
Open Document
392 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Secure Document Similarity Detection
Secure Document Similarity Detection

Document similarity detection is very useful in many areas like copyright and plagiarism discovery. However, it is difficult to test the similarity between documents when there is no information disclosure or when privacy is a concern. This paper provides a suggested solution using two metrics that are utility and security.

Problem
Suppose that there are two parties whose concern is finding wither or not they have related or similar documents. These parties have concerns about privacy. Their target is to only discover if there is similarity among their documents without disclosing them.

Solution a. Without Privacy concerns If the parties have no concern about their privacy, then there are many ways to discover the similarities. One among is using “similarity of ranked list”. Given a document D from A entity, find a ranked list of Top 10 documents with B, which are similar to D.

b. Privacy is a concern If the two entities do not want to disclose the documents to each others, then a secure solution has to be found. Using the same utility above, “Similarity of ranked list” and using the security metrics “t-Plausibility” below is a suggested solution: Given a document D, produce D’: a generalized document using t-Plausibility. Pass D’ to party B and retrieve the ranked list of similar documents.

Analysis and testing
To measure the efficiency of the solution suggested above, the top 10 ranked list output from solution (a) is compared with the top list output from solution (b). If for a given threshold, the documents that are common on both lists are close to threshold, then we can say the solution is sufficient.

Comments and Ideas: - The more general D’ is, the less of probability that D’ was generated from D. This may cause the similarity deduction difficult - The top rated list will contain the documents in the domain of D, not the documents similar to it. On other

You May Also Find These Documents Helpful

  • Satisfactory Essays

    Analyzes the documents by grouping them in as many appropriate ways as possible. Does not simply summarize the documents individually.…

    • 1030 Words
    • 4 Pages
    Satisfactory Essays
  • Good Essays

    costco

    • 794 Words
    • 4 Pages

    10.Select the Rankings tab. Choose a document title. What is the document title? What is the ranking method? What number is your company on this list?…

    • 794 Words
    • 4 Pages
    Good Essays
  • Satisfactory Essays

    Pre-writing DBQ

    • 427 Words
    • 2 Pages

    Come up with 3 categories that help answer the question above and can include all of the documents. A document can be placed in more than one category.…

    • 427 Words
    • 2 Pages
    Satisfactory Essays
  • Good Essays

    analyzes the documents by grouping them in as many ways as possible and does not simply summarize the documents individually…

    • 1939 Words
    • 8 Pages
    Good Essays
  • Good Essays

    Lab #3

    • 517 Words
    • 3 Pages

    The similarity is that they both leave a fingerprint leading back to the source data.…

    • 517 Words
    • 3 Pages
    Good Essays
  • Satisfactory Essays

    Ap World History 2010 Q3

    • 1129 Words
    • 5 Pages

    • Identifies at least one valid similarity or at least one valid difference in methods of…

    • 1129 Words
    • 5 Pages
    Satisfactory Essays
  • Satisfactory Essays

    Unit 3 Assignment 3

    • 288 Words
    • 1 Page

    Data classification policy is important for securing data from risks, which includes unauthorized users, modification, disclosure, access, use, and removal. This is to ensure protection on data assets from unauthorized users. Data must be maintained in a secure, accurate and reliable manner and be available for authorized use. In order to implement security measures, the data must be classified into categories like; confidential, restricted, and public. Confidential and restricted will be required some sort of access mechanism to authenticate anyone who wants to access any classified documents. In order to construct a Classification policy, one must know what type of data are available, where they are located, what type of access levels are implemented and what protection level is implemented and does it adhere to compliance regulations.…

    • 288 Words
    • 1 Page
    Satisfactory Essays
  • Good Essays

    2003 Dbq Analysis

    • 479 Words
    • 2 Pages

    When the two documents are collocated, Document 2 is stronger than Document 1. Where Document 1 is insufficient in delivery, Document 2 offers; Document 2 includes solid claims, provides sufficient evidence, and giving a greater viewpoint. As a result of these techniques, the document successfully addresses this issue with greater credibility than Document 1.…

    • 479 Words
    • 2 Pages
    Good Essays
  • Powerful Essays

    Ais Midterm

    • 4395 Words
    • 18 Pages

    The information quality that enables users to identify similarities and differences in two pieces of information.…

    • 4395 Words
    • 18 Pages
    Powerful Essays
  • Satisfactory Essays

    MPI Assignment

    • 274 Words
    • 2 Pages

    Algorithms are mathematical formulas that combine weighted data elements to determine the probability of a duplicate in order to identify potential duplicate MPI entries. The three algorithms are deterministic, rules based, and probabilistic.…

    • 274 Words
    • 2 Pages
    Satisfactory Essays
  • Satisfactory Essays

    ChildLine Activity Cards

    • 384 Words
    • 2 Pages

    Match the examples to these language techniques used in the text. Can you then sort…

    • 384 Words
    • 2 Pages
    Satisfactory Essays
  • Satisfactory Essays

    The first plagiarism checker I will evaluate is Quetext.com, a website that describes itself on its About page as, "A leading plagiarism-detection software, providing services to over 1 million teachers, students, and professionals worldwide."…

    • 183 Words
    • 1 Page
    Satisfactory Essays
  • Satisfactory Essays

    As you type into the boxes, the table will expand. When you are done, the pink boxes in the upper right will have the identified similarities, and the yellow boxes in the lower left will have the identified differences. Be sure you include at least two items in each box.…

    • 273 Words
    • 2 Pages
    Satisfactory Essays
  • Good Essays

    Avoiding Plagiarism

    • 424 Words
    • 2 Pages

    In an article published in the Journal of Geography in Higher Education, Burkill and Abbey (2004) discuss the problem with plagiarism among students and how to avoid plagiarism. The article discusses the definition and the seriousness of plagiarism so the reader comes away with a greater understanding of how important it is to give credit by following APA guidelines for the use someone else’s ideas other than your own.…

    • 424 Words
    • 2 Pages
    Good Essays
  • Powerful Essays

    LDA is a means of classifying objects, such as documents, based on their underlying topics. I was surprised to see this paper as number one instead of Shannon’s information theory paper (#7) or the paper describing the concept that became Google (#3). It turns out that interest in this paper is very strong among those who list artificial intelligence as their subdiscipline. In fact, AI researchers contributed the majority of readership to 6 out…

    • 1801 Words
    • 8 Pages
    Powerful Essays

Related Topics