Preview

Data Mining for Business Intelligence: Data Visualization and Summary Statistics

Powerful Essays
Open Document
Open Document
1091 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Data Mining for Business Intelligence: Data Visualization and Summary Statistics
Chapter 3 – Data
Visualization
Chapter 4 – Summary
Statistics
Data Mining for Business
Intelligence
Shmueli, Patel & Bruce
© Galit Shmueli and Peter Bruce 2010

Data Visualization
• “A picture is worth a thousand words”
• Data visualization and summary statistics help condense data
• Effective presentation
• Supports data cleaning (identify missing values, outliers, incorrect values, duplicates) and exploring (combine some groups)
• Helps identify suitable variables
• Mandatory initial step for most data mining applications Graphs for Data
Exploration
Basic Plots
Line Graphs
Bar Charts
Scatterplots

Distribution Plots
Boxplots
Histograms

Two Examples
Amtrak Ridership:

Boston Housing

Amtrak routinely

Data:

collects data on ridership Goal: To predict future ridership using the series of monthly ridership data between Jan
1991 – March 2004

Census tracts in

Boston
Several variables (14)
– crime rate, location, etc. Goal 1: Predict median value of a home in the tract Goal 2: Cluster census tracts Line Graph for Time Series

Shows how ridership patterns of Amtrak trains change over time

Bar Chart for Categorical
Variable
Determine differences between subgroups
Example: 95% of tracts do not border
Charles River

Scatterplot
Displays relationship between two numerical variables
– median values decreases as percentage of low status population increases

Graphs
 Three most effective plots:
 bar charts – usually for categorical variables
 line graphs – time series data
 Scatterplots – relationship between 2

variables
 Used widely in the business world
 Domain knowledge and nature of the task are

used to select appropriate chart for data at hand Distribution Plots
 Display entire distribution of a numerical

variable
 Display “how many” of each value occur in a data set or, for continuous data or data with many possible values, “how many” values are in each of a series of ranges or “bins”
 Generally useful for prediction tasks
(supervised

You May Also Find These Documents Helpful

  • Good Essays

    Acct 505 Course Project

    • 596 Words
    • 3 Pages

    The 1st individual variable is LOCATION which is a categorical variable. The three subcategories are Urban, Suburban and Rural. Since Location is a categorical variable, the measures of central tendency have not been calculated for this variable. The frequency distribution and pie chart are given as follows:…

    • 596 Words
    • 3 Pages
    Good Essays
  • Good Essays

    The first variable considered is Location, a categorical variable. The three subcategories are Urban, Suburban and Rural. The frequency distribution and pie chart are included. Measures of central tendency and descriptive statistics are not calculated due to the categorical nature of the variable.…

    • 1935 Words
    • 8 Pages
    Good Essays
  • Satisfactory Essays

    DRAFT EXAMINATION TIMETABLE TRIMESTER 3, 2010 MORNING EXAMS AT BURWOOD - COMMENCE AT 8.45 AM…

    • 545 Words
    • 3 Pages
    Satisfactory Essays
  • Good Essays

    Even an appropriate chart type is selected, issues such as scaling, suitable titles and legend types must be considered carefully to ensure effective communication of the data so that it can be interpreted to aid decision making. Data is usually presented in digital table form, sometimes printed out only from the spreadsheet. While this presentation style provides detailed figures, it is not the efficient way to provide and disseminate information. It may be emphasizing some key information, or to emphasize the relationship between certain data or to identify trends. Appropriately presenting data in the form of graphs or graphs can be a useful analysis tool and this can facilitate the decision-making process if the data is effectively interpreted. It allows the user to create charts or graphs that look very professional and let decision makers review data…

    • 661 Words
    • 3 Pages
    Good Essays
  • Satisfactory Essays

    lab 1 assignment

    • 395 Words
    • 2 Pages

    A line graph would work best with this data set because it would provide the clearest results of…

    • 395 Words
    • 2 Pages
    Satisfactory Essays
  • Satisfactory Essays

    Business Intelligence (BI): applications and technologies used to gather, provide access to, and analyze data and information to support decision-making efforts.…

    • 310 Words
    • 2 Pages
    Satisfactory Essays
  • Good Essays

    Data Mining Soltions

    • 1720 Words
    • 7 Pages

    Question 1: Assume a base cuboid of 10 dimensions contains only three base cells: (1) (a1, b2, c3, d4; ..., d9, d10), (2) (a1, c2, b3, d4, ..., d9, d10), and (3) (b1, c2, b3, d4, ..., d9, d10), where a_i != b_i, b_i != c_i, etc. The measure of the cube is count. 1, How many nonempty cuboids will a full data cube contain? Answer: 210 = 1024 2, How many nonempty aggregate (i.e., non-base) cells will a full cube contain? Answer: There will be 3 ∗ 210 − 6 ∗ 27 − 3 = 2301 nonempty aggregate cells in the full cube. The number of cells overlapping twice is 27 while the number of cells overlapping once is 4 ∗ 27 . So the final calculation is 3 ∗ 210 − 2 ∗ 27 − 1 ∗ 4 ∗ 27 − 3, which yields the result. 3, How many nonempty aggregate cells will an iceberg cube contain if the condition of the 4, iceberg cube is "count >= 2"? Answer: There are in total 5 ∗ 27 = 640 nonempty aggregate cells in the iceberg cube. To calculate the result: fix the first three dimensions as (***), (a1**), (*c1*), (**b3) or (*c1b3), and vary the rest seven ones. 4, How many closed cells are in the full cube? Answer: There’re 6 closed cells in the full cube: 3 base cells; (a1, *, *, d4, …, d10); (*, c2, b3, d4, …, d10) : count 2; (*, *, *, d4, .., d10): count 3. Question 2: (Half open questions, make sure your algorithm and assumptions are correct, no need to be very specific) Suppose a base cuboid has the following tuples:…

    • 1720 Words
    • 7 Pages
    Good Essays
  • Powerful Essays

    It is needed to analyze customer behavior and preference data. To generate the necessary data and understand customers’ preferences, Harrah’s had to mine the data.…

    • 5913 Words
    • 24 Pages
    Powerful Essays
  • Powerful Essays

    3) Globalization has significantly reduced the complexity of the business environment. For example, companies can find suppliers and customers in many countries where materials are cheaper, which reduces competition and complexity.…

    • 2350 Words
    • 12 Pages
    Powerful Essays
  • Powerful Essays

    Ready to use MDI front-end for data warehouse applications powered by ContourCube. Forms full report distribution solution with Contour CubeMaker. Provides highly dynamic interface for interactive data analysis, sophisticated analytical,…

    • 1540 Words
    • 7 Pages
    Powerful Essays
  • Better Essays

    1. What will be the biggest obstacles faced by the business intelligence implementation as it expands throughout SYSCO?…

    • 905 Words
    • 4 Pages
    Better Essays
  • Good Essays

    Chapter 6: Multiple Linear Regression Data Mining for Business Intelligence Shmueli, Patel & Bruce © Galit Shmueli and Peter Bruce 2010 Topics Explanatory vs. predictive modeling with regression Example: prices of Toyota Corollas Fitting a predictive model Assessing predictive accuracy Selecting a subset of predictors (variable selection) Explanatory Modeling Goal: Explain relationship between predictors (explanatory variables) and target  Familiar use of regression in data analysis  Multiple linear regression – linear relationship between a dependent variable Y (response) and a set of predictors X1,…,Xp  Model Goal: Fit the data well and understand the contribution of explanatory variables to the model – model performance assessed by residual analysis  Model fitted to the entire dataset Predictive Modeling Goal: Predict target values in new data where we have predictor values, but not target values Classic data mining context Model Goal: Optimize predictive accuracy – how accurately can the fitted model predict new cases Model trained on training data and performance is assessed on validation or test data Explaining role of predictors is not the primary purpose (although useful)…

    • 921 Words
    • 8 Pages
    Good Essays
  • Powerful Essays

    Business Intelligence (BI) is defined by IBM as, “the discipline that combines services, applications and technologies to gather, manage and analyze data, transforming it into usable information to develop insight and understanding needed to make informed decisions.” (IBM.com, 2006) In its most basic form, BI is an umbrella principle that synergizes the core understanding of your business, including all of its facets, and acting on what that foundation is made up of.…

    • 1501 Words
    • 7 Pages
    Powerful Essays
  • Powerful Essays

    Copyright © 2006 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.com Last Modified on May 29, 2008…

    • 5457 Words
    • 22 Pages
    Powerful Essays
  • Good Essays

    How Data Mining, Data Warehousing and On-line Transactional Databases are helping solve the Data Management predicament.…

    • 853 Words
    • 4 Pages
    Good Essays

Related Topics