CS2032 DATAMINING AND DATA WAREHOUSING
ANNA UNIVERSITY QUESTION BANK
Anna University
DEPARTMENT OF INFORMATION TECHNOLOGY
Data Warehousing and Data Mining
QUESTION BANK
2012 Edition
Sub Code : CS2032
Sub Name: Data Warehousing and Data
Mining
UNIT-I
PART A
DATA WAREHOUSING
1. Define the term ‘Data Warehouse’.
2. Write down
the applications of data warehousing.
3. When is data mart appropriate?
4. List out the functionality of
metadata.
5. What are nine decision in the design
of a Data warehousing?
6. List out the two different types of
reporting tools.
7. Why data mining is used in all
organizations.
8. What are the technical issues to be
considered when designing and implementing a data warehouse environment?
9. List out some of the examples of
access tools.
10. What are the advantages of data
warehousing.
11. Give the difference between the
Horizontal and Vertical Parallelism.
12. Draw a neat diagram for the
Distributed memory shared disk architecture.
13. Define star schema.
14. What are the reasons to achieve very
good performance by SYBASE IQ technology?
15. What are the steps to be followed to
store the external source into the data warehouse?
16. Define Legacy data.
17. Draw the standard framework for
metadata interchange.
18. List out the five main groups of
access tools.
19. Define Data Visualization.
20. What are the various forms of data
preprocessing?
21. How is data warehouse different from
database? How are they similar?
22. What is data transformation? Give
example.
23. With an example explain what is Meta
data?
24. What is data mart?
PART-B
1. Enumerate the building blocks of data
warehouse. Explain the importance of metadata in a data
warehouse environment. [16]
2. Explain various methods of data
cleaning in detail [8]
3. Diagrammatically illustrate and
discuss the data warehousing architecture with briefly explain components of
data warehouse [16]
4. (i) Distinguish between Data
warehousing and data mining. [8] (ii)Describe in detail about data extraction,
cleanup [8]
5. Write short notes on
(i)Transformation [8] (ii)Metadata [8]
6. List and discuss the steps involved in
mapping the data warehouse to
a multiprocessor architecture. [16]
7. Discuss in detail about Bitmapped
Indexing [16]
8. Explain in detail about different
Vendor Solutions. [16]
UNIT-II
BUSINESS ANALYSIS
PART A
1. Difference between OLAP and OLTP.
2. Classify OLAP tools.
3. What is meant by OLAP?
4. Difference between OLAP & OLTP
5. Define Concept Hierarchy.
6. List out the five categories of
decision support tools.
7. Define Cognos Impromptu
8. List out any 5 OLAP guidelines.
9. Distinguish between multidimensional
and multi-relational OLAP.
10. Define ROLAP.
11. Draw a neat diagram for the
web processing model.
12. Define MQE.
13. Draw a neat sketch for three-tired
client/server architecture.
14. List out
the applications that the organizations uses to build a query and
reporting environment for the data warehouse.
15. Distinguish between window painter
and data windows painter.
16. Define ADF, SGF and DEF.
17. What is the function of power play
administrator?
PART-B
1. Discuss the typical OLAP operations
with an example. [6]
2. List and discuss the basic features
that are provided by reporting
and query tools used for business
analysis. [16]
3. Describe in detail
about Cognos Impromptu [16]
4. Explain about OLAP in detail. [16]
5. With relevant examples discuss
multidimensional online analytical processing and multi-relational online
analytical processing. [16]
6. Discuss about the OLAP tools and the
Internet [16]
7. (i)Explain Multidimensional Data
model. [10]
(ii)Discuss how computations can be
performed efficiently on data cubes. [6]
UNIT-III
DATA MINING
PART A
1. Define data.
2. State why the data preprocessing an
important issue for data warehousing and data mining.
3. What is the need for discretization in
data mining?.
4. What are the various forms of data
preprocessing?
5. What is concept Hierarchy? Give an
example.
6. What are the various forms of data
preprocessing?
7. Mention the various tasks to be
accomplished as part of data pre-processing.
8. Define Data Mining.
9. List out any four data mining tools.
10. What do data mining functionalities
include?
11. Define patterns.
PART-B
1.
|
(i) Explain the various primitives
for specifying Data mining Task.
(ii) Describe the various descriptive statistical measures for data mining. |
[10] [6]
|
2.
|
Discuss about different types of
data and functionalities.
|
[16]
|
3.
|
(i)Describe in detail about
Interestingness of patterns.
(ii)Explain in detail about data mining task primitives. |
[10]
[6] |
4.
5. |
(i)Discuss about different Issues
of data mining.
(ii)Explain in detail about data preprocessing. How data mining system are classified? Discuss each classification with an example. |
[6]
[10] [16] |
6.
|
How data mining system can be
integrated with a data warehouse? Discuss with an example.
|
[16]
|
UNIT-IV
ASSOCIATION RULE AND CLASSIFICATION
Part A
1. What is meant by market
Basket analysis?
2. What is the use of multilevel
association rules?
3. What is meant by pruning in
a decision tree induction?
4. Write the two measures of Association
Rule.
5. With an example explain correlation
analysis.
6. Define conditional pattern base.
7. List out the major strength of
decision tree method.
8. In classification trees, what are
the surrogate splits, and how are they used?
9. The Naïve Bayes’ classifier makes what
assumptions that motivate its name?
10. What is the frequent item set
property?
11. List out the major strength of the
decision tree Induction.
12. Write the two measures of association
rule.
13. How are association rules mined from
large databases?
14. What is tree pruning in
decision tree induction?
15. What is the use of multi level
association rules?
16. What are the Apriori properties used
in the Apriori algorithms?
17. How is predication different from
classification?
18. What is a support vector machine?
19. What are the means to improve
the performance of association rule mining algorithm?
20. State the advantages of the decision
tree approach over other approaches for performing classification.
PART-B
1. Decision tree induction is a popular
classification method. Taking one typical decision tree induction algorithm ,
briefly outline the method of decision tree classification. [16]
2. Consider the following training dataset
and the original decision tree induction algorithm (ID3). Risk is the class
label attribute. The Height values have been already discredited into disjoint
ranges. Calculate the information gain if Gender is chosen as the test
attribute. Calculate the information gain if Height is chosen as the test
attribute. Draw the final decision tree (without any pruning) for the
training dataset. Generate all the “IF-THEN rules from the decision tree.
Gender Height Risk
F (1.5, 1.6) Low
M (1.9, 2.0) High
F (1.8, 1.9) Medium F (1.8, 1.9) Medium F
(1.6, 1.7) Low
M (1.8, 1.9) Medium
F (1.5, 1.6) Low M (1.6, 1.7) Low M (2.0,
8) High M (2.0, 8) High
F (1.7, 1.8) Medium M (1.9, 2.0) Medium F
(1.8, 1.9) Medium F (1.7, 1.8) Medium
F (1.7, 1.8) Medium [16]
(a) Given the following transactional
database
1 C, B, H
2 B, F, S
3 A, F, G
4 C, B, H
5 B, F, G
6 B, E, O
(i) We want to mine all the frequent
itemsets in the data using the Apriori algorithm.
Assume the minimum support level is 30%.
(You need to give the setof frequent item sets in L1, L2,… candidate item sets
in C1, C2,…) [9]
(ii) Find all the association rules that
involve only B, C.H (in either leftor right hand side of the rule). The minimum
confidence is 70%. [7]
3. Describe the multi-dimensional
association rule, giving a suitable example. [16]
4. (a)Explain the algorithm for
constructing a decision tree from training samples [12]
(b)Explain Bayes theorem. [4]
6. Develop an algorithm for
classification using Bayesian classification.Illustrate the algorithm with a
relevant example. [16]
7. Discuss the approaches for mining
multi level association rules from the transactional databases. Give
relevant example. [16]
8. Write and explain the algorithm for
mining frequent item sets without candidate generation. Give relevant
example. [16]
9. How is attribute oriented induction
implemented? Explain in detail. [16]
10. Discuss in detail about Bayesian
classification [8]
11. A database has four transactions. Let
min sup=60% and min conf=80%.
TID
|
DATE
|
ITEMS_BOUGHT
|
T100
|
10/15/07
|
{K,A,B}
|
T200
|
10/15/07
|
{D,A,C,E,B}
|
T300
|
10/19/07
|
{C,A,B,E}
|
T400
|
10/22/07
|
{B,A,D}
|
Find all frequent itemsets using Apriori
and FP growth, respectively. Compare the efficiency of the two mining
process. [16]
UNIT-V
CLUSTERING AND APPLICATION AND TRENDS IN
DATA
PART A
1. What are the requirements of
clustering?
2. What are the applications of spatial
data bases?
3. What is text mining?
4. Distinguish between classification and
clustering.
5. Define a Spatial database.
6. List out any two various commercial
data mining tools.
7. What is the objective function of
K-means algorithm?
8. Mention the advantages of Hierarchical
clustering.
9. Distinguish between classification and
clustering.
10. List the requirements of clustering
in data mining.
11. What is web usage mining?
12. What are the requirements of
clustering?
13. What are the applications of spatial
databases?
14. What is text mining?
15. What is cluster analysis ?
16. What are the two data structures in
cluster analysis?
17. What is an outlier? Give example.
18. What is audio data mining?
19. List two application of data mining.
PART-B
1. BIRCH and CLARANS are two interesting
clustering algorithms that perform effective clustering in large data sets.
(i) Outline how BIRCH performs clustering
in large data sets. [10] (ii) Compare and outline the major differences of the
two scalable clustering algorithms BIRCH and CLARANS. [6]
2. Write a short note on web mining
taxonomy. Explain the different activities of text mining.
3. Discuss and elaborate the current
trends in data mining. [6+5+5]
4. Discuss spatial data bases and Text
databases [16]
5. What is a multimedia database? Explain
the methods of mining multimedia database? [16]
6. (a) Explain the following clustering
methods in detail.
(a) BIRCH (b) CURE [16]
7. Discuss in detail about any four data
mining applications. [16]
8. Write short notes on
(i) Partitioning methods [8] (ii) Outlier
analysis [8]
9. Describe K means clustering with an
example. [16]
10. Describe in detail about Hierarchical
methods.
11. With relevant example discuss
constraint based cluster analysis. [16]
No comments:
Post a Comment