Follow and like us on our Facebook page where we post on the new release subject and answering tips and tricks to help save your time so that you can never feel stuck again.
Shortcut

Ctrl + F is the shortcut in your browser or operating system that allows you to find words or questions quickly.

Ctrl + Tab to move to the next tab to the right and Ctrl + Shift + Tab to move to the next tab to the left.

On a phone or tablet, tap the menu icon in the upper-right corner of the window; Select "Find in Page" to search a question.

Share Us

Sharing is Caring

It's the biggest motivation to help us to make the site better by sharing this to your friends or classmates.

Data Analysis

The process of inspecting, cleansing, and modelling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

procedure

qualitative research

methods

interpretation

statistical treatment

probability

analyst

mathematics

correlation

demographic

tracking earned value

business analyst

Which function provides the value of a function at any particular value of x but does NOT directly give the probability of the random variable?

  • a. Probability density
  • b. continuous probability
  • c. discrete probability
  • d. probability mass

Which is NOT a correct correlation Coefficient?

  • a. 0.56
  • b. 1.2
  • c. -0,43
  • d. 0.9

What is the value of the mean and standard deviation in a normal probability density function?

  • a. Mean =10 s=1
  • b. mean-50 s=5
  • c. Mean =50 s=10
  • d. Mean =10 s=5

_____________ is rated as the number one business analytics software.

  • a. WEKA
  • b. Orange
  • c. Rapid miner
  • d. Knime

Which is NOT a characteristic feature of data structure?

  • a. It defines as to how components relate to each other.
  • b. Set of operations is on one or more component items.
  • c. It contains component data
  • d. It contains a fixed structure.

What is the value of the standard deviation in a standard normal distribution?

  • 1
  • 0
  • 2
  • 5

Which of the following is TRUE when a distribution is normal?

  • Mean=Median=Mode
  • Mean >Mode >Median
  • Mean < Median <Mode
  • Mean > Median >Mode

Which is NOT a component of KR?

  • fundamental conception
  • set of inferences that represent sactions
  • set of inferences that it recommends
  • it adheres to the function

It assigns a cost to every machine operation.

  • a. uniform cost model
  • b. exponential cost model
  • c. variable cost model
  • d. logarithmic cost model

Displays the performance of a model and enables a comparison to be made with other models.

  • a. DAC
  • b. GLM
  • c. SBC
  • d. ROC curve

The term "analysis of algorithms" was coined by

  • a. Ronald Kunt
  • b. Donald Knot
  • c. Ronald Knut
  • d. Donald Knuth

The major outcome of correlation.

  • a. prediction
  • b. interpretation
  • c. analysis
  • d. critical thinking

Which is primarily written in C and in Fortran?

  • a. WEKA
  • b. Rapid miner
  • c. IBM Cognos
  • d. R-programming

It is a theoretical classification that estimates and anticipates the increase in running time (or run-

  • a. Run-time behaviour
  • b. Run-time cost
  • c. Run-time average
  • d. Run-time analysis

The classification table that XLSTAT can display

  • a. poison matrix
  • b. hypergeometric matrix
  • c. confusion matrix
  • d. logistic matrix

The standard deviation for the data in 2,4,4,4,5,5,6,8,9

  • a. 2.15
  • b. 2.16
  • c. 2.18
  • d. 2.17

What technique can be used to measure an algorithm's running time?

  • a. software profiling
  • b. software operation
  • c. software analysis
  • d. software behavior

There are how many data mining techniques?

  • a. 6
  • b. 7
  • c. 5
  • d. 8

The method that does NOT require t he assumption that the parameters are normally distributed.

  • a. profile
  • b. profile link
  • c. profile likehood
  • d. identity likehood

It is used to enable an entity to determine consequences by thinking rather than acting.

  • Artificial Intelligence
  • Intelligent reasoning
  • Knowledge Channel
  • Knowledge Representation

Which of the following is TRUE?

  • a. AB=BA
  • b. BC=CB
  • c. AB is not possible
  • d. A + B = B+ A

Which of the following is a predictive data mining technique?

  • a. Prediction
  • b. tracking pattern
  • c. association
  • d. regression

A vegetable distributor knows that during the month of August ,the weights of tomatoes are normally distributed with a mean of 0.61 lb and a standard deviation of 0.15 lb. What percent of the tomatoes weigh less than 0.71 lb?

  • a. 85
  • b. 95
  • c. 97
  • d. 84

The proportion of a well-defined classified positive events.

  • a. clarity
  • b. sensitivity
  • c. volume
  • d. velocity

If the standard deviation of a distribution is 3, the variance is

  • a. 6
  • b. 1.5
  • c. 9
  • d. 1.41

The following are elements in an analytic plan EXCEPT

  • a. graphs
  • b. analytic models
  • c. decision support tools
  • d. interlinked data output

As of 2014,there are _______million of tweets a day.

  • a. 400
  • b. 200
  • c. 500
  • d. 300

A network purpoting to describe family memberships.

  • a. network area
  • b. network facility
  • c. network access
  • d. network topology

How many bytes of data are generated every two days in today's world?

  • a. 5 terabytes
  • b. 5 exabytes
  • c. 5 gigabytes
  • 5 gigabytes
  • d. 5 megabytes

A survey of 100 consumers said that the price charged for a kilo of rice could be approximated by a normal distribution with a mean of 35 and a standard deviation of 4.How many of them lie between 27 and 43?

  • 90
  • 95
  • 92
  • 88

It is often used as a model of the number of arrivals at a facility in a given period of time.

  • a. multinomial probability distribution
  • b. poison probability distribution
  • c. binomial probability distribution
  • d. logic probability distribution

Empirical rule for a normal distribution that is 2 standard deviations above and below the mean is ________% of data.

  • 80
  • 95
  • 90
  • 85

It is a method for discovering patterns in large data sets.

  • a. Business Intelligence
  • b. Text Analytics
  • c. Statistics Analytics
  • d. Data Mining

An array is a good example of _________data structure.

  • a. linear
  • b. static
  • c. dynamic
  • d. nonlinear

What range of values lie between 3 standard deviations above and below the mean if the mean is 80 and the standard deviation is 3?

  • 70-89
  • 71-88
  • 71-89
  • 72-89

It views the world in terms of attribute -object value triples

  • a. semantic nets
  • b. frame
  • c. rule-based
  • d. logic based

A distribution with 4 modes is said to be a _________distribution.

  • a. bimodal
  • b. multimodal
  • c. unimodal
  • d. trimodal

Which of the following is a continuous distribution?

  • a. negative binomial
  • b. hypergeometric
  • c. geometric
  • d. Chi-square

Which of the following is NOT a data mining tool?

  • a. Python
  • b. Orange
  • c. WEKA
  • d. Knime

It is a variety of formal calculation typically deduction.

  • Artificial Intelligence
  • GLM
  • Intelligent Reasoning
  • KR

It is a free software programming language.

  • a. Orange
  • b. WEKA
  • c. R-programming
  • d. Knime

Which is a concatenation of α =babaa β =a^6b^8a which is α β ?

  • a. babab^8a
  • b. babaaaa^6b^8a
  • c. a^6b^8ababba
  • d. babaa b^8a

KR means __________________________.

  • Knowledge Representation
  • Knowledge Replenished
  • Knowledge Requisition
  • Knowledge Request

It includes identifying groups of data records

  • a. cluster analysis
  • b. data analysis
  • c. database
  • d. data mining

A bell-shaped distribution that is symmetric about a vertical line?

  • skewed
  • symmetric
  • standard
  • normal

Which is NOT a measure of variability?

  • a. standard deviation
  • b. variance
  • c. quartile
  • d. range

The developer of farmville, a famous game in the internet.

  • a. Supercell
  • b. Moontoon
  • c. Electronic Arts
  • d. Zynga Incorporated

What does GLM means?

  • a. Generalized Linear model
  • b. Generalized Linear Mode
  • c. General Line Mode
  • d. General Linear Model

What is the size of the product of a 5x 6 and a 6x 8 matrices?

  • a. 5x 8
  • b. 8x8
  • c. 5x5
  • d. 8x5

It is a variety of formal calculation typically deduction.

  • a. Intelligent Ratio
  • b. Intelligence Rationing
  • c. Intelligent Reasoning
  • d. Intelligence Ratio

He is someone who asks interesting questions on formal and informal theory.

  • a. data analyst
  • b. data scientist
  • c. data expert
  • d. data drive

If there are 101 scores the median is equal to the _____ranked score.

  • a. 55th
  • b. 54th
  • c. 52nd
  • d. 51st

It views the world in terms of attributes object value triples.

  • rule based
  • logic
  • semantic net
  • frame

Data involving two variables are called _________data.

  • a. bivariate
  • b. dichotomy
  • c. dichotomal
  • d. multivariate

If there are 103 scores the median is equal to the _____ranked score.

  • a. 51st
  • b. 55th
  • c. 54th
  • d. 52nd

It includes identifying groups of data records.

  • a. database
  • b. data mining
  • c. data analysis
  • d. cluster analysis

What percent of data will lie within 2 standard deviation of the mean?

  • 68
  • 90
  • 95
  • 99

The area of the standard normal curve to the right of z=0.82 is _______.

  • 0.294
  • 209
  • 0.295
  • 0.206

Earlier name for data science.

  • a. dataloogy
  • b. datology
  • c. datalogy
  • d. dataology

On an examination given to 1000 students, Jef’s score of 80 was higher than the score of 480 students who took the exam. What is the percentile for Jef’s score?

  • a. 65th
  • b. 48th
  • c. 60th
  • d. 50th

What conditions must be satisfied in the development of a probability function for a discrete random variable? a. must be nonnegative b.sum of the probabilities for each value must be equal to 1. c. may assume any value d.assumes specific values

  • a. a and b
  • b. b and c
  • c. a and c
  • d. a and d

The process of inspecting,cleansing,transforming and modelling data with the goal of discovering useful information.

  • a. data statistics
  • b. data analysis
  • c. Data mining
  • d. data retriever

Data involving two variables.

  • a. bichotomous
  • b. divariate
  • c. bivariate
  • d. binary

Which of the following statements is TRUE?

  • a. Q2=Range
  • b. Q2=Mode
  • c. Q2=Mean
  • d. Q2=median

The most common function used to link probability to explanatory variables.

  • a. Probit model
  • b. Proba model
  • c. logit model
  • d. legit model

It enables the performance of a model and enables a comparison to be made with other models.

  • a. ROC
  • b. GML
  • c. LR
  • d. IOT

What is the value of the mean in a normal probability density function?

  • a. 1
  • b. 10
  • c. 50
  • d. 5

What range of values 3 SD below and above the mean in a normal distribution if the mean is 10 and standard deviation is 2?

  • 5-15
  • 4-16
  • 8-14
  • 10-14

Which is NOT a basic representation technologies?

  • a. rules
  • b. graph
  • c. frame
  • d. logic

What programming language is used in Rapid miner?

  • a. Python
  • b. Cobol
  • c. Java
  • d. Pascal

ROC comes from ______theory.

  • a. signal attraction
  • b. signal execution
  • c. signal detraction
  • d. signal detection

In the equation of the regression line represented by Y= 1.24 X + 6.9 if X=2 then Y =?

  • a. 9.21
  • b. 9.38
  • c. 9.30
  • d. 9.02

GLM means_____________.

  • a. Generalized Line Mode
  • b. General Line Model
  • c. Generalized Linear Mode
  • d. Generalized Linear Model

What is an organized collection of information and set of information used to manage that operation?

  • a. data science
  • b. ML
  • c. data structure
  • d. ADT

What is the earlier name for data science?

  • a. datology
  • b. datatology
  • c. dataology
  • d. datalogy

Which of the following algorithms is the fastest?

  • a. platform search
  • b. binary search
  • c. linear search
  • d. software search

What is the value of the mean if a score of 110 is 3 standard deviation above the mean?

  • 90
  • 85
  • 91
  • 95

The number that occurs most frequently is called________.

  • a. range
  • b. Mode
  • c. median
  • d. mean

It provides the height or the value of the function at any particular value of x

  • a. probability density function
  • b. probability dense function
  • c. probability mass function
  • d. probability massive function

Which is usually denoted as n in algorithms?

  • a. sample size
  • b. population size
  • c. Input size
  • d. output size

A vegetable distributor knows that during the month of August ,the weights of tomatoes are normally distributed with a mean of 0.61 lb and a standard deviation of 0.15 lb. How many can be expected to weigh between 0.31 to 0.91 in a shipment of 4500 tomatoes.

  • a. 4000
  • b. 4275
  • c. 4215
  • d. 4100

Which of the following is NOT a module in rapid Miner?

  • a. radoop
  • b. studio
  • c. loop
  • d. server

It involves a commitment in viewing the world in terms of individual entities and relations.

  • rules
  • logic
  • frame
  • semantic nets

He said that “ In mathematics the art of proposing a question must be held of higher value than solving it”.

  • a. Francis Galton
  • b. Eric Schmidt
  • c. William Gibson
  • d. Georg Cantor

Another term for variability.

  • a. dispersion
  • b. mean
  • c. frequent
  • d. center

Which of the following does not use discrete distribution ?

  • a. geometric
  • b. hypergeometric
  • c. chi=square
  • d. negative binomial

Lists the percent of data in each distribution.

  • relative frequency distribution
  • ogive
  • histogram
  • grouped frequency distribution

It partitions a ranked data into four equal groups.

  • a. median
  • b. mean
  • c. quartile
  • d. percentile

Which of the matrices is singular?

  • a. none
  • b. A
  • c. B
  • d. C

These are the data skills that a good data scientist need to cultivate EXCEPT

  • a. Math and Stats
  • b. speaking
  • c. Communication
  • d. coding

What is the mean for a standard normal distribution?

  • 5
  • 1
  • 2
  • 0
  • 0 (zero)

A vegetable distributor knows that during the month of August ,the weights of tomatoes are normally distributed with a mean of 0.61 lb and a standard deviation of 0.15 lb. How many can be expected to weigh more than 0.31 lb in a shipment of 6000 tomatoes.

  • a. 150
  • b. 200
  • c. 100
  • d. 120

It expands available data enormously since there is so much more text being generated than numbers.

  • a. Text mining
  • b. text analysis
  • c. data ranking
  • d. data mining

LR means ________________________.

  • a. Logistic Reinforcement
  • b. Linear Regression
  • c. Logistic Regression
  • d. Linear Relativity

“ All models are wrong but some are useful “

  • a. Georg cantor
  • b. William Gibson
  • c. George E. P. Box
  • d. DJ Patil

The integral of all the values of a random variable in a probability density function is equal to______.

  • a. zero
  • b. One
  • c. negative integer
  • d. positive integer

It extracts meaningful numerical indices from information and make it available to statistical and machine learning.

  • a. data visualization
  • b. Text analytics
  • c. data mining
  • d. business intelligence

The following are softwares used in data mining EXCEPT

  • a. Weka
  • b. Rapid miner
  • c. SPSS
  • d. Orange

The explosion of _______data is the main reason why every 2 days 5 exabytes of data are generated.

  • a. gargantuan
  • b. reaction
  • c. transaction
  • d. interaction

It is a process of finding the computational complexity of algorithms.

  • a. history of algorithms
  • b. length of algorithms
  • c. analysis of algorithms
  • d. analogy of algorithms

It offers a way to examine trends from collected data and derive insights from it.

  • a. Text analytics
  • b. Business Intelligence
  • c. data visualization
  • d. Data mining

He coined the term “analysis of algorithms”.

  • a. Donald Knuth
  • b. David Knut
  • c. David Knuth
  • d. Donald Knoth

The goal is to transform raw data into understandable business information.

  • a. business intelligence
  • b. data visualization
  • c. text analytics
  • d. Data mining

The intersection of the two sets A={ 2,3} B={4,5} is a

  • a. null set
  • b. singleton
  • c. singular
  • d. nonsingular

PAW means____________.

  • a. Preliminary Assumption Web
  • b. Predictive Analytics World
  • c. Predictive Analytics web
  • d. Predicting Analytics Web

What is the shape of a normal probability distribution?

  • a. bell-shaped
  • b. assymetrical
  • c. leptokurtic
  • d. skewed

It is a numerical description of the outcome of a statistical experiment.

  • a. continuous variable
  • b. random variable
  • c. numerical variable
  • d. discrete variable

The quantification of data into information.

  • a. mining
  • b. dataology
  • c. datafication
  • d. analytics

IOT means

  • a. Internet of things
  • b. Internet of time
  • c. Interconnction of things
  • d. Interaction of time

What is a data structure that has a fixed size?

  • a. static
  • b. linear
  • c. nonlinear
  • d. dynamic

The following are large inputs EXCEPT

  • a. Big beta notation
  • b. big theta notation
  • c. Big O notation
  • d. Big omega notation

The creation of a data product contains 3 components EXCEPT

  • a. process
  • b. technical expertise
  • c. data
  • d. time

It is a powerful tool that shows the network of data.

  • a. WEKA
  • b. Knime
  • c. Orange
  • d. Rapid Miner

The following provided inspirations of what constitute intelligent reasoning EXCEPT

  • a. economics
  • b. philosophy
  • c. psychology
  • d. mathematical logic

Empirical rule for a normal distribution lie ______% of data with 1 standard deviation below and above the mean.

  • 64
  • 75
  • 79
  • 68

It is a numerical function of the outcome of a statistical experiment.

  • a. variable
  • b. constant
  • c. dense variable
  • d. random variable

If A={ 2,3} B={4,5},which of the following is a Cartesian product of the two sets?

  • a. { (3,4) (3,5) (2,4 ) {2,2) }
  • b. { (3,4) (3,3) (2,4 ) {2,5) }
  • c. { (3,4) (3,5) (2,4 ) {2,5) }
  • d. { (3,3) (3,5) (2,4 ) {2,5) }

Which is not a measure of central tendency?

  • a. mode
  • b. median
  • c. standard deviation
  • d. mean

Who said that "The future is not google-able " ?

  • a. Wiliam Harvey
  • b. Roland Patil
  • c. William Gillason
  • d. Dennis Grant

It refers to well based theories and sound business judgement.

  • a. Data visualization
  • b. Data Mining
  • c. Data Analytics
  • d. Data Science

It is used for prototyping in Rapid miner.

  • a. radoop
  • b. studio
  • c. loop
  • d. server

The normal distribution with a mean of 0 and standard deviation of 1.

  • Skewed
  • kurtic
  • skewed to the right
  • Standard

Another term for an empty set.

  • a. singleton
  • b. null
  • c. zero
  • d. cipher

A negative correlation exists when___________.

  • a. x increases y increases
  • b. x increases y decreases
  • c. x decreases y decreases
  • d. x and y remains constant

What increases data volume?

  • a. velocity
  • b. viscosity
  • c. vastness
  • d. variety

Exabyte means ________bytes

  • million million
  • trillion trillion
  • thousand thousand
  • billion billion

ROC means

  • a. Receiving Operator Character
  • b. Receiver Operator Characteristics
  • c. Receiving Operating Character
  • d. Receiver Operating Characteristics

It views the world in thinking of prototypical objects.

  • frame
  • rule
  • logic
  • semantic net

The proportion of well defined negative events is called ________________.

  • a. regression
  • b. sensitivity
  • c. specificity
  • d. probability

KR is a set of __________commitments.

  • ontological
  • anthropological
  • social
  • psychological

Which of the following is NOT a method used in data analysis?

  • a. Statistics Analytics
  • b. Business Intelligence
  • c. Data Mining
  • d. Text Analytics

A graph used to indicate intervals in a frequency distribution is refereed to as a______________.

  • pie graph
  • histogram
  • ogive
  • bar graph

The most frequent score.

  • a. mean
  • b. standard deviation
  • c. mode
  • d. median

A bell-shaped distribution that is symmetric about a vertical line.

  • a. normal
  • b. standard
  • c. kurtic
  • d. skewed

It relates the length of an algorithm to the number of storage location it uses.

  • a. space analysis
  • b. space covered
  • c. space flexibility
  • d. space complexity

Algorithm analysis is an important part of a broader_____________.

  • a. computed complex theory
  • b. computerized complexity theory
  • c. computational complex theorem
  • d. computational complexity theory

If R= { (3,3), (3,6), (5,5),(5,10),(6.12)} is a binary relation in R which the domain is

  • a. {3,5,6,10,12}
  • b. {5,10,12}
  • c. {3,5,6}
  • d. {3,6}

What is the correct meaning of ADT?

  • a. Abstract Data Type
  • b. Abstract Data Topography
  • c. Adequate Data Type
  • d. Adequate Data Tautology

Which of the following is a set equal to the distinct letters of the word "MISSISSIPI"?

  • a. {I,L,M,P}
  • b. {I.M,S}
  • c. {I,M,S,P}
  • d. {I,M,P}

SBC means_________

  • a. Schwar’s Bayesian Criterion
  • b. Schmidt’s Bayesian Criterion

An algorithm is said to be efficient when its function values are

  • a. sufficient
  • b. small
  • c. big
  • d. enough

A frequently used method as it enables binary variables, sum polytomous variable to be modelled.

  • a. logistic regression
  • b. binomial regression
  • c. linear regression
  • d. exponential regression

It is a module in rapid miner that considers the workflow.

  • a. radoop
  • b. studio
  • c. server
  • d. loop

The following are abstract notions EXCEPT

  • casualty
  • beliefs
  • processees
  • actions

Which pair belongs to the same family of models called GLM? i) logistic ii) linear regression iii.) multinomial regression iv)probability

  • a. iii and iv
  • b. I and iv
  • c. ii and iii
  • d. I and ii

A score of 3 in 2,4,4,4,5,5,6,8,9 is

  • a. 1.02 below the mean
  • b. 1.2 above the mean
  • c. 1.18 below the mean
  • d. 1.92 above the mean

Any way to get new expressions from old ones.

  • surrogate
  • semantic
  • inference
  • reasoning

Matrix B is

  • a. invertible
  • b. singular
  • c. transpose
  • d. inverse

The distribution 2,4,4,4,5,5,6,8,9 is said to be

  • a. multimodal
  • b. bimodal
  • c. trimodal
  • d. unimodal

What type of text are processed in Text analytics?

  • a. raw
  • b. structured
  • c. altered
  • d. unstructured

The following are data mining techniques EXCEPT:

  • a. Clustering
  • b. Classification
  • c. Collection
  • d. Regression

What is the focus of data science?

  • a. collection of data
  • b. statistical computation
  • c. manipulate data efficiently and effectively
  • d. organization of data

It involves a commitment in viewing the world in terms of individual entities and relations between them.

  • a. rule-based
  • b. semantic nets
  • c. frame
  • d. logic

Two of the most widely used discrete probability distribution.

  • a. logic and exponential
  • b. binomial and logic
  • c. poisson and logic
  • d. poisson and binomial

It shows a high correlation between the incidence of flu and searches about flu on google.

  • a. Google Flu Reactions
  • b. Google Flu trends
  • c. Google Flu Viral
  • d. Google Flu Searches

Data is NOT information unless we add_________.

  • a. volume
  • b. depth
  • c. analytics
  • d. velocity

A survey of 100 consumers said that the price charged for a kilo of rice could be approximated by a normal distribution with a mean of 35 and a standard deviation of 4.How many are less than 39?

  • 80
  • 78
  • 82
  • 84

Null strings are indicated by

  • a. λ
  • b. α
  • c. γ
  • d. β

It does NOT require the assumption that the parameters are normally distributed.

  • a. mass likehood
  • b. densiy likehood
  • c. profile likehood
  • d. definite likehood

It refers to a data structure that grows and shrinks at execution time.

  • a. dynamic
  • b. linear
  • c. static
  • d. nonlinear

The expected value or mean of a random variable in discrete case.

  • a. probability logit function
  • b. probability density distribution
  • c. probability dense function
  • d. probability mass distribution

Which of the following type of text is processed in text analytics?

  • a. unorganized
  • b. structured
  • c. unstructured
  • d. raw

It is used in organization’s strategic and tactical business decision making.

  • a. data mining
  • b. data visualization
  • c. text analytics
  • d. business intelligence

It relates the length of an algorithm’s input to the number of steps it takes.

  • a. time flexibility
  • b. time elapsed
  • c. time series
  • d. time complexity

Positive correlation means that_______________.

  • a. as x increases y remains constant
  • b. as x increases y also increases and vice versa
  • c. as x increases y decreases
  • d. as x decreases y increases

If R={ (3,3), (3,6), (5,5),(5,10),(6.12) is a cartesian product of sets X and Y and x= {3,5,6} then Y=?

  • a. {3,5,6}
  • b. {3,6}
  • c. {5,10,12}
  • d. {3,5,6,10,12}

A bell shaped curve that is symmetric about a vertical line.

  • skewed
  • kurtic
  • normal distribution
  • standard distribution

He coined the term "data scientist"

  • a. G.Cantor
  • b. J Pastor
  • c. N.R. Drops
  • d. DJ Patil

Which of the following belong to the GLM?

  • a. multivariate
  • b. exponential
  • c. logistic
  • d. quadratic

On an examination given to 1000 students, Jef’s score of 80 was higher than the score of 480 students who took the exam. What is the percentile for Jef’s score?

  • a. 50th
  • b. 65th
  • c. 48th
  • d. 60th

The following processes are used in data analysis EXCEPT:

  • a. transforming
  • b. collecting
  • c. inspecting
  • d. cleansing

The proportion of a well-classified negative event.

  • a. clarity
  • b. identity
  • c. specificity
  • d. sensitivity

The range in R={ (3,3), (3,6), (5,5),(5,10),(6.12)} is a binary relation in R is

  • a. {3,5,6}
  • b. {3,6}
  • c. {5,10,12}
  • d. {3,5,6,10,12}

The equation of the _______line predicts the value of Y given X.

  • a. Regression
  • b. prediction
  • c. analytic
  • d. correlation

A positive z-score means that the score is

  • a. Higher than the mean
  • b. Equal to the mean
  • c. Lower than the mean
  • d. One standard deviation higher than the mean

All representations are ________.

  • stable
  • perfect
  • unstable
  • imperfect

It is often used as model of of the number arrivals at a facility in a given period of time.

  • a. binomial probability distribution
  • b. normal probability distribution
  • c. logistic probability distribution
  • d. poison probability distribution

The symbol used to indicate strings with no elements.

  • a. β
  • b. δ
  • c. λ
  • d. α

_____________ includes identifying groups of data record.

  • a. Text Analytics
  • b. Cluster analysis
  • c. Statistics Analytics
  • d. Business Intelligence

Algorithms are independent of its

  • a. operation
  • b. platform
  • c. time
  • d. behaviour

The most common functions used to link probability to the explanatory variables are the LOGIT model and ________model.

  • a. PROBIT
  • b. PROBET
  • c. LEGIT
  • d. LOGET

The constant multiplicative factor in which algorithms are related are_______ constants.

  • a. alternative
  • b. rational
  • c. hidden
  • d. additive

What is the process of deriving useful information from text?

  • a. Statistics Analytics
  • b. Business Intelligence
  • c. Text Analytics
  • d. Data Mining

Which is NOT a basic representation technology?

  • semantic net
  • graph
  • frame
  • logic

A matrix that has the same number of rows and columns is called

  • a. identity
  • b. invertible
  • c. square
  • d. transpose

Primarily used for data pre-processing.

  • a. Knime
  • b. WEKA
  • c. Orange
  • d. Rapid miner

It corresponds to the case where the dependent variable has more than 2 categories.

  • a. trinomial logit model
  • b. binomial logit model
  • c. multinomial logit model
  • d. polynomial logit model

Which is an example of a discrete random variable?

  • a. number of books
  • b. weight
  • c. height
  • d. time

It is a perfect software which is written in Python computing language.

  • a. Rapid miner
  • b. WEKA
  • c. Orange
  • d. Knime

A distribution where large distribution are displayed.

  • histogram
  • Relative frequency distribution
  • ogive
  • Grouped frequency distribution

Which is NOT a KR technology?

  • semantic nets
  • logic
  • frames
  • roles

Which of the following data mining techniques is predictive?

  • a. clustering
  • b. outlier detection
  • c. classification
  • d. tracking pattern

The score NOT easily affected by extreme values.

  • a. mode
  • b. Median
  • c. mean
  • d. range

The following are discrete distributions EXCEPT

  • a. negative binomial
  • b. geometric
  • c. chi-square
  • d. hypergeometric

Example of a data product.

  • a. google search
  • b. google drive
  • c. google games
  • d. google map

ML means:

  • a. Mobile Learning
  • b. Math Learning
  • c. Machine Learning
  • d. Machine Landscaping

A score of 50 lies 2 standard deviations above a mean of 30.What is the value of the standard deviation?

  • 15
  • 10
  • 20
  • 25

A special type of function where the domain is a set of consecutive integers.

  • a. series
  • b. string
  • c. regression
  • d. sequence

Running time for algorithms is usually measured in

  • a. microseconds
  • b. picoseconds
  • c. nanoseconds
  • d. miiliseconds

Which of the following is a discrete distribution?

  • a. gamma
  • b. Hypergeometric
  • c. exponential
  • d. chi-square

_______________ is a data structure that every component has a unique processor and succesor.

  • a. dynamic
  • b. linear
  • c. static
  • d. nonlinear

A positive z-score means that the score is

  • a. One standard deviation higher than the mean
  • b. Higher than the mean
  • c. Equal to the mean
  • d. Lower than the mean

3A + B =

  • d
  • a
  • c
  • b

It is a theoretical classification that estimates and anticipates the increase increase in running time for algorithms.

  • a. run time analysis
  • b. run turn analysis
  • c. running turn analysis
  • d. running time analysis

The following are continuous distributions EXCEPT

  • a. geometric
  • b. F-test
  • c. chi-square
  • d. exponential

The most widely used continuous probability distribution.

  • a. Normal
  • b. discrete
  • c. uniform
  • d. standard

Which is Not an interaction data?

  • a. geo-location
  • b. RFID data
  • c. data base
  • d. browser action

The method of correlation used for ranked score is ________.

  • a. Spearman rho
  • b. Pearson r
  • c. Chi-square
  • d. Kendal tau

A data having the same number of occurrence in scores is said to be

  • a. trimodal
  • b. no mode
  • c. unimodal
  • d. bimodal

It makes complex data more understandable and usable.

  • a. data visualization
  • b. data mining
  • c. business intelligence
  • d. text analytics

The middle-most value in a ranked list of numbers.

  • a. mean
  • b. median
  • c. mode
  • d. percentile

The most commonly used continuous probability distribution.

  • a. standard
  • b. normal
  • c. multinomial
  • d. linear

The value of X in the regression equation Y= 1.24 X + 6.9 if Y=13.1 is

  • a. 3
  • b. 6
  • c. 5
  • d. 4

If A= { x/x is a distinct letter in the word "MATHEMATICS"} AND B={x/x is a distinct letter in the word "STATISTICS"} then their intersection is

  • a. {C,I,S}
  • b. {A,C,I,S,T}
  • c. {A,C,I,S}
  • d. .{A,C,S,}

The function describing the performance of an algorithm is usually an upper bound determined from ______inputs.

  • a. best case
  • b. better case
  • c. worst case
  • d. average case

If the standard deviation of a distribution is 3.5, the variance is

  • a. 12.50
  • b. 14.50
  • c. 12.25
  • d. 15.25

What is a great example of data product?

  • a. google maps
  • b. google navigation
  • c. google drive
  • d. google navigation

It refers to a frequently used method as it enables binary or polytomous variables to be modelled.

  • a. exponential regression
  • b. Multinomial regression.
  • c. logistic regression
  • d. linear regression

Which pair belongs to the same family of models called GLM ? i) logistic ii) linear regression iii.) multinomial regression iv)probability

  • a. iii and iv
  • b. ii and iii
  • c. I and iv
  • d. I and ii

KR as a _________is a substitute for the thing itself.

  • semantic
  • pragmatic
  • surrogate
  • ontological

The score easily affected by extreme values is the _________.

  • a. Mean
  • b. median
  • c. range
  • d. mode

The classification table that XL Stat can display.

  • a. square matrix
  • b. inverse matrix
  • c. confusion matrix
  • d. identity matrix

The product of a 2x5 and 5x3 matrices is a ______matrix

  • a. 3x5
  • b. 5x2
  • c. 2x3
  • d. 5x5

The person who said that “ The future is not google-able”.

  • a. Eric Schmidth
  • b. William Gibson
  • c. D J Patil
  • d. Georg cantor

It is a collection of machine learning algorithms for data mining task.

  • a. Knime
  • b. Rapid miner
  • c. Orange
  • d. WEKA

A graph that is used to indicate frequency distribution.

  • a. bar graph
  • b. pie graph
  • c. histogram
  • d. ogive

The method used to iteratively find a solution to a multinomial legit model.

  • a. Baye’s algorithm
  • b. Kun’s alhorithm
  • c. Newtonian algorithm
  • d. Newton-Raphson algorithm

What is value of quartile 3 in 2,4,4,4,5,5,6,8,9 ?

  • a. 6
  • b. 8
  • c. 5
  • d. 7

It is used to discover patterns in large data sets

  • a. data analysis
  • b. Data mining
  • c. data retriever
  • d. data statistics

It sees a set of prototypes in particular prototypical diseases to be matched against the case at hand.

  • LOGIC
  • MYCIN
  • SEMANTIC NETS
  • INTERNIST

The two sets If A={ 2,3} B={4,5} are said to be

  • a. disjoint
  • b. equal
  • c. adjoint
  • d. joint

Matrix B is

  • a. singular
  • b. transpose
  • c. invertible
  • d. inverse

Another term for text analytics.

  • a. text miner
  • b. text mining
  • c. text examiner
  • d. text analysis

The method that does not require the assumption that parameters are normally distributed.

  • a. profile likehood
  • b. feedback
  • c. profile likeness
  • d. parameter range

What does ROC mean?

  • a. Receiver Operating Character
  • b. Receptor Operating Characteristics
  • c. Receiver Operating Characteristics
  • d. Receiver Operating Channel

The creation of data from varied sources and its qualification into information.

  • a. datacation
  • b. datafition
  • c. datafication
  • d. datafitration

It is a language that we say things about the world.

  • Medium of human expression
  • Medium of ontological commitments
  • Medium of human experiences
  • Medium of pragmatic evidences

Refers to using tools of statistics to present data visually.

  • a. Data mining
  • b. data visualization
  • c. data analysis
  • d. data statistics

Which of the following is the transpose of B?

  • a
  • b
  • c
  • d

If in a distribution all scores are distinct then_____________.

  • a. there is no mode.
  • b. the mean is higher than the mode
  • c. it is skewed.
  • d. it is normal

He proposed the use of a penalized likehood function.

  • a. Firth
  • b. Herth
  • c. Waiz
  • d. Gombartz

He pointed out that until 2003 ,all of mankind had generated just 5 exabytes of data

  • a. Eric Smicht
  • b. Eric Schmidt
  • c. Eric Smith
  • d. Eric Smidth

It list the percent of data in a distribution.

  • a. relative distribution
  • b. frequency distribution
  • c. percent distribution
  • d. relative frequency distribution

The normal distribution with a mean of 0 and standard deviation of 1.

  • a. Standard
  • b. skewed to the right
  • c. kurtic
  • d. Skewed

It allows you to see which value of the explanatory variable corresponds a given probability success.

  • a. ogive
  • b. probability analysis table
  • c. histogram
  • d. probability table

Which belong to the GLM family?

  • a. exponential only
  • b. linear only
  • c. logistic and linear
  • d. logistic only

A network purpoting to describe family memberships.

  • network adherence
  • network topology
  • network tautology
  • networking

The creation of data from varied sources and its quantification into information.

  • a. datology
  • b. datalization
  • c. Datafication
  • d. dataology

The following are distinct roles that KR plays EXCEPT

  • Medium for pragmatically diligent interpretation
  • Surrogate
  • Medium of human expression
  • Set of ontological commitments

The proportion of a well defined positive event is called _________________.

  • a. probability
  • b. specificity
  • c. anonimity
  • d. sensitivity

Which of the following does NOT use continuous distribution?

  • a. F-test
  • b. chi-square
  • c. hypergeometric
  • d. gamma

What is the size of the product of a 5x 6 and a 6x 8 matrices?

  • a. 5x 8
  • b. 8x5
  • c. 5x5
  • d. 8x8

Which of the following pertains to predictive data mining technique?

  • a. Prediction
  • b. Association
  • c. Clustering
  • d. Regression

It sees the medical world as made of empirical associations connecting symptoms to diseases.

  • a. IOP
  • b. KR
  • c. MYCIN
  • d. INTERNIST

it is a perfect software for machine learning.

  • a. orange
  • b. R-programming
  • c. WEKA
  • d. Knime

It refers to the degree of relationship between two variables?

  • a. synthesis
  • b. regression
  • c. analysis
  • d. Correlation

The following are artifacts used in data analysis EXCEPT:

  • a. pivot tables
  • b. ANOVA
  • c. statistical tools
  • d. graphs

3A + B

  • a. B
  • b. C
  • c. D
  • d. A

The following are the 3V's of big data EXCEPT

  • a. variety
  • b. volume
  • c. velocity
  • d. veracity

According to Hilary Mason which is NOT a skill that a good data scientist must cultivate.

  • a. math and stat
  • b. critical thinking
  • c. coding
  • d. communication

In α =babaa β =a^6b^5bb, what is the length of the concatenation of the two strings?

  • a. 20
  • b. 18
  • c. 15
  • d. 16

The following provided inspirations of what constitutes intelligent reasoning EXCEPT

  • Statistics
  • Psychology
  • Sociology
  • Biology

___________ uses artifacts to present data visually.

  • a. Data Mining
  • b. Statistics Analytics
  • c. data visualization
  • d. Text Analytics

The sets A= { x/x is a distinct letter in the word "MATHEMATICS"} and B={x/x is a distinct letter in the word "STATISTICS"} , the two sets are

  • a. disjoint
  • b. joint
  • c. equal
  • d. equivalent

What is the value of the mean if a score of 110 is 3 standard deviation above the mean?

  • a. 90
  • b. 95
  • c. 85
  • d. 90

Addition and subtraction of matrices only is possible if two are more matrices.

  • a. Have same sizes.
  • b. Have same number of columns.
  • c. Are square matrices.
  • d. Have same number of rows

Empirical rule for a normal distribution that is 3 standard deviations above and below the mean covers ______% of the data.

  • 99.7
  • 92
  • 98
  • 95

It is a process that goes on internally while most things it wishes about exists only externally.

  • inference
  • logic
  • reasoning
  • actions

To estimate the parameters of the model ,the ________function is maximized.

  • a. multinomial
  • b. sensiivity
  • c. likehood
  • d. specificity

It is popular among financial data analysts.

  • a. WEKA
  • b. orange
  • c. Knime
  • d. R-programming

Which of the following is NOT a goal in data mining?

  • a. evaluating data
  • b. aids in business decision making
  • c. discovering useful information
  • d. collecting data

Which is NOT a value of r ?

  • a. -0.05
  • b. 1.02
  • c. 0.98
  • d. 0.03

What programming language doe Orange use?

  • a. Fortran
  • b. JAVA
  • c. Cobol
  • d. python

An example of an abstract computer.

  • a. Turing machine
  • b. Cartographic machine
  • c. Adding machine
  • d. Cryptographic machine

It expands available data enormously.

  • a. text mining
  • b. sorting
  • c. volume
  • d. text

The score NOT easily affected by extreme values.

  • a. range
  • b. mean
  • c. Median
  • d. mode

Time needed to execute an algorithm is a function of its________.

  • a. input
  • b. output
  • c. usage
  • d. reaction

A model that corresponds to the case where the dependent variable has more than two categories.

  • a. logistic model
  • b. multinomial legit model
  • c. hypergeometric model
  • d. multinomial logit model

A perfect positive correlation coefficient is equal to

  • a. 0
  • b. 1
  • c. -1
  • d. 2

In 2,4,4,4,5,5,6,8,9 the range is

  • a. 3
  • b. 6
  • c. 5
  • d. 7

The difference between the highest and lowest value.

  • a. range
  • b. mean
  • c. variance
  • d. deviation

Which of the following is TRUE?

  • a. A + B = B+ A
  • b. BC=CB
  • c. AB=BA
  • d. AB is not possible

It sees a set of prototypes in particular to be matched to cases at hand

  • a. INTERNIST
  • b. MYCIN
  • c. ONTOLOGY
  • d. IOP

Which of the following is used as a method for Correlation?

  • a. Chi-square
  • b. Pearson r
  • c. f-test
  • d. Kendal tau

AUC means___________.

  • a. Artificial Under Cover
  • b. Area Under Coverage
  • c. Artificial Unit Curve
  • d. Area Under the Curve

A new phenomenon for the explosion of _________data

  • a. transient
  • b. transaction
  • c. interaction
  • d. communication

The _______value is the weighted average of the value the random variable may assume.

  • a. finite
  • b. infinite
  • c. Expected
  • d. middle

Classification table is also called ________

  • a. confusion matrix
  • b. confidential matrix
  • c. conditional matrix
  • d. criteria matrix

What is KR?

  • a. Knowledge Rational
  • b. Knowledge Representation
  • c. Knowledge Representative
  • d. Knowledge Race

It has the goal of discovering useful information to support decision making.

  • a. data analysis
  • b. data visualization
  • c. database
  • d. data mining

It transforms data into actionable intelligence for business purposes.

  • a. Text Analytics
  • b. Data Mining
  • c. Business Intelligence
  • d. Statistics Analytics

What is the value when it is 2 standard deviations above the mean in a normal probability distribution?

  • a. 65
  • b. 60
  • c. 70
  • d. 55
Comments