University of South Queensland- Assignment blog: CIS8008

Get assignment help for this at assignmenthelpuk@yahoo.com

Executive summary

Data mining is an important process for the decision makers wherein decision making is done based on the data collected through multiple sources and analyzed by deploying suitable data analytical tools. In the first task, rattle data mining tool would be deployed in order to carry out the statistical analysis so as to reveal the chances for a consumer to respond favorably towards a marketing campaign made by the bank for term deposit. In the pivot analysis for the present context data has been collected for the technology adoption, population and urbanization in the African continent in order to understand the overall technology adoption in different countries of South Africa.

Introduction

Business intelligence is an important discipline in current age business scenario wherein business organizations gain wide knowledge of the various business aspects in order to take informed decisions which helps them in growing their business. Present paper would tend to apply the wide variety of knowledge pertaining to the markets, technology and management so as to understand implications of these in the organizational systems. Further business intelligence allows decisions makers in the business organizations to solve the critical problems by making suitable use of the data which helps them in making informed decisions which allow them to gain sustainable competitive advantage over their competitors. Finally, present paper would showcase the importance of clear communication to the management team of the organization by making use of the report format so that they gain the main idea represented through statistical information.

Task-1

CRISP DM Process

CRISM DP process stands for the cross industry standard process for data mining and this process is deployed in the current time so as to make better decision by making use of the wide range of data available. There are mainly six stages to the data mining process in the CRISP DM process and these include data understanding, business understanding, modeling, evaluation, deployment and data preparation etc. There are range of opportunities and challenges faced by the CRISP DM process which are given as under:

Opportunities

· CRISP DM process would be helpful in making predictive decisions for the financial industry so that results of the decision taken can be determined for the future

· Data gathering process is made a part of the decision making process for the financial institutions wherein this process would be helpful for them to handle data in better way.

· Implementation of the CRISP DM process in the financial sector for the decision making process of loan sanctioning socio technical changes can be deployed

· Implementation of CRISP DM process would allow quick decision making based on the data gathered

Key challenges

· No standard process: There is no standard process which can be adopted for the CRISP DM and the decision making would be highly subjective in nature thereby varying the decision from person to person

· Predictive modeling process has high degree of risk and it usage of CRISP DM process further enhances the risk for the decision makers

This task would tend to develop a statistical model in order to predict whether the bank customers would respond positively to the marketing campaign developed by the bank for newly deployed product which is a term deposit.

PC	PC1	PC2	PC3	PC4	PC5	PC6	PC7
SD	1.21	1.06	1.04	0.98	0.94	0.91	0.76
Variance	0.21	0.16	0.15	0.13	0.12	0.11	0.08
Cumm. Prop.	0.21	0.37	0.52	0.66	0.79	0.91	1.00

Table 1: Showing the variance for seven principle components

There are seven principle components identified explaining the reason for impact of the marketing campaign made by the bank on consumers and these seven principle components are important in order to explain 100% variance present in the data. 21% variance is the highest amount which is explained by any of the variables while 8% variance explained by 7^th variable shows the least amount of variance. Looking into this, all seven principle components needs to be considered for explaining variance in the available data.

Figure 1: Showing the variance explained through each of the principle components

The seven principle components are explained as under as under:

· Age: Age variable shows the age of the customer and it’s a numeric variable. With increasing age of the customer there would be higher chances for the customer to respond positively towards the campaign as older consumers are likely to take more term deposit as compared to younger consumers as younger consumers would look for higher return.

· Balance: This variable shows the average yearly balance in euro and with increasing balance there would be high likelihood for a customer to respond positively towards the campaign as higher balance would lead to higher chances of term deposit.

· Day: This variable shows the last contact day of the month and this would have adverse relationship with the chances of a consumer to respond positively for the campaign.

· Duration: It shows the duration for the last contact in seconds and with increasing duration there would be higher chances for responding positively for a consumer.

· Campaign: It shows the number of contacts performed during the campaign and it would be having positive impact over the consumer for responding positively to the term deposit campaign of the bank.

· Pdays: It shows the number of days passed by for customer last contact and it is numeric variable. With higher pdays value there would be higher chances that a customer would respond negatively to the campaign of the bank.

· Previous: It shows the number of contacts performed before this campaign to this client and with higher number of contact there would be higher chances that a client would respond positively for a marketing campaign.

Table 2 below shows the correlation values between different variables wherein a negative value shows the negative correlation and a positive value presents a positive correlation between the two variables.

	Duration	Day	Pdays	campaign	previous	Balance	Age
Duration	1.00	-0.03	-0.01	-0.09	-0.00	0.02	-0.01
Day	-0.03	1.00	-0.09	0.17	-0.05	-0.00	-0.01
Pdays	-0.15	-0.09	1.00	-0.08	0.54	0.01	-0.01
Campaign	-0.09	0.17	-0.08	1.00	-0.03	-0.00	-0.00
Previous	-0.00	-0.05	0.54	-0.03	1.00	-0.02	0.00
Balance	0.02	-0.00	0.01	-0.00	0.02	1.00	0.09
Age	-0.01	-0.01	-0.01	-0.00	0.00	0.09	1.00

Table 2: Showing correlation table

Figure 2: Showing correlation plot

As given through figure 2 above that correlation has been plotted for the different variables with the other variables of interest in the present context. A dark blue ball shows the perfect correlation between the two variables hence correlation of the one variable with itself has been represented by the dark blue ball (Pang-Ning et al, 2005). Similarly, a light blue ball shows the positive correlation between the two variables while a red ball shows the negative correlation between the two variables. Intensity of the correlation between the two variables is represented by the intensity of the color of the balls. For example, duration has perfect correlation with itself due to which it has been shown through perfect dark blue ball. Similarly, duration and campaign are having negative correlation between them but the negative correlation value is just -0.09 which is not very high.

Table 3 below provides the rotation of the each principle component present in the above statistical model.

Factor	PC1	PC2	PC3	PC4	PC5	PC6	PC7
Age	-0.01	0.13	0.69	-0.21	-0.65	0.16	0.05
Balance	0.02	0.19	0.67	0.14	0.68	-0.12	0.01
Day	-0.293	-0.48	0.12	0.49	0.03	0.64	0.04
Duration	0.067	0.45	-0.05	0.80	-0.28	-0.21	0.02
Campaign	-0.28	-0.58	0.16	0.12	-0.14	-0.70	0.08
Pdays	0.65	-0.22	0.02	0.06	0.00	0.02	0.71
previous	0.62	-0.31	0.11	0.12	-0.06	-0.01	-0.69

Table 3: Showing the rotation for each of the seven principle components

Table 3 above shows the rotation variable for the each of the seven components responsible for a customer to respond positively towards the marketing campaign of the bank for term deposit variable. Selection of the principle components have been done based on the value explained by them as per above table. For example, first principle component is based explained through pdays variable as it has the highest value of 0.65 (considering both positive and negative rotation). From the above table value of the seven principle components can be ascertained and given as under:

PC	Variable	Variance explained
PC1	Pday	21%
PC2	Campaign	16%
PC3	Age		15%
PC4	Duration		13%
PC5	Balance		12%
PC6	Day		11%
PC7	Previous		8%

Table 4: Showing the principle component

Figure 3: Showing the variance explained by each of the principle component

Hence pdays is the most important factor which would showcase the probability for a customer to respond positively towards the marketing campaign and 21% of the variance in the data would be explained by this factor only.

Cluster variable can be further investigated through the below figure

Figure 4: Showing the cluster correlation

From the above figure it is clear that there is low degree of correlation exist between duration and age factor and low degree of correlation exist between the age and balance variable. Variables which have been explained through longer lines would be having lesser correlation while variables represented through shorter line would be having higher degree of correlation (Ian et al, 2011). Correlation between previous and pdays is much higher as compared to the day and campaign and similarly correlation between day and campaign is higher as compared to the correlation between age and balance factor which is also evident through the correlation table and figure.

Decision tree model

In order to predict the likelihood of customers to respond positively for the marketing campaign developed by banks for the term deposit there would be several factors of importance so as to make a predictive model. As shown in the figure that the duration of last contact in seconds have been taken as an important variable for decision making. In case duration is lesser than 382 then there would only be 19% chances for a customer to respond positively for the marketing campaign of the bank while there are 81% of the chances in case the duration is more than 382 seconds.

Figure 5: Showing the decision tree model

As shown above that the total sample size of the customers is 23736 based on which probability for responding to a marketing campaign of the bank has been assessed. As shown through the above figure that in case duration of the last contact is lesser than 381 seconds than there would be 589 customers which would respond to the marketing campaign favorably while 18717 would respond unfavorably. Similarly other variables such as contact mode would impact the final probability for a consumer to respond towards the marketing campaign developed by the banks.

Linear logistic regression

Output for the linear logistic regression value can be given as under which shows the p value for each of the variable given in the model. P values having higher value than 0.05 would be insignificant for the present model and would be neglected in the present modeling:

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -2.342738380 0.254914383 -9.190 < 2e-16 ***

age 0.001566255 0.003052009 0.513 0.607820

jobblue-collar -0.318673374 0.102655668 -3.104 0.001907 **

jobentrepreneur -0.316647948 0.172889041 -1.832 0.067025 .

jobhousemaid -0.538388738 0.191183831 -2.816 0.004861 **

jobmanagement -0.102527911 0.102615934 -0.999 0.317726

jobretired 0.376518263 0.133761955 2.815 0.004880 **

jobself-employed -0.342504442 0.158459442 -2.161 0.030659 *

jobservices -0.194232947 0.118919971 -1.633 0.102404

jobstudent 0.436806785 0.151637794 2.881 0.003969 **

jobtechnician -0.130960267 0.096823566 -1.353 0.176194

jobunemployed -0.095461551 0.154354754 -0.618 0.536275

jobunknown -0.324410056 0.341731512 -0.949 0.342462

maritalmarried -0.204017340 0.081121122 -2.515 0.011904 *

maritalsingle 0.067809569 0.092891094 0.730 0.465396

educationsecondary 0.144754469 0.090122466 1.606 0.108231

educationtertiary 0.335025063 0.104212182 3.215 0.001305 **

educationunknown 0.280421030 0.144362709 1.942 0.052080 .

defaultyes -0.206265166 0.245001364 -0.842 0.399847

balance 0.000007552 0.000006997 1.079 0.280404

housingyes -0.598551149 0.060724215 -9.857 < 2e-16 ***

loanyes -0.403288451 0.083548021 -4.827 1.39e-06 ***

contacttelephone -0.212671512 0.103962989 -2.046 0.040791 *

contactunknown -1.881500801 0.104719349 -17.967 < 2e-16 ***

day 0.015389112 0.003459778 4.448 8.67e-06 ***

monthaug -0.854139270 0.106503112 -8.020 1.06e-15 ***

monthdec 0.745806323 0.234774040 3.177 0.001490 **

monthfeb -0.244513923 0.121338985 -2.015 0.043891 *

monthjan -1.482841489 0.168533560 -8.798 < 2e-16 ***

monthjul -1.011211970 0.105775958 -9.560 < 2e-16 ***

monthjun 0.490577739 0.127371095 3.852 0.000117 ***

monthmar 1.508293633 0.166565279 9.055 < 2e-16 ***

monthmay -0.541757623 0.098919286 -5.477 4.33e-08 ***

monthnov -1.054958271 0.115861428 -9.105 < 2e-16 ***

monthoct 0.562862311 0.152434421 3.692 0.000222 ***

monthsep 0.854664108 0.158987534 5.376 7.63e-08 ***

duration 0.004172679 0.000089337 46.707 < 2e-16 ***

campaign -0.090290211 0.013999226 -6.450 1.12e-10 ***

pdays -0.001008817 0.000434729 -2.321 0.020310 *

previous 0.013926615 0.008771871 1.588 0.112367

poutcomeother 0.114061243 0.127666809 0.893 0.371627

poutcomesuccess 2.121250059 0.113068828 18.761 < 2e-16 ***

poutcomeunknown -0.266338419 0.129018678 -2.064 0.038985 *

Hence from the above data it can be evident that several factors such as age, single (married), management job, service job and previous can be neglected from the predictive modeling due to their higher value as compared to the normal p values.

Task-2

Datafication can be considered as the process of turning various aspects of human life into the data so that by analysis of data value can be created through the data regarding human beings. Some of the key examples of datafication process is the ways Twitter and LinkedIn uses the data in order to create value by turning various aspects of human life in computerized data. The concept of datafication can be defined based on the three key aspects which are density, liquification and dematerlization. Dematerialization process can be defined as the process wherein the data is separated from the resource in context of the physical world. Process of liquification shows the point that information which is dematerlized from the resources and assets can be further manipulated and moved so as to allow resource and assets which were linked to unbundle or re-bundle them. Density can be considered as the outcome for the value creation process (Shah et al, 2012).

There are key challenges faced by the process of datafication in context to the individuals and organizations as privacy and security concerns are major for the individuals and business organizations. A semantic or causal relationship is not required between various variables such as economic, political and social variables but the technology provides an alternate way in order to track the trend between these external market sources. Individual privacy and security concerns hinder the progress of datafication process. Daily activities for the users are being tracked through the various tools and various human life aspects are datafied. For example, looking at the human life network of friends is datafied through the facebook, network of professional links is datafied through the LinkedIn, location of a person is datafied through the foursquare, thoughts are datafied through Twitter and music preferences are datafied with the help of Spotify. This shows that various aspects of human life are being datafied in order to record the daily life activities for the users based on which decision making can be done in the organizational context.

Further processes like reading books is also being tracked by the online sites so as to present the analysis of the reading list for the users and websites like Amazon tracks the reading list of the users and provide suggestions in accordance to the reading preferences shown by the customers. Analysis done by these sites is to deep that they can check the speed of reading and according to this they can estimate when I will be finishing the book so that they can make new offer for the books they are selling online. Websites like Amazon would offer the books from the next series with discount when someone would finish off the book from the earlier series. Similarly, for business organizations also datafication is being done e.g. commercial vehicles are being tracked through the GPS devices and even the tires used in the vehicles are being monitored.

With advancement in technology, datafication process is able to answer several issues faced by the business organizations for tracking human behavior so as to take the strategic business decisions by the organizations but at the same time it invades privacy for the individuals and business organizations thereby questioning the usefulness of datafication. Individuals and business organizations using internet for various purposes are being heavily tracked by the websites using datafication and this tracking is much more than just recoding the preferences of the consumers over internet (Anderson, 2008). These websites are having access to the consumer details, personal activities, preferences and buying decision making process which allows them to shape their business offering as per the consumer preferences and this process would involve invading the security for the individual and business organizations. With access for the key information from the individuals and organization it would be easy for a person with ill intentions to use them for their personal benefits. Further unauthorized access is made by the users without the prior knowledge of the users working on internet and many a times financial details of the consumers can also be shared which may lead to financial losses for the individuals and business organizations. Overall the datafication process leads to compromise on the internet security for the individuals and organizations working on internet.

Further government organizations such as military and banks also under the purview of datafication process and tracking for these organizations may even lead to higher security concerns as the information available in the information systems used by these organizations is very critical. Any compromise done with the security aspects would lead to the lapses and may result in attack from the external invaders on the information system so as to take away the critical information and use them for their personal benefits. Government has developed several regulations related with the invasion of privacy for the users on internet so that such datafication process do not lead to the compromise on user security thereby resulting in proper information security measures being adopted by the users. Government regulations have been framed so as to impose restrictions on the websites which are involved in the process of datafication and compromising on the information security of users. A list of such websites has been prepared by the government and have been blocked for access from the government offices so that their information security do not get compromised and critical information present in their information systems do not get into wrong hands. Further in order to ensure higher level of security against the phenomenon of datafication information security arrangement has been done so that there are no such instances arises from the process.

In addition to this, user training is another important step which is taken into this direction so as to impart training for the surfing, information security and steps adopted in order to ensure proper information security arrangements in the organization. Users are provided with detail manual and processes which needs to be followed while working on the information systems for the government authorities and responsibilities have been assigned for the users in case of any breach for information security takes place (Tuomi, 1999). Such user trainings are of immense importance so as to ensure that users do not get trapped into the various traps developed by the websites involved into datafication so as to obtain information from the users working on internet and compromise on the information security aspects. Similarly, individuals using internet for purchase of particular products are being tracked for the payment ways adopted by them, password for the credit/debit card used by the customers and several websites offer remember option for the password so that quick payment can be made. Such instances and tracking for the financial information of the users may involve compromise on the financial information for the users which may result in heavy financial losses.

Ethical issues are of major concern while considering about the datafication phenomenon as the individuals and organizations are being tracked for their every activities and the data is created and analyzed to generate value or the business organizations involved into the datafication process. Though it has been argued several times that this data would be useful for the individuals and organizations making use of the internet but it is one of major ethical concerns which needs to be addressed by advocates of datafication process. The first ethical issue arising from the datafication process is the low consumer knowledge regarding the tracking being done by the websites for their every action on the internet. Individuals and organizations working on internet and not having sufficient information regarding the process of datafication and they realize it when they come across several advertisement and offers pertaining to their choice only. Tracking of actions for consumers without their prior knowledge creates ethical implications as users have not agreed upon the information sharing and despite of that information is obtained from the consumers which are being used by the business organizations for their personal benefits thereby leading to unethical work for the websites involved into datafication process.

Further marketing and advertisement companies involved into datafication process are tracking consumer preferences for particular product and services and based on their past history or search for the particular products of services they are being offered similar products as there would be higher probability for the consumers to purchase such things which they are searching on the internet. However, according to the datafication advocates it is of immense importance for the consumers as well since they are getting offers on the products which they like and it becomes easy for them to look for the products and services without looking for the information here and there (Maitlis, 2005). But once the advertisements and marketing communication made by the website is customized in such a manner to suit the consumer preferences then consumers would be buying it without knowing the fact that these offerings have been customized by the marketers hence they are unknowingly buying products as per the marketers choice. Further several times users are faced with several terms and conditions which are agreed upon and they may involve sharing consumer information with other users available on internet for their commercial benefits. Since consumers are not aware of the consequences of sharing information so it should be the responsibility of the websites to aware users for the consequences of such information sharing process.

Task-3

1) Top 10 best performing countries in terms of technology adoption of mobile phones by year and region

Figure below provides the analysis of the 10 top performing countries in terms of the mobile technology adoption from the year and region wise. It shows that the 10 best countries for different years in terms of the mobile technology wherein Algeria is the best and Tanzania is the worst in technology adoption among these 10 countries

2) Top 10 worst performing countries in terms of technology adoption of Internet by year and region

As shown in the figure below that 10 worst performing countries in terms of the internet technology adoption in African continent has been highlighted wherein it has been found out that Comoros is the worst performing countries followed by Dijbouti and others.

3) Top 10 best performing countries in terms of technology adoption of landlines per head of population by year and region

Figure below provides the 10 best performing countries from the landline adoption technology and reveals that Algeria is still the best country while Tunisia is at the 10^th rank in terms of technology adoption for the landline technology.

4) A summary of key technology adoption factors for each region of Africa for a given year

Figure below provides the summary of technology adoption factors region wise for African continent as per the given year and graph indicates that the East Africa is the best region and South Africa is the worst performing region.

b) Graphic design and functionality for World bank regional unit of African continent

Dashboard prepared for the World Bank regional unit of African continent would be of high importance in order to determine the key trends in terms of the population, urbanization and adoption of technological equipments such as mobile phones, landline and internet etc. Each dashboard developed above highlights the key figures indicating that adoption of technology based on several years in the different countries in the African continent. These dashboards can be considered as an important part of the analysis while decision making would be done from the higher authorities in the bank. Some of the key features and functionality provided by the dashboard adopted by World Bank regional unit of African continent can be given as under:

· eDeployment: There would be several users accessing the dashboard for the key information available in the system and real time update feature should be provided so that users are able to know about the current information which is available with the organization. This would allow users sitting at different geographic locations to assess the same dashboard and discuss the implications and improvement tools which can be deployed in order to improve upon the situation faced in the African continent. Email/internet and intranet tools can be deployed so as to make real time update of information available on dashboard. Enabling eDeployment process would help in order to have large number of cubes and dashboards for updating information in much shorter time period. Further during the lean hours data would be updated on the systems so that the data updation process does not impact upon the work activity done by the users at the time of business hours.

· Extension builder: Hassle free data transfer process needs to be implemented with the designed dashboard so that there are no time delays resulting due to data transfer process and safety for the data can also be ensured while transferring the data from one system to another system. Extension builder functionality deployed on the dashboard for the current system would enable users to transfer the data at high speed without any error or loss of data in the process. The role of extension builder tool would not only limit to the data transfer process but it would also allow resolution of the compatibility issues faced by the users in data movement from one system to another.

· Dashboard designer: Dashboard designer tool would be of immense importance for the users as it would enable the users to design various components of the dashboard. Efficiency for the dashboard can be enhanced through the process of visualization as the data available on dashboard would be real time and proper synchronization of the data would be done. Real time data refresh can be allowed through this tool so that users can access the data at the same time of update (Günnemann et al, 2011).

· Named consumer users: This functionality adopted by the system would reduce duplication in the available data wherein one department can enter the data in system and the analysis can be accessed by multiple users sitting at different locations. This tool would be helpful for the users so that they can synchronize their efforts to obtain the output without entering the data multiple times in the system. This functionality would enhance the efficiency of the organization and would reduce duplication efforts made by the data entering process.

· Web access server: Web access server would be deployed for the dashboard made by the World Bank regional African continent unit so that faster access can be provided to the users of the dashboard without any intervention (Battiti and Andrea, 2010). Access through URL would be provided to the users so that they can access the dashboard through any device by logging into the portal for which access rights would be provided to the limited users only. Access offered through the URL would be having the option of anytime, anywhere and faster access for the users accessing the dashboard in order to look at the critical information available on dashboard.

Hence the various functionalities deployed for the dashboard would be useful for the various users of the dashboard and these functionalities would ensure that the data provided to the users are real time updated, users have faster access to the data and transfer of data from one system to another can be done without facing any hassles in the data transfer process.

References

Anderson C (2008) The end of theory: the data deluge makes the scientific method obsolete. Wired,http://www.wired.com/science/discoveries/magazine/1607/pb_theory (accessed 25 May 2015).

Maitlis S (2005) The social processes of organizational sensemaking. Academy of Management Journal 48(1), 21–49

Tuomi I (1999) Data is more than knowledge: implications of the reversed knowledge hierarchy for knowledge management and organizational memory. Journal of Management Information Systems 16(3), 103–117.

Shah S, Horne A and Capellá J (2012) Good data won’t guarantee good decisions. Harvard Business Review90(4), 23–25

Günnemann, S.; Kremer, H.; Seidl, T. (2011). "An extension of the PMML standard to subspace clustering models". Proceedings of the 2011 workshop on Predictive markup language modeling - PMML '11. pp. 48. DOI:10.1145/2023598.2023605. ISBN 9781450308373.

Battiti, R; Andrea P. (2010). "Brain-Computer Evolutionary Multi-Objective Optimization (BC-EMO): a genetic algorithm adapting to the decision maker". IEEE Transactions on Evolutionary Computation 14 (15): 671–687

Pang-Ning Tan, Michael Steinbach and Vipin Kumar (2005). Introduction to Data Mining. ISBN 0-321-32136-7

Ian H. Witten; Eibe Frank; Mark A. Hall (2011). Data Mining: Practical Machine Learning Tools and Techniques (3 ed.). Elsevier. ISBN 978-0-12-374856-0

University of South Queensland- Assignment blog

Friday, August 28, 2015

CIS8008

No comments:

Post a Comment

Blog Archive