This report will focus on two parts: the first one would be Case Study of Fluitec Wind (Fluitec) and the following one would be the assessment of a sales dashboard. To tackle problems of the Fluitec, a critical analysis would be given following certain stages in the Cross-industry standard process for data mining (CRISP-DM) framework, including business understanding, data understanding, data preparation and model assessment. As for the given dashboard, assessments will be made on its general design and visualisation.Part 1 – Fluitec Case StudyBusiness UnderstandingAs for the data mining of Fluitec, understanding the business is needed as the first stage. In this stage, there are mainly four issues involved as followings.(1) Understanding, from a business perspective, what objectives Fluitec really wants to achieve and the ways to achieve those objectives. Here, we aim at quickly getting to the crux of the company’s situation.As a clean technology company, it operates in the wind turbine services market and provides the services to wind turbine owners and operators. It aims at creating “one of the world’s largest databases in wind” in the long term. In the shorter term, given a host of options for going to market, the company’s primary objective could be gaining rapid adoption in the booming wind energy market (Ivey Publishing, 2015). In addition to its primary business objective, there are typically other related business questions that the firm would like to address. Specifically, it focused on lowering adoption barriers and leveraging network effects to build scale. Since its product had low adoption barriers, the firm can start a project with the objective to leverage the low adoption barriers to improve the accuracy of Fluitec’s predictive analytics product. Therefore, the business success criterion can be set as an increase of accuracy to a certain level. The business objectives should be achieved after the criterion being met. (2) Assessing the situation in a more detailed way. This involves fact-finding about the sources of its competitive advantages, the barriers to entry in this type of business, necessary assumptions, and other factors to be considered in determining data mining goals. To monitor gearbox conditions, lots of hardware-based technologies emerged. However, those technologies were often costly to install. As a subsidiary of Fluitec SA, Fluitec Wind has the access to a wealth of untapped data resources, which enables it to diagnose gearbox health without additional monitoring and sensor hardware. Such a technical advantage in conditions monitoring could further contribute to its competitive advantage in cost and thereby price. However, the further potential advantage is based on the accurate and robust predictive analytics. It should be also assumed that the necessary data is allowed to use during the data mining. As for the barriers to entry in the type of business, the high competition in markets for SCADA systems and conditions monitoring should be taken into account. In addition, the transformation brought from big data industry and the requirement of up-to-date products could create barriers in the capital and technical resources.(3) Determining, from a technical perspective, data mining goals.As the intended outputs of the project should enable the achievement of the business objective, the data mining goals should be to have a model in place that returns the likelihood of accurate prediction of gearbox failures (IBM, 2014). (4) Determining the intended plan for Fluitec to convince a customer of the value of its analytics. Given the current gearbox failure rates, replacement prices and crane costs, the savings made by using Fluitec’s analytics could be an evidence to convince customers of the value. More importantly, since predicting and preventing gearbox failures were immensely valuable to wind farm owners, predicting or preventing such failure would be a preferable way to prove the value of analytics. Therefore, to convince a customer of the value of analytics, the expected savings and examples of effectively predicting or preventing gearbox failures should be presented. The above two messages are what prospective customers are curious about as well. Data Understanding and Data PreparationOnce business objectives and the project plan are established, data understanding is introduced to collect initial data, describe and explore data and verify data quality in order to understand the data resources and their characteristics. Specifically, in the stage, at least three issues should be considered by Fluitec when selecting the related data from many available databases (SPSS, 2005). The first issue is setting up a concise description of the problem, which in the case is to identify features or patterns of gearbox failures in order for more precise predictions. The second one is to identify the relevant data for the problem description. Since data can be integrated from various sources, including weather, wind speed, SCADA and oil samples, it is needed to classify which attributes of the data are relevant to the problem description. Noticeably, when using these sources, certain practical challenges may arise. Outliers are usually removed to avoid obscuring the patterns produced by a good classification model whereas in the case extreme values such as those caused by gusts of wind could provide valuable insights and contribute to predictions of equipment failure. If those extreme values are not considered in the predictive model, the robustness of the model may be weakened. However, the process of inclusion of the values may be time- and money-consuming. A cost-benefit analysis may be needed here for decision-making. The final one is to ensure that the variables selected for the relevant value do not contain overlapping information. This can contribute to easier data mining algorithms and thereby quicker discovery of useful knowledge patterns.Apart from the above issues, Fluitec needs to explain the meanings of the fields in the datasets in the data-description report (IBM, 2014). For example, a field named OS may be ambiguous for anticipants in the project, and the field might need renaming, for example, into Oil_Samples. Besides, the abbreviations encountered in the data and their meanings should be listed. The values created for the certain field should be explained to enable the interpretations of modelling results later. In addition, the names of a field in different datasets should be mapped to each other. In other words, one field could be named differently in different datasets. As for the data-exploration report, graphs of the field of interest should be included to visually inspect the data. The graphs may indicate data characteristics or reveal interesting data subsets worthy of further studies. Then the quality of the data is examined to address whether the data is complete and whether errors and missing values exist in the data. Those instances should be reported and the corresponding reasons should be investigated.Noticeably, in the stage, the datasets acquired could be listed together with their locations, the acquiring-data means and the problems encountered as well as the resolutions achieved. The records can be used for future replication of the analytics or for similar future projects.After the data understanding phrase, data pre-processing is needed to clean selected data for better quality. The data preparation phase covers all activities to construct the final dataset that will be fed into the modelling tools, including selecting, cleaning, constructing, and formatting data and combining datasets (IBM,2014). When selecting data, fields that are a consequence of the field that Fluitec wants to predict should be excluded and the fields that can be predictors should be included in the model. Here, an expert may be needed to make decisions. In addition, records of the groups that are not of interest should be removed as well. As for the data-cleaning report, all the actions that have been taken in the data-cleaning processes should be listed. For example, the missing data may have been replaced with the mean value of a field. When constructing data, it may produce derived attributes or transformed values for existing attributes. For example, the conditions monitoring that use oil sample analysis and the SCADA can be highly integrated and a new attribute may be hence derived from the two existing attributes. In formatting data, the data needs restructuring into a form that the analysis requires. For example, the data needed may be delivered at different intervals of time, and therefore, the dataset may need transforming into a dataset that all the data is in the same levels of detail.Assessment of the new analysisAfter the new predictive analysis being created, it would be assessed to evaluate the quality and potential impact of the new analysis. For the evaluation of its quality, two main aspects should be assessed. On one hand, does it meet the data mining goal? This is expected as a purely technical assessment based on the outcome of the modelling tasks. On the other hand, results of the new analysis should be evaluated with respect to business success criteria (IBM, 2014). In other words, has the project achieved the predetermined business objectives? To answer this, two issues are considered essential (SPSS, 2005). One is “how to determine the business value from the patterns discovered during the stages?” Another one is “which tool should be used to visualize the data mining results?” The operation of the business value recognition should be dependent on the interactions between, in the case, the data scientist, and Frank Magnotti, the business analyst and decision maker. This is because being fully aware of the purpose of the data mining goal may not be possible for the new data scientist, and understanding the sophisticated mathematical results could be a challenge to Magnotti as well. In addition, the interactions between the two can be an effective suggestion to ensure the data scientist stays on track and does not get lost in technical details. As the chief executive officer (CEO) of Fluitec Wind, Magnotti had clear ideas of the business objectives and the needs of those important prospective customers. The expression of those ideas to the data scientist could enable her to understand what functionalities of the model Fluitec and major clients value. As for the visualisation tool, the choice of visualisation packages such as pie chart, histograms, scatter plots and so on can be important in properly interpreting the drawn patterns. Moreover, an efficient and productive business decision often starts with a good interpretation, whereas a poor interpretation may lead to the omission of useful information. Apart from the above, since “customers are always asking for additional functionality”, the extension of functionality may be an indicator of the quality of analysis for those customers. However, the costs and benefits of the extension should be evaluated here. Besides, testing the analysis model within the real application would be suggested to assess its quality and to check whether the scientist stays on track if the resources and time are available (SPSS, 2005).After the quality assessment, the potential impact of the new analysis could be further evaluated and for the evaluation, probing questions could be put forward around the main users of the analysis and other main stakeholders, including existing and prospective customers, existing and prospective competitors, Fluitec Wind and its shareholders, wind turbine services industry and wind industry. For all customers, what cost savings they could expect from the new analysis? Furthermore, how likely are existing customers satisfied with the performance of the new analysis and therefore renew contracts? Will it be possible that potential customers are convinced and purchase the predictive analytics? As for Fluitec, what profits or other benefits, such as reputation, it could expect from the analysis and how likely and to what extent will the analysis strengthen its competitiveness and contribute to its market share extension? The answers to the impact on Fluitec may help Magnotti figure out how likely the analysis will be a threat to the existing competitors or a barrier to entry for potential competitors. As for Fluitec’s shareholders, especially its parent company, it will not be a trouble if shareholders are satisfied with the analytical results but it will be the case if the analysis does not meet their expectations because the major technical supports are gained from its parent company. Therefore, the product’s impact on shareholders’ decisions should be considered. As for the impact on the relevant industries, the answers to the impacts on the above parties could be aggregated to give some ideas.Part 2 – DashboardsIn this part, comments will be put on the general design and the chosen visualisations of the given dashboard that is advertised by a company as a dashboard that every sales team member needs. In order for simplicity, it is assumed that the six key performance indicators (KPIs) shown on the dashboard are suitable for their purpose and therefore no discussions will be placed upon the KPIs. Taking a view at the design of the dashboard first, it is obvious that, although subtitle is given to each component of the dashboard, no headline is given to tell viewers that it is designed to show six KPIs for a fictitious coffee shop. Besides, although it is good that the charts in the dashboard use a consistent colour scheme, there is no need to use sequential colours for defining each attribute in the charts. On one hand, the usage adds no value to dashboard design and on the other hand, it creates a difficulty in identifying and distinguishing certain attributes in the charts. The chosen colours should be of different intensities to direct attention around the dashboard. Colours used in the dashboard are suggested to be replaced by colours that complement each other and colours that blur or clash should be avoided. Moreover, the layout should be criticised as well. The general rule here is that the key information should be displayed first as the most important view goes on top or top-left (IB9BW0 Lecture notes). However, it seems that the charts of KPIs are simply placed randomly on the dashboard and each KPI is treated as equally important.In order to present business information effectively, a significant part of a dashboard is visualisation types chosen. The chosen visualisation tool for each KPI in the dashboard would then be discussed. The main idea here is the visualisation should help users interpret and analyse data clearly and effectively (IB9BW0 Lecture notes).As for the first chart (Annual Sales by Region), the choice of a pie chart is apparently not a perfect choice. Although such a chart can be easily scanned and understandable for users and they should be able to easily identity the largest slice in the chart, it is found difficult to accurately compare the sizes of slices. The situation is worse when the similar colours are used. Looking at the second (Annual Sales by Year) and the third (Annual Sales by Product Type) charts, the good and bad points are quite similar. The use of colours and 3-D effects are unnecessary because they cannot add value or extra information to the visualisation but create interpretation difficulties instead. However, using bar charts is appropriate in order for quick comparisons. Again, such a clear and compact method can be easily understood. Then, in the fourth part, a line chart is used for the comparisons of sales of different categories. However, compared with bar chart, which is ideal for comparisons, especially the comparison between each category, line chart should work better for trends.As for the fifth chart, the use of gauges should be criticised since gauges take up too much space and underperform on comparisons of “Actual vs Plan” results of different regions (IB9BW0 Lecture notes). The only good point may be that it can be easy to identify, for each region, which of actual and plan is higher. However, this could be arguable that the comparisons are much less effective than comparisons using other tools such as bullet graphs since, on one hand, the eye is better at comparing lengths than angles, and on the other hand, the differences between actual and plan could be easier to identify using bullet graphs.In the last part, stacked bar chart and line chart are combined to respectively show the sales and percent growth of different products. Therefore, comparisons of the sales of each product during the two years would not be a case. Besides, it can be seen that distinguishing and ranking the growth rate of each product are straightforward. However, the comparisons between the sales in each year would be hard.
In conclusion, three problems exist in the general design, including headline, colour and layout problems. When giving insights to each visualisation of the dashboard, the main issue is the comparisons between attributes of interest. However, the reasons for the issue vary. Among of the six parts, the second, the third and the final ones are overall satisfying. The fourth chart may be acceptable but replacing it with bar chart would be a more satisfactory choice. As for the rest charts, they are likely to be unacceptable by dashboard users and need much greater improvements.