Prévia do material em texto
Expert Systems With Applications 175 (2021) 114820 Available online 4 March 2021 0957-4174/© 2021 Elsevier Ltd. All rights reserved. Review Machine Learning for industrial applications: A comprehensive literature review Massimo Bertolini a, Davide Mezzogori b,*, Mattia Neroni b, Francesco Zammori b a Enzo Ferrari Engineering Department, University of Modena and Reggio Emilia Via P. Vivarelli, 10, 41125 Modena, Italy b Department of Engineering and Architecture, University of Parma, Parco Aree delle Scienze, 181/A, 43124 Parma, Italy A R T I C L E I N F O Keywords: Literature review Industrial applications Deep Learning Machine Learning Operation management A B S T R A C T Machine Learning (ML) is a branch of artificial intelligence that studies algorithms able to learn autonomously, directly from the input data. Over the last decade, ML techniques have made a huge leap forward, as demon- strated by Deep Learning (DL) algorithms implemented by autonomous driving cars, or by electronic strategy games. Hence, researchers have started to consider ML also for applications within the industrial field, and many works indicate ML as one the main enablers to evolve a traditional manufacturing system up to the Industry 4.0 level. Nonetheless, industrial applications are still few and limited to a small cluster of international companies. This paper deals with these topics, intending to clarify the real potentialities, as well as potential flaws, of ML algorithms applied to operation management. A comprehensive review is presented and organized in a way that should facilitate the orientation of practitioners in this field. To this aim, papers from 2000 to date are cate- gorized in terms of the applied algorithm and application domain, and a keyword analysis is also performed, to details the most promising topics in the field. What emerges is a consistent upward trend in the number of publications, with a spike of interest for unsupervised and especially deep learning techniques, which recorded a very high number of publications in the last five years. Concerning trends, along with consolidated research areas, recent topics that are growing in popularity were also discovered. Among these, the main ones are pro- duction planning and control and defect analysis, thus suggesting that in the years to come ML will become pervasive in many fields of operation management. 1. Introduction In the new global economy, competition fosters complexity, which directly affects manufacturing processes, products, companies, and supply chain dynamics. Now that we are entering into the Industry 4.0 era (Lu, 2017), the new managerial paradigm is shifting from the need for low variability, through products’ commonalities and processes’ repeatability, as advocated in the lean thinking theory (Liker, 2004), toward the so-called mass-customization where, conversely, wide- markets goods should be rapidly modified and re-manufactured, at low cost, to satisfy a specific customer’s need (Coronado et al. 2004). In this scenario, resilience, reconfigurability, and flexibility are key issues of competitiveness, as clearly expressed by the ‘smart manufacturing’ concept, indicating a company that has the potential to fundamentally change how products are designed, manufactured, supplied, used, remanufactured, and eventually retired (Kusiak, 2018). Information technology, sensor networks, computerized controls, production man- agement software, and, more in general, the Industrial Internet of Things (IIoT) are basic prerequisites for a company to be smart. Yet, these de- vices alone are not enough, and a manufacturing system cannot be considered smart, unless its overall functioning is regulated by intelli- gent control technologies, for a quick, accurate, and reliable response to internal and external events (Mittal et al., 2016). Furthermore, as noted by Kusiak (2017), smart manufacturing must embrace big data and, to this aim, information system and production management software must be coupled and/or enriched with deep analytical skills (Waller and Fawcett, 2013) and with learning ability (Monostori, 2003), to ensure competitiveness and effectiveness. Shreds of evidence also suggest that data are one of the most valuable assets of a firm and, especially for innovative companies, big data management is a key issue of competitiveness (Harding et al, 2006). Not only a proper data management may help in differentiating from * Corresponding author at: Department of Engineering and Architecture, University of Parma, Parco Aree delle Scienze, 181/A, 43124 Parma, Italy. E-mail addresses: massimo.bertolini@unimore.it (M. Bertolini), davide.mezzogori@unipr.it (D. Mezzogori), mattia.neroni@unipr.it (M. Neroni), francesco. zammori@unipr.it (F. Zammori). Contents lists available at ScienceDirect Expert Systems With Applications journal homepage: www.elsevier.com/locate/eswa https://doi.org/10.1016/j.eswa.2021.114820 Received 15 March 2020; Received in revised form 29 December 2020; Accepted 28 February 2021 mailto:massimo.bertolini@unimore.it mailto:davide.mezzogori@unipr.it mailto:mattia.neroni@unipr.it mailto:francesco.zammori@unipr.it mailto:francesco.zammori@unipr.it www.sciencedirect.com/science/journal/09574174 https://www.elsevier.com/locate/eswa https://doi.org/10.1016/j.eswa.2021.114820 https://doi.org/10.1016/j.eswa.2021.114820 https://doi.org/10.1016/j.eswa.2021.114820 http://crossmark.crossref.org/dialog/?doi=10.1016/j.eswa.2021.114820&domain=pdf Expert Systems With Applications 175 (2021) 114820 2 competitors and gaining a competitive advantage, but companies that use data-driven decision-making approaches have proven to easily outperform their competitors, being, on average, 5% more productive and 6% more profitable (McAfee et al., 2012). Unfortunately, while in many cases companies perceive the utility of their data, often they do not have the knowledge needed to exploit their data-silos and lack a clear understanding of what is important to be measured. As a result, the informative content of the data is missed, and real and valuable knowledge gets lost (Harding et al, 2006). If so, the well-known mana- gerial expression that ‘quality trumps quantity’ becomes true because, if managers do not know how to select truly meaningful data easily and rapidly, a large and detailed data warehouse can be as harmful as a total lack of relevant information. Hence, optimizing data collection, usage and sharing have become vital for many companies (Kusiak, 2017) and Machine Learning (ML), a branch of Artificial Intelligence (AI) dealing with algorithms that learn directly from the input data, is expected to play a key role in the fulfillment of these needs. Not surprisingly, many works (Lu, 2017; Xu et al., 2018), indicate ML as one of the main en- ablers to evolve a traditional manufacturing system up to the Industry 4.0 level. It is worth noting that, a spike of academic interest followed the report by Pham and Afify (2005), one of the first to have shown potential applications of ML to operation management. From that moment, researchers started to consider ML applications also within industrial fields, especially for pattern and image recognition, natural language processing, operations optimization, data mining, and knowledge discovery (Wuest et al., 2016). Since then, as it will be described in later sections, the number of papers published in this field has ever increased, and the trend has been recently fueled by many government initiatives, like Industry 4.0 (Germany), Smart Factory (South Korea), and Smart Manufacturing (USA), calling for a radical change in the manufacturing paradigm, based on processes’ augmen- tation and enhancements due to Information Technologies (IT). Especially in the last decade, the state of the art of ML techniques has made a huge leap forward, as demonstrated by the algorithms used byautonomous driving cars or by electronic strategy games. Both tasks were considered many years away from a practical solution (Martínez- Díaz and Soriguera, 2018; Müller, 2002) yet autonomous driving cars are already being tested in urban environments, and AlphaGo has overwhelmed the world champion of the Go game (Silver, 2016). Similarly, DeepMind recently reported the development of an AI that successfully learned to play better than humans in many other strategy games (Silver, et al., 2017). Furthermore, enabling technologies (i.e., sensors, open-source software, public datasets, computational power, cloud services, etc.) are now mature and available at low cost and government initiatives offer interest-free (or even non-refundable) loans and/or fiscal incentives to support investments in IT projects. Owing to these favorable issues, the time seems to be right to implement ML in the industry, and indeed, according to the Gartner Hype Cycle for Emerging Technologies (Burton & Barnes, 2017), Arti- ficial Intelligence and especially Machine and Deep Learning have reached the peak of inflated expectation. Nonetheless, industrial appli- cations of these technologies are still rare and generally confined within a small cluster of big international companies. Should this trend continue, a ‘disillusion phase’ may follow soon and the ‘plateau of pro- ductivity’ may never be reached. Presumably, a detrimental element of acceptance can be found in the widespread concern that AI could jeopardize many jobs, increasing the unemployment phenomenon as noted by Korinek and Stiglitz (2017) and as clearly indicated in the McKinsey report by Manyika et al. (2017). In our opinion this is a misconception: if on the one hand, it is true that automation will replace human labor, on the other one hand replacement will concern redundant and repetitive tasks. Having the ability to learn representations auton- omously, ML and especially DL models can extract knowledge directly from raw data, freeing researchers from the expensive and time- consuming step of feature extraction and feature engineering (LeCun et al, 2015). Thus, it is not daredevil to assume that the most successful implementations will be those augmenting, and assisting human deci- sion making, freeing people from low value-added tasks. Apart from that, one of the main barriers to pervasive industrial adoption of ML is the lack of a clear understanding of these methodol- ogies and the lack of awareness of what ML can and cannot do (LaValle et al., 2011). As posed by the notorious ‘No Free Lunch Theorem’ formulated by Wolpert and Macready (1997), ML cannot solve all in- dustrial problems and its practical adoption, as an alternative to more mature technologies, must be carefully evaluated and pondered. Clearly, it is important to make an informed decision, without being influenced by the trend and the fashion of the moment. The ability to choose an algorithm (or a subset of algorithms), suitable for a specific task or problem, is a core competence for data analysts and/or practitioners who want to apply ML in industrial settings, as this choice can make the difference between failure and success. Yet, in absence of experience and/or on previous studies of similar nature, envisioning a way to deploy ML at the industrial level to improve business’ performances is challenging, especially considering the vast number of algorithms (and possible variations differentiating in terms of operating characteristics and of complexity) that have been proposed in technical literature. Such variety can be disorienting and misleading, and the problem is further complicated by the lack of a repository of best use-cases, for each in- dustry and organization. So, we believe that a systematic literature re- view focused on the historical developments of ML for industrial applications, may be extremely useful to highlight present and future trends and, above all, to orient industrial practitioners in the selection and in a more conscious use of ML techniques. Specifically, to clarify the real potentialities, as well as potential flaws, of ML algorithms applied in the field of operation management, papers from 2000 to date will be reviewed and categorized in terms of applied algorithm and application field. Insights, concerning trends and evolutions in the subject matter will be provided, and possible future developments will be investigated as well. The remainder of the paper is organized as follows. Section 2 gives a brief introduction and defines the technical lexicon that will be used in the paper. Section 3 describes the searching methodology that led to the identification of the set of papers that will be analyzed, in a general and more detailed way, in Section 4. Lastly, conclusions and general remarks will be drawn in Section 5. 2. A brief introduction of Machine Learning theory A single definition of ML cannot be properly formulated, as this term encompasses a multitude of different approaches taken from the field of computer science and of multivariate statistics. Nonetheless, a good definition can be found in Murphy (2012), who defines ML as the «set of methods that can automatically detect patterns in data, and then use the uncovered patterns to predict future data, or to perform other kinds of de- cision making under uncertainty». Although very clear, this definition gives too much emphasis on pattern recognition and decision-making that, as important as they may be, do not cover the whole spectrum of ML approaches and methodologies. So, more in general, we could define ML as a set of methodologies and algorithms capable of extracting knowledge from data, and continuously improve their capabilities, by learning from experience (i.e., from data accumulating over time). Please note that learning, as defined by Simon (1983), denotes a change that makes a system more and more adaptive, enabling it to perform the same task (or tasks drawn from the same population) more effectively the next time. It is also worth noting that, in many ways, ML overlaps with the so- called Statistical Learning (SL), an important field of statistics aimed to model and to understand complex datasets (Gareth et al., 2013). Both ML and SL models are characterized by the ability to self-adapt (at least to some extent), to changes in the data and/or in the environment, and to readjust their output accordingly. This pivotal element explains the recent increasing interest in these disciplines, as they perfectly match M. Bertolini et al. Expert Systems With Applications 175 (2021) 114820 3 the need to process and to analyze Big Data generated by the widespread use of electronic devices, web searches, social media, and social media marketing. 2.1. Machine Learning areas ML is commonly divided into three broad areas, namely Supervised Learning (SL), Unsupervised Learning (UL), and Reinforcement Learning (RL) (Murphy, 2012), as detailed below. 2.1.1. Supervised Learning (SL) Supervised Learning, also called predictive learning, includes many algorithms, of which the most commons are: Neural Networks, Support Vector Machines, Decision Trees (and their extensions, such as Random Forests and XGBoost), Logistic Regression, and Naïve Bayes Classifiers. Apart from implementation and operational differences, all SL methods aim to learn a good approximation ̂f of the true mapping f from the input vector x→ to the outputs vector y→, using information contained in a dataset of training examples, generated either performing experiments or through the direct observation of the phenomenon under analysis. More precisely, the data set is built by registering, for each observedexample, the true value of the response variable y, together with the known values of the input vector x→. The data set of examples is then split into a ‘training’ and ‘test’ set; the first one is used to reconstruct ̂f by iteratively minimizing a predefined cost (or loss) function, whereas the second one is used to assess the prediction accuracy of the model, on data that were ‘not seen’ during the training phase. Output variables may be either categorical or continuous. In the first case, the problem is known as a classification task, and a classic example could be that to generate a model to detect process failures or to predict the quality level (expressed on a categorical scale) of new production batches, starting from a dataset containing the physical properties x→ and the quality level y of completed production batches. Conversely, if vari- ables are continuous, the problem is known as a regression task, and an industrial example could be that to predict a certain physical property, such as the thickness or the surface roughness of items processed by a numerical control machine. In this case, the task could be traced back to an image recognition problem, as different pictures of the manufactured items, taken before and after the machining process, could be converted into a vector of features, to generate the predictive variables x→. 2.1.2. Unsupervised Learning (UL) Unsupervised Learning is concerned with unlabelled datasets, where no ground truth is available (i.e., the output vector y→ is missing). Hence, the goal is not to make a prediction, but rather to detect and to extract patterns in the data, whose nature or even whose existence could be partially or completely unknown. For these reasons, UL is sometimes referred to as descriptive learning and it is associated with knowledge discovery techniques. Broadly speaking, UL could be divided into three sub-areas (Murphy, 2012): clustering, density estimation, and dimensionality reduction. Clustering is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups. A common example is a marketing-driven need to find groups of customers similar in terms of purchase behavior. If informa- tion about class membership is not known, notorious algorithms, such as Hierarchical Clustering or K-Means, can be effectively used to this scope. Density estimation is a wide set of techniques that can be used to discover useful properties (e.g. skewness or multimodality) or even to generate an estimate of an unobservable underlying probability density function, of a dataset of observed data. Rescaled histograms are the most basic approach for density estimation, but more complex techniques can be also be used such, as Parzen Windows and vector quantization. Dimensionality reduction is frequently needed, especially in the case of Big-Data analysis, as a way to compress data, without altering and/or distorting their original informative content. Principal Component Analysis is the classical way to perform this task, but many neural network topologies (such as Autoencoders) can be employed too, to learn the best-compressed representation of the original data. In a broader sense, all Deep Learning (DL) models can be considered as a way to capture both the hidden representation of the data and the most relevant relationships among them. Accordingly, DL is also referred to as Representational Learning (Bengio et al., 2013). 2.1.3. Reinforcement Learning (RL) Reinforcement Learning differentiates from the other ML ap- proaches, as it implements a computational approach to learn from in- teractions with an environment (Sutton and Barto, 1998). Rather than generating a mapping from the input to the output space, RL generates a mapping from situations (environment state) to actions. Akin the learning process of a person, RL does not require a pre-existing dataset but, with the goal to learn autonomously how to make decisions, it ex- ploits a set of agents that learn by doing, following a rewarded trial and error approach. More precisely, the agent is free to interact with the environment, by performing a predefined set of actions, according to a predefined policy. Each action modifies the system’s state, and such modification is quantified through a specific reward signal, which is sent back to the agent. Since the objective of the agent is to maximize its total reward, it will learn, by doing, the best reaction to each possible external scenario, or system’s state. It is worth noting that Q-learning (Watkins, 1989) is one of the most popular reinforcement learning algorithms, in which the agent learns actions’ values, which define the agent policy, without the need to have an explicit model of the environment. In addition to the reward signal, the learning process can also be supported by a superset of supervised and/or unsupervised algorithms, which should optimize the exploration and the exploitation of the action space of the agent. When all, or at least a part, of the implemented superset of algorithms are neural networks, the approach is known as Deep Rein- forcement Learning (Li, 2017). In this regard, double Q-Learning is one of the examples of the application of Deep Learning models to improve the classic Q-learning algorithm (Van Hasselt et al., 2015). Anyhow, regardless of the implementation details, the final goal of an RL algorithm is to produce an artificial agent (or multiple agents interacting with each other) capable to make good decisions, based on the current state of the environment and its experience. For instance, from an industrial perspective, RL agents could be used to automate ordering strategies in multi-tier supply chain networks, or to update production parameters to maximize yield keeping operating costs at a minimum level. 3. Searching methodology In line with the objectives of the present work, and owing to identify trends, potentialities, and criticalities concerning the use of ML for operation management, the review focuses on the following Research questions (Rq): - Rq. 1 – Which are the main application domains (i.e., industrial pro- cesses) where ML has been successfully adopted? - Rq. 2 – Is the trend stable or has it modified through time, starting from 2000? - Rq. 3 – Which are the most popular ML methodologies for operation management? - Rq. 4 – Is it possible to identify interesting development patterns? - Rq. 5 – Are there any criticalities in the use of ML algorithms for In- dustrial Applications? - Rq. 6 – Which are the least studied domains and algorithms, which could benefit from renewed approaches? To answer the above-mentioned questions, the whole publications’ domain was investigated following a specific search-protocol, based on M. Bertolini et al. Expert Systems With Applications 175 (2021) 114820 4 four main steps, as detailed below. 3.1. Initial query-based search To collect as many publications as possible, in January 2020, a keywords-based search was made on three trustable and comprehensive scientific databases: Scopus, Web of Science, and Google Scholar. Aim- ing to restrain the search to the papers dealing with Machine and Reinforcement Learning for operation management, possibly with a focus on Industry 4.0, data were filtered using the following query, where the asterisk (*) is the ‘all’ operator. KEY ({manufact*} OR {supply chain} OR {industry 4*}) AND ({machine learning} OR {reinforcement learning} OR {deep learning}) AND PUBYEAR ≥ 2000 AND DOCTYPE (Article) AND (LIMIT-TO (LANGUAGE, English)) The query is reported here with a syntax similar to the one requiredby Scopus; yet, with minor adjustments, it was used to collect papers from Web of Science too. Conversely, papers retrieved from Google Scholar were manually filtered, due to a binding restriction, set by the search engine, that allows searching and filtering by title only. Anyhow, the filter (either applied manually or automatically) returned papers with at least a keyword belonging to the Set A = {manufacturing, supply chain, industry 4.0} and a keyword to Set B = {machine learning, reinforcement learning, deep learning}, provided that all the following inclusion criteria were met: - C1 – Studies must be either conference of journals peer-reviewed publications (i.e., other kinds of scientific works, such as books, patents, and Ph.D. thesis were not considered); - C2 – Language must be English; - C3 – Only recent studies, published starting from 2000, are considered. Such an extensive search returned a total of 678 publications, 370 from Scopus, 218 from Web of Science, and 90 from Google Scholar. 3.2. Search enlargement Next, despite the high number of collected papers, to avoid possible omissions of other relevant works, the search was enlarged using cross- reference and citation graph analysis, as detailed next. 3.2.1. Cross-reference analysis To enlarge the search, we considered the list of all citations found in the original set of 678 papers. Such a list was automatically generated leveraging on the Scopus APIs (https://dev.elsevier.com/sc_apis.html), which allows us to retrieve all citations and their related metadata (i.e., keywords, abstract, authors, etc.). Also, to exclude papers unrelated to the stream of the research herein considered, the inclusion criteria C1, C2, and C3 were re-used to filter the obtained citations’ list. However, the constraints imposed on the keywords were partially relaxed, as we accepted papers with at least one keyword belonging to set A or to set B. By operating in this way, we obtained a list of 767 filtered citations. 3.2.2. Relevance assessment through citation graph analysis The 767 citations and the original set of 678 papers were joined together and used as input for Gephi©, a freeware software application for the creation of citation networks. The simplified version of the generated network, where only connections among the main nodes are displayed, is shown in Fig. 1. In the network, nodes correspond to pa- pers, and arcs indicate citations among them. More precisely, green nodes are the source of a citation, whereas blue nodes are the papers that received at least one citation from the other ones. Also, the nodes’ size is an indication of importance, evaluated as the number of received citations. Using this relevance criterion, we decided to add to the original list all the nodes having at least three incoming arcs in the citation graph. As an example, let us consider the blue node labeled as A in Fig. 1. This Fig. 1. The simplified citation networks (sources of citation in green, cited papers in blue). M. Bertolini et al. https://dev.elsevier.com/sc_apis.html Expert Systems With Applications 175 (2021) 114820 5 node, whose size has been enlarged for display purposes, refers to Shiue (2009), a work that did not belong to the original list of the collected papers. However, the citation graph allowed us to include A in the list too, as A is cited, and thus connected, with three relevant works that were already part of the list. These works, namely Priore et al. (2010), Shiue et al. (2011), and Shiue et al. (2012), correspond to the green nodes (or citation sources) labeled as B, C, and D, in Fig. 1. By operating in this way, the original list increased from 678 to 714 papers. 3.3. Abstract analysis and final screening of the selected works Lastly, to refine the selection, all the abstracts were read and filtered using three additional inclusion criteria: - C4 – Only works with an informative abstract clearly stating the papers’ contributions and industrial results are considered; - C5 – Studies must be unique, copies (or very similar papers) are removed; - C6 – Purely theoretical or conceptual studies were not considered. Specifically, to be included, studies should present industrial appli- cations tested on experimental data or, at least, tested on accessible datasets (used as a benchmark by the research community). By doing so, mainly due to the application of criteria C4 and C6, 569 papers were considered of low operating value and were discarded, leaving a final corpus of 147 papers. The full list of the selected papers can be found in Tables 3a–3d of Section 4, where the papers are analyzed in detail. 4. Systematic review 4.1. Preliminary classification To answer the first three research questions, all papers were carefully read and classified in terms of their: - Application Domain (AD) – The industrial area or process considered in the paper, - ML Area (MLA) – The SL, UL, and RL clusters, as described in Section 2, to which the adopted algorithms belong to. In line with the content of the articles that were collected during the search, we tried to define clusters of comparable size containing papers sufficiently detailed and homogeneous. In light of this, a good compromise was reached considering the following four ADs: 1. Maintenance Management (MM), which includes 23 papers dealing with: - Failure modes classification and prediction (6), - Condition monitoring and fault detection (14), - Downtime minimization and maintenance planning (3). 2. Quality Management (QM), which includes 53 papers dealing with: - On-line quality control (10), - Defects detection and classification (33), - Image recognition for defect identification (9), - Life cycle management (1). 3. Production Planning and Control (PPC), which includes 49 papers dealing with: - Performance prediction and maximization (18), - Job scheduling and dispatching (15), - Dynamic process control (16). 4. Supply Chain Management (SCM), which includes 19 papers dealing with: - Demand planning and forecasting (6), - Inventory management (4), - Supply chain modeling and coordination (9). The above-mentioned classification is graphically displayed in Fig. 2, where the distribution of the papers in terms of AD and of MLA is clearly shown. Please note that the histogram chart includes an additional category, namely Engineering Design (ED), that was purposely introduced to insert three relevant papers, in the field of technical design (Cholette et al., 2017; Loyer et al., 2016; Stocker et al., 2019), that could not have been put in any other category. Also, note that the sum of the bars of a certain AD may be greater than the number of papers belonging to the same AD. This is because, quite frequently, there are papers that use and/or compare different methodologies to solve the same problem. As can be seen, the number of ML applications to the industrial problem is relevant and, most of all, in terms of the application domain, (i.e., Research Question #1) applications are distributed fairly evenly among the various fields of operations management. Only SCM is not yet a much-explored domain, a fact that can be probably explained considering that most of the time, SCM involves strategic optimization models, requiring complex and less known approaches, such as Deep Learning and/or Reinforcement Learning. A further discussion on this Fig. 2. Number of papers for Application Domain (AD) and Machine Learning Area (MLA). M. Bertolini et al.Expert Systems With Applications 175 (2021) 114820 6 topic is postponed to Section 4.4, where a detailed analysis of the collected paper is given. For some additional statistics, the interested reader is referred to Appendix B, where a bibliometric analysis (in terms of journals with most publications, authors with more citations, etc.) is provided. 4.2. Trend analysis The trend in the number of publications, for each AD, is shown in Fig. 3, which incontrovertibly responds to Research Question #2. Indeed, after an initial phase of latency, in which only some pioneering works have been occasionally published, scientific and industrial in- terest in ML applications has exploded. Especially over the last five years, the growing trend of publications is evident, with a very high spike in 2019. Concerning the evolution through time of the application areas (i.e., Research Question #3), a clear picture is given by Fig. 4, which shows the evolution of the distribution on the published papers, in terms of MLAs, for each of the four 5-year-periods from 2000 to 2019. In line with the overall increase of published papers, the trend is positive in each of the three MLAs, and it is particularly pronounced for SL approaches. This is not surprising because, historically, SL methods have always been the most studied and applied ones. Indeed, thanks to the ground-truth information (recorded in the training data set), they fully exploit available data, and they are also easier to interpret. Due to the relevance of SL approaches, the trend analysis is deepened in Fig. 5, which shows the trend of Neural Networks (NN)s, Support Vector Machine (SVM), and Tree-Based (TB) techniques (i.e., Decision Trees, Random Forests and Gradient Boosting), which have shown to be the most used techniques belonging to this ML area. As it is clear from the chart, SVM was the prominent technique until 2010, and although its use has not faded away, lately it has been overtaken by NNs. Albeit informally, the start of the Deep Learning era can be approximately placed around 2010–2012, and indeed, in the last 7–8 years, the use of Neural Network (especially of deep architectures), has been prominent. Nonetheless, collecting and labeling data is expensive and time- consuming, and this explains why, more recently, UL methods are increasingly being used too. As shown in Fig. 4, although they were Fig. 4. Time distributions of ML Areas (MLA). Fig. 3. The trend of publications, for each Application Domain (AD). M. Bertolini et al. Expert Systems With Applications 175 (2021) 114820 7 almost absent till 2014, in the last five years they account for about 1/5 of the total, with a very rapid growth trend, as clearly highlighted by the black line displayed in Fig. 5. Conversely, after a modest peak toward the end of the first decade of 2000, RL has stabilized at a lower growth rate. Probably, notwith- standing recent breakthrough developments in such areas and the greater understanding of RL’s potentialities, the high complexity of reinforcement learning algorithms is still a hurdle for its full acceptance and industrial applicability. The above-mentioned analyses are summarized in Table 1, which shows the number of published papers in terms of MLA and AD. Please note that, as for the histogram of Fig. 3, also the rows’ sum of the values of Table 1, may be higher than the corresponding number of papers. 4.3. Keywords analysis From the corpus of the investigated papers, we extracted around 350 different keywords. Getting rid of the obvious ones (e.g., Machine Learning, Supervised Learning, Unsupervised Learning, etc.), and combining the remaining ones by synonyms, a total of 61 basic key- words remained. Of these, 32 concern the application domain, the other 29 refer to the adopted ML techniques. The total count is graphically shown in Figs. 6a and 6b, where three fictional macro-keywords, namely ‘Metaheuristics’, ‘Statistic Tech- niques’, and ‘Neural Networks’, have been added to group similar and recurrent items. As can be seen, following the results reported in the previous sec- tions, NNs and SVMs are very common, together with RL and Meta- heuristic, that occur quite frequently too. Relatively to the application domain, ‘Diagnosis & Fault Detection’, ‘Additive Manufacturing’, and ‘Manufacturing Processes’ are, by far, the most frequent keywords. Immediately after, other interesting fields follow, such as: ‘Supply Chain Management’, ‘Big Data’, ‘Intelligent Manufacturing’, ‘Production Planning & Control’, ‘Quality Control’ and ‘Simulation’. For more in-depth information, a Word Cloud representation of the 20 most relevant keywords is also provided in Fig. 6c. As it is evident, there is a very good matching between the most occurring keywords and the Applications Domains that were used to classify the investigated papers. Apart from this rather predictable result, the presence of the ‘Intelligent Manufacturing’ is a strong indication of how important machine learning techniques are considered to obtain a competitive edge in the Industry 4.0 era. Lastly, it is also worth noting that the term Fig. 5. Publication trend of papers dealing with NNs, SVM, and TB algorithms. Table 1 Rq. 3 – Trend Analysis: results summary. Unsupervised Learning Reinforcement Learning Supervised Learning NNs SVM TB Other SL Maintenance Management (23) 3 4 13 9 7 6 Failure Mode Analysis (6) 1 [–] 5 2 1 2 Condition Monitoring (14) 2 [–] 7 6 6 3 Downtime Minimization (3) [–] 4 1 1 [–] 1 Quality Management (53) 16 1 27 24 19 24 On-Line Quality Control (10) 3 1 7 4 [–] [–] Defect Detection & Class. (33) 12 [–] 16 14 15 20 Image Recognition (9) 1 [–] 3 5 3 4 Life Cycle Management (1) [–] [–] 1 1 1 [–] Prod. Planning & Control (49) 10 12 22 10 8 10 Performance Prediction (18) 6 2 7 4 3 7 Scheduling (16) 1 7 6 2 5 2 Process Control (15) 3 3 9 4 [–] 1 Logistic & Supply Chain (19) [–] 10 3 5 3 3 Demand Forecasting (6) [–] [–] 3 4 1 1 Inventory Management (4) [–] 5 [–] [–] [–] [–] Modelling & Coordination (9) [–] 5 [–] 1 2 2 Engineering Design (3) [–] 1 [–] 2 [–] 1 M. Bertolini et al. Expert Systems With Applications 175 (2021) 114820 8 ‘neural network’ is the only one that explicitly refers to a particular ML algorithm. This is a further indication of the prominence and importance attached by researchers to this specific technique. However, the pres- ence of the ‘feature extraction’ term suggest that the practice of data pre- processing and data engineering is still common and dominant. This fact is in partial contrast with the development and dissemination of Deep Learning techniques that, as known, can exploit raw data, without needing sophisticated feature extraction techniques. Although the presence of the ‘feature extraction’ term is probably due to the older works (that used standard ML techniques), it may also indicate a rather immature approach to Deep Learning technique, which is still influ- enced by the most popular approaches in the recent past. 4.3.1. Current trends and hot topics To get a better idea of the current trends, and to give an answer to Research Question # 4, we also organized keywords in the 3D bubble chart of Fig. 7. Each keyword k (denoted using the same abbreviations used in Fig. 6b) is identified with a triplet of data (age, trend, size), and it is plotted as a sphere, with volume proportional to the size, and centrally located at coordinates(x, y) corresponding to ‘age’ and ‘trend’, Fig. 6a. Keywords relative to the adopted ML technique. Fig. 6b. Keywords relative to the solution domain. M. Bertolini et al. Expert Systems With Applications 175 (2021) 114820 9 respectively. Specifically, age, trend, and size are defined as follows: - Size (Sk) – The total number of occurrences of k, - Age (Ak) – The number of years since the first occurrence of k, - Trend (Tk) – The percentage misalignment of the Centre Of Gravity (COG) of k, as defined in Eqs. (1) and (2): T k = ((t (COG, k) − (t n − 0.5A k)))/A k = ((t (COG, k) − t k))/A k (1) t (COG, k) = ( ∑ (i = 1)n(s (i, k)∙t i))/( ∑ is (i, k)) = ( ∑ (i = 1)n(s (i, k)∙t i))/S k (2) where: tn is the current year, t − k is the midpoint of the life of k, si,k is the number of occurrences of k at year i, and tCOG,k is the ‘temporal’ coor- dinate of the COG of k. Specifically, for a consolidated and stable keyword k, tCOG,k should lay at the midpoint of its life (i.e., tCOG,k = t − k) and Tk should be close to zero. Instead, a positive value of Tk indicates a keyword that is being used more and more frequently, or that has come back into vogue, after a period of latency. Conversely, a negative value of Tk denotes a keyword that is out of fashion or no longer in use. Using these metrics, five main clusters can be identified. These are: 1. Question Marks (Low Age and Negative Trend) – Recently introduced topics, that have not got a follow-up, yet. Thermography (THER), Cyber-Physical Systems (CPS), and Design For (D4) belong to this category. 2. Hot Topics (Low Age and Negative Trend) – Very recent topics of booming interest. At present, none of the keywords properly belong to this category. Yet, Additive Manufacturing (ADD_MN), Prediction & Prognostic (PR_PR), and Industry 4.0 (I4.0) are those who come closest to this category. For this reason, they have been labeled as ‘new promises’. 3. Consolidated (Medium Age and Stable Trend) – Not recent topics, which are still studied, but without the initial spike of interests. Topics such as Supply Chain Management (SCMI), Flexible Manufacturing Systems (FMS), Inventory Control (INV_CTRI), and Tool Monitoring (TLL_MN) belong to this category. 4. Stars (High Age and Positive Trend) – Old and consolidated topics that are still attracting increasing research interest. Topics such as Diagnosis and Fault Detection (DG_FLT), Manufacturing Process (MN_PR), Intelligent Manufacturing (INT_MN), and Big Data analysis (BD_DM) certainly belong to this class. Probably, Simulation (SIM) and the Internet of Things (IoT) are on their way to become stars. 5. Obsoletes (High Age and Negative Trend) – Old topics that have never received much scientific interest and that have almost disappeared from the technical literature. Due to the recent introduction of ML, for operation management, no keywords can be classified as obso- letes yet. However, Order Management (OM) and, probably, also Feature Extraction (FT_EX) are moving toward this class. We also note that, as indicated by the direction arrows shown in Fig. 7, according to a standard evolutionary trajectory, question marks should become consolidated topics, moving diagonally from the bottom left corner to the center of the graph. However, in case of rapid success, question marks can move vertically to reach the hot topics area and, if Fig. 7. Topics’ evolution map measured in terms of age and trend. Fig. 6c. Word cloud of the 20 most relevant keywords. M. Bertolini et al. Expert Systems With Applications 175 (2021) 114820 10 the growing trend continues, they can proceed straightly toward the stars’ area. In this regard, three additional clusters, namely New Promises, Emerging Trends, and Young Stars can be identified. The first one contains recent topics that have already overcome the initial phase of uncertainty and that are likely to remain of interest in the years to come. As already noted, Additive Manufacturing (ADD_MN), Prevision & Prognostic (PR_PR), and Industry 4.0 (I4.0) belong to this cluster. The second one contains rather recent topics growing in popularity, which can be expected to become consolidated or even star topics in the next few years. Production Planning & Control (PPC) and Defect Detection (Q_DF) and Signal Processing (SIGN_PR) are the main topics in this area. The last one contains young and consolidates topics that are still in a booming phase. Automation (AUT) and Process Control (Q_PR) are the main topics in this area. 4.3.2. Gaps’ investigation To partially explain the difference between Stars and Question Marks, a gap analysis is provided in Table 2. Specifically, the occurrence of each ML algorithm is reported, both for the items classified as Stars and Question Marks. Some interesting differences, between the two groups, are clear. Indeed: - Applications of Reinforcement Learning algorithms are completely missing in the Question Marks group. - As far as the Supervised Learning algorithms are concerned, the number of algorithms applied by the Question Marks group is lower and limited to the most classic and widespread techniques. Several gaps are noted, even in case of some very common techniques, such as NN, RF and SVM, that are little used, if not completely ignored. - A similar gap can also be found in terms of Unsupervised Learning techniques. The gap is particularly marked in the Quality Management area, where Unsupervised learning is widely investi- gated by Stars, but it is totally neglected in the Question Marks group Is therefore evident that, in case of Stars the whole spectrum of possible ML solutions has been tested and, to emerge in this group, where research is almost mature, researchers have to resort to innova- tive and frontier techniques. Conversely, concerning Question Marks, ML applications are still a niche and only standard and consolidate ML techniques have been tested. There is therefore room for further in- vestigations, which could certainly lead to a positive development in all the involved Application Domains. 4.4. Detailed analysis of selected papers To answer to Research Questions #5 and #6, all selected papers were analyzed in detail. For each of the four ADs defined in Section 4.1, re- sults are summarized by providing a brief description of the papers deemed more significant and innovative and a summary table that highlights the main features of all the analyzed papers. Specifically, for each paper, the following fields are quantified: - Article – The reference to the described paper. - Sub Area -The sub-area to which the described paper belongs to. - # Citations – The number of obtained citations. - Alg_Class – The class (i.e., Supervised, Unsupervised, and Reinforced Learning) to which the algorithms used in the paper belong to. Algorithm - The full list of the adopted algorithms. Please note that, for reason of space, algorithms are indicated with an acronym; the full list is re- ported in Table A1 in the appendix section. - Sim_Based – A Boolean field that is equal to one for the papers based on discrete event simulation. If none of the articles belonging to a specific AD is based on simulation, this field is not considered. - CPS – A Boolean field that is equal to one for the papers dealing with a Cyber-Physical System. Also in this case, if this field is missing, none of the papers deals with a CPS. - Goals & Approaches – A small summaryof the papers’ methodologies and objectives. 4.4.1. Maintenance management Maintenance management concerns administrative, financial, and technical approaches for assessing and planning maintenance opera- tions, on a scheduled basis. The objective is to keep assets and machines at a full operating state, so that production proceeds effectively, and no money is wasted due to inefficiencies. Papers belonging to this area are listed in Table 3a, from which it is easy to see that ML perfectly fits this area, especially within the SL framework, for condition monitoring and failure analysis (i.e., faults detection and classification). Indeed, the problem can be easily inter- preted as a prediction task, where historical data are collected on the production floor, and faulty and non-faulty events are used as ‘ground- truth data’ against which a prediction model can be trained. NNs and SVM are commonly used, with a total of thirteen and nine applications, respectively. Although SVM was generally considered as the best per- forming techniques (see for example the review by Widodo and Yang, 2007), thanks to the introduction of new sophisticated algorithms (generally taken from the Deep Learning area), in the last decade their popularity has started to decrease, in favor of more promising NN ap- proaches. Most of the papers dealing with ‘Failure Mode Analysis’ employ NNs to efficiently determine the cause of failures of both equipment and machines. For instance, Prieto et al. (2013) proposed a novel approach for on-line fault detection of electrical machines, which considers both local and distributed defects. The model integrates a curvilinear Table 2 Gap analysis of consolidated and new emerging clusters. Question Marks Stars ED MM PPC QM MM PPC QM Reinforcement Learning Deep Q Learning 1 Proximal Policy Optimization 1 Trust Region Policy Optimization 1 Supervised Learning Boosting 1 1 1 Decision Tree 1 3 6 Linear Discriminant Analysis 1 1 Logistic Regression OGIT 1 Linear Regression 1 1 Neighbor Based Clustering 2 2 Neural Network 1 2 10 3 13 Quadratic Discriminant Analysis 1 Random Forests 2 2 Rough Set Algorithm 2 3 Support Vector Data Description 2 Super Vector Machines 2 2 6 12 Unsupervised Learning Gaussian Density Estimation 1 Gaussian Mixture Modelling 1 3 Hierarchical Clustering 1 K-Means/K-Median 1 K-Means clustering 1 3 K Nearest Neighbors 1 1 3 Local Outlier Factor 2 Non-negative Matrix Factoriz. 1 Principal Comp. Analysis 4 Parzen Windows 3 Self-Organizing Maps 1 1 M. Bertolini et al. Expert Systems With Applications 175 (2021) 114820 11 Table 3a Maintenance Management Papers. Sub-Area Article Received Citations Algorithm Classification Main Algorithm Simul. Based Goals & Approach Condition Monitoring Cho et al., (2005) 62 Supervised Learning SVM 0 Multiple sensors are used to record cutting forces and power consumption of milling machines. Using a Super Vector Regression, the tool breakage detection rate is increased, with a huge impact on manufacturing performance. Condition Monitoring Saxena and Saad, (2007) 116 Supervised Learning NN, GA 0 A Genetic algorithm is coupled to a NN for feature selection and topology search. The NN is used for fault detection of roller bearing health monitoring. Data were collected on-site, from three accelerometers and one acoustic sensor. Condition Monitoring Kankar et al., (2011) 100 Supervised Learning NN, SVM 0 NN and SVM are compared on a dataset of ball bearings’ faults, that have been pre-processed, for dimensionality reduction. Results show that an automated diagnosis system is feasible. Condition Monitoring Azadeh et al., (2013) 38 Supervised Learning SVM, NN, GA 0 A flexible algorithm based on an ensemble of SVM, NN, and metaheuristics is used for condition monitoring and fault detection. The ensemble is tested against noisy and corrupted data of centrifugal pumps. Condition Monitoring Zhang et al., (2015) 31 Supervised Learning SVM, ACO 0 An Ant Colony Optimization metaheuristic is applied for features selection and hyperparameters optimization of an SVM for intelligent fault diagnosis. The method is evaluated on a rotor system and locomotive roller bearings. Condition Monitoring Li et al., (2017) 0 Supervised Learning CNN 0 A novel fault diagnosis algorithm, leveraging on an ensemble of Deep Convolutional NN, is presented. The algorithm is tested on a public database of bearings’ failure data. Condition Monitoring Syafrudin et al., (2018) 3 Supervised & Unsupervised Learning RF 0 A two-steps approach for fault detection is presented. First, the DBSCAN algorithm is used to detect possible outliers, next a random forest is used to predict possible faults. Condition Monitoring Liu et al. (2018b) 2 Supervised Learning LDA, Clustering 0 Acoustic emissions signals, collected from additive manufacturing machines, are used to recognize different operating states. To this aim, data are pre-processed through LDA (both in time and frequency domains) and clustered with unsupervised methods. Condition Monitoring Hesser and Markert (2019) 0 Supervised Learning NN 0 A programmable prototype platform, equipped with onboard sensors, is coupled with a NN to make existing milling machines compliant to the Industry 4.0 standards. Condition Monitoring Wang et al., (2019) 3 Supervised Learning NN 0 A newly developed deep heterogeneous GRU model is used with local feature extraction for long-term prediction of equipment deterioration. Condition Monitoring Li et al., (2019) 0 Supervised & Unsupervised Learning PCA, DT, RF, KNN, SVM 0 A tool wearing detection framework is proposed, based on audio signal processing. A compression stage based on PCA is followed by a classification stage that makes use of standard ML techniques Condition Monitoring Bukkapatnam et al., (2019) 1 Supervised Learning Balanced -RF 0 The paper introduces a non-parametric random forest (Manufacturing system-wide Balanced RF), that takes into account complex dynamic dependencies among parts and failures. The approach allows a long-term prognosis of machine breakdowns and greatly reduces prediction error. Condition Monitoring Kammerer et al., (2019) 0 Supervised Learning DT, RF, NN 0 The work considers two data sets (taken from Industry 4.0 scenarios) and has the goal to detect sensor data anomalies. The focus is on the collection and processing steps, whereas analysis is performed using standard machine learning techniques. Condition Monitoring Alegeh et al., (2019) 0 Supervised Learning SVM, DT, KNN 0 The paper focus on the “product-service system” (PSS). Specifically, a case study is discussed where the manufacturer of a 5 axes gantry machine monitors the degradation of the equipment (using sensor data) and use the analysis to offer maintenance services. Downtime Minimization Susto et al., (2015) 20 Supervised Learning SVM, KNN 1 A multi-classifier is proposed to optimize a cost-based maintenance decision system. Each classifier can deal with high- dimensional censored data and is trained with different prediction horizons. Downtime Minimization Wan et al., (2017) 2 Supervised Learning NN 0 A NN is proposed to predict the remaining lifetime of mechanical components, subjected to specific processing conditions. Using the NN in a big-data system, an active preventive maintenance is developed. Downtime Minimization Kuhnle et al. (2018) 0 ReinforcementLearning DQN, VPF, TRPO, PPO 1 Downtime reduction and lower maintenance costs are achieved using a Reinforcement Learning approach, based on the Proximal Policy Optimization algorithm. Failure analysis Prieto et al., (2013) 95 Supervised Learning NN 0 The paper considers 6 bearing scenarios, in 25 operating conditions. After feature selection and dimensionality reduction (for physical interpretation), a NN is used for the multiclassification task. Failure analysis Perzyk et al., (2014) 5 Supervised Learning DT, RST, NBC, NN, SVM 0 The paper shows how simple statistical methods, such as contingency tables, may perform similarly or better, than ML techniques in detecting the main parameters for fault diagnosis. Failure analysis 0 Supervised Learning CNN 1 (continued on next page) M. Bertolini et al. Expert Systems With Applications 175 (2021) 114820 12 component analysis (used for dimensionality reduction) with a final classifier based on a two-level hierarchical NN. Li et al. (2017) used an ensemble of Convolutional Neural Networks (CNN) for bearings’ fault diagnosis and classification. The same problem was also tackled by Sobie et al. (2018), who used Dynamic Time Warping along with CNN’s. In doing so, they demonstrated that a training dataset generated through high-resolution simulations may be effectively used to integrate or even to replace missing and/or insufficient data. This is an important achievement because all precedent works (concerning fault detection and classification) were highly dependent on historical data collected on the field; a clear detrimental fact for their adoption in the industry. Instead, unsupervised techniques are less frequent, and they are gener- ally limited to defects’ classification, as in the work by Wu et al. (2019), where a Self-Organizing Map, based on acoustic data, is used to cluster filaments in terms of different failure modes. Even in the field of ‘Condition Monitoring’ NNs are, by far, the most applied techniques and, in this case, the most common applications concern condition monitoring of rotating mechanical systems (Saxena and Saad, 2007; Zhang et al., 2015) and rolling bearings (Kankar et al., 2011). In both cases, the problem is solved using vibrations and/or acoustic signals as classifiers inputs, for faulty and non-faulty prediction. It is interesting to note that, to exploit the information content of the acoustic signal, the oldest works paid close attention to feature selec- tions, hyper-parameters optimization, and dimensionality reduction. More recent ML techniques, instead, have eliminated part of these lim- itations and especially Deep Learning allows an ‘As Is’ use of the original data set, without requiring careful data pre-processing. In this regard, Azadeh et al. (2013), proposed an ensemble of Deep NNs and SVM for condition monitoring of centrifugal pumps and effective maintenance management. The ensemble, optimized with a novel metaheuristic, has been proved to be particularly resilient concerning corrupted or noisy data. On the other side, Deep Learning requires a very massive dataset, that is not always available. For this reason, historical data are often enriched with additional data generated through simulation, as in Sobie et al. (2018) and in Kuhnle et al. (2018) where an innovative approach for downtime reduction and lower maintenance costs, is proposed based on four different Reinforcement Learning algorithms. Similar approaches can also be found in the field of ‘Downtime Minimization’, as in Susto et al. (2015), who used an ensemble of SVM and k-Nearest Neighbors to plan predictive maintenance tasks in a way that minimizes all the costs generated by unexpected breakdowns and/ or by machine unexploited lifetime. Their interesting approach was successfully tested on a well-known semiconductor manufacturing maintenance problem. 4.4.2. Quality management Quality Management, a major area within the field of operation management, can be defined as the process of achieving and maintain- ing a certain level of business excellence so that products and/or services are consistent with what customers want and are willing to pay for. From this perspective, quality management is not limited to product and/or service compliance, but it also encompasses all the processes that are needed to achieve the desired quality level, such as quality planning, quality assurance, quality control, and quality improvement. As shown by Table 3b, in the context of ML the focus is mainly on quality assurance and quality control and, overall, the main aim is to understand what customers want and, more in general, which are the true drivers for better quality. A typical example is that of quality monitoring and ‘Defects’ Detection and Classification’, a topic that counts several applications in the elec- tronic industry. Typically, to discriminate between defective and non- defective items, manufacturing data are collected from sensors, PLCs, and Manufacturing Executions Systems, and they are used as decision variables of an ensemble of classifiers. Lenz et al. (2013) used an ensemble of Decision Trees, NNs, and SVM to tackle a virtual metrology problem, that is to predict the thickness of dielectric layers deposited during the manufacturing of semiconductor wafers. Saucedo-Espinosa et al. (2014) used sound analysis to detect defective bearings in home appliances and showed that Random Forests are the most effective classification techniques. Liu et al. (2017) implemented a Deep Belief Network (a composition of Restricted Boltzmann Machines) for fault detection and isolation and demonstrated that this peculiar network topology can capture highly discriminative semantic features; indeed, impressive accuracy levels, up to 100%, was obtained. It is interesting to note that, when the aim is to detect defective items, the so-called imbalance problem is frequently found. Indeed, this issue is rather common when the objective is to discriminate positive events from negative and rare ones, such as defects. A detailed discussion of this problem can be found in Lee et al. (2016) and in Kim et al. (2018), who compared a comprehensive set of ML classification techniques showing that, in case of heavily unbalanced data sets, Random Forests offer the best results. Other relevant works are those by Ye et al. (2013) and by Ko et al. (2017). The first one proposed an ensemble of NNs and SVMs (based on a weighted majority vote), for functional diagnosis of printed- circuit boards. The ensemble was successfully applied to a highly un- balanced manufacturing dataset that was artificially augmented with synthetic data. The second work presented a framework to detect anomalies of heavy machinery engines, based on manufacturing, in- spection, and after-sales data. Specifically, it was shown that in the case of unbalanced data, Gaussian Mixture Models and Parzen Window Density Estimation are very effective, compared to other techniques such as Principal Component Analysis or K-Means Clustering. Besides the assessment of product compliance, ML has also been used to implement ‘On-Line Quality Control’ systems, thus enabling more Table 3a (continued ) Sub-Area Article Received Citations Algorithm Classification Main Algorithm Simul. Based Goals & Approach Sobie et al., (2018) Statistical methods are compared with Convolutional NN for bearing fault classification. Data are generated from high- resolution simulations and a novel application of Dynamic Time Warping is also presented. Failure analysis Liu et al. (2018a) 0 Supervised Learning, Unsupervised Learning NN, SVM, AE 0 A Denoising Auto-Encoder is usedto extract meaningful representations of failure modes, and newly generated data is compared to historical ones, using KL-divergence. The approach emphasizes new fault modes while maintaining a dynamic and compensatory behavior. Failure analysis Ren et al., (2018) 4 Unsupervised Learning AE DNN 0 To predict the remaining useful life of a rolling bearing, a Deep Auto-Encoder and a Deep Neural Network are used. Specifically, they are coupled with a novel eigenvector-based method and can accurately reproduce the bearings’ degradation process. Failure analysis Wu et al., (2019) 5 Unsupervised Learning SOM 0 The paper proposes a data-driven monitoring method, based on acoustic emissions, for online process failure diagnosis of fused filament fabrication. Specifically, the diagnosis of different failure modes is formalized using a self-organizing map. M. Bertolini et al. Expert Systems With Applications 175 (2021) 114820 13 Table 3b Quality Management Papers. Sub-Area Article Received Citations Algorithm Classification Main Algorithm Simul. Based Goals & Approach Defect Detection Kusiak and Kurasek, (2001) 47 Supervised Learning RST, DT 0 Data mining techniques are used to identify the cause of solder-ball defects, in circuit board manufacturing. The Rough Set algorithm is used because it can provide explicit rules, in contrast with NN or Linear Regression. Defect Detection Kim et al., (2012) 16 Supervised & Unsupervised Learning GDE, GMM, PW, KMC, SVM, PCA 0 Aiming to detect faulty wafers, 7 different ML algorithms, and 3 dimensionality reduction methods are used. Defect Detection Çaydaş and Ekici (2010) 48 Supervised Learning SVM, NN 0 SVM and NN are compared to estimate the surface roughness of stainless steel. SVM shows the best performances, but the NN is very shallow, and only three input variables are used. Defect Detection Ye et al., (2013) 36 Supervised Learning SVM, NN 0 An ensemble of NN and SVM, based on majority voting, is applied both to defects detection (of three complex boards) and to propose repair suggestions. Defect Detection Lenz et al., (2013) 1 Supervised Learning DT, NN, SVM 0 Using 27 features from process data, Decision Trees, NN, and SVM are compared for predicting the thickness of dielectric layers in a semiconductor manufacturing scenario. Defect Detection Tan et al., (2015) 13 Supervised Learning Evolutionary NN 0 An evolutionary Neural Network is applied to an imbalanced data set (of semiconductor manufacturing) for defect detection. Based on the adaptive resonance theory, it combines a fuzzy set and stability-plasticity characteristic. It is benchmarked against other cost sensitive NN and non-cost sensitive ML algorithms. Defect Detection Adly et al., (2015) 5 Supervised Learning SVM, NN 0 A novel regression algorithm is introduced and compared to state-of-the-art ML methods for the identification of defects in wafer manufacturing. Results show comparable performance, with the benefit of a reduced computational footprint. Defect Detection Gao et al., (2016) 4 Unsupervised Learning NMF 0 A sparsity-adaptive sparse non-negative matrix factorization is proposed to detect defects in an unsupervised way, without requiring manual selection of specific frequencies. Experimental tests are made on metal manufacturing data. Defect Detection Lee et al., (2016) 0 Supervised Learning SVM, DT, Bagging, Boosting, RF, KNN 0 The performance of three sampling-based algorithms, four ensemble algorithms, four instance-based algorithms, and two support vector machine algorithms are compared to effectively tackle the imbalance problem for the development of high- performance fault detection systems. Defect Detection Mohammadi and Wang, (2016) 0 Supervised Learning SVM 0 Based on data collected throughout an abrasion- resistant material manufacturing process, product quality prediction of burned balls is achieved using Support Vector Machine. Defect Detection Saucedo-Espinosa et al., (2017) 1 Supervised Learning SVM, NN, NBC, KNN, DT 0 Home appliances with defective embedded bearings are detected using ML algorithms for sound signals analysis. Results show that intuitive and simple methods yield high performance. Defect Detection Ko et al., (2017) 0 Supervised & Unsupervised Learning GMM, PW, LOF, K- MEANS, PCA, k- PCA, SVDD 0 A novel method for feature extraction is proposed for the manipulation of multidimensional time-series data. Specifically, the method is tested on after-sales data of heavy machine engines. Defect Detection Tušar et al., (2017) 0 Supervised Learning DT, RF 0 A quality prediction framework based on machine vision, Decision Tree-based algorithms, and evolutionary optimization algorithms are studied in terms of overfitting problems, and authors show that, in some cases, over-optimization leads to overfitting. Defect Detection Liu et al., (2017) 0 Unsupervised Learning RBM 0 A Deep Belief Network is employed to capture different semantic representations of the voltage signal for fault detection and isolation system. The method proved to be superior to traditional feature extraction methods. Defect Detection Kim et al., (2018) 4 Supervised Learning DT 0 The paper deals with defect detection and focuses on the imbalance problem. Using a die-cast data set, it is shown that the AdaC2 algorithm, a cost-sensitive Decision Tree algorithm, outperforms other classifiers in case of unbalanced data. Defect Detection Khanzadeh et al., (2017) 7 Unsupervised Learning SOM 0 A Self-Organizing Map is employed for measuring geometric accuracy, with fewer data and avoiding the need to define custom landmark features. Identified (continued on next page) M. Bertolini et al. Expert Systems With Applications 175 (2021) 114820 14 Table 3b (continued ) Sub-Area Article Received Citations Algorithm Classification Main Algorithm Simul. Based Goals & Approach clusters correspond to specific types of deviation from the ideal shape. Defect Detection Manohar et al., (2018) 1 Unsupervised Learning SS, PCA 0 ML is employed to learn, from past data, the distribution of shim gaps in aircraft assembly. ML is coupled with optimized sparse sensing to gather new data. Also, Robust Principal Component Analysis is used for dimensionality reduction. Defect Detection Khanzadeh et al., (2018) 12 Supervised Learning DT, KNN, SVM, LDA, QDA 0 ML algorithms are used to regress defect occurrence from melt pool characteristics, in additive manufacturing. DT shows the lowest type II error, while KNN achieves the highest accuracy. The combination of a morphological model with supervised learning techniques outperforms state-of- the-art results. Defect Detection Zhu et al., (2018) 3 Supervised Learning Gauss. PR 0 A multi-task Gaussian Process is employed to analyze in-plane geometric deviations from an additive manufacturing process to estimate geometric deviation. Defect Detection Carvajal Soto et al. (2019) 3 Supervised Learning GrB NN 1 A Multi-layer Perceptron, a Random Forest, and a Gradient Boosting algorithm are applied to build a real-time online failure identification solution. Decision Tree-based methods outperform the NN, mainly due to data unbalance. Defect Detection Peres et al., (2019) 2 Supervised Learning NBC, KNN, XGB, RF, SVM 0 Different methods are compared to recognize productdimensional variability, for defect detection in a real automotive multistage assembly line. Defect Detection Stoyanov et al., (2019) 0 Supervised Learning SVM 0 SVM is employed for failure testing in electronics manufacturing. The objective is to develop an intelligent optimization of the tests’ sequence and a reduced number of tests. Defect Detection Chen et al., (2019), Chen et al. (2019b) 0 Supervised Learning NN, SVM 0 NN and SVM are compared for automatic detection and classification of welding defects. Applied to a dataset of galvanized steel sheets, NN outperformed SVM. Defect Detection Kim and Kang, (2019) 0 Supervised Learning NN, DT, KNN 0 NN, DT, and KNN are compared for defect detection, using data set containing irrelevant variables. KNN shows the maximum degradation, while DT is more resilient. Defect Detection Ruiz et al., (2019) 0 Supervised Learning KNN, RF, NN 0 Three methodologies are compared to detect breakage during the drawing of steel. The imbalance problem is tackled using different techniques (under-sampling, oversampling, SMOTE). Defect Detection Caggiano et al., (2019) 3 Supervised Learning CNN 0 A Deep Convolutional NN is used for online defects detection. Specifically, the NN is trained to analyze in- process images of Selective Laser Melting manufacturing process. Defect Detection Tsutsui and Matsuzawa, (2019) 1 Supervised Learning DNN 0 Deep Learning models are applied to Optical Emission Spectroscopy for predicting measurements of ongoing semiconductor process. The proposed network topology outperforms standard models for image analysis. Defect Detection Imoto et al., (2019) 0 Supervised Learning CNN, TL 0 A Convolutional NN, trained on a real semiconductor fabrication dataset, is used for defect classification, based on the analysis of electron microscope images. To reduce the amount of data of the training step a Transfer Learning algorithm is also used. Defect Detection Oh et al. (2019b) 0 Supervised Learning ASVM 0 The paper presents a framework for on-line-quality control of a sunroof assembly line. Thanks to an iterating loop between a data pre-processing module and an SVM learning module, the defect classifier continuously learns from past experiences. Defect Detection Yacob et al., (2019) 2 Supervised & Unsupervised Learning SVM, DT, KNN 0 The aim is to detect parts’ anomalies, based on surface characteristics, and categorize them as systematic and random variations. To reduce the number of physical parts needed to train the models, also digital twins (Skin Model Shapes) are used. This has the additional benefit of avoiding biases and unbalancing problems. Defect Detection Papananias et al., (2019) 2 Supervised Learning Bayesian R., ANOVA 0 The paper develops a probabilistic model, based on Bayesian linear regression, for flatness tolerance evaluation. Two case studies demonstrate the effectiveness of the probabilistic model. Defect Detection Saqlain et al., (2019) 5 Supervised Learning NN, LogR, GrB, RF 0 The paper proposes a soft voting ensemble classifier with multi-types features, to identify wafer map defect patterns in semiconductor manufacturing. Four classifiers are used, and results are combined assigning (continued on next page) M. Bertolini et al. Expert Systems With Applications 175 (2021) 114820 15 Table 3b (continued ) Sub-Area Article Received Citations Algorithm Classification Main Algorithm Simul. Based Goals & Approach higher weights to the classifiers with higher prediction accuracy. Defect Detection Iqbal et al., (2019) 3 Supervised Learning AE, Clustering 1 The paper presents a novel approach for automated Fault Detection and Isolation. Deep Auto Encoders coupled with other ML tools (hierarchical clustering and Markov Chains) model the spatial/temporal patterns found in the data and successfully diagnose and locate multiple classes of faults. Image Recognition Ravikumar et al., (2011) 28 Supervised Learning NBC, DT 0 A Decision Tree and a Naïve Bayes classifier are compared relative to an image classification task, for automated visual inspection. Feature pre-processing is performed to generate images’ histogram features used as input for the classifiers. Image Recognition El-Bendary et al., (2015) 10 Supervised & Unsupervised Learning LDA, PCA, SVM 0 The aim is to classify tomato ripeness based on their color. SVM, Linear Discriminant Analysis, and Principal Components Analysis are combined and tested on a sample of 250 images. Results are validated with 10-fold cross-validation. Image Recognition Chen et al., (2016) 2 Supervised Learning SVM 0 Aiming to enhance yield and to reduce defect rate, an automatic optical inspection system is proposed. The system makes use of an SVM classifier, which is strengthened by a similarity approach capable to reduce the number of false alarms. Image Recognition Yang et al., (2018) 2 Supervised Learning CNN 0 A Convolutional Neural Network, coupled with a three-point circle fitting method, is used for automatic aperture detection of LED cups. Image Recognition Gobert et al., (2018) 5 Supervised Learning SVM 0 To enable in-process re-melting and defects correction (of an additive manufacturing process), an in-situ defect detection protocol is proposed. Using SVM, digital single-lens images are pre-processed and classified, with an accuracy rate of around 80%. Image Recognition Yuan et al., (2018) 2 Supervised Learning CNN 0 A Convolutional NN is trained (in a supervised fashion) to analyze 10 ms video clips of laser powder additive manufacturing. The CNN can predict LPBF track widths and track continuity, from in situ video data. Image Recognition Scime and Beuth, (2018) 1 Supervised Learning CNN 0 The input layer of a Convolutional NN is modified to allow the NN to learn the appearance of the powder bed anomalies and key contextual information with the scale-invariance property. This alteration improves accuracy and mitigates human biases. Image Recognition Penumuru et al. (2019) 0 Supervised Learning SVM, DT, RF, LogR, KNN 0 Alternative methodologies are compared in the recognition of metallic materials from surface images. The robustness of the classifiers is checked for various camera orientations, illuminations angle, and focal length. Image Recognition Scime and Beuth, (2019) 12 Supervised & Unsupervised Learning SIFTS, SVM 0 The goal is to detect keyholing porosity and balling instabilities in laser powder bed fusion additive manufacturing. A scale-invariant description of the melting pool morphology is constructed applying the “Bag-of-Words” technique to features extracted using Scale Invariant Feature Transforms. SVM is then applied to classify the observed melt pools. Life cycle Management Jennings et al., (2016) 2 Supervised Learning RF, NN, SVM 0 The aim is to predict the obsolescence risk level at a certain stage of the lifecycle of a device. Specifically, NN, Random Forests, and SVM are compared, to partition “active” and “obsolete” smartphones. Online Quality Control Ribeiro, (2005) 47 Supervised Learning SVM, NN 0 The work compares NN and SVM to predict product quality using process’ data. Using the real-data of a molding injection process, the paper shows that both methods can quickly react to unexpected disturbances. Online Quality Control Lin et al., (2011) 16 Supervised Learning SVM, NN 0 Support Vector Machine and Neural Networks are compared to effectively classify seven different control charts patterns for specific causes. SVM results less prone to overfitting and more robustto background noise. Online Quality Control Wuest et al. (2013) 16 Supervised & Unsupervised Learning HC SVM n.a. (theoretical) Hierarchical Clustering and SVM are used to analyze multidimensional data (of the product’s state along the whole manufacturing process) and to trigger corrective actions if needed. Online Quality Control Yang and Zhou, (2015) 2 Supervised Learning NN, LVQ 0 This study proposes a NN, ensemble-enabled, autoregressive, and coefficient-invariant control chart patterns recognition model. Each NN is trained to recognize CCP with a specific autoregressive (continued on next page) M. Bertolini et al. Expert Systems With Applications 175 (2021) 114820 16 flexible processes with the ability to automatically take corrective ac- tions as soon as possible. For instance, Wuest et al. (2013), proposed a hybrid approach, namely Semi-Supervised Learning, to estimate the product’s state along its manufacturing process. At the first stage, an unsupervised hierarchical clustering is used to label the records of the training set. Next, labeled data are processed by a supervised layer, based on SVM, which performs the final classification. In this way, the need for a manually labeled data set is avoided, with great benefit in terms of development time and classification accuracy. Similarly, Nakata et al. (2017) proposed a Big-Data based, long-term automated monitoring system of micro-conductors manufacturing, which allowed production engineers to obtain significant yield enhancements. More recently, visual quality inspection, supported by automated ‘Image Recognition’, is emerging as a promising field for defects identi- fication and classification. For instance, Chen et al. (2016) used an automatic optical inspection system, based on an SVM classifier with a similarity approach, to reduce the false alarm rate (of defect classifica- tion) in the production of CMOS image sensors. El-Bendary et al. (2015) proposed the application of machine learning techniques to assess to- mato ripeness. Posed as a multi-class classification task, the problem was solved with a hybrid classifier (based on SVM and Linear Discriminant Analysis), supported by Principal Component Analysis for feature extraction. Other interesting works concern the use, increasingly com- mon, of CNNs for image recognition and visual control, as in the work by Yuan et al. (2018) and Scime and Beuth (2018), who used this neural network topology to detect anomalies in laser powder additive manufacturing. Other interesting applications propose integrating ML and statistical control charts, to understand if drift in operating parameters is taking place. Notable examples can be found in Lin et al. (2011) and in Yang and Zhou (2015), who used an ensemble of NNs to handle autocorrelated data in control chart patterns. The model can detect up to seven types of unnatural patterns and drifts and can be used by quality managers to promptly identify the root causes of processes’ anomalies. 4.4.3. Production Planning & Control (PPC) As noted in Section 4.3, in terms of ML applications, Process Planning and Control is an emerging trend that is attracting much academic and industrial interest in the last decade. Mainly, it includes all the activities that are needed to manage a manufacturing process and to improve its operating performance; as shown in Table 3c, ‘Performance Prediction and Optimization’ is the most studied problem. For instance, Arredondo and Martinez (2010) proposed an RL approach based on Local Weighted Regression, to implement an order acceptance policy, similar to Workload Control. In particular, jobs can either be put in a rejection or in an acceptance set and, in this way, the average revenues can be maximized relative to the installed capacity. Doltsinis et al. (2012) used RL for production ramp-up optimization. To this aim, they formulated the problem as a sequence of technical de- cisions needed to progress the system toward the desired steady state, in the shortest amount of time. Other interesting contributions can be found in Li et al. (2016), who combined Q-Learning and SVM to reduce the electricity consumption of an automated manufacturing system, and in Agarwal et al. (2019), who used an autoencoder to find the best set of process parameters for optimizing process productivity and profit. ‘Scheduling’ is the second most studied problem. This topic has al- ways attracted a lot of industrial and academic interest, not only for its immediate practical implications but also because it is extremely chal- lenging from a research perspective. Indeed, scheduling problems are known to be NP-hard (almost ever) and, for this reason, they create a fertile ground for the application of novel ML algorithms. One of the first works is that by Priore et al. (2001), who studied the implementation of Table 3b (continued ) Sub-Area Article Received Citations Algorithm Classification Main Algorithm Simul. Based Goals & Approach coefficient. The outputs are combined through Learning Vector Quantization. Online Quality Control Nakata et al., (2017) 0 Supervised Learning, Unsupervised Learning CNN, K-means 0 A Convolutional NN is applied to classify wafers’ failure map patterns. It is integrated into a three-stage automated monitoring system fed with real-time massive manufacturing data. Online Quality Control Zhang et al., (2019a), Zhang et al. (2019b) 0 Supervised Learning LSTM 0 A Long-Short Term Memory NN takes as input temperature and vibration data of an additive manufacturing process and predicts the tensile strength of the manufactured item. Layer-wise Relevance Propagation is used to assess parameters’ influence. Online Quality Control Oh et al. (2019a) 1 Supervised Learning SVM 0 A cost-effective SVM is used for online QC of a manufacturing process. The SVM incorporates inspection-related expenses and error types and is tested against an automotive door-trim manufacturing process. Design of Experiment is carried out to perform sensitivity analysis. Online Quality Control Zhu et al., (2019) 2 Reinforcement Learning QLrn, TS 0 Acoustic emissions sensing, through fiber brag grating, is coupled with Q-Learning and Taboo Search for quality monitoring of an additive manufacturing process. Online Quality Control Yu (2019) 0 Supervised Learning SDAE 0 An enhanced stacked denoising autoencoder (ESDAE), with manifold regularization, is used for wafer map pattern recognition (WMPR). The approach, which can be used for on-line detection of map defects, has been successfully validated using a real-world wafer map dataset. Online Quality Control Yu et al., (2019b) 4 Supervised & Unsupervised Learning SDAE 1 A Stacked Denoising Autoencoder is used for pattern recognition. SDAE denoises the input signal and extracts the important features used as input of a final classification layer. SDAE layers are trained in an unsupervised way, whereas the regression is fine- tuned with a supervised approach. By doing so SDAE greatly improves its generalization performance and can learn more robust and compact features. M. Bertolini et al. Expert Systems With Applications 175 (2021) 114820 17 Table 3c Production Planning and Control Papers. Sub-Area Article Rec. Cit. Algorithm Class. Main Alg. Sim. Based Cyber Ph. Sys. Goals & Approach Performance Prediction Arredondo and Martinez, (2010) 20 Reinforc. LearningLWR 1 0 A locally weighted regression is used to learn the value of accepting or rejecting a production order. The approach maximizes the average revenue obtained per unit cost of the installed capacity. Performance Prediction Meidan et al., (2011) 19 Superv. Learning NBay 1 0 Using Conditional Mutual Information Maximization, a selective Naïve Bayesian Classifier is used to select the most discriminative features for cycle time prediction. Performance Prediction Doltsinis et al., (2012) 2 Reinforc. Learning QLrn 0 0 A Q-Learning algorithm supports decision-making during production ramp-up. It significantly reduces the time needed to reach a stable state. Performance Prediction Duan et al., (2015) 0 Superv. Learning SVM, DT, Bay.R 0 0 Based on production capacity and orders’ properties and requirements, a tree-based classifier is used to accept or reject incoming orders, to maximize profit. Performance Prediction Heger et al., (2016) 5 Superv. Learning Gauss. PR 1 0 Gaussian Process Regression predicts the best parameters’ settings, conditioned on current system status. Results showed a significant mean tardiness reduction. Performance Prediction Delgoshaei and Gomes, (2016) 1 Superv. Learning SA, NN 1 0 A hybrid model based on NN and Simulated Annealing is used to optimize the prediction mix. The focus is on a cellular shopfloor with parallel machines and bottlenecks Performance Prediction Li, (2016) 0 Superv. & Reinfor. Learning SVM, QLrn 1 0 To reduce the electricity consumption of a multi-route transportation system, SVM and Q-learning algorithms are proposed. The approach is validated through simulation. Performance Prediction Diaz-Rozo et al., (2017) 0 Unsuper. Learning K-Mean, HC, GMM 0 1 A Cyber-Physical system is described, and 3 clustering algorithms are compared, to group high throughput machining cycle conditions. Performance Prediction Rude et al. (2015) 0 Unsuper. Learning HMM 0 0 An unsupervised Hidden Markov Model, used for recognition of worker activity in manufacturing processes, shows comparable results with supervised techniques, thus reducing the need for labeled data. Performance Prediction Chan et al., (2018) 0 Superv. Learning LASSO, Cluster. 1 0 The aim is to estimate the costs of new jobs, based on historical data and technical features. A model based on dynamic clustering for model selection, coupled with Lasso and/or Elastic Regression is proposed. Performance Prediction Ghadai et al., (2018) 0 Superv. Learning CNN 0 0 Difficult-to-manufacture geometries are predicted with a 3D Convolutional NN. A second method is proposed to explain the causes of non-manufacturability. Performance Prediction Tulsyan et al., (2018) 3 Superv. Learning Gauss. PR 1 0 The paper addresses the “Low-N” problem, relatively to a batch manufacturing process for which scarce historical data are available. The problem is tackled using a multi-dimensional approach based on Gaussian Processes. Performance Prediction Gyulai et al., (2018) 2 Superv. Learning RF, SVM 0 1 Analytical and ML techniques are applied, within a Digital Data Twin, to predict Lead Time. The focus is on flow-shops with frequent changes in customer demand. Frequent retraining and on-line learning are adopted. Performance Prediction Silbernagel et al., (2019) 0 Superv. & Unsuperv. Learning AE, PCA, K- mean 0 0 A Convolutional Autoencoder is used to cluster images of the processing of pure copper in a laser powder bed fusion printer. The quality of each cluster is mapped manually, to the original set of the process parameter. Performance Prediction Stathatos and Vosniakos, (2019) 0 Superv. Learning NN 0 0 Three NNs are used to predict, given a laser trajectory, the evolution of temperature and density. The trajectory is decomposed using a custom method that provides a local description relative to the surroundings. Performance Prediction Agarwal et al. (2019) 0 Superv. &, Unsup. Learning AE, NN, SVM 1 0 The paper presents 2 approaches to find the ranges of process inputs optimizing process productivity and profit. Supervised and unsupervised deep learning techniques are investigated, and the layer-wise relevance propagation algorithm is used to prune the inputs of the NNs. Performance Prediction Jang et al., (2019) 0 Superv. Learning NN 0 0 The paper presents a model to predict the yield of new wafer maps. The approach is based on a deep NN and exploits spatial relationships relative to the positions of dies (on a wafer) and die-level yield variations. Performance Prediction Gurgenc et al., (2019) 0 Superv. Learning NN 0 0 A deep NN is used to estimate the machining times of a CNC milling machine. Design and manufacturing parameters are used as input and the network is trained with an extreme learning machine (ELM), with optimal results. Process Control Chinnam, (2002) 49 Superv. Learning SVM, NN 0 0 NNs and SVM are applied to recognize quality drifts in related and unrelated manufacturing processes. It is shown that even simple linear kernels perform better than Statistical Process Control techniques. Process Control Sun et al., (2004) 70 Superv. Learning NN, SVM 0 0 A tool condition monitoring system, based on acoustic emission sensing, is presented. NN and SVM are used to classify the tool state; the performance evaluation is based on manufacturing loss, due to misclassification. (continued on next page) M. Bertolini et al. Expert Systems With Applications 175 (2021) 114820 18 Table 3c (continued ) Sub-Area Article Rec. Cit. Algorithm Class. Main Alg. Sim. Based Cyber Ph. Sys. Goals & Approach Process Control Shin et al., (2012) 13 Reinfor. Learning AHC 1 0 A fuzzy Reinforcement Learning system is applied for manufacturing control. The agent has the ability of self- regulating in response to the system’s changes. It can also dynamically re-set its goal. Process Control García Nieto et al. (2012) 20 Superv. Learning SVM, NN 0 0 SVM and NN are used to control the manufacturing process of a paper mill. NN and SVM are chosen, given their capabilities to reproduce non-linear relationships among explanatory variables. Process Control Wang et al., (2018) 1 Superv. Learning CNN 0 0 A Convolutional NN is applied for continuous human motion analysis. The aim is to infer human actions and future intentions for human-robot collaboration. Experiments showed a 96% accuracy. Process Control Maggipinto et al., (2018) 0 Supervised Learning CNN 0 0 A Convolutional NNN is employed to avoid the feature extraction phase that is generally needed for image processing in virtual metrology. It is applied to optical emission spectroscopy data. Process Control Mezzogori et al. (2020) 0 Superv. Learning NN 1 0 Deep NN and Linear Regression are used to predict throughput time, given the current system’s state. The system is regulated by Workload Control and the aim is to define reliable due dates reducing the % of tardy jobs. Process Control Zan et al., (2019) 5 Superv. Learning CNN, NN 1 0 A 1-D Convolutional NN is applied to a dataset with 6 control chart patterns. The CNN performance is compared to manual feature extraction methods and a simple NN, showing advantages. Process Control Ma et al., (2019) 4 Reinfor. Learning DDPG 1 0 Deep Deterministic Policy Gradient is applied to chemical process control. Many operating features, such as action boundaries and reward definitions are discussed. Process Control Joswiak et al., (2019) 0 Superv. & Unsuperv. Learning PCA, t-SNE, UMAP 0 0 16 dimensionality reduction techniquesare compared using data sets of chemical plants. UMAP (Uniform Manifold Approximation and Projection) outperforms other methods Process Control Zhang et al., (2019a), Zhang et al. (2019b) 1 Unsuperv. Learning K-mean 0 0 A K-Means with Davies-Bouldin Criterion is used to decompose the surface of additive manufactured parts, to optimize the build orientation dynamically. Process Control Gardner et al., (2019) 2 Superv. Learning NN, GrB. 0 0 The work proposes a combined approach (based on NN and Gradient Boosting) for optimal parameters’ selection depending on the location of a 3D printing process. Process Control Chen et al., (2019), Chen et al. (2019b) 1 Superv. & Unsuperv. Learning NN 0 0 A Deep Neural Network is applied for energy consumption modeling, which usually relies on abundant labeled data. The NN is trained with a semi-supervised approach, to better exploit non-labeled data. An experimental study on furnace energy consumption data is described. Process Control Dornheim et al. (2019) 0 Reinforc. Learning QLrn 1 0 The paper proposes a self-learning optimal control algorithm (based on Q Learning), for manufacturing processes subject to nonlinear dynamics and stochastic influences. It accounts for stochastic variations of the process conditions and can cope with partial observability. Process Control Denkena et al., (2019) 0 Superv. Learning SVM, SA 1 0 The aim is to optimize the operating parameters of a grinding process of helical flutes. The model integrates simulation, SVM, and an optimizer (based on simulated annealing) to fine-tune both the cutting feed and the speed of the grinder. Scheduling Aydin and Oztemel, (2000) 96 Reinforc. Learning QLrn 1 1 An RL agent learns how to select the most appropriate dispatching rule and performs dynamic scheduling based on available information. An extension of the Q-learning algorithm, called Q-III, is also presented. Scheduling Priore et al., (2001) 7 Superv. Learning DT 1 1 Decision Trees are used to identify, the best dispatching rule for flow and job shop systems. Results are good, but many simulation runs are needed to generate training examples. Scheduling Mönch et al., (2006) 40 Superv. Learning DT, NN 1 0 Decision Trees and NN are used to fine-tune a simple heuristic for dispatching rule selection. Data is generated via simulation. Scheduling Priore et al., (2006) 58 Superv. Learning DT, NN, CBR 1 0 Inductive Learning, NNs, and Case-Based Reasoning are compared to find the best dispatching rule. Testes performed via simulation Scheduling Csáji et al., (2006) 19 Reinforc. Learning SA, TD, NN 1 0 Simulated Annealing, Temporal Difference Learning, and NNs are used to solve a dynamic job-shop scheduling problem in a distributed and iterative way. Each machine and job is associated with an agent, which has the role of selecting the best schedule. Scheduling Shiue, (2009) 14 Superv. Learning DT 1 1 A Decision Tree is applied to a two-stage real-time scheduling scenario with a nonstationary product mix: first, a knowledge base class is selected, then a scheduling rule is chosen. Scheduling Gaham and Bouzouia, (2009) 3 Superv. Learning NN, GA 1 0 A Genetic Algorithm is used to solve a flexible job shop scheduling, while 2 NNs are used for machine allocation for priority assignment. Scheduling Priore et al., (2010) 5 Superv. Learning SVM 1 1 SVM is used to find the best dispatching rule. Tests are made with simulated data. (continued on next page) M. Bertolini et al. Expert Systems With Applications 175 (2021) 114820 19 a Decision Tree algorithm, (namely C.4.5) to select the best dispatching rule, depending on the current system state. The work was subsequently extended and refined with the addition of other ML techniques, and it was shown that SVM is the best approach when the objective is the minimization of the mean tardiness and of the mean flow time (Priore et al., 2006, 2010). Similar work was made by Heger et al. (2016), who investigated the application of Gaussian Processes to dynamically readjust the parameters of dispatching rules, based on current shop-floor conditions. As a result, their approach drastically reduced jobs’ mean tardiness. More recently, scientific interest has moved from the selection of the best dispatching rule to the definition of fully automated adaptive and real-time schedulers. Multiple and diverse approaches have been pro- posed, ranging from Self-Organizing Maps (SOM), that autonomously extrapolate the suitable classes which best explain the data (as in Shiue et al., 2011), to hybrid models based on simulation, population-based metaheuristics, and NNs (as in Gaham & Bouzouia, 2009) that make it possible to dynamically regenerate optimal scheduling sequences, anytime certain manufacturing events take place. In some cases, even hybrid models, based on metaheuristics and RL, are used. For instance, Csáji et al. (2006) proposed a combination of a Simulated Annealing, RL, and NN to implement an adaptive iterative distributed scheduling al- gorithm for market-based production, in which each machine and job is seen as an agent and can participate in a bid system with the global aim of minimizing the total production time. Palombarini and Martínez (2012), implemented a Q-Learning algorithm to reschedule jobs anytime an unforeseen event (e.g. arrival of a rush order or breakdown of a working machine) takes place on the shop floor. Many works have also dealt with the problem of ‘Process Control’, a field closely related to ‘Online Quality Control’, in which the objective is that to govern a manufacturing process (regulating and fine-tuning its main operating parameters) so that it behaves as planned with little or none non-conforming situations. Shin et al. (2012) proposed an RL- based approach to build a self-adapting manufacturing system. The RL model, based on two collaborating neural networks, makes the manufacturing system completely autonomous, as it becomes able to set its goals and reconfigures itself to changing environmental conditions. Next, similar work was proposed by Dornheim et al. (2019), who used a Q-Learning method for optimal and automatic control of manufacturing processes characterized by non-linear dynamics. To conclude, we cite the interesting and very recent work by Joswiak et al. (2019) who tackled process control following a more practitioner-oriented approach. In particular, the authors compared different dimensionality reduction algorithms to create a dashboard of meaningful process data aimed to support and to enhance human decision-making. 4.4.4. Supply chain management SCM is the process of planning, controlling, and executing all logistic flows, from the acquisition of raw materials to the delivery of end products, in the most streamlined and cost-effective way. In this sense, SCM encompasses a diversified set of activities that broadly includes: demand planning, sourcing, inventory management, and trans- portations. As shown by the rather limited number of papers included in Table 3d, in terms of ML applications, SCM is not yet a much-explored domain, as already confirmed by the keywords’ analysis of Section 4.3, which revealed that, although SCM is a consolidated field, with a recently increasing trend, it is not a star topic yet. ‘Modelling and Coordination’ is the most studied topic of the SCM area. Generally, a two-stage supply chain with non-stationary demand is considered and a variegated set of performance indicators is optimized using a multi-agent-based simulated scenario. One of the first works of this kind is that of Kim et al. (2008), who used an Action Value-basedRL algorithm to optimize and to compare a centralized and a decentralized supply chain, whose state is described by customer-demand patterns. Chaharsooghi et al. (2008) tested the Q-Learning algorithm using, as a simulative setting, that of the famous Beer Game (Coppini et al. 2010), showing that purchasing orders generated by the RL based decision system greatly reduce the bullwhip effect of the supply chain. More recently, a similar approach was used by Mortazavi et al. (2015) who used Q-Learning to coordinate a four-echelons supply chain with non- stationary demand. Other than using RL based models, some recent papers applied SL approaches to coordinate the supply chain. For instance, Cavalcante et al. (2019) used K-Nearest Neighbours and Lo- gistic Regression for suppliers’ selection, while Priore et al. (2019) used a Decision Tree to dynamically select the best replenishment model for each tier of the supply chain. ‘Demand Forecasts’, a cornerstone of SMC, is the second most considered topic, with applications in different settings and demand Table 3c (continued ) Sub-Area Article Rec. Cit. Algorithm Class. Main Alg. Sim. Based Cyber Ph. Sys. Goals & Approach Scheduling Shiue et al., (2011) 2 Unsuper. Learning SOM 1 1 A Self-Organizing Map is used to select multiple scheduling rules. The SOM outperforms, in the long run, traditional approaches based on a single scheduling rule. Scheduling Palombarini and Martínez, (2012) 9 Reinforc. Learning QLrn 1 1 A Q-Learning system is proposed for adaptive rescheduling, to respond to non-planned events such as new order or equipment failures. Scheduling Shiue et al., (2012) 5 Superv. Learning NN, DT, Bagging SVM, GA 0 1 A real-time scheduling system is proposed. NNs, SVM, and Decision Tree (based on Bagging) are integrated, and a Genetic algorithm is used for feature selection. The approach is evaluated using 10-fold cross-validation. Scheduling Drakaki and Tzionas, (2017) 0 Reinforc. Learning QLrn 1 0 An order-picking scheduling problem is tackled through a Q- learning algorithm (without a Neural Network function approximator) coupled with hierarchical Colored Petri Nets. Scheduling Priore et al., (2018) 2 Superv. Learning Bagging, Boosting 1 1 Bagging, boosting, and stacking methods are tested for dispatching rule selection. Mean tardiness and mean flow time are improved. Scheduling Tan et al., (2019) 1 Reinforc. Learning QLrn 1 1 A Multi-agent reinforcement learning approach for dynamic planning and scheduling is proposed. The focus is on robot assembly lines, to minimize the makespan. Scheduling De Jong et al., (2019) 1 Superv. Learning CNN 1 0 A CNN is proposed for quick and accurate makespan forecast, both for job and shop floor systems. A visual representation of the layout and the system’s state is also provided as an additional input. Scheduling Lin et al., (2019) 2 Reinforc. Learning DQN 1 1 The paper integrates Deep Q-Learning with edge computing to solve complex scheduling problems requiring different dispatching rules M. Bertolini et al. Expert Systems With Applications 175 (2021) 114820 20 Table 3d Logistic and Supply Chain Management Papers. Sub-Area Article Received Citations Algorithm Classification Main Algorithm Simul. Based Goals & Approach Modeling and Coordination Chi et al., (2007) 19 Supervised Learning SVM, GA 1 SVM for regression is compared to a DOE approach to predict 7 performance measures of Vendor Managed Inventory. The optimal input settings are generated using a genetic algorithm. The main benefit of SVM is the possibility to avoid system disruption, as opposed to the DOE approach. Modeling and Coordination Kim et al., (2008) 12 Reinforcement Learning AVL 1 An asynchronous RL agent is used for inventory control in a serial supply chain. Time-varying rewards are used, and the approach is tested either for centralized and decentralized supply chains. Modeling and Coordination Chaharsooghi et al., (2008) 37 Reinforcement Learning QLrn 1 Q-Learning is proposed to coordinate a multi-agent supply chain (with 4 tiers) and to minimize the bullwhip effect. The environment state is described by inventory position, ordering size to the upstream level, and distribution amount at each level and the objective is to minimize the total inventory costs. Modeling and Coordination Zarandi et al. (2012) 2 Reinforcement Learning TD 1 A fuzzy-inference system is used to approximate the value function returned by a Reinforcement Learning approach for inventory control. Specifically, the agent models a supplier and determines the number of orders for each retailer, with supply capacity constraints. Modeling and Coordination Mortazavi et al., (2015) 7 Reinforcement Learning QLrn 1 Q-Learning algorithm is used, in an agent-based simulation of a 4- echelon chain, with non-stationary demand. The objective is to coordinate the ordering processes. The Value-at-Risk methodology is also applied both for risk evaluation and sensitivity analysis. Modeling and Coordination Cavalcante et al., (2019) 0 Supervised Learning LogRT, KNN 1 Simulation and ML are combined for supplier selection in resilient chains. K-nearest neighbors and Logistic Regression are compared for the classification task. Modeling and Coordination Priore et al., (2019) 3 Supervised Learning DT 1 A dynamic framework for automated inventory management is proposed. Specifically, a Decision Tree periodically selects the best inventory model for a node of the supply chain according to its state and the network state. Modeling and Coordination Du and Jiang, (2019) 0 Reinforcement Learning QLrn 1 A multi-agent reinforcement learning approach is proposed. The aim is to optimize the manufacturer’s strategies, in a dynamic supply chain, mitigating the risk of the supplier. The approach is successfully validated in a simulated environment with a single manufacturer and a single supplier. Modelling and Coordination González Rodríguez et al. (2019) 0 Supervised Fuzzy Inf System + Tree 0 The paper proposes a decision support system to coordinate a Closed-Loop Supply Chain in presence of uncertainties. The support system makes use of a Fuzzy Inference Systems, whose rules are automatically generated with a regression tree. One of the main contributions is the ability to limit the impact, on inventories, of imbalances in the rest of the chain. Demand Forecasting Carbonneau et al., (2007) 6 Supervised Learning SVM, NN, RNN 0 ML algorithms are compared to statistical methods for demand forecasting in supply chains. Tests showed that ML techniques are generally outperformed when applied to single feature time- series. The performance of ML rapidly increases using multi- dimensional time-series. Demand Forecasting Villegas et al., (2018) 0 Supervised Learning SVM 0 SVM is used to select the best forecasting model, based on the prediction output of each model. Also, a comprehensive feature selection analysis was carried out. Demand Forecasting Mezzogori and Zammori (2019) 0 Supervised Learning AE RNN 0 An Entity Embedding based neural network is used to learn vector representation of past and current product. The vectorial representations are exploited to trace similarities of the current product to past products, so to build pseudo-time-series, analyzed by an RNN based network to predict the quantity sold for each product at the end of a sales campaign Demand Forecasting Fu and Chien (2019) 0 Supervised Learning KNN, SVM, NN 0 Machine Learning and temporal aggregation mechanism areintegrated to forecast the demand for intermittent products. The proposed framework is tested using the data of a semiconductor distributor. Demand Forecasting Ji et al., (2019) 0 Supervised Learning XGB 0 A novel XGBoost algorithm is proposed and tested against classical ARIMA models, to forecast sales of an e-commerce platform. Demand Forecasting Wu, (2010) 45 Supervised Learning SVM, PSO 0 A hybrid approach based on Particle Swarm Optimization and SVM is tested on a dataset of car sales. The aim is to optimize the reorder points of each tier of the supply chain. Inventory Management Kim et al., (2005) 41 Reinforcement Learning AVL 1 Centralized and non-centralized inventory models are proposed to manage a supply chain with one supplier and multiple retailers. Specifically, an action-value based algorithm is proposed to constantly react to the changes in customers’ demand. Inventory Management Kwon et al., (2008) 7 Reinforcement Learning CBR 1 A case-based RL approach is presented to control inventory (at supply chain scale) in case of non-stationary customer e. Specifically, the case-based reasoning discretizes the state space, thus reducing the number of possible configurations to be learned. Inventory Management Jiang and Sheng, (2009) 37 Reinforcement Learning CBR 1 (continued on next page) M. Bertolini et al. Expert Systems With Applications 175 (2021) 114820 21 patterns. For instance, Carbonneau et al. (2007), compared classical statistical methods against ML and concluded that for multidimensional time-series, the accurateness of the newest methods is evident. Wu (2010) showed that a combination of Particle Swarm Optimization and SVM can be used to optimize reorder points at each level of a supply chain. Villegas et al. (2018) used SVM to select the best forecasting model among a diverse pool of predictive models for non-stationary and/or lumpy demand. Ji et al. (2019) presented a hybrid forecasting model based on XGBoost and Arima and successfully tested it against an e-commerce dataset. Mezzogori and Zammori (2019) integrated an Entity Embedding and a Recurrent Neural Network for demand forecast in the fashion market. Finally, a limited number of works applied RL for ‘Inventory Control’. The first paper in this area is that by Kim et al. (2005), who used an Action Value RL algorithm to optimize centralized and non-centralized inventory models. Kwon et al. (2008) applied a case-based myopic approach to a Vendor Managed Inventory model, to minimize the probabilities of infringements of the contracted service level. Next, Jiang and Sheng (2009) expanded this model to study a simulated multi-agent supply-chain with 80 retailers and 10 customers, in which each tier of the supply chain is modeled as an independent reinforcement learning agent. Lastly, Kara and Dogan (2018) focused on perishable products, with random demand and deterministic lead-time. The aim was to minimize the retailer’s total cost of a retailer, using two alternative in- ventory policies optimized either with Q-Learning and SARSA algo- rithms. Specifically, the latter ensured better results if applied to items with a short lifetime and lumpy demand. 4.4.5. Models’ complexity, Input-Output variables To give a better idea of the complexity and variety of the considered problems, we deemed it useful to include Table 4, which provides some indications concerning the variables that are commonly used per each application domain and sub-area. Specifically, this information is exemplified by the list of the input and output variables of the dataset used by most representative papers, belonging to each investigated sub- area. 4.4.6. Concluding remarks As already mentioned, given the sudden rise of ML and DL applica- tions, the vastness of scientific literature that is being produced can be confounding if not even misleading, both for researchers and practi- tioners aiming to apply these methodologies to specific industrial tasks. Apart from this general criticality, some operational issues, that could hamper the diffusion of ML in the industry, have also emerged from the literature review (i.e., Research Question #5). Generally, problems are related to the data set needed to train the ML models. Indeed, if data are collected directly on the field, issues related to missing, dirty, or even insufficient data are frequently encountered. Apart from the standard ways used for data pre-processing (e.g., imputation using most frequent, zero/constant and k-NN), many papers (see for example Sobie et al., 2018) demonstrated that this problem can be reduced using a training dataset generated using high-resolution simulations or using generative methods, such as Generative Adversarial Networks (Douzas and Bacao, 2018). Also, and perhaps more important, Deep Learning methodologies allow working on almost raw data with little or no need for data pre- Table 4 Overview of dataset characteristics. Area Sub-Area Article Input Variables Output Variable(s) # of samples Maintenance Management Condition Monitoring Saxena and Saad, (2007) 38 statistical features of accelerometer and microphone data Type of fault predicted 1152 Downtime minimization Susto et al., (2015) 125 statistical moments calculated of 31 time series (i.e. current, deceleration, position, pressure). Predicted faultiness class 3671 Failure Analysis Prieto et al., (2013) 25 statistical-time features calculated from vibration signal Prediction of 6 bearing status 120 Quality Management Defect Detection Kusiak and Kurasek, (2001) 14: stencil composition, stencil thickness, …, paste application, position Presence of solder defect 2052 Image Recognition Ravikumar et al., (2011) 20 histogram features of image data 3 different component status 300 Life Cycle Management Jennings et al., (2016) 18 product characteristics (weight, screen resolution, etc.) Prediction of product discontinuity 7000 Online Quality Control Ribeiro, (2005) 26 sensors readings (temperature, pressures, etc.) 6 kind of plastic part fault 200 Production Planning & Control Performance Prediction Arredondo and Martinez, (2010) 4 order type attributes (size, composition, due date, arrival date) Order value N.A. Process Control Sun et al., (2004) 9 cutting conditions (speed, depth, feed rate), and statistics of band power 3 tool states N.A. Scheduling Mönch et al., (2006) 4 batch machine factor, due dates tightness, due date variance, ready time tightness Scheduling look-ahead parameter N.A. Supply Chain Management Modeling and Coordination Priore et al., (2019) 7 firm and supply state variables Replenishment model 2000 Demand Forecasting Mezzogori and Zammori (2019) 26 product attributes Product demand prediction 1020 Inventory Management Kara and Dogan, (2018) 4 state variables measuring product remaining life and inventory position Action value N.A. Table 3d (continued ) Sub-Area Article Received Citations Algorithm Classification Main Algorithm Simul. Based Goals & Approach Under a nonstationary demand simulated scenario, a case-based RL approach is tested, both in a periodical review order-up-to system and in an order-quantity reorder-point system. Inventory Management Kara and Dogan, (2018) 0 Reinforcement Learning QLrn, SARSA 1 The Q-learning algorithm and the Sarsa method are compared for solving an inventory management problem with perishable products. RL shows better results with high variance demand of short lifetime products. M. Bertolini et al.Expert Systems With Applications 175 (2021) 114820 22 processing (Azadeh et al., 2013). Even in the case of very noisy data (especially for signal and/or image processing), data can be optimally denoised using stacked autoencoders, as in Yu et al. (2019). Certainly, on the other one side, Deep Learning techniques and, more, in general, all the NN based approaches, are difficult to be interpreted and could be negatively seen as a black box, by most of the practitioners. However, new and effective techniques, such as the ‘layer-wise relevance propa- gation’ and ‘Grad-cam’, can be effectively used either to interpret a concept learned by a NN or for producing visual explanation for de- cisions made by CNN’s (see, for example, Ayodele and Yussof, 2019; Montavan et al., 2018; Selvaraju et al., 2017). Thus, also considering the extreme flexibility of Deep-learning techniques, and the outstanding results that have been obtained in seemingly unrelated applications, such as Natural Language Processing (Vaswani et al., 2017), their use in operation management is expected to further increase. It is not a wild guess to speculate that, in the next future, deep learning could find its way in many industrial fields where these techniques are still shallowly explored (i.e., Research Question #6). The first evidence comes from SCM, a domain area that, although still little explored, is rapidly growing, thanks to the adoption of Deep and Reinforcement Learning techniques that make it possible to model and optimize complex prob- lems of strategic nature. It is not difficult to predict that a similar approach could be helpful to obtain concrete improvements over state- of-the-art results in traditional industrial problems, such as scheduling and inventory management. Finally, the so-called data unbalancing problem is worth mentioning. This issue is typical for quality and/or defect classification tasks when the objective is to discriminate positive events from negative and rare ones. Also, in this case, standard methods exist, ranging from classical under-sampling (e.g. Near Miss algorithm) and oversampling ap- proaches (e.g. Synthetic Minority Oversampling techniques) to more elaborated techniques based on Competitive NNs (Nugroho et al., 2002). However, as noted by Ko et al. (2017) and by Kim et al. (2018), even the use of ensemble methods (of Random Forest in the simplest case) is frequently enough to overcome this criticality. 5. Conclusions and directions for future works The hype surrounding Machine Learning and Deep Learning algo- rithms is ever-growing and, given recent breakthrough developments, their use has been experiencing a steep increase in many fields. This trend is very marked in the industry, especially in the operation man- agement area, as revealed by the literature analysis herein described. The number of published papers is very large and covers the whole spectrum of operation management. Moreover, all the application do- mains considered in this study show a steady and significant increase in the number of publications (especially in the last two years), thus further demonstrating an ever-growing interest in such applications. Histori- cally, in terms of application domains, studies concerning Maintenance and Quality appeared first, followed by applications in Production Planning and Control and, lastly, in Supply Chain Management. Quality management, as of today, is the most studied topic, probably given its relevance on total sales and, consequently, its quicker return on in- vestment. Recently, the investigated domain has been extended with the introduction of new research fields such as Cyber-physical systems, Additive Manufacturing and, more generally, Industry 4.0. These fields seem to be promising for ML applications, and preliminary results are encouraging. However, this enthusiasm is not certain to be followed up and the initial interest could rapidly fade off, as already happened in other fields. A typical example is ‘Order Management’ that, after an initial boom, is now displaying a rapid decrease in interest. Probably, this is due to the use of boundary algorithms that, although appealing for the academic community, are of scarce interest for industrial practi- tioners. This fact highlights the need to find a trade-off between novelty and industrial applicability; a trade-off that is particular critical espe- cially for ‘young domains’ (or question marks) where, to foster acceptance it may be preferable to leverage on simple and more consolidated techniques, rather than on novel and complex one that, conversely, might even have a detrimental effect. Concerning the adopted techniques, the most explored ones are based on Supervised Learning, closely followed by Unsupervised Learning algorithms, fast-rising especially in the last decade. Rein- forcement Learning methods, given their higher complexities, are still few, but they are also increasing (with a spike in 2018–2019), mainly in the SCM area. Anyhow, this positive trend and the even distribution of ML appli- cations in many different industrial areas confirm the flexibility of ML methodologies and their high potentialities for operation management tasks. It is also important to note that enabling technologies are now mature and that only a few operational problems must still be solved, for the definitive dissemination of these methods. As discussed, most of the problems concern either the generation of meaningful benchmark datasets or the low interpretability of the obtained results. However, as clearly discussed in the paper, thanks to recently emerging techniques, both problems can already be satisfactory solved. Perhaps, the only real problem that still needs to be solved, is to provide practitioners with a proper key to interpret and to choose appropriate ML methods, without getting lost in the vastity of scientific works published in the subject matter. Hence, we hope that this systematic literature review, which classified the existing corpus of works in a structured and operative way, could be of help to solve this problem. For the same reason, a topics’ trend analysis has also been made, aiming to give precious indications on the research areas on which academic researchers should focus, depending on the tasks at hand and the scope of their study. Surely, it can be presumed that more effort should be placed on topics classified as ‘Question Marks’ and ‘Hot Topics’ that, being the youngest and least explored, are the ones where the bigger innovations can be made. On the other hand, if the problem falls within the ‘Consolidated’ or ‘Stars’ category, then the innovation rate will be lower, but a solid corpus of works can be found with precise indications of the implementation strategy. In this regard, we note that the creation and sharing of open datasets (of real industrial data) could be very helpful to further accel- erate the diffusion and acceptance process of ML methods. Indeed, this would allow practitioners to develop, test, and compare new algorithms, leveraging common datasets. Our belief is that many opportunities and potentials are yet to be discovered in the application and/or integration of ML methods to existing operational management techniques. In a certain sense, the adoption and hybridization of standard operation management ap- proaches with ML algorithms could further strengthen the smart manufacturing concept. Just to name a few examples, embedding ML models within discrete event simulations (or in Digital Twins), could exploit the concept of cyber-physical system, boosting operating per- formance and bringing to light new and interesting results. Similarly, Reinforcement Learning techniques should be studied not only for classic ‘hard’ applications in the field of robotics and automation, but also for more ‘soft’ tasks, such as expert systems and/or decision support systems. Other fields worthinvestigating could be the applicability of ML methods in a real-world environment, in terms of computing power and excessive latency. Also, economical assessments of the impact of ML techniques could be helpful to further show the utility of such methods. All these could be interesting topics for future streams of research. To conclude we note that, due to the vastness of the considered domain, our analysis was mainly of explanatory nature. The aim, in fact, was to assess the current diffusion of ML and the potential it offers and/ or it may offer to solve problems typical of the operation management field. We have simply tracked which algorithms are used in which field, without any ambition of establishing which are the best ones. This was not the aim of our works and, frankly speaking, we do not think it is possible to do so as the problems analyzed are so varied and specific that it would be difficult to make a fair comparison. Certainly, by narrowing the field (for instance to one of the sub areas identified in the paper) such M. Bertolini et al. Expert Systems With Applications 175 (2021) 114820 23 a detailed analysis and comparison could be made and would be extremely useful for a further advancement of ML in the industry. Hence, this could be another interesting field for future research. CRediT authorship contribution statement Massimo Bertolini: Conceptualization, Supervision. Davide Mez- zogori: Data curation, Writing - original draft. Mattia Neroni: Visual- ization, Resources. Francesco Zammori: Methodology, Writing - review & editing. Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Appendix A Acronyms of the algorithms cited in the literature review Table A1 Glossary of the Acronyms of the cited algorithms. Acronym Full Name Explanation AHC Adaptive Heuristic Critic Reinforcement Learning Algorithm ANOVA Analysis Of Variance Mean Test among groups AVL Action Value Reinforcement Learning Algorithm ASVM Adaptive Super Vector Machine Adaptive Classification ACO Ant Colony Optimization Metaheuristic for optimization AuNN Autoencoder Neural Network Network for Dimensionality Reduction Bay.R Bayesian Regression Non-Parametric Regression model Boosting Boosting Ensemble learning technique Bagging Bootstrap Aggregating Resampling Technique for Variance Reduction CBR Case Base Reasoning Using past knowledge to solve new problems CNN Convolutional Neural Network Feed Forward Network DT Decision Tree Classification and Regression DDPG Deep Deterministic Policy Gradient Reinforcement Learning Algorithm DQN Deep Q Learning Reinforcement Learning Algorithm GMM Gaussian Mixture Modelling Clustering Gauss. PR Gaussian Process Regression Non-Parametric Regression model GA Genetic Algorithm Metaheuristic Optimization GDE Gaussian Density Estimation Gaussian Distribution estimation method GrB Gradient boosting Ensemble Technique for decision trees HMM Hidden Markov Model Prediction and Prognostic Model HC Hierarchical Clustering Clustering KNN K nearest neighbor Clustering K-PCA Kernel Principal Component Analysis Dimensionality Reduction K-Means K-Means Clustering KMC K-Means/K-Median Clustering LASSO Lasso Regression Regression model LVQ Learning Vector Quantization Classification (labeled data) LDA Linear Discriminant Analysis Classification and Patter recognition LOF Local Outlier Factor Anomaly Detection LWR Locally Weighted Regression Regression LogR Logistic Regression Parametric Regression model (for probabilities) LSTM Long-Short Term Memory Recurrent Neural Network Nbay Naive Bayes Classification Technique NBC Neighbor Based Clustering Clustering NN Neural Network (Multilayers perc.) Standard Feed Forward Network NMF Non-Negative Matrix Factorization Matrix Factorization PSO Particle Swarm Optimization Optimization Metaheuristic PCA Principal Component Analysis Dimensionality Reduction PPO Proximal Policy Optimization Reinforcement Learning Algorithm PW Parzen Windows Unsupervised Density Estimation QLrn Q-Learning Reinforcement Learning Algorithm QDA Quadratic Discriminant Analysis Classification and Patter recognition RnF Random Forest Ensemble of Decision Tree RNN Recurrent Neural Network Neural Networks for time series analysis RBM Restricted Boltzmann Machine Network for Probability Distribution Learning RST Rough Set Algorithm Rule Mining Algorithm SIFT Scale Invariant Feature Transform Computer Vision Feature Detection technique SOM Self-Organizing Maps Network for Dimensionality Reduction SA Simulated Annealing Optimization Heuristic SS Sparse Sensing Signal Processing Technique SDA Stacked Denoising Auto Encoder Network for Dimen. Reduction and data denoising SARSA State–action–reward–state–action Reinforcement Learning Algorithm SVM Super Vector Machine Classification and Regression SVDD Support Vector Data Description Classification technique for unbalanced datasets TS Taboo Search Optimization Heuristic t-SNE t-distributed stochastic neighbor embedding Dimensionality Reduction TD Temporal Difference Learning Reinforcement Learning Algorithm TL Transfer Learning Storing past learning to solve new problems TRPO Trust Region Policy Optimization Reinforcement Learning Algorithm UMAP Uniform Manifold Approx. and Projection Dimensionality Reduction VPF Variable Picket Fence Harmonic Analysis XGB XGBoost Parallel Tree Gradient Boosting technique M. Bertolini et al. Expert Systems With Applications 175 (2021) 114820 24 Appendix B Bibliometric analysis Some additional statistic, of bibliosmetric nature, are reported herein. Specifically, the following figures and Table B1a–B1d show: - the top journals, evaluated both in terms of number of publications and number of obtained citations, - the most cited authors, for each application domain. Figs. B1 and B2 Fig. B2. Top 25 Journals measured in number of received citations. Fig. B1. Top ten Journals measured in terms of number of published papers. M. Bertolini et al. Expert Systems With Applications 175 (2021) 114820 25 References Adly, F., Alhussein, O., Yoo, P. D., Al-Hammadi, Y., Taha, K., Muhaidat, S., … Ismail, M. (2015). Simplified subspaced regression network for identification of defect patterns in semiconductor wafer maps. IEEE Transactions on Industrial Informatics, 11(6), 1267–1276. https://doi.org/10.1109/TII.2015.2481719 Agarwal, P., Tamer, M., Sahraei, M. H., & Budman, H. (2019). Deep learning for classification of profit-based operating regions in industrial processes. Industrial & Engineering Chemistry Research, 59(6), 2378–2395. https://doi.org/10.1021/acs. iecr.9b04737 Alegeh, N., Shagluf, A., Longstaff, A. P., & Fletcher, S. (2019). Accuracy in detecting failure in Ballscrew assessment towards machine tool servitization. International Journal of Mechanical Engineering and Robotics Research, 8(5), 667–673. https://doi. org/10.18178/ijmerr.8.5.667-673 Arredondo, F., & Martinez, E. (2010). Learning and adaptation of a policy for dynamic order acceptance in make-to-order manufacturing. Computers & Industrial Engineering, 58(1), 70–83. https://doi.org/10.1016/j.cie.2009.08.005Aydin, M. E., & Oztemel, E. (2000). Dynamic job-shop scheduling using reinforcement learning agents. Robotics and Autonomous Systems, 33(2–3), 169–178. https://doi. org/10.1016/s0921-8890(00)00087-7 Ayodele, O. O., & Yussof, N. (2019). Explainable deep learning: Methods and challenges. Journal of Advanced Research in Dynamical and Control Systems, 11(8), 1186–1205. Azadeh, A., Saberi, M., Kazem, A., Ebrahimipour, V., Nourmohammadzadeh, A., & Saberi, Z. (2013). A flexible algorithm for fault diagnosis in a centrifugal pump with corrupted data and noise based on ANN and support vector machine with hyper- parameters optimization. Applied Soft Computing, 13(3), 1478–1485. https://doi.org/ 10.1016/j.asoc.2012.06.020 Bengio, Y., Courville, A. & Vincent, P. (2013) Representation Learning: a review and new persectives, IEEE Transaction on Patter Analysis and Machine Intelligence, 35(8), 1798–828. https://doi.org/10.1109/TPAMI.2013.50. Table B1a Most cited authors in Maintenance Management domain. Authors # Papers # Citations Abhinav Saxena 1 116 Ashraf Saad 1 116 Satish C. Sharma 1 100 Antonio Garcia Espinosa 1 95 Giansalvo Cirrincione 1 95 Humberto Henao 1 95 Juan Antonio Ortega 1 95 Miguel Delgado Prieto 1 95 Arzu Onar 1 62 Nandita Kaundinya 1 62 Table B1b Most cited authors in Production Planning and Control domain. Authors # Papers # Citations Ercan Oztemel 1 96 M. Emin Aydin 1 96 Javier Puente 4 72 Paolo Priore 4 72 David De La Fuente 2 65 José Parreño 2 63 Ratna Babu Chinnam 1 49 Jens Zimmermann 1 40 Lars Moench 1 40 Peter Otto 1 40 Table B1c Most cited authors in Quality Management domain. Authors # Papers # Citations Sami Ekici 1 48 Ulaş Çaydaş 1 48 Bernardete Ribeiro 1 47 Kusiak & Kurasek 1 47 Fangming Ye 1 36 Krishnendu Chakrabarty 2 36 Xinli Gu 1 36 Zhaobo Zhang 1 36 Linkan Bian 2 19 Mark A. Tschopp 2 19 Table B1d Most cited authors in Supply Chain Management domain. Authors # Papers # Citations Qi Wu 1 45 Chengzhi Jiang 1 37 Jafar Heydari 1 37 S. Hessameddin Zegordi 1 37 S. Kamal Chaharsooghi 1 37 Zhaohan Sheng 1 37 Chang Ouk Kim 3 19 Herbert Moskowitz 1 19 Hoi-Ming Chi 1 19 Ick-Hyun Kwon 2 19 M. Bertolini et al. https://doi.org/10.1109/TII.2015.2481719 https://doi.org/10.1021/acs.iecr.9b04737 https://doi.org/10.1021/acs.iecr.9b04737 https://doi.org/10.18178/ijmerr.8.5.667-673 https://doi.org/10.18178/ijmerr.8.5.667-673 https://doi.org/10.1016/j.cie.2009.08.005 https://doi.org/10.1016/s0921-8890(00)00087-7 https://doi.org/10.1016/s0921-8890(00)00087-7 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0030 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0030 https://doi.org/10.1016/j.asoc.2012.06.020 https://doi.org/10.1016/j.asoc.2012.06.020 Expert Systems With Applications 175 (2021) 114820 26 Bukkapatnam, S. T. S., Afrin, K., Dave, D., & Kumara, S. R. T. (2019). Machine learning and AI for long-term fault prognosis in complex manufacturing systems. CIRP Annals, 68(1), 459–462. https://doi.org/10.1016/j.cirp.2019.04.104 Burton, B., & Barnes, H. (2017). Hype Cycles Highlight Enterprise and Ecosystem Digital Disruptions: A Gartner Trend Insight. Caggiano, A., Zhang, J., Alfieri, V., Caiazzo, F., Gao, R., & Teti, R. (2019). Machine learning-based image processing for on-line defect recognition in additive manufacturing. CIRP Annals, 68(1), 451–454. https://doi.org/10.1016/j. cirp.2019.03.021 Carbonneau, R., Vahidov, R., & Laframboise, K. (2007). Machine learning-based demand forecasting in supply chains. International Journal of Intelligent Information Technologies, 3(4), 40–57. https://doi.org/10.4018/jiit.2007100103. Carvajal Soto, J. A., Tavakolizadeh, F., & Gyulai, D. (2019). An online machine learning framework for early detection of product failures in an Industry 4.0 context. International Journal of Computer Integrated Manufacturing, 32(4–5), 452–465. https://doi.org/10.1080/0951192x.2019.1571238 Cavalcante, I. M., Frazzon, E. M., Forcellini, F. A., & Ivanov, D. (2019). A supervised machine learning approach to data-driven simulation of resilient supplier selection in digital manufacturing. International Journal of Information Management, 49, 86–97. https://doi.org/10.1016/j.ijinfomgt.2019.03.004 Çaydaş, U., & Ekici, S. (2010). Support vector machines models for surface roughness prediction in CNC turning of AISI 304 austenitic stainless steel. Journal of Intelligent Manufacturing, 23(3), 639–650. https://doi.org/10.1007/s10845-010-0415-2 Chaharsooghi, S. K., Heydari, J., & Zegordi, S. H. (2008). A reinforcement learning model for supply chain ordering management: An application to the beer game. Decision Support Systems, 45(4), 949–959. https://doi.org/10.1016/j.dss.2008.03.007 Chan, S. L., Lu, Y., & Wang, Y. (2018). Data-driven cost estimation for additive manufacturing in cybermanufacturing. Journal of Manufacturing Systems, 46, 115–126. https://doi.org/10.1016/j.jmsy.2017.12.001 Chen, C., Liu, Y., Kumar, M., Qin, J., & Ren, Y. (2019a). Energy consumption modelling using deep learning embedded semi-supervised learning. Computers & Industrial Engineering, 135, 757–765. https://doi.org/10.1016/j.cie.2019.06.052 Chen, Y.-J., Fan, C.-Y., & Chang, K.-H. (2016). Manufacturing intelligence for reducing false alarm of defect classification by integrating similarity matching approach in CMOS image sensor manufacturing. Computers & Industrial Engineering, 99, 465–473. https://doi.org/10.1016/j.cie.2016.05.009 Chen, Y., Chen, B., Yao, Y., Tan, C., & Feng, J. (2019b). A spectroscopic method based on support vector machine and artificial neural network for fiber laser welding defects detection and classification. NDT & E International, 108, 102176. https://doi.org/ 10.1016/j.ndteint.2019.102176 Chi, H.-M., Ersoy, O. K., Moskowitz, H., & Ward, J. (2007). Modeling and optimizing a vendor managed replenishment system using machine learning and genetic algorithms. European Journal of Operational Research, 180(1), 174–193. https://doi. org/10.1016/j.ejor.2006.03.040 Chinnam, R. B. (2002). Support vector machines for recognizing shifts in correlated and other manufacturing processes. International Journal of Production Research, 40(17), 4449–4466. https://doi.org/10.1080/00207540210152920 Cho, S., Asfour, S., Onar, A., & Kaundinya, N. (2005). Tool breakage detection using support vector machine learning in a milling process. International Journal of Machine Tools and Manufacture, 45(3), 241–249. https://doi.org/10.1016/j. ijmachtools.2004.08.016 Cholette, M. E., Borghesani, P., Gialleonardo, E. D., & Braghin, F. (2017). Using support vector machines for the computationally efficient identification of acceptable design parameters in computer-aided engineering applications. Expert Systems with Applications, 81, 39–52. https://doi.org/10.1016/j.eswa.2017.03.050 Coppini, M., Rossignoli, C., Rossi, T., & Strozzi, F. (2010). Bullwhip effect and inventory oscillations analysis using the beer game model. International journal of production Research, 48(13), 3943–3956. https://doi.org/10.1080/00207540902896204 Coronado, A.E., Lyons, A.C., Kehoe, D.F. and Coleman, J. (2007) Enabling mass customization: extending build-to-order concepts to supply chains. Production Planning and Control, 15(4) Special Issue Mass Customisation, 398–411. https://doi. org/10.1080/0953728042000238809. Csáji, B. C., Monostori, L., & Kádár, B. (2006). Reinforcement learning in a distributed market-based production control system. Advanced Engineering Informatics, 20(3), 279–288. https://doi.org/10.1016/j.aei.2006.01.001 De Jong, A. W., Rubrico, J. I. U., Adachi, M., Nakamura, T., & Ota, J. (2019). A generalised makespan estimationfor shop scheduling problems, using visual data and a convolutional neural network. International Journal of Computer Integrated Manufacturing, 32(6), 559–568. https://doi.org/10.1080/0951192x.2019.1599430 Delgoshaei, A., & Gomes, C. (2016). A multi-layer perceptron for scheduling cellular manufacturing systems in the presence of unreliable machines and uncertain cost. Applied Soft Computing, 49, 27–55. https://doi.org/10.1016/j.asoc.2016.06.025 Denkena, B., Dittrich, M.-A., Böß, V., Wichmann, M., & Friebe, S. (2019). Self-optimizing process planning for helical flute grinding. Production Engineering, 13(5), 599–606. https://doi.org/10.1007/s11740-019-00908-0 Diaz-Rozo, J., Bielza, C., & Larrañaga, P. (2017). Machine learning-based CPS for clustering high throughput machining cycle conditions. Procedia Manufacturing, 10, 997–1008. https://doi.org/10.1016/j.promfg.2017.07.091 Doltsinis, S., Ferreira, P., & Lohse, N. (2012). Reinforcement Learning for production ramp-up: A Q-batch learning approach. In 2012 11th international conference on machine learning and applications. https://doi.org/10.1109/icmla.2012.113 Dornheim, J., Link, N., & Gumbsch, P. (2019). Model-free adaptive optimal control of episodic fixed-horizon manufacturing processes using reinforcement learning. International Journal of Control, Automation and Systems., 18(6), 1593–1604. https:// doi.org/10.1007/s12555-019-0120-7 Douzas, G., & Bacao, F. (2018). Effective data generation for imbalanced learning using conditional generative adversarial networks. Expert Systems with applications, 91, 464–471. https://doi.org/10.1016/j.eswa.2017.09.030 Drakaki, M., & Tzionas, P. (2017). Manufacturing scheduling using colored petri nets and reinforcement learning. Applied Sciences, 7(2), 136. https://doi.org/10.3390/ app7020136 Du, H., & Jiang, Y.e. (2019). Backup or reliability improvement strategy for a manufacturer facing heterogeneous consumers in a dynamic supply chain. IEEE Access, 7, 50419–50430. https://doi.org/10.1109/Access.628763910.1109/ ACCESS.2019.2911620 Duan, Q., Zeng, J., Chakrabarty, K., & Dispoto, G. (2015). Data-driven optimization of order admission policies in a digital print factory. ACM Transactions on Design Automation of Electronic Systems, 20(2), 1–25. https://doi.org/10.1145/2699836 El-Bendary, N., El Hariri, E., Hassanien, A. E., & Badr, A. (2015). Using machine learning techniques for evaluating tomato ripeness. Expert Systems with Applications, 42(4), 1892–1905. https://doi.org/10.1016/j.eswa.2014.09.057 Ye, F., Zhang, Z., Chakrabarty, K., & Xinli, G.u. (2013). Board-level functional fault diagnosis using artificial neural networks, support-vector machines, and weighted- majority voting. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 32(5), 723–736. https://doi.org/10.1109/tcad.2012.2234827 Fu, W., & Chien, C.-F. (2019). UNISON data-driven intermittent demand forecast framework to empower supply chain resilience and an empirical study in electronics distribution. Computers & Industrial Engineering, 135, 940–949. https://doi.org/ 10.1016/j.cie.2019.07.002 Gaham, M., & Bouzouia, B. (2009). Intelligent product-driven manufacturing control: A mixed genetic algorithms and machine learning approach to product intelligence synthesis. In 2009 XXII International Symposium on Information, Communication and Automation Technologies. https://doi.org/10.1109/icat.2009.5348452 Gao, B., Woo, W. L., Tian, G. Y., & Zhang, H. (2016). Unsupervised diagnostic and monitoring of defects using waveguide imaging with adaptive sparse representation. IEEE Transactions on Industrial Informatics, 12(1), 405–416. https://doi.org/10.1109/ tii.2015.2492924 García Nieto, P. J., Martínez Torres, J., Araújo Fernández, M., & Ordóñez Galán, C. (2012). Support vector machines and neural networks used to evaluate paper manufactured using Eucalyptus globulus. Applied Mathematical Modelling, 36(12), 6137–6145. https://doi.org/10.1016/j.apm.2012.02.016 Gardner, J. M., Hunt, K. A., Ebel, A. B., Rose, E. S., Zylich, S. C., Jensen, B. D., … Sauti, G. (2019). Machines as craftsmen: Localized parameter setting optimization for fused filament fabrication 3D printing. Advanced Materials Technologies, 4(3), 1800653. https://doi.org/10.1002/admt.v4.310.1002/admt.201800653 Gareth, J., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning. New York: Springer. Ghadai, S., Balu, A., Sarkar, S., & Krishnamurthy, A. (2018). Learning localized features in 3D CAD models for manufacturability analysis of drilled holes. Computer Aided Geometric Design, 62, 263–275. https://doi.org/10.1016/j.cagd.2018.03.024 Gobert, C., Reutzel, E. W., Petrich, J., Nassar, A. R., & Phoha, S. (2018). Application of supervised machine learning for defect detection during metallic powder bed fusion additive manufacturing using high resolution imaging. Additive Manufacturing, 21, 517–528. https://doi.org/10.1016/j.addma.2018.04.005 González Rodríguez, G., Gonzalez-Cava, J. M., & Méndez Pérez, J. A. (2019). An intelligent decision support system for production planning based on machine learning. Journal of Intelligent Manufacturing, 31(5), 1257–1273. https://doi.org/ 10.1007/s10845-019-01510-y Gurgenc, T., Ucar, F., Korkmaz, D., Ozel, C., & Ortac, Y. (2019). A study on the extreme learning machine based prediction of machining times of the cycloidal gears in CNC milling machines. Production Engineering, 13(6), 635–647. https://doi.org/10.1007/ s11740-019-00923-1 Gyulai, D., Pfeiffer, A., Nick, G., Gallina, V., Sihn, W., & Monostori, L. (2018). Lead time prediction in a flow-shop environment with analytical and machine learning approaches. IFAC-PapersOnLine, 51(11), 1029–1034. https://doi.org/10.1016/j. ifacol.2018.08.472 Li, H. (2016). An approach to improve flexible manufacturing systems with machine learning algorithms. In IECON 2016–42nd annual conference of the IEEE industrial electronics society. https://doi.org/10.1109/iecon.2016.7793838 Harding, J., Shahbaz, M., & Kusiak, A. (2006). Data mining in manufacturing: A review. Journal of Manufacturing Science and Engineering, 128(4), 969–976. https://doi.org/ 10.1115/1.2194554 Heger, J., Branke, J., Hildebrandt, T., & Scholz-Reiter, B. (2016). Dynamic adjustment of dispatching rule parameters in flow shops with sequence-dependent set-up times. International Journal of Production Research, 54(22), 6812–6824. https://doi.org/ 10.1080/00207543.2016.1178406 Hesser, D. F., & Markert, B. (2019). Tool wear monitoring of a retrofitted CNC milling machine using artificial neural networks. Manufacturing Letters, 19, 1–4. https://doi. org/10.1016/j.mfglet.2018.11.001 Imoto, K., Nakai, T., Ike, T., Haruki, K., & Sato, Y. (2019). A CNN-based transfer learning method for defect classification in semiconductor manufacturing. IEEE Transactions on Semiconductor Manufacturing, 32(4), 455–459. https://doi.org/10.1109/ TSM.2019.2941752 Iqbal, R., Maniak, T., Doctor, F., & Karyotis, C. (2019). Fault detection and isolation in industrial processes using deep learning approaches. IEEE Transactions on Industrial Informatics, 15(5), 3077–3084. https://doi.org/10.1109/TII.2019.2902274 Jang, S.-J., Kim, J.-S., Kim, T.-W., Lee, H.-J., & Ko, S. (2019). A wafer map yield prediction based on machine learning for productivity enhancement. IEEE Transactions on Semiconductor Manufacturing, 32(4), 400–407. https://doi.org/ 10.1109/TSM.2019.2945482 M. Bertolini et al. https://doi.org/10.1016/j.cirp.2019.04.104 https://doi.org/10.1016/j.cirp.2019.03.021 https://doi.org/10.1016/j.cirp.2019.03.021 https://doi.org/10.1080/0951192x.2019.1571238 https://doi.org/10.1016/j.ijinfomgt.2019.03.004https://doi.org/10.1007/s10845-010-0415-2 https://doi.org/10.1016/j.dss.2008.03.007 https://doi.org/10.1016/j.jmsy.2017.12.001 https://doi.org/10.1016/j.cie.2019.06.052 https://doi.org/10.1016/j.cie.2016.05.009 https://doi.org/10.1016/j.ndteint.2019.102176 https://doi.org/10.1016/j.ndteint.2019.102176 https://doi.org/10.1016/j.ejor.2006.03.040 https://doi.org/10.1016/j.ejor.2006.03.040 https://doi.org/10.1080/00207540210152920 https://doi.org/10.1016/j.ijmachtools.2004.08.016 https://doi.org/10.1016/j.ijmachtools.2004.08.016 https://doi.org/10.1016/j.eswa.2017.03.050 https://doi.org/10.1080/00207540902896204 https://doi.org/10.1016/j.aei.2006.01.001 https://doi.org/10.1080/0951192x.2019.1599430 https://doi.org/10.1016/j.asoc.2016.06.025 https://doi.org/10.1007/s11740-019-00908-0 https://doi.org/10.1016/j.promfg.2017.07.091 https://doi.org/10.1109/icmla.2012.113 https://doi.org/10.1007/s12555-019-0120-7 https://doi.org/10.1007/s12555-019-0120-7 https://doi.org/10.1016/j.eswa.2017.09.030 https://doi.org/10.3390/app7020136 https://doi.org/10.3390/app7020136 https://doi.org/10.1109/Access.628763910.1109/ACCESS.2019.2911620 https://doi.org/10.1109/Access.628763910.1109/ACCESS.2019.2911620 https://doi.org/10.1145/2699836 https://doi.org/10.1016/j.eswa.2014.09.057 https://doi.org/10.1109/tcad.2012.2234827 https://doi.org/10.1016/j.cie.2019.07.002 https://doi.org/10.1016/j.cie.2019.07.002 https://doi.org/10.1109/icat.2009.5348452 https://doi.org/10.1109/tii.2015.2492924 https://doi.org/10.1109/tii.2015.2492924 https://doi.org/10.1016/j.apm.2012.02.016 https://doi.org/10.1002/admt.v4.310.1002/admt.201800653 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0225 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0225 https://doi.org/10.1016/j.cagd.2018.03.024 https://doi.org/10.1016/j.addma.2018.04.005 https://doi.org/10.1007/s10845-019-01510-y https://doi.org/10.1007/s10845-019-01510-y https://doi.org/10.1007/s11740-019-00923-1 https://doi.org/10.1007/s11740-019-00923-1 https://doi.org/10.1016/j.ifacol.2018.08.472 https://doi.org/10.1016/j.ifacol.2018.08.472 https://doi.org/10.1109/iecon.2016.7793838 https://doi.org/10.1115/1.2194554 https://doi.org/10.1115/1.2194554 https://doi.org/10.1080/00207543.2016.1178406 https://doi.org/10.1080/00207543.2016.1178406 https://doi.org/10.1016/j.mfglet.2018.11.001 https://doi.org/10.1016/j.mfglet.2018.11.001 https://doi.org/10.1109/TSM.2019.2941752 https://doi.org/10.1109/TSM.2019.2941752 https://doi.org/10.1109/TII.2019.2902274 https://doi.org/10.1109/TSM.2019.2945482 https://doi.org/10.1109/TSM.2019.2945482 Expert Systems With Applications 175 (2021) 114820 27 Jennings, C., Wu, D., & Terpenny, J. (2016). Forecasting obsolescence risk and product life cycle with machine learning. IEEE Transactions on Components, Packaging and Manufacturing Technology, 6(9), 1428–1439. https://doi.org/10.1109/ tcpmt.2016.2589206 Ji, S., Wang, X., Zhao, W., & Guo, D. (2019). An application of a three-stage XGBoost- based model to sales forecasting of a cross-border e-commerce enterprise. Mathematical Problems in Engineering, 2019, 1–15. https://doi.org/10.1155/2019/ 8503252 Jiang, C., & Sheng, Z. (2009). Case-based reinforcement learning for dynamic inventory control in a multi-agent supply-chain system. Expert Systems with Applications, 36(3), 6520–6526. https://doi.org/10.1016/j.eswa.2008.07.036 Joswiak, M., Peng, Y., Castillo, I., & Chiang, L. H. (2019). Dimensionality reduction for visualizing industrial chemical process data. Control Engineering Practice, 93, 104189. https://doi.org/10.1016/j.conengprac.2019.104189 Kammerer, K., Hoppenstedt, B., Pryss, R., Stökler, S., Allgaier, J., & Reichert, M. (2019). Anomaly detections for manufacturing systems based on sensor data-insights into two challenging real-world production settings. Sensors, 19(24), 5370. https://doi. org/10.3390/s19245370 Kankar, P. K., Sharma, S. C., & Harsha, S. P. (2011). Fault diagnosis of ball bearings using machine learning methods. Expert Systems with Applications, 38(3), 1876–1886. https://doi.org/10.1016/j.eswa.2010.07.119 Kara, A., & Dogan, I. (2018). Reinforcement learning approaches for specifying ordering policies of perishable inventory systems. Expert Systems with Applications, 91, 150–158. https://doi.org/10.1016/j.eswa.2017.08.046 Khanzadeh, M., Rao, P., Jafari-Marandi, R., Smith, B. K., Tschopp, M. A., & Bian, L. (2017). Quantifying geometric accuracy with unsupervised machine learning: using self-organizing map on fused filament fabrication additive manufacturing parts. Journal of Manufacturing Science and Engineering, 140(3). https://doi.org/10.1115/ 1.4038598 Khanzadeh, M., Chowdhury, S., Marufuzzaman, M., Tschopp, M. A., & Bian, L. (2018). Porosity prediction: Supervised-learning of thermal history for direct laser deposition. Journal of Manufacturing Systems, 47, 69–82. https://doi.org/10.1016/j. jmsy.2018.04.001 Kim, A., Oh, K., Jung, J.-Y., & Kim, B. (2018). Imbalanced classification of manufacturing quality conditions using cost-sensitive decision tree ensembles. International Journal of Computer Integrated Manufacturing, 31(8), 701–717. https://doi.org/10.1080/ 0951192x.2017.1407447 Kim, C. O., Jun, J., Baek, J. K., Smith, R. L., & Kim, Y. D. (2005). Adaptive inventory control models for supply chain management. The International Journal of Advanced Manufacturing Technology, 26(9–10), 1184–1192. https://doi.org/10.1007/s00170- 004-2069-8 Kim, C. O., Kwon, I.-H., & Baek, J.-G. (2008). Asynchronous action-reward learning for nonstationary serial supply chain inventory control. Applied Intelligence, 28(1), 1–16. https://doi.org/10.1007/s10489-007-0038-2 Kim, D., Kang, P., Cho, S., Lee, H., & Doh, S. (2012). Machine learning-based novelty detection for faulty wafer detection in semiconductor manufacturing. Expert Systems with Applications, 39(4), 4075–4083. https://doi.org/10.1016/j.eswa.2011.09.088 Kim, D., & Kang, S. (2019). Effect of irrelevant variables on faulty wafer detection in semiconductor manufacturing. Energies, 12(13), 2530. https://doi.org/10.3390/ en12132530 Ko, T., Lee, J. H., Cho, H., Cho, S., Lee, W., & Lee, M. (2017). Machine learning-based anomaly detection via integration of manufacturing, inspection and after-sales service data. Industrial Management & Data Systems, 117(5), 927–945. https://doi. org/10.1108/imds-06-2016-0195 Korinek, A., & Stiglitz, J. E. (2017). Artificial intelligence and its implications for income distribution and unemployment (No. w24174). National Bureau of Economic Research. https://doi.org/ 10.3386/w24174. Kuhnle, A., Jakubik, J., & Lanza, G. (2018). Reinforcement learning for opportunistic maintenance optimization. Production Engineering, 13(1), 33–41. https://doi.org/ 10.1007/s11740-018-0855-7 Kusiak, A., & Kurasek, C. (2001). Data mining of printed-circuit board defects. IEEE Transactions on Robotics and Automation, 17(2), 191–196. https://doi.org/10.1109/ 70.928564 Kusiak, A. (2017). Smart manufacturing must embrace big data. Nature, 544(7648), 23–25. https://doi.org/10.1038/544023a Kusiak, A. (2018). Smart manufacturing. International Journal of Production Research, 56 (1–2), 508–517. https://doi.org/10.1080/00207543.2017.1351644 Kwon, I., Kim, C., Jun, J., & Lee, J. (2008). Case-based myopic reinforcement learning for satisfying target service level in supply chain. Expert Systems with Applications, 35 (1–2), 389–397. https://doi.org/10.1016/j.eswa.2007.07.002 LaValle, S., Lesser, E., Shockley, R., Hopkins, M., & Kruschwitz, N. (2011). Big data, analytics and the path from insights to value. MIT Sloan Management Review, 52(2), 21–31. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539 Lee, T., Lee, K. B., & Kim, C. O. (2016). Performance of machine learning algorithms for class-imbalanced process fault detection problems. IEEE Transactions on Semiconductor Manufacturing,29(4), 436–445. https://doi.org/10.1109/ tsm.2016.2602226 Lenz, B., Barak, B., Muhrwald, J., Leicht, C., & Lenz, B. (2013). Virtual metrology in semiconductor manufacturing by means of predictive machine learning models. In 2013 12th international conference on machine learning and applications. https://doi. org/10.1109/icmla.2013.186 Li, S., Liu, G., Tang, X., Lu, J., & Hu, J. (2017). An ensemble deep convolutional neural network model with improved D-S evidence fusion for bearing fault diagnosis. Sensors, 17(8), 1729. https://doi.org/10.3390/s17081729 Li, Y. (2017). Deep reinforcement learning: An overview. arXiv preprint arXiv: 1701.07274. Li, Z., Liu, R., & Wu, D. (2019). Data-driven smart manufacturing: Tool wear monitoring with audio signals and machine learning. Journal of Manufacturing Processes, 48, 66–76. https://doi.org/10.1016/j.jmapro.2019.10.020 Liker, J. K. (2004). The Toyota Way: 14 management principles from the world’s greatest manufacturer. New York: McGraw-Hill. Lin, C.-C., Deng, D.-J., Chih, Y.-L., & Chiu, H.-T. (2019). Smart manufacturing scheduling with edge computing using multiclass deep Q network. IEEE Transactions on Industrial Informatics, 15(7), 4276–4284. https://doi.org/10.1109/TII.942410.1109/ TII.2019.2908210 Lin, S.-Y., Guh, R.-S., & Shiue, Y.-R. (2011). Effective recognition of control chart patterns in autocorrelated data using a support vector machine based approach. Computers & Industrial Engineering, 61(4), 1123–1134. https://doi.org/10.1016/j. cie.2011.06.025 Liu, J., An, Y., Dou, R., & Ji, H. (2018a). Dynamic deep learning algorithm based on incremental compensation for fault diagnosis model. International Journal of Computational Intelligence Systems, 11(1), 846. https://doi.org/10.2991/ijcis.11.1.64 Liu, J., Hu, Y., Wu, B., & Wang, Y. (2018b). An improved fault diagnosis approach for FDM process with acoustic emission. Journal of Manufacturing Processes, 35, 570–579. https://doi.org/10.1016/j.jmapro.2018.08.038 Liu, Z., Jia, Z., Vong, C.-M., Bu, S., Han, J., & Tang, X. (2017). Capturing high- discriminative fault features for electronics-rich analog system via deep learning. IEEE Transactions on Industrial Informatics, 13(3), 1213–1226. https://doi.org/ 10.1109/tii.2017.2690940 Loyer, J.-L., Henriques, E., Fontul, M., & Wiseall, S. (2016). Comparison of Machine Learning methods applied to the estimation of manufacturing cost of jet engine components. International Journal of Production Economics, 178, 109–119. https:// doi.org/10.1016/j.ijpe.2016.05.006 Lu, Y. (2017). Industry 4.0: A survey on technologies, applications and open research issues. Journal of Industrial Integration Information, 6, 1–10. https://doi.org/10.1016/ j.jii.2017.04.005 Ma, Y., Zhu, W., Benton, M. G., & Romagnoli, J. (2019). Continuous control of a polymerization system with deep reinforcement learning. Journal of Process Control, 75, 40–47. https://doi.org/10.1016/j.jprocont.2018.11.004 Maggipinto, M., Terzi, M., Masiero, C., Beghi, A., & Susto, G. A. (2018). A computer vision-inspired deep learning architecture for virtual metrology modeling with 2- dimensional data. IEEE Transactions on Semiconductor Manufacturing, 31(3), 376–384. https://doi.org/10.1109/tsm.2018.2849206 Manyika, J., Lund, S., Chui, M., Bughin, J., Woetzel, J., Batra, P., Ko, R., & Sanghvi, S. (2017). Job lost job gained: what the future of work will mean for jobs, skills and wages. McKinsey report, access on line at: https://www.mckinsey.com/featured- insights/future-of-work/jobs-lost-jobs-gained-what-the-future-of-work-will-mean- for-jobs-skills-and-wages#. Manohar, K., Hogan, T., Buttrick, J., Banerjee, A. G., Kutz, J. N., & Brunton, S. L. (2018). Predicting shim gaps in aircraft assembly with machine learning and sparse sensing. Journal of Manufacturing Systems, 48, 87–95. https://doi.org/10.1016/j. jmsy.2018.01.011 Martínez-Díaz, M., & Soriguera, F. (2018). Autonomous vehicles: Theoretical and practical challenges. Transportation Research Procedia, 33, 275–282. https://doi.org/ 10.1016/j.trpro.2018.10.103 McAfee, A., Brynjolfsson, E., & Davenport, T. (2012). Big data: The management revolution. Harvard Business Review, 90(10), 60–68. Meidan, Y., Lerner, B., Rabinowitz, G., & Hassoun, M. (2011). Cycle-time key factor identification and prediction in semiconductor manufacturing using machine learning and data mining. IEEE Transactions on Semiconductor Manufacturing, 24(2), 237–248. https://doi.org/10.1109/tsm.2011.2118775 Mezzogori, D., & Zammori, F. (2019). An entity embeddings deep learning approach for demand forecast of highly differentiated products. Procedia Manufacturing, 39, 1793–1800. https://doi.org/10.1016/j.promfg.2020.01.260 Mezzogori, D., Romagnoli, G., & Zammori, F. (2020). Defining accurate delivery dates in make to order job-shops managed by workload control. Flexible Services and Manufacturing Journal, 1–36. https://doi.org/10.1007/s10696-020-09396-2 Mittal, S., Khan, M., Romero, D., & Wuest, T. (2016). Smart manufacturing: Characteristics and technologies. In International conference on product lifecycle management (pp. 539–548). Cham: Springer, 10.1007/978-3-319-54660-5_48. Mohammadi, P., & Wang, Z. J. (2016). Machine learning for quality prediction in abrasion-resistant material manufacturing process. In 2016 IEEE Canadian conference on electrical and computer engineering (CCECE). https://doi.org/10.1109/ ccece.2016.7726783 Mönch, L., Zimmermann, J., & Otto, P. (2006). Machine learning techniques for scheduling jobs with incompatible families and unequal ready times on parallel batch machines. Engineering Applications of Artificial Intelligence, 19(3), 235–245. https://doi.org/10.1016/j.engappai.2005.10.001 Monostori, L. (2003). AI and machine learning techniques for managing complexity, changes and uncertainties in manufacturing. Engineering Applications of Artificial Intelligence, 16(4), 277–291. https://doi.org/10.1016/S0952-1976(03)00078-2 Montavon, G., Samek, W., & Müller, K.-R. (2018). Methods for interpreting and understanding deep neural networks. Digital Signal Processing: a review journal, 73, 1–15. https://doi.org/10.1016/j.dsp.2017.10.011 Mortazavi, A., Arshadi Khamseh, A., & Azimi, P. (2015). Designing of an intelligent self- adaptive model for supply chain ordering management system. Engineering Applications of Artificial Intelligence, 37, 207–220. https://doi.org/10.1016/j. engappai.2014.09.004 Müller, M. (2002). Computer go. Artificial Intelligence, 134(1–2), 145–179. https://doi. org/10.1016/S0004-3702(01)00121-7 M. Bertolini et al. https://doi.org/10.1109/tcpmt.2016.2589206 https://doi.org/10.1109/tcpmt.2016.2589206 https://doi.org/10.1155/2019/8503252 https://doi.org/10.1155/2019/8503252 https://doi.org/10.1016/j.eswa.2008.07.036 https://doi.org/10.1016/j.conengprac.2019.104189 https://doi.org/10.3390/s19245370 https://doi.org/10.3390/s19245370 https://doi.org/10.1016/j.eswa.2010.07.119 https://doi.org/10.1016/j.eswa.2017.08.046 https://doi.org/10.1115/1.4038598 https://doi.org/10.1115/1.4038598 https://doi.org/10.1016/j.jmsy.2018.04.001 https://doi.org/10.1016/j.jmsy.2018.04.001 https://doi.org/10.1080/0951192x.2017.1407447 https://doi.org/10.1080/0951192x.2017.1407447 https://doi.org/10.1007/s00170-004-2069-8 https://doi.org/10.1007/s00170-004-2069-8 https://doi.org/10.1007/s10489-007-0038-2 https://doi.org/10.1016/j.eswa.2011.09.088 https://doi.org/10.3390/en12132530 https://doi.org/10.3390/en12132530 https://doi.org/10.1108/imds-06-2016-0195 https://doi.org/10.1108/imds-06-2016-0195 https://doi.org/10.1007/s11740-018-0855-7 https://doi.org/10.1007/s11740-018-0855-7 https://doi.org/10.1109/70.928564 https://doi.org/10.1109/70.928564https://doi.org/10.1038/544023a https://doi.org/10.1080/00207543.2017.1351644 https://doi.org/10.1016/j.eswa.2007.07.002 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0395 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0395 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0395 https://doi.org/10.1038/nature14539 https://doi.org/10.1109/tsm.2016.2602226 https://doi.org/10.1109/tsm.2016.2602226 https://doi.org/10.1109/icmla.2013.186 https://doi.org/10.1109/icmla.2013.186 https://doi.org/10.3390/s17081729 https://doi.org/10.1016/j.jmapro.2019.10.020 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0430 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0430 https://doi.org/10.1109/TII.942410.1109/TII.2019.2908210 https://doi.org/10.1109/TII.942410.1109/TII.2019.2908210 https://doi.org/10.1016/j.cie.2011.06.025 https://doi.org/10.1016/j.cie.2011.06.025 https://doi.org/10.2991/ijcis.11.1.64 https://doi.org/10.1016/j.jmapro.2018.08.038 https://doi.org/10.1109/tii.2017.2690940 https://doi.org/10.1109/tii.2017.2690940 https://doi.org/10.1016/j.ijpe.2016.05.006 https://doi.org/10.1016/j.ijpe.2016.05.006 https://doi.org/10.1016/j.jii.2017.04.005 https://doi.org/10.1016/j.jii.2017.04.005 https://doi.org/10.1016/j.jprocont.2018.11.004 https://doi.org/10.1109/tsm.2018.2849206 https://doi.org/10.1016/j.jmsy.2018.01.011 https://doi.org/10.1016/j.jmsy.2018.01.011 https://doi.org/10.1016/j.trpro.2018.10.103 https://doi.org/10.1016/j.trpro.2018.10.103 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0495 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0495 https://doi.org/10.1109/tsm.2011.2118775 https://doi.org/10.1016/j.promfg.2020.01.260 https://doi.org/10.1007/s10696-020-09396-2 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0515 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0515 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0515 https://doi.org/10.1109/ccece.2016.7726783 https://doi.org/10.1109/ccece.2016.7726783 https://doi.org/10.1016/j.engappai.2005.10.001 https://doi.org/10.1016/S0952-1976(03)00078-2 https://doi.org/10.1016/j.dsp.2017.10.011 https://doi.org/10.1016/j.engappai.2014.09.004 https://doi.org/10.1016/j.engappai.2014.09.004 https://doi.org/10.1016/S0004-3702(01)00121-7 https://doi.org/10.1016/S0004-3702(01)00121-7 Expert Systems With Applications 175 (2021) 114820 28 Murphy, K. (2012). Machine learning: A probabilistic perspective. Cambridge: The MIT Press. Nakata, K., Orihara, R., Mizuoka, Y., & Takagi, K. (2017). A comprehensive big-data- based monitoring system for yield enhancement in semiconductor manufacturing. IEEE Transactions on Semiconductor Manufacturing, 30(4), 339–344. https://doi.org/ 10.1109/tsm.2017.2753251 Nugroho, A. S., Kuroyanagi, S., & Iwata, A. (2002). A solution for imbalanced training sets problem by CombNET-II and its application to fog forecast. IEICE transactions in Information and Systems, 85(7), 1165–1174. Oh, Y., Busogi, M., Ransikarbum, K., Shin, D., Kwon, D., & Kim, N. (2019a). Real-time quality monitoring and control system using an integrated cost effective support vector machine. Journal of Mechanical Science and Technology, 33(12), 6009–6020. https://doi.org/10.1007/s12206-019-1145-9 Oh, Y., Ransikarbum, K., Busogi, M., Kwon, D., & Kim, N. (2019b). Adaptive SVM-based real-time quality assessment for primer-sealer dispensing process of sunroof assembly line. Reliability Engineering & System Safety, 184, 202–212. https://doi.org/ 10.1016/j.ress.2018.03.020 Palombarini, J., & Martínez, E. (2012). SmartGantt – An intelligent system for real time rescheduling based on relational reinforcement learning. Expert Systems with Applications, 39(11), 10251–10268. https://doi.org/10.1016/j.eswa.2012.02.176 Papananias, M., McLeay, T. E., Mahfouf, M., & Kadirkamanathan, V. (2019). A Bayesian framework to estimate part quality and associated uncertainties in multistage manufacturing. Computers in Industry, 105, 35–47. https://doi.org/10.1016/j. compind.2018.10.008 Penumuru, D. P., Muthuswamy, S., & Karumbu, P. (2019). Identification and classification of materials using machine vision and machine learning in the context of industry 4.0. Journal of Intelligent Manufacturing., 31(5), 1229–1241. https://doi. org/10.1007/s10845-019-01508-6 Peres, R. S., Barata, J., Leitao, P., & Garcia, G. (2019). Multistage quality control using machine learning in the automotive industry. IEEE Access, 7, 79908–79916. https:// doi.org/10.1109/ACCESS.2019.2923405 Perzyk, M., Kochanski, A., Kozlowski, J., Soroczynski, A., & Biernacki, R. (2014). Comparison of data mining tools for significance analysis of process parameters in applications to process fault diagnosis. Information Sciences, 259, 380–392. https:// doi.org/10.1016/j.ins.2013.10.019 Pham, D. T., & Afify, A. A. (2005). Machine-learning techniques and their applications in manufacturing. Proceedings of the Institution of Mechanical Engineers, Part B: Journal of Engineering Manufacture, 219(5), 395–412. https://doi.org/10.1243/ 095440505X32274 Prieto, M. D., Cirrincione, G., Espinosa, A. G., Ortega, J. A., & Henao, H. (2013). Bearing fault detection by a novel condition-monitoring scheme based on statistical-time features and neural networks. IEEE Transactions on Industrial Electronics, 60(8), 3398–3407. https://doi.org/10.1109/tie.2012.2219838 Priore, P., De La Fuente, D., Gomez, A., & Puente, J. (2001). Dynamic scheduling of manufacturing systems with machine learning. International Journal of Foundations of Computer Science, 12(06), 751–762. https://doi.org/10.1142/s0129054101000849 Priore, P., de la Fuente, D., Puente, J., & Parreño, J. (2006). A comparison of machine- learning algorithms for dynamic scheduling of flexible manufacturing systems. Engineering Applications of Artificial Intelligence, 19(3), 247–255. https://doi.org/ 10.1016/j.engappai.2005.09.009 Priore, P., Priore, J., Pino, R., Gomez, A., & Puente, J. (2010). Learning-based scheduling of flexible manufacturing systems using support vector machines. Applied Artificial Intelligence, 24(3), 194–209. https://doi.org/10.1080/08839510903549606 Priore, P., Ponte, B., Puente, J., & Gomez, A. (2018). Learning-based scheduling of flexible manufacturing systems using ensemble methods. Computers & Industrial Engineering, 126, 282–291. https://doi.org/10.1016/j.cie.2018.09.034 Priore, P., Ponte, B., Rosillo, R., & de la Fuente, D. (2019). Applying machine learning to the dynamic selection of replenishment policies in fast-changing supply chain environments. International Journal of Production Research, 57(11), 3663–3677. https://doi.org/10.1080/00207543.2018.1552369 Ravikumar, S., Ramachandran, K. I., & Sugumaran, V. (2011). Machine learning approach for automated visual inspection of machine components. Expert Systems with Applications, 38(4), 3260–3266. https://doi.org/10.1016/j.eswa.2010.09.012 Ren, L., Sun, Y., Cui, J., & Zhang, L. (2018). Bearing remaining useful life prediction based on deep autoencoder and deep neural networks. Journal of Manufacturing Systems, 48, 71–77. https://doi.org/10.1016/j.jmsy.2018.04.008 Ribeiro, B. (2005). Support vector machines for quality monitoring in a plastic injection molding process. IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews), 35(3), 401–410. https://doi.org/10.1109/ tsmcc.2004.843228 Rude, D. J., Adams, S., & Beling, P. A. (2015). Task recognition from joint tracking data in an operational manufacturing cell. Journal of Intelligent Manufacturing, 29(6), 1203–1217. https://doi.org/10.1007/s10845-015-1168-8 Ruiz, E., Cuartas, M., Ferreno, D., Romero, L., Arroyo, V., & Gutierrez-Solana, F. (2019). Optimization of the fabrication of cold drawn steel wire through classification and clustering. Machine Learning Algorithms IEEE Access, 7, 141689–141700. https://doi. org/10.1109/ACCESS.2019.2942957 Saqlain, M., Jargalsaikhan, B., & Lee, J. Y. (2019). A voting ensemble classifierfor wafer map defect patterns identification in semiconductor manufacturing. IEEE Transactions on Semiconductor Manufacturing, 32(2), 171–182. https://doi.org/ 10.1109/TSM.6610.1109/TSM.2019.2904306 Saucedo-Espinosa, M. A., Escalante, H. J., & Berrones, A. (2014). Detection of defective embedded bearings by sound analysis: A machine learning approach. Journal of Intelligent Manufacturing, 28(2), 489–500. https://doi.org/10.1007/s10845-014- 1000-x Saxena, A., & Saad, A. (2007). Evolving an artificial neural network classifier for condition monitoring of rotating mechanical systems. Applied Soft Computing, 7(1), 441–454. https://doi.org/10.1016/j.asoc.2005.10.001 Scime, L., & Beuth, J. (2018). A multi-scale convolutional neural network for autonomous anomaly detection and classification in a laser powder bed fusion additive manufacturing process. Additive Manufacturing, 24, 273–286. https://doi. org/10.1016/j.addma.2018.09.034 Scime, L., & Beuth, J. (2019). Using machine learning to identify in-situ melt pool signatures indicative of flaw formation in a laser powder bed fusion additive manufacturing process. Additive Manufacturing, 25, 151–165. https://doi.org/ 10.1016/j.addma.2018.11.010 Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision (pp. 618–626). Shin, M., Ryu, K., & Jung, M. (2012). Reinforcement learning approach to goal- regulation in a self-evolutionary manufacturing system. Expert Systems with Applications, 39(10), 8736–8743. https://doi.org/10.1016/j.eswa.2012.01.207 Tan, S. C., Watada, J., Ibrahim, Z., & Khalid, M. (2015). Evolutionary fuzzy ARTMAP neural networks for classification of semiconductor defects. IEEE Transactions on Neural Networks and Learning Systems, 26(5), 933–950. https://doi.org/10.1109/ tnnls.2014.2329097 Shiue, Y.-R. (2009). Development of two-level decision tree-based real-time scheduling system under product mix variety environment. Robotics and Computer-Integrated Manufacturing, 25(4–5), 709–720. https://doi.org/10.1016/j.rcim.2008.06.002 Shiue, Y.-R., Guh, R.-S., & Lee, K.-C. (2011). Study of SOM-based intelligent multi- controller for real-time scheduling. Applied Soft Computing, 11(8), 4569–4580. https://doi.org/10.1016/j.asoc.2011.07.022 Shiue, Y.-R., Guh, R., & Lee, K. (2012). Development of machine learning-based real time scheduling systems: Using ensemble based on wrapper feature selection approach. International Journal of Production Research, 50(20), 5887–5905. https://doi.org/ 10.1080/00207543.2011.636389 Silbernagel, C., Aremu, A., & Ashcroft, I. (2019). Using machine learning to aid in the parameter optimisation process for metal-based additive manufacturing. Rapid Prototyping Journal, 26(4), 625–637. https://doi.org/10.1108/RPJ-08-2019-0213 Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., … Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484–489. https://doi.org/10.1038/nature16961 Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Hassabis, D. (2017). Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv preprint arXiv:1712.01815. Simon, H. (1983). Machine learning: An artificial intelligence approach. Tioga Press. Sobie, C., Freitas, C., & Nicolai, M. (2018). Simulation-driven machine learning: Bearing fault classification. Mechanical Systems and Signal Processing, 99, 403–419. https:// doi.org/10.1016/j.ymssp.2017.06.025 Stathatos, E., & Vosniakos, G.-C. (2019). Real-time simulation for long paths in laser- based additive manufacturing: A machine learning approach. The International Journal of Advanced Manufacturing Technology, 104(5–8), 1967–1984. https://doi. org/10.1007/s00170-019-04004-6 Stocker, C., Schmid, M., & Reinhart, G. (2019). Reinforcement learning-based design of orienting devices for vibratory bowl feeders. The International Journal of Advanced Manufacturing Technology, 105(9), 3631–3642. https://doi.org/10.1007/s00170- 019-03798-9 Stoyanov, S., Ahsan, M., Bailey, C., Wotherspoon, T., & Hunt, C. (2019). Predictive analytics methodology for smart qualification testing of electronic components. Journal of Intelligent Manufacturing, 30(3), 1497–1514. https://doi.org/10.1007/ s10845-018-01462-9 Sun, J., Rahman, M., Wong, Y., & Hong, G. (2004). Multiclassification of tool wear with support vector machine by manufacturing loss consideration. International Journal of Machine Tools and Manufacture, 44(11), 1179–1187. https://doi.org/10.1016/j. ijmachtools.2004.04.003 Susto, G. A., Schirru, A., Pampuri, S., McLoone, S., & Beghi, A. (2015). Machine learning for predictive maintenance: A multiple classifier approach. IEEE Transactions on Industrial Informatics, 11(3), 812–820. https://doi.org/10.1109/tii.2014.2349359 Sutton, R., & Barto, A. (1998). Reinforcement learning: An introduction. Cambridge: MIT press. Syafrudin, M., Alfian, G., Fitriyani, N., & Rhee, J. (2018). Performance Analysis of IoT- based sensor, big data processing, and machine learning model for real-time monitoring system in automotive manufacturing. Sensors, 18(9), 2946. https://doi. org/10.3390/s18092946 Tan, Q., Tong, Y., Wu, S., & Li, D. (2019). Modeling, planning, and scheduling of shop- floor assembly process with dynamic cyber-physical interactions: A case study for CPS-based smart industrial robot production. The International Journal of Advanced Manufacturing Technology, 105(9), 3979–3989. https://doi.org/10.1007/s00170- 019-03940-7 Tsutsui, T., & Matsuzawa, T. (2019). Virtual metrology model robustness against chamber condition variation using deep learning. IEEE Transactions on Semiconductor Manufacturing, 32(4), 428–433. https://doi.org/10.1109/TSM.6610.1109/ TSM.2019.2931328 Tulsyan, A., Garvin, C., & Ündey, C. (2018). Advances in industrial biopharmaceutical batch process monitoring: Machine-learning methods for small data problems. Biotechnology and Bioengineering, 115(8), 1915–1924. https://doi.org/10.1002/bit. v115.810.1002/bit.26605 Tušar, T., Gantar, K., Koblar, V., Ženko, B., & Filipič, B. (2017). A study of overfitting in optimization of a manufacturing quality control procedure. Applied Soft Computing, 59, 77–87. https://doi.org/10.1016/j.asoc.2017.05.027 M. Bertolini et al. http://refhub.elsevier.com/S0957-4174(21)00261-X/h0550 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0550 https://doi.org/10.1109/tsm.2017.2753251 https://doi.org/10.1109/tsm.2017.2753251 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0560 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0560 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0560 https://doi.org/10.1007/s12206-019-1145-9 https://doi.org/10.1016/j.ress.2018.03.020 https://doi.org/10.1016/j.ress.2018.03.020 https://doi.org/10.1016/j.eswa.2012.02.176 https://doi.org/10.1016/j.compind.2018.10.008 https://doi.org/10.1016/j.compind.2018.10.008 https://doi.org/10.1007/s10845-019-01508-6 https://doi.org/10.1007/s10845-019-01508-6 https://doi.org/10.1109/ACCESS.2019.2923405 https://doi.org/10.1109/ACCESS.2019.2923405 https://doi.org/10.1016/j.ins.2013.10.019 https://doi.org/10.1016/j.ins.2013.10.019 https://doi.org/10.1243/095440505X32274 https://doi.org/10.1243/095440505X32274 https://doi.org/10.1109/tie.2012.2219838 https://doi.org/10.1142/s0129054101000849 https://doi.org/10.1016/j.engappai.2005.09.009 https://doi.org/10.1016/j.engappai.2005.09.009 https://doi.org/10.1080/08839510903549606 https://doi.org/10.1016/j.cie.2018.09.034https://doi.org/10.1080/00207543.2018.1552369 https://doi.org/10.1016/j.eswa.2010.09.012 https://doi.org/10.1016/j.jmsy.2018.04.008 https://doi.org/10.1109/tsmcc.2004.843228 https://doi.org/10.1109/tsmcc.2004.843228 https://doi.org/10.1007/s10845-015-1168-8 https://doi.org/10.1109/ACCESS.2019.2942957 https://doi.org/10.1109/ACCESS.2019.2942957 https://doi.org/10.1109/TSM.6610.1109/TSM.2019.2904306 https://doi.org/10.1109/TSM.6610.1109/TSM.2019.2904306 https://doi.org/10.1007/s10845-014-1000-x https://doi.org/10.1007/s10845-014-1000-x https://doi.org/10.1016/j.asoc.2005.10.001 https://doi.org/10.1016/j.addma.2018.09.034 https://doi.org/10.1016/j.addma.2018.09.034 https://doi.org/10.1016/j.addma.2018.11.010 https://doi.org/10.1016/j.addma.2018.11.010 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0685 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0685 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0685 https://doi.org/10.1016/j.eswa.2012.01.207 https://doi.org/10.1109/tnnls.2014.2329097 https://doi.org/10.1109/tnnls.2014.2329097 https://doi.org/10.1016/j.rcim.2008.06.002 https://doi.org/10.1016/j.asoc.2011.07.022 https://doi.org/10.1080/00207543.2011.636389 https://doi.org/10.1080/00207543.2011.636389 https://doi.org/10.1108/RPJ-08-2019-0213 https://doi.org/10.1038/nature16961 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0730 https://doi.org/10.1016/j.ymssp.2017.06.025 https://doi.org/10.1016/j.ymssp.2017.06.025 https://doi.org/10.1007/s00170-019-04004-6 https://doi.org/10.1007/s00170-019-04004-6 https://doi.org/10.1007/s00170-019-03798-9 https://doi.org/10.1007/s00170-019-03798-9 https://doi.org/10.1007/s10845-018-01462-9 https://doi.org/10.1007/s10845-018-01462-9 https://doi.org/10.1016/j.ijmachtools.2004.04.003 https://doi.org/10.1016/j.ijmachtools.2004.04.003 https://doi.org/10.1109/tii.2014.2349359 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0765 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0765 https://doi.org/10.3390/s18092946 https://doi.org/10.3390/s18092946 https://doi.org/10.1007/s00170-019-03940-7 https://doi.org/10.1007/s00170-019-03940-7 https://doi.org/10.1109/TSM.6610.1109/TSM.2019.2931328 https://doi.org/10.1109/TSM.6610.1109/TSM.2019.2931328 https://doi.org/10.1002/bit.v115.810.1002/bit.26605 https://doi.org/10.1002/bit.v115.810.1002/bit.26605 https://doi.org/10.1016/j.asoc.2017.05.027 Expert Systems With Applications 175 (2021) 114820 29 Van Hasselt, H., Guez, A., & Silver, D. (2015). Deep reinforcement learning with double q-learning. arXiv preprint arXiv:1509.06461. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 5998–6008. Villegas, M. A., Pedregal, D. J., & Trapero, J. R. (2018). A support vector machine for model selection in demand forecasting applications. Computers & Industrial Engineering, 121, 1–7. https://doi.org/10.1016/j.cie.2018.04.042 Waller, M. A., & Fawcett, S. E. (2013). Data science, predictive analytics, and big data: A revolution that will transform supply chain design and management. Journal of Business Logistics, 34(2), 77–84. Wan, J., Tang, S., Li, D., Wang, S., Liu, C., Abbas, H., & Vasilakos, A. V. (2017). A manufacturing big data solution for active preventive maintenance. IEEE Transactions on Industrial Informatics, 13(4), 2039–2047. https://doi.org/10.1109/ tii.2017.2670505 Wang, J., Yan, J., Li, C., Gao, R. X., & Zhao, R. (2019). Deep heterogeneous GRU model for predictive analytics in smart manufacturing: Application to tool wear prediction. Computers in Industry, 111, 1–14. https://doi.org/10.1016/j.compind.2019.06.001 Wang, P., Liu, H., Wang, L., & Gao, R. X. (2018). Deep learning-based human motion recognition for predictive context-aware human-robot collaboration. CIRP Annals, 67(1), 17–20. https://doi.org/10.1016/j.cirp.2018.04.066 Watkins, C. J. C. H. (1989). Learning from delayed rewards. PhD thesis. England: University of Cambridge. Widodo, A., & Yang, B.-S. (2007). Support vector machine in machine condition monitoring and fault diagnosis. Mechanical Systems and Signal Processing, 21(6), 2560–2574. https://doi.org/10.1016/j.ymssp.2006.12.007 Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1), 67–82. https://doi.org/10.1109/ 4235.585893 Wu, Q. (2010). Product demand forecasts using wavelet kernel support vector machine and particle swarm optimization in manufacture system. Journal of Computational and Applied Mathematics, 233(10), 2481–2491. https://doi.org/10.1016/j. cam.2009.10.030 Wu, H., Yu, Z., & Wang, Y. (2019). Experimental study of the process failure diagnosis in additive manufacturing based on acoustic emission. Measurement, 136, 445–453. https://doi.org/10.1016/j.measurement.2018.12.067 Wuest, T., Irgens, C., & Thoben, K.-D. (2013). An approach to monitoring quality in manufacturing using supervised machine learning on product state data. Journal of Intelligent Manufacturing, 25(5), 1167–1180. https://doi.org/10.1007/s10845-013- 0761-y Wuest, T., Weimer, D., Irgens, C., & Thoben, K. D. (2016). Machine learning in manufacturing: Advantages, challenges, and applications. Production & Manufacturing Research, 4(1), 23–45. https://doi.org/10.1080/ 21693277.2016.1192517 Yacob, F., Semere, D., & Nordgren, E. (2019). Anomaly detection in Skin Model Shapes using machine learning classifiers. The International Journal of Advanced Manufacturing Technology, 105(9), 3677–3689. https://doi.org/10.1007/s00170- 019-03794-z Xu, L. D., Xu, E. L., & Li, L. (2018). Industry 4.0: State of the art and future trends. International Journal of Production Research, 56(8), 2941–2962. https://doi.org/ 10.1080/00207543.2018.1444806 Yang, W.-A., & Zhou, W. (2015). Autoregressive coefficient-invariant control chart pattern recognition in autocorrelated manufacturing processes using neural network ensemble. Journal of Intelligent Manufacturing, 26(6), 1161–1180. https://doi.org/ 10.1007/s10845-013-0847-6 Yang, Y., Lou, Y., Gao, M., & Ma, G. (2018). An automatic aperture detection system for LED cup based on machine vision. Multimedia Tools and Applications, 77(18), 23227–23244. https://doi.org/10.1007/s11042-018-5639-8 Yu, J. (2019). Enhanced stacked denoising autoencoder-based feature learning for recognition of wafer map defects. IEEE Transactions on Semiconductor Manufacturing, 32(4), 613–624. https://doi.org/10.1109/TSM.6610.1109/TSM.2019.2940334 Yu, J., Zheng, X., & Wang, S. (2019). A deep autoencoder feature learning method for process pattern recognition. Journal of Process Control, 79, 1–15. https://doi.org/ 10.1016/j.jprocont.2019.05.002 Yuan, B., Guss, G. M., Wilson, A. C., Hau-Riege, S. P., DePond, P. J., McMains, S., … Giera, B. (2018). Machine-learning-based monitoring of laser powder bed fusion. Advanced Materials Technologies, 3(12), 1800136. https://doi.org/10.1002/admt. v3.1210.1002/admt.201800136 Zan, T., Liu, Z., Wang, H., Wang, M., & Gao, X. (2019). Control chart pattern recognition using the convolutional neural network. Journal of Intelligent Manufacturing, 31(3), 703–716. https://doi.org/10.1007/s10845-019-01473-0 Zarandi, M. H. F., Moosavi, S. V., & Zarinbal, M. (2012). A fuzzy reinforcement learning algorithm for inventory control in supply chains. The International Journal of Advanced Manufacturing Technology, 65(1–4), 557–569. https://doi.org/10.1007/ s00170-012-4195-z Zhang, J., Wang, P., & Gao, R. X. (2019a). Deep learning-based tensile strength prediction in fused deposition modeling. Computers in Industry, 107, 11–21. https:// doi.org/10.1016/j.compind.2019.01.011 Zhang, X., Chen, W., Wang, B., & Chen, X. (2015). Intelligent fault diagnosis of rotating machinery using support vector machine with ant colony algorithm for synchronous feature selection and parameter optimization. Neurocomputing,167, 260–279. https://doi.org/10.1016/j.neucom.2015.04.069 Zhang, Y., Harik, R., Fadel, G., & Bernard, A. (2019b). A statistical method for build orientation determination in additive manufacturing. Rapid Prototyping Journal, 25 (1), 187–207. https://doi.org/10.1108/rpj-04-2018-0102 Zhu, Z., Anwer, N., Huang, Q., & Mathieu, L. (2018). Machine learning in tolerancing for additive manufacturing. CIRP Annals, 67(1), 157–160. https://doi.org/10.1016/j. cirp.2018.04.119 M. Bertolini et al. http://refhub.elsevier.com/S0957-4174(21)00261-X/h0800 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0800 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0800 https://doi.org/10.1016/j.cie.2018.04.042 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0810 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0810 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0810 https://doi.org/10.1109/tii.2017.2670505 https://doi.org/10.1109/tii.2017.2670505 https://doi.org/10.1016/j.compind.2019.06.001 https://doi.org/10.1016/j.cirp.2018.04.066 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0830 http://refhub.elsevier.com/S0957-4174(21)00261-X/h0830 https://doi.org/10.1016/j.ymssp.2006.12.007 https://doi.org/10.1109/4235.585893 https://doi.org/10.1109/4235.585893 https://doi.org/10.1016/j.cam.2009.10.030 https://doi.org/10.1016/j.cam.2009.10.030 https://doi.org/10.1016/j.measurement.2018.12.067 https://doi.org/10.1007/s10845-013-0761-y https://doi.org/10.1007/s10845-013-0761-y https://doi.org/10.1080/21693277.2016.1192517 https://doi.org/10.1080/21693277.2016.1192517 https://doi.org/10.1007/s00170-019-03794-z https://doi.org/10.1007/s00170-019-03794-z https://doi.org/10.1080/00207543.2018.1444806 https://doi.org/10.1080/00207543.2018.1444806 https://doi.org/10.1007/s10845-013-0847-6 https://doi.org/10.1007/s10845-013-0847-6 https://doi.org/10.1007/s11042-018-5639-8 https://doi.org/10.1109/TSM.6610.1109/TSM.2019.2940334 https://doi.org/10.1016/j.jprocont.2019.05.002 https://doi.org/10.1016/j.jprocont.2019.05.002 https://doi.org/10.1002/admt.v3.1210.1002/admt.201800136 https://doi.org/10.1002/admt.v3.1210.1002/admt.201800136 https://doi.org/10.1007/s10845-019-01473-0 https://doi.org/10.1007/s00170-012-4195-z https://doi.org/10.1007/s00170-012-4195-z https://doi.org/10.1016/j.compind.2019.01.011 https://doi.org/10.1016/j.compind.2019.01.011 https://doi.org/10.1016/j.neucom.2015.04.069 https://doi.org/10.1108/rpj-04-2018-0102 https://doi.org/10.1016/j.cirp.2018.04.119 https://doi.org/10.1016/j.cirp.2018.04.119 Machine Learning for industrial applications: A comprehensive literature review 1 Introduction 2 A brief introduction of Machine Learning theory 2.1 Machine Learning areas 2.1.1 Supervised Learning (SL) 2.1.2 Unsupervised Learning (UL) 2.1.3 Reinforcement Learning (RL) 3 Searching methodology 3.1 Initial query-based search 3.2 Search enlargement 3.2.1 Cross-reference analysis 3.2.2 Relevance assessment through citation graph analysis 3.3 Abstract analysis and final screening of the selected works 4 Systematic review 4.1 Preliminary classification 4.2 Trend analysis 4.3 Keywords analysis 4.3.1 Current trends and hot topics 4.3.2 Gaps’ investigation 4.4 Detailed analysis of selected papers 4.4.1 Maintenance management 4.4.2 Quality management 4.4.3 Production Planning & Control (PPC) 4.4.4 Supply chain management 4.4.5 Models’ complexity, Input-Output variables 4.4.6 Concluding remarks 5 Conclusions and directions for future works CRediT authorship contribution statement Declaration of Competing Interest Appendix A Acronyms of the algorithms cited in the literature review Appendix B Bibliometric analysis References