Prévia do material em texto
Data Handling in Science and Technology Hyperspectral Imaging Volume 32 Series Editor José Manuel Amigo Elsevier Radarweg 29, PO Box 211, 1000 AE Amsterdam, Netherlands The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom 50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States Copyright © 2020 Elsevier B.V. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the Publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither AACCI nor the Publisher, nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library ISBN: 978-0-444-63977-6 ISSN: 0922-3487 For information on all Elsevier publications visit our website at https://www.elsevier.com/books-and-journals Publisher: Susan Dennis Acquisition Editor: Kathryn Eryilmaz Editorial Project Manager: Vincent Gabrielle Production Project Manager: Divya KrishnaKumar Cover Designer: Matthew Limbert Typeset by TNQ Technologies http://www.elsevier.com/permissions https://www.elsevier.com/books-and-journals Contributors Nuria Aleixos Departamento de Ingenierı́a Gráfica, Universitat Politècnica de València, València, Spain José Manuel Amigo Professor, Ikerbasque, Basque Foundation for Science; Department of Analytical Chemistry, University of the Basque Country, Spain; Chemometrics and Analytical Technologies, Department of Food Science, University of Copenhagen, Denmark Josselin Aval ONERA/DOTA, Université de Toulouse, Toulouse, France Touria Bajjouk Laboratoire d’Ecologie Benthique Côtière (PDG-ODE-DYNECO- LEBCO), Brest, France Jean-Baptiste Barré Univ. Grenoble Alpes, Irstea, LESSEM, Grenoble, France Jon Atli Benediktsson University of Iceland, Reykjavik, Iceland Jose Blasco Centro de Agroingenierı́a, Instituto Valenciano de Investigaciones Agrarias (IVIA), Valencia, Spain Johan Bøtker Department of Pharmacy, University of Copenhagen, Copenhagen, Denmark Xavier Briottet ONERA/DOTA, Université de Toulouse, Toulouse, France Ingunn Burud Faculty of Science and Technology, Norwegian University of Life Sciences NMBU, Norway Daniel Caballero Professor, Ikerbasque, Basque Foundation for Science; Department of Analytical Chemistry, University of the Basque Country, Spain; Chemometrics and Analytical Technologies, Department of Food Science, Faculty of Science, Copenhagen, Denmark; Computer Science Department, Research Institute of Meat and Meat Product (IproCar), University of Extremadura, Cáceres, Spain Rosalba Calvini Department of Life Sciences, University of Modena and Reggio Emilia, Modena, Italy Gustau Camps-Valls Image Processing Laboratory (IPL), Universitat de València, València, Spain Véronique Carrère Laboratoire de Planétologie et Géodynamique (LPG), Nantes, France Andrea Casini Istituto di Fisica Applicata “Nello Carrara” - National Research Council (IFAC-CNR), Sesto Fiorentino (Florence), Italy Jocelyn Chanussot Gipsa-lab, Grenoble INP, Grenoble, Rhône Alpes, France; Univ. Grenoble Alpes, CNRS, Grenoble INP (Institute of Engineering Univ. Grenoble Alpes), GIPSA-lab, Grenoble, France xvii Sergio Cubero Centro de Agroingenierı́a, Instituto Valenciano de Investigaciones Agrarias (IVIA), Valencia, Spain Costanza Cucci Istituto di Fisica Applicata “Nello Carrara” - National Research Council (IFAC-CNR), Sesto Fiorentino (Florence), Italy Mauro Dalla Mura Univ. Grenoble Alpes, CNRS, Grenoble INP (Institute of Engineering Univ. Grenoble Alpes), GIPSA-lab, Grenoble, France; Tokyo Tech World Research Hub Initiative (WRHI), School of Computing, Tokyo Institute of Technology, Tokyo, Japan Michele Dalponte Department of Sustainable Agro-ecosystems and Bioresources, Research and Innovation Centre, Fondazione Edmund Mach, San Michele all’Adige, Italy Florian de Boissieu TETIS, Irstea, AgroParisTech, CIRAD, CNRS, Université Mont- pellier, Montpellier, France Anna de Juan Chemometrics group, Department of Chemical Engineering and Analyt- ical Chemistry, Universitat de Barcelona (UB), Barcelona, Spain Sylvain Dout Institut de Planétologie et d’Astrophysique de Grenoble (IPAG), Greno- ble, France; Météo-France-CNRS, CNRM/CEN, Saint Martin d’Hères, France Lucas Drumetz Lab-STICC, IMT Atlantique, Brest, Brittany, France Marie Dumont Météo-France-CNRS, CNRM/CEN, Saint Martin d’Hères, France Sophie Fabre ONERA/DOTA, Université de Toulouse, Toulouse, France Nicola Falco Lawrence Berkeley National Laboratory, Berkeley, CA, United States Baowei Fei Department of Bioengineering, The University of Texas at Dallas, Richard- son, TX, United States; Department of Radiology, The University of Texas South- western Medical Center, Dallas, TX, United States; Advanced Imaging Research Center, The University of Texas Southwestern Medical Center, Dallas, TX, United States Jean-Baptiste Féret TETIS, Irstea, AgroParisTech, CIRAD, CNRS, Université Mont- pellier, Montpellier, France João Fortuna Idletechs AS, Trondheim, Norway; Department of Engineering Cyber- netics, Norwegian University of Science and Technology NTNU, Trondheim, Norway Pierre-Yves Foucher ONERA/DOTA, Université de Toulouse, Toulouse, France Neal B. Gallagher Chemometrics, Eigenvector Research, Inc., Manson, WA, United States Luis Gómez-Chova Image Processing Laboratory (IPL), Universitat de València, València, Spain Aoife Gowen UCD School of Biosystems and Food Engineering, University College of Dublin (UCD), Belfield, Dublin, Ireland Silvia Grassi Department of Food, Environmental and Nutritional Sciences (DeFENS), Università degli Studi di Milano, Milano, Italy xviii Contributors Ana Herrero-Langreo UCD School of Biosystems and Food Engineering, University College of Dublin (UCD), Belfield, Dublin, Ireland Christian Jutten Gipsa-lab, Université Grenoble Alpes, Grenoble, Rhône-Alpes, France Xudong Kang Hunan University, College of Electrical and Information Engineering, Hunan, China Tatiana Konevskikh Faculty of Science and Technology, Norwegian University of Life Sciences NMBU, Norway; Department of Fundamental Mathematics, Perm State University PSU, Perm, Russia Valero Laparra Image Processing Laboratory (IPL), Universitat de València, València, Spain Anthony Laybros AMAP, IRD, CNRS, CIRAD, INRA, Univ. Montpellier, Montpel- lier, France Shutao Li Hunan University, College of Electrical and Information Engineering, Hunan, China Giorgio Antonino Licciardi E.Amaldi Foundation, Rome, Italy Federico Marini Department of Chemistry, University of Rome La Sapienza, Roma, Italy Rodolphe Marion CEA/DAM/DIF, Arpajon, France Harald Martens Idletechs AS, Trondheim, Norway; Department of Engineering Cybernetics, Norwegian University of Science and Technology NTNU, Trondheim, Norway Gabriel Martı́n Instituto de Telecomunicações, Lisbon, Portugal Luca Martino Image Processing Laboratory (IPL), Universitat de València, València, Spain Théo Masson Univ. Grenoble Alpes, CNRS, Grenoble INP, Institute of Engineering Univ. Grenoble Alpes, GIPSA-Lab, Grenoble, France Gonzalo Mateo-Garcı́a Image Processing Laboratory (IPL), Universitat de València, València, Spain Audrey Minghelli Laboratoire des Sciences de l’Information et des Systèmes (LSIS), University of South Toulon Var ISITV, La Valette, France Jean-Matthieu Monnet Univ. Grenoble Alpes, Irstea, LESSEM, Grenoble, France Pascal Mouquet Saint Leu, La Réunion, France Sandra Munera Centro de Agroingenierı́a, Instituto Valenciano de Investigaciones Agrarias (IVIA), Valencia, Spain Jordi Muñoz-Marı́ Image Processing Laboratory (IPL), Universitat de València, València, Spain José Nascimento ISEL - Instituto Superior de Engenharia de Lisboa, Instituto Politéc- nico de Lisboa, Lisbon, Portugal; Instituto de Telecomunicações, Lisbon, Portugal xixContributors Hodjat Rahmati Idletechs AS, Trondheim, Norway Jukka Rantanen Department of Pharmacy, University of Copenhagen, Copenhagen, Denmark Carolina Santos Department of Fundamental Chemistry, Federal University of Pernambuco, Recife, Brazil Amalia G.M. Scannell UCD Institute of Food and Health, University College of Dublin (UCD), Belfield, Dublin, Ireland; UCD Center for Food Safety, University College of Dublin (UCD), Belfield, Dublin, Ireland; UCD School of Agriculture and Food Science, University College of Dublin (UCD), Belfield, Dublin, Ireland Frédéric Schmidt GEOPS, Univ. Paris-Sud, CNRS, Université Paris-Saclay, Orsay, France Petter Stefansson Faculty of Science and Technology, Norwegian University of Life Sciences NMBU, Norway Daniel H. Svendsen Image Processing Laboratory (IPL), Universitat de València, València, Spain Irina Torres Department of Bromatology and Food Technology, University of Córdoba, Campus of Rabanales, Córdoba, Spain Eduardo Tusa Univ. Grenoble Alpes, Irstea, LESSEM, Grenoble, France; Univ. Grenoble Alpes, CNRS, Grenoble INP (Institute of Engineering Univ. Grenoble Alpes), GIPSA-lab, Grenoble, France; Universidad Técnica de Machala, Facultad de Ingeniería Civil, AutoMathTIC, Machala, Ecuador Alessandro Ulrici Department of Life Sciences, University of Modena and Reggio Emilia, Modena, Italy Kuniaki Uto School of Computing, Tokyo Institute of Technology, Tokyo, Japan Jochem Verrelst Image Processing Laboratory (IPL), Universitat de València, València, Spain Grégoire Vincent AMAP, IRD, CNRS, CIRAD, INRA, Univ. Montpellier, Montpel- lier, France Gemine Vivone Department of Information Engineering, Electrical Engineering and Applied Mathematics, University of Salerno, Salerno, Italy Christiane Weber TETIS, Irstea, AgroParisTech, CIRAD, CNRS, Université Montpel- lier, Maison de la Télédétection, Montpellier, France Jian X. Wu Oral Formulation Research, Novo Nordisk A/S, Denmark Junshi Xia RIKEN Center for Advanced Intelligence Project, Tokyo, Japan xx Contributors List of Figures Chapter 1.1 Figure 1 Confocal laser scanning microscopy image of a yogurt sample. The image is one of the images used in Ref. [2]. The right part of the figure shows the histogram of the grayscale values and the result of applying two different threshold values (24 for protein and 100 for microparticulated whey protein). 5 Figure 2 Real color picture in RGB color space of a butterfly, and representation of the red, green, and blue monochannels using the color map displayed in the right part of the figure. The two images in the bottom are the transformed image in L*a*b* color space. 6 Figure 3 Multispectral image taken to a 10 euros paper note (top left). The top right part shows the intensities of the 19 different wavelengths for two pixels. The bottom part shows different single channel pictures extracted for eight channels. 7 Figure 4 Representation of the image of a cookie measured with a hyperspectral camera in the wavelength range of 940e1600 nm (near infrared, NIR) with a spectral resolution of 4 nm. The spectra obtained in two pixels are shown and the false color image (single channel image) obtained at 1475 nm. The single channel image selected highlighted three areas of the cookie where water was intentionally added. This water is invisible in the VIS region (nothing can be appreciated in the real color picture). Nevertheless, water is one of the main elements that can be spotted in NIR. 8 Figure 5 Comprehensive flowchart of the analysis of hyperspectral and multispectral images. ANN, artificial neural networks; AsLS, asymmetric least squares; CLS, classical least squares; EMSC, extended multiplicative scatter correction; FSW-EFA; fixed size window-evolving factor analysis; ICA, independent component analysis; LDA, linear discriminant analysis; MCR, multivariate curve resolution; MLR, multiple linear regression; MSC, multiplicative scatter correction; NLSU, nonlinear spectral unmixing; OPA, orthogonal projection approaches; PCA, principal component analysis; PLS-DA, partial least squares-discriminant analysis; PLS, partial least squares; SIMCA, soft independent modeling of class analogy; SIMPLISMA, simple-to-use interactive self-modeling mixture analysis; SNV, standard normal variate; SVM, support vectors machine; WLS, weighted least squares. 10 Chapter 1.2 Figure 1 Comparison of the basic setups for conventional imaging, hyperspectral and multispectral cameras, and conventional spectroscopy. 18 Figure 2 Lightematter interaction. Depiction of the physical and chemical effects that a photon might have when iterated with matter. 19 lxxix Figure 3 RGB picture of part of a hand and the corresponding image taken at 950 nm. 20 Figure 4 Point, line, and plane scan configurations in hyperspectral and multispectral (only the plane scan) imaging devices and the structure of the final data cube of dimensions X � Y � l. 21 Figure 5 Spectral irradiance spectra of atmospheric terrestrial and extraterrestrial light. 24 Figure 6 Different geometries that the light sources can adopt, highlighting the emitted and reflected (or transmitted) light path, and how light interacts with the sample. 25 Figure 7 Black, reflectance of spectralon reference material for camera calibration. Red, energy emission of a Tungsten Halogen lamp at 3300 K. Green, emission of a green LED. Dark red, behavior of a band-pass filter at 1200 nm. The Y axis scale only belongs to the reflectance of Spectralon. 26 Figure 8 Example of printable checkerboards (the USAF 1951 and a customized one) used for line mapping and plane scan HSI and MSI cameras. HSI, hyperspectral imaging; MSI, multispectral imaging. 31 Chapter 2.1 Figure 1 Original and distorted aerial images of houses. 38 Figure 2 Left, false RGB of a hyperspectral image showing four different clusters of plastic pellets. The spectral range was from 940 to 1640 nm with a spectral resolution of 4.85 nm. Further information about the acquisition, type of hyperspectral camera, and calibration can be found in Ref. [3]. Right, raw spectra of the selected pixels (marked with an “x” in the left figure). The black line in the left figure indicates a missing scan line. 39 Figure 3 Top, distorted image (dashed squares) and corrected image (solid squares). The pixel of interest is highlighted in bold. Bottom, three of the methodologies for pixel interpolation, highlighting in each ones the pixels of the distorted image used for the interpolation. 41 Figure 4 Left, spectrum containing one spiked point. The continuous red line denotes the mean of thespectrum. The upper and lower dashed red lines denote the mean � six times the standard deviation. Right, corrected spectrum where the spike has been localized and its value substituted by an average of the neighboring spectral values. 42 Figure 5 Top, raw spectra of Fig. 2 and the spectra after different spectral preprocessing methods. Bottom, the image resulting at 1220 nm for each spectral preprocessing. 43 Figure 6 Example of spectral preprocessing for minimizing the impact of the shape of the sample in the spectra. Top left, RGB image of a nectarine. Top middle, the average signal of the pixels contained in the green line of the top left figure. Top right, spectra of the green line of the top left figure showing the effect of the curvature. Bottom middle, average signal of the preprocessed spectra (with standard normal variate (SNV)) of the green line of the top left figure. Bottom right, preprocessed spectra of the green line of the top left figure. 44 Figure 7 Depiction of three different methodologies for background removal. Left, false RGB image. Top, K-means analysis of the hyperspectral image in SNV and the selection of clusters 2 and 4 to create the mask. Middle, false color image obtained at 971 nm of the hyperspectral image in SNV and the result of applying a proper threshold to create the mask. Bottom, PCA scatter plot of the hyperspectral image in SNV with the selected pixels highlighted in red to create the mask. All the analysis have been made using HYPER-Tools [28], freely download in Ref. [29]. SNV, standard normal variate; PCA, principal component analysis. 46 lxxx List of Figures Figure 8 Comparison of two PCA models performed in the hyperspectral image of the plastics [3]. For each PCA model, the score surfaces of the first two PCs, the scatter plot of PC1 versus PC2 and the corresponding loadings are shown. All the analyses have been made using HYPER-Tools [28], freely download in Ref. [29]. PCA, principal component analysis; PC, principal component. 47 Chapter 2.2 Figure 1 Vector quantization compression block diagram. 57 Figure 2 Predictive coding compression block diagram. 60 Figure 3 Transform-based compression block diagram. 61 Chapter 2.3 Figure 1 An example of spectral distortion for component substitution approaches (see, e.g., the river area on the right side of the images). An image acquired by the IKONOS sensor over the Toulouse city is fused: (A) Ground-truth (reference image) and the fusion products using the (B) GrameSchmidt (GS) and (C) the GS adaptive (GSA) approaches. A greater spatial distortion can be pointed out in the case of GS where a lower similarity between the panchromatic and the multispectral spatial (intensity) component is shown with respect to the GSA case. 72 Figure 2 An example of spectral distortion for component substitution approaches. An image acquired by the IKONOS sensor over the Toulouse city is fused. Error maps between the ground-truth (reference image) and the fusion products using (A) the GrameSchmidt (GS) and (B) the GS adaptive (GSA) approaches. A greater spatial distortion can be pointed out in the case of GS where a lower similarity between the panchromatic and the multispectral spatial (intensity) component is shown with respect to the GSA case. 72 Figure 3 An example of the spectral distortion reduction due to the histogram-matching for component substitution approaches (see, e.g., the river area on the right side of the images). An image acquired by the IKONOS sensor over the Toulouse city is fused: (A) Ground-truth (reference image) and the fusion products using the (B) principal component substitution (PCS) without histogram-matching and (C) PCS with histogram-matching. A greater spatial distortion can be pointed out in the case of PCS without the histogram- matching with respect to the same procedure including the histogram- matching processing step. 73 Figure 4 An example of the spectral distortion reduction due to the histogram-matching for component substitution approaches. An image acquired by the IKONOS sensor over the Toulouse city is fused. Error maps between the ground-truth (reference image) and the fusion products using (A) the principal component substitution (PCS) without histogram-matching and (B) the PCS with histogram-matching. A greater spatial distortion can be pointed out in the case of PCS without the histogram-matching with respect to the same procedure including the histogram-matching processing step. 74 Figure 5 Flowchart presenting the blocks of a generic component substitution pansharpening procedure. HSR, Higher spectral resolution; LPF, Low-pass filter. 75 Figure 6 Flowchart of a generic multiresolution analysis pansharpening approach. HSR, higher spectral resolution. 77 List of Figures lxxxi Figure 7 Reduced resolution Hyp-ALI data set (Red ¼ band 30, Green ¼ band 20, Blue ¼ band 14): (A) ground-truth; (B) EXP; (C) principal component substitution; (D) GrameSchmidt; (E) GrameSchmidt adaptive; (F) smoothing filter-based intensity modulation; (G) modulation transfer function-generalized Laplacian pyramid; (H) modulation transfer function-generalized Laplacian pyramid with high pass modulation. 85 Figure 8 Close-ups of the full resolution Hyp-ALI data set (Red ¼ band 30, Green ¼ band 20, Blue ¼ band 14): (A) panchromatic; (B) EXP; (C) principal component substitution; (D) GrameSchmidt; (E) GrameSchmidt adaptive; (F) smoothing filter-based intensity modulation; (G) modulation transfer function-generalized Laplacian pyramid; (H) modulation transfer function- generalized Laplacian pyramid with high pass modulation. 87 Chapter 2.4 Figure 1 Exploration of a Raman image of an emulsion [1,2]. Left, false color image of a Raman hyperspectral image. Top right, 30 random spectra taken from the image and bottom right, the corresponding images obtained for some selected wavelengths. 94 Figure 2 Graphical representation of a principal component analysis model of a hyperspectral sample containing two chemical compounds. 96 Figure 3 Principal component analysis (PCA) model of the emulsion sample. Top left, the false color image. Top, right, the first four PCs with the corresponding explained variance. Bottom left, a composite image using PC1, PC2, and PC3 surfaces, and they were the RGB channels. Bottom right, the loadings corresponding to the first four PCs. PC, principal component. 97 Figure 4 Principal component analysis (PCA) model of a multispectral image of a banknote of 10 euros. Top left, the true color (RGB) image. Bottom left, a composite image using PC1, PC2, and PC3 surfaces, and they were the RGB channels. Middle, the first four PCs with the corresponding explained variance. Right, the loadings corresponding to the first four PCs. PC, principal component. 98 Figure 5 Sign ambiguity shown in the example of Fig. 2. Left shows the result obtained in the analysis in Fig. 2. Right shows the same result, but multiplied times �1. PC, principal component. 99 Figure 6 PC1 versus PC2 density scatter plot of the multispectral image of the banknote of 10 euros and the selection of four different pixel regions in the scatter plot and their position in the sample. PC, principal component. 100 Figure 7 Principal component analysis (PCA) models performed to a hyperspectral image of a tablet (further information [11]). For every line the first four PCs and the corresponding loadings are shown. (A) PCA model of the whole surface. (B) PCA model of the coatings and the core of the tablet. (C) PCA model of only the core of the tablet. (D) PCA model of only the coatings of the tablet. PC, principal component. 102 Figure 8 Schematic representation of a dendrogram for a simulated data set involving 37 objects from three clusters with different within-group variance. Complete linkage was used as metrics and the resulting hierarchical tree shows the progressive agglomeration of the groups from individual sample clustersup to the last step when all objects are grouped into a single cluster. 106 Figure 9 K-means clustering of the banknote by using four, five, and six clusters. Top, cluster assignation by colors. Bottom, the corresponding centroids. 111 lxxxii List of Figures Figure 10 Cluster analysis of a mixture composed by ibuprofen and starch. Top, K-means models with 2, 3, and 10 clusters with the corresponding centroids. Bottom, Fuzzy clustering model with two clusters and the corresponding centroids. 112 Chapter 2.5 Figure 1 Image cube and bilinear model. 116 Figure 2 (A) Bilinear model of a 4D image formed by three spatial dimensions (x, y, and z) and one spectral dimension. (B) Trilinear model of a 4D image formed by two spatial dimensions (x and y) and two spectral (excitation/ emission) dimensions. 118 Figure 3 Principal component analysis (PCA) model (top plot) and multivariate curve resolution (MCR) model (bottom plot) from a Raman emulsion image. 120 Figure 4 (A) Fixed size image window-evolving factor analysis application to a hyperspectral image. Principal component analysis (PCA) analyses and local rank map (B) combination of local rank and reference spectral information to obtain masks of absent components in pixels (in red). These absences are used as local rank constraints in multivariate curve resolution analysis. 125 Figure 5 (A) Four-dimensional excitation-emission fluorescence measurement image structured as a data matrix. (B) Implementation of the trilinearity constraint in the ST matrix of emission spectra signatures. PCA, Principal component analysis. 127 Figure 6 Multiset structures and bilinear models for (A) several images obtained with the same spectroscopic platform and (B) a single image obtained with several platforms. 129 Figure 7 Multivariate curve resolution results (maps and spectral signatures) obtained from a multiset analysis of ink images obtained at different depths in a document. The sequence of use of inks can be seen from the distribution maps (Pilot BPG is more dominant in the upper layers in the ink intersection and crosses over Pilot BAB). 131 Figure 8 Incomplete multiset used to couple images obtained from different spectroscopic platforms with different spatial resolutions. 134 Figure 9 Multivariate curve resolution results obtained from the analysis of an incomplete multiset formed by Raman and FT-IR images from a sample of tonsil tissue. FT-IR, Fourier-transform infrared. 135 Figure 10 (A) Image fusion of 3D and 4D excitation-emission fluorescence measurement fluorescence images. 136 Figure 11 (A) Maps and resolved spectra for a kidney stone Raman image, (B) segmentation maps and centroids obtained from raw image spectra and from multivariate curve resolution (MCR) scores. 138 Figure 12 Use of multivariate curve resolution scores for quantitative image analysis at a bulk image and local pixel level. 139 Figure 13 Heterogeneity information obtained from multivariate curve resolution (MCR) maps of compounds in a pharmaceutical formulation, see Ref. [37] (top plot). Constitutional heterogeneity represented by histograms obtained from map concentration values (middle plots). Distributional heterogeneity represented by heterogeneity curves (bottom plots). AAS, Acetylsalicylic acid. 141 Figure 14 Superresolution strategy based on the combination of multivariate curve resolution multiset analysis, and superresolution applied on to the resolved maps from a set of images slightly shifted from one another. MCR-ALS, Multivariate curve resolution-alternating least square. 142 List of Figures lxxxiii Figure 15 Combination of multivariate curve resolution resampling and use of resolved spectral signatures to develop compound-specific PLS-DA or ASCA models. 144 Chapter 2.6 Figure 1 Illustration of the linear mixture model. 152 Figure 2 Illustration of different nonlinear scenarios. (A) Multilayered mixtures. (B) Intimate mixtures. 153 Figure 3 Schematic diagram of the radiative transfer model. IFOV, Instantaneous field of view. 155 Figure 4 Color image corresponding to the 3D model used to generate the synthetic hyperspectral data sets. (A) Orchard image with only two endmembers (soil and trees) and (B) orchard data set considering three endmembers (soil, trees, and weeds). 158 Figure 5 Washington, DC, data set (band 50). 161 Figure 6 Abundance fractions. (Top) Grass; (center) trees; (bottom) shadows. 161 Figure 7 Endmembers spectral signatures: anorthite, enstatite, and olivine. 162 Figure 8 Two-dimensional scatterplot of the intimate mixture. (A) Reflectance. (B) Average single-scattering albedo. True endmembers (circles), intimate mixtures (dots), endmember estimates by nonlinear unmixing method (squares), simplex identification via split augmented Lagrangian (SISAL) endmember estimates (triangles), vertex component analysis (VCA) (stars). 163 Chapter 2.7 Figure 1 (A) Geometric interpretation of the linear mixing model (LMM) in the case of three endmembers (red dots). The axes represent a basis of the linear subspace spanned by the endmembers. A pixel that does not satisfy the usual LMM (xk’) is shown. (B) A nonlinear mixture of the three endmembers for pixel xk’. (C) Spectral variability in an LMM framework. 170 Figure 2 (A) Concept of spectral bundles. (B) Geometric interpretation of using fully constrained least-squares unmixing (FCLSU) on the whole extracted dictionary. The red polytopes are the convex hulls of the different bundles. The yellow points are accessible endmembers when using FCLSU, whereas they were not extracted by the endmember extraction algorithm. 176 Figure 3 (A) Example of the construction of a binary partition tree (BPT). At each step of the merging process, the two most similar regions are merged. (B) Example of pruning of the BPT of (A). 180 Figure 4 (A) Geometric interpretation of local spectral unmixing. (B) The fluctuations of local endmembers around references (in green) are at the core of most computational models to address material variability. (C) A simple parametric model to deal with endmember variability (one free parameter). (D) A more complex model (two free parameters). 182 Figure 5 Acquisition angles for a given spatial location (red dot). The tangent plane at this point of the surface is in brown. The incidence angle is q0, the emergence angle is q, and the angle between the projections of the sun and the sensor is the azimuthal angle, denoted as f. g is the phase angle. q0 and q are defined with respect to the zenith, which is defined locally (in each point of the observed surface) as the normal vector to the observed surface at this point. 187 Figure 6 Geometric interpretation of the extended linear mixing model in the case of three endmembers. In blue are two data points, in red are the reference endmembers, and in green are the scaled versions for the two considered pixels. The simplex used in the linear mixing model is shown in dashed lines. 192 lxxxiv List of Figures Figure 7 (A) An RGB representation of the Houston hyperspectral data set. (B) High- spatial-resolution color image acquired over the same area at a different time. (C) Associated LiDAR data, where black corresponds to 9.6 m and white corresponds to 46.2 m. 195 Figure 8 The abundance maps estimated by all algorithms for the Houston data set. The color scale goes from 0 (blue) to 1 (red). ELMM, Extended linear mixing model; FCLSU, Fully constrained least-squares unmixing; NCM, Normal compositional model; PLMM, Perturbed linear mixing model; SCLSU, Scaled (partially) constrained least-squares unmixing. 196 Figure 9 Magnitude of the perturbed linear mixing model (PLMM) variability term (top row), the scaling factors estimated by scaled (partially) constrained least- squares unmixing (SCLSU) (middle row), and the proposed approach (bottom row). ELMM, Extended linear mixing model. 197 Figure 10 Scatterplots of the results of the tested algorithms, represented using the firstthree principal components of the data. Data points are in blue, extracted endmembers are in red, and reference endmembers are in black (except for the bundles, where all the endmember candidates are in black). ELMM, Extended linear mixing model; FCLSU, Fully constrained least-squares unmixing; NCM, Normal compositional model; PLMM, Perturbed linear mixing model; SCLSU, Scaled (partially) constrained least-squares unmixing. 197 Chapter 2.8 Figure 1 Development of the X matrix from different hyperspectral imaging or multispectral imaging samples by extracting the region of interest (RoI). 209 Figure 2 Raw image (A), prediction maps of chlorophyll-a (B), chlorophyll-b (C), total chlorophyll (D), and carotenoids (E) by applying the corresponding partial least square models using optimal wavelengths on a randomly selected image. 210 Figure 3 Visualization of internal quality index (IQI) prediction using partial least squares and optimal wavelengths for different cultivars of nectarines. 214 Figure 4 Tenderness distribution maps for beef longissimus muscle from PLS-DA models using the mean spectrum of the whole rib eye area. SF50 and SF300b: the region of interest (RoI) is the rib eye area. SF300a: the RoI is the core position. 219 Figure 5 An overview of the overall process monitoring of roll compaction and tableting; the implementation of NIR-CI to gain information related to the physical or chemical properties of intermediate or final product. 221 Figure 6 Active principal ingredient (API) distribution map of tablets predicted by partial least squares regression (PLS-R) model. 222 Chapter 2.9 Figure 1 (A) The model implied by Eq.(9) where the measured signal is x ¼ xc þ csþ e. (B) The model implied by Eq. (10) where the measured signal is x ¼ ccxc þ csþ e. (C) The model implied by Eq. (10) with strict closure on the contributions. 236 Figure 2 (A) Scores image for principal component (PC) 1 for a near-infrared image of wheat gluten (no signal processing was used). (B) Scores on PC 2 versus PC 1 with approximate 95% and 99% confidence ellipses based on the assumption of normality. (C) Scores histograms for PC 1 (top) and PC 2 (bottom) compared to Gaussian distributions. 237 List of Figures lxxxv Figure 3 (Left) RGB image of the PCA scores on PCs 1, 2, and 3. (Right) Image of PCA Q residuals showing Pixel 560 has high Q (bright yellow). Its measured spectrum is plotted in Fig. 4 (bottom right). PCA, Principal component analysis; PC, Principal component. 238 Figure 4 (Left) Contrasted image of target contributions. (top right) Pixel Group A spectra. (Bottom right) Normalized spectrum for Pixel 560 compared to Pixel 383. 239 Figure 5 (Left) RGB image of the contributions (MCR scores), (top right) estimated normalized pure component spectra, (bottom right) scores profile for the white arrow in the image. In each image/graph: Blue is Component 1 ¼ major wheat gluten signal, Red is Component 2 ¼ minor wheat gluten signal (pixels interspersed), and Green is Component 3 ¼ melamine target. 240 Figure 6 (Left) Image of Lake Chelan and the surrounding area based on bands 3, 2, 1 (RGB) listed in Table 1. (Right) Binary image of pixels comprising signal primarily associated with water (yellow). Lake Chelan, the Chelan River, and four small regions (circled) were correctly classified as water. 241 Figure 7 (Left) Class Lawn detections (yellow) and other Class Green (dark blue). (Right) Class Cherries detections (yellow) and other Class Green (dark blue). 243 Figure 8 (Left) RGB image with an overlay of Class Lawn (green), Class Cherries (blue), Class Green (non-Lawn and non-Cherries) (yellow), and Class Bare Earth (red). 244 Chapter 2.10 Figure 1 The ensemble topologies (A) Concatenation style; (B) Parallel style. 249 Figure 2 Flowchart of rotation random forest-kernel principal component analysis (RoRFeKPCA). RF, Random forest. 252 Figure 3 (A) Three-band color composite of AVIRIS image (B) Reference map. 252 Figure 4 Classification maps obtained by (A) random forest, (B) support vector machine, (C) rotation random forest-principal component analysis, (D) rotation random forest-kernel principal component analysis (RoRF-KPCA) with linear, (E) RoRF-KPCA with RBF, and (F) RoRF-KPCA with polynomial. 254 Figure 5 Reduced AP obtained by fusing the multiscale information extracted by a large AP built on a single input feature. Thickening and thinning profiles are the two components that compose the entire AP. The final rAP is composed by the original feature (middle), a feature for the thickening component (left) and one for the thinning component (right). 257 Figure 6 Overview of the Hyperion hyperspectral image over Sodankylä: (A) RGB true color composition; (B) area defined for the training (green) and test (red) sets. 257 Figure 7 Classification performance for the Hyperion hyperspectral image over Sodankylä: (A) reference map and the (B) classification map obtained by using the proposed approach. 258 Figure 8 Schematic of (A) EPF-based feature extraction, and (B) EPF-based probability optimization for classification of hyperspectral images. EPF, Edge-preserving filtering. 259 Figure 9 The effect of edge-preserving filteringebased feature extraction. (A) Input hyperspectral band (B) Filtered image. 260 Figure 10 The effect of edge-preserving filtering in probability optimization. (A) Input probability map. (B) Guidance image. (C) Filtered probability map. 261 lxxxvi List of Figures Figure 11 Indian Pines data set. (A) Three-band color composite. (B) Reference data. (C) Class names. 263 Figure 12 Classification maps obtained by different methods on the Indian Pines data set using 1% of the available samples as training set: (A) support vector machine, overall accuracy (OA) ¼ 52.96%; (B) extended multiattribute profile, OA ¼ 68.71%; (C) guided filteringebased probability optimization, OA ¼ 66.55%; (D) hierarchical guidance filtering, OA ¼ 77.81%; (E) image fusion and recursive filtering, OA ¼ 73.92%; (F) principal component analysisebased edge-preserving features, OA ¼ 84.17%. 263 Figure 13 Classification maps obtained by different methods on the Indian Pines data set using 10% of the available samples as training set: (A) Support vector machine, overall accuracy (OA) ¼ 52.96%; (B) Extended multiattribute profile, OA ¼ 93.66%; (C) guided filteringebased probability optimization, OA ¼ 93.36%; (D) Hierarchical guidance filtering, OA ¼ 96.89%; (E) Image fusion and recursive filtering, OA ¼ 97.77%; (F) Principal component analysisebased edge-preserving features, OA ¼ 98.91%. 264 Chapter 2.11 Figure 1 Fusion categories defined by five different authors. 283 Figure 2 Graphical representation of processes for illustrating fusion methods: (A) Unit of data symbolizes the spatial space and the type of information. (B) A block expresses the task for processing data and information. (C) Interaction arrow for representing the inputs and outputs of processing blocks. (D) Input simultaneity to a processing block. 283 Figure 3 Illustration of fusion at low level or observation level. 284 Figure 4 Fusion at medium level or feature level. 286 Figure 5 Fusion at high level or decision level. 292 Chapter 2.12 Figure 1 The experiment (A) Illustration of experimental setup used to measure the spectral reflectance and weight of a drying wood sample (B) RGB rendering of wood sample in wet state (drying time ¼ 0 h) (C) RGB rendering of wood sample in dry state (drying time ¼ 21 h). 308 Figure 2 Overview of experimental data acquisition and modeling of hyperspectral video (a) Input data: (2200 � 1070) pixels � 159 wavelength channels � 150 time points. (bei) Model what is known about input data: EMSC modeling of two-way input data for 159 wavelength channels at 353,100,000 pixels (2200 � 1070 � 150) � 159 wavelengths, and spatiotemporal averaging. (jeo) Model what is unknown: Adaptive bilinear modeling of two- way residual data for 353,100,000 pixels� 159 wavelengths. EMSC, extended multiplicative signal correction; OTFP, On-the-fly-processing. 309 Figure 3 Modeling the known: Spectral and temporal structure of the parameters from extended multiplicative signal correction (EMSC). Left column shows EMSC model spectra chosen for modeling apparent absorbance; (A) Absorbance spectrum m for estimating optical path length, calculated as the average spectra of the last (driest) image in the series. (B) Constant “spectrum” for estimating baseline offset. (C) Linear “spectrum” for estimating baseline slope. (D) Dominant pigment spectrum DsWoodPigment, defined as the average difference between early- and latewood pixels in the last (driest) image in the List of Figures lxxxvii series. (E) Water spectrum DsWater. Right column shows the temporal development of all EMSC parameters (estimated at each point in time by averaging over all image pixels). 313 Figure 4 Weight vector used to assign weights to different wavelength regions during EMSC and OTFP. Red dotted line represents measured signal-to-noise ratio. Dark solid line represents smoothed S/N curve used as weight vector v in both EMSC and OTFP. EMSC, extended multiplicative signal correction; OTFP, On-the-fly-processing. 315 Figure 5 (A) Weight of wood sample as function of drying time. (B) Percentage of water in wood sample as function of drying time, calculated as water% ¼ 100,(wwood � 245.46 g) /245.46. (C) Rate of change d(water%) /dt as function of drying time. (D) Rate of change d(water%) /dt as function of water%, with four local approximation lines. (E) ln(wwood � 245.46 g) as function of drying time. (F) lnðwater%Þ wood as function of drying time, with three local approximation lines. 319 Figure 6 Apparent absorbance spectra from 10 typical pixels of the wood sample in wet condition (left) at t ¼ 0 h and dry condition (right) at t ¼ 21 h. The black dotted line represents the chosen reference spectrum m, which is the average of all pixels in the image taken after 21 h of drying. Top figures show spectra before EMSC preprocessing. Middle figures show spectra after EMSC preprocessing. Bottom figures show unmodeled spectral residuals after the EMSC modeling. Windows within the figures show a magnification of the 940e1005 nm region strongly associated with water absorption. EMSC, extended multiplicative signal correction. 320 Figure 7 Modeling the known: Spatial structure of EMSC parameters. 2D visualization of fitted EMSC parameters in wet wood sample, i.e., t ¼ 0 h (upper row) and dry wood sample, i.e., t ¼ 21 h (lower row) for all parameters used in the EMSC model. EMSC, extended multiplicative signal correction. 323 Figure 8 Modeling the unknown: Spectral and temporal structure of the parameters from adaptive bilinear modeling in the on-the-fly-processing (OTFP) implementation. Left column shows the OTFP model spectra estimated for modeling of apparent absorbance. Deweighted loadings for components 1e5. Right column shows the temporal development of the adaptive bilinear modeling parameters (estimated at each point in time by averaging over all image pixels). 325 Figure 9 Modeling the unknown: Spatial structure of OTFP parameters. 2D visualization of reconstructed OTFP scores of wet wood sample, i.e., t ¼ 0 h (upper row) and dry wood sample, i.e., t ¼ 21 h (lower row) for the five first principal components (PCs). 326 Figure 10 How the variation at the different wavelengths was explained by the sequence of modeling steps: In the input data, after EMSC and after OTFP PCs #1, #2, #3, #4 and #5. Left: Residual standard deviations, statistically weighted. Right: Residual standard deviations, deweighted. EMSC, extended multiplicative signal correction; OTFP, On-the-fly-processing; PC, Principal component. 327 Figure 11 Kinetic modeling of the hyperspectral video developments: Analysis of the known and unknown temporal developments of the parameters from EMSC (left) and OTFP (right), averaged over all pixels, as first-order dynamic processes. Dotted: ln(normalized parameters); densely dotted represent the data points used in the least squares estimation of the kinetic parameters. Straight red lines: Model fitted. EMSC, extended multiplicative signal correction; OTFP, On-the-fly-processing; PC, Principal component. 329 lxxxviii List of Figures Chapter 2.13 Figure 1 Forward and inverse problems in remote sensing data modeling. RTM, radiative transfer model. 335 Figure 2 Statistical inverse modeling. 337 Figure 3 Leaf area index (LAI) map [m2/m2] processed by Gaussian process regression (GPR) using all 125 bands (top left), LAI map processed by PLS-GPR using five components (top right), associated GPR uncertainty estimates (s), respectively (bottom). Relative differences in SD (s) between GPR of all 125 bands and PLS-GPR are also provided (bottom right) [36]. 340 Figure 4 Subsets of S2 composite LAIgreen and LAIbrown product [m 2/m2] for South of Toulouse, France (left), and West of Valladolid, Spain (right). LAI, leaf area index. 342 Figure 5 (A) LAI and fAPAR MODIS 8 daily time series of Spain rice are from 2003 to 2014. (B) Predictions made using individual, single-output models. (C) Predictions made using the ICM model. fAPAR, fraction of absorbed photosynthetically active radiation; LAI, leaf area index; MODIS, Moderate Resolution Imaging Spectroradiometer. 347 Figure 6 (A) Predictions with individual Gaussian processes using a covariance function with three terms: bias, linear, and Matérn kernel. (B) The same predictions using the linear model of coregionalization multioutput model. fAPAR, fraction of absorbed photosynthetically active radiation; LAI, leaf area index. 348 Figure 7 Projection of data only NDVIeLAI space, showing how different crop types tend to cluster together. LAI, leaf area index; NDVI, normalized difference vegetation index. 351 Figure 8 Different Gaussian process (GP) approximations to estimate surface temperature from infrared atmospheric sounding interferometer radiances: classical GPs, sparse spectrum (SSGPs), and sparse approximation based on inducing variables (FITC). [Left] Root mean squared error (RMSE) as a function of the number of data points present in the kernel (m). [Center] Training time as a function of m. [Right] RMSE as a function of training time. 356 Figure 9 Scheme of an automatic emulator. RTM, radiative transfer model. 357 Figure 10 General sketch of the Automatic Emulation (AE) procedure. Top: the radiative transfer model g(y) (solid line), its approximation bgtðyÞ (dashed line). Bottom: the acquisition function At(y). Its maximum suggests where a new node should be added there. 358 Figure 11 RMSE on test grid computed for emulators using different sampling methods. Each method is initialized with 50 points sampled with the LHS scheme, upon which 50 more are sampled. AMOGAPE, Automatic Multi-Output Gaussian Process Emulator; LHS, Latin hypercube sampling; RMSE, Root mean squared error. 363 Chapter 3.1 Figure 1 Three-dimensional oblique view of a portion of the Russell dune on Mars. Information on the nature of the materials and on their texture as well as on the active areas is extracted from different types of imagery and represented using color coding. 377 List of Figures lxxxix Figure 2 Spectral endmembers extracted from the data set presented in Fig. 1 using different methods (A) VCA (vertex component analysis), (B) BPSS (Bayesian positive source separation), (C) MVC-NMF (minimum volume constrained non-negative matrix factorization), and (D) spatial-VCA (vertex component analysis) methods (see Ref. [44] for more details). 378 Figure 3 Global distribution of coastal and inland aquatic ecosystems. Red indicates regions where water depth is less than 50 m and where land elevation is less than 50 m. Light violet to dark violet gives the concentration of inland wetlands, lakes, rivers, and other aquatic systems. Increased darkness means greaterpercentage of areal coverage for inland aquatic ecosystems [48,49]. 380 Figure 4 Example of spectra of the main materials of interest in the campaign of acquisition. 382 Figure 5 From left to right: coral reef, hyperspectral image acquired over Saint Gilles reef flat (Reunion island, Indian Ocean), bathymetry estimation, classified coral vitality index distribution from low (orange) to high (blue green) values, and evolution of the coral cover with areas of degradation in red and progression in blue. 383 Figure 6 True color atmospherically corrected hyperspectral images (R ¼ 638 nm, G ¼ 551 nm, B ¼ 471 nm) of (top left) Boucan and (bottom left) Ermitage in Reunion Island, hyperspectral derived bathymetry (to right), and RGB composition of unmixing results for corals, algae, and sand abundances corresponding, respectively, to red, green, and blue channels (bottom right). Seagrass (absent in outer reef area) is indirectly represented by the dark pixels (modified from Petit et al. [86]). 384 Figure 7 Typical minimal and maximal snow extent over the northern hemisphere. 385 Figure 8 (Top) Reflectance of the first seven bands of MODIS over an area of 100 by 80 km near Grenoble, France. Pixels covered by clouds are masked and reported in white. (Bottom) Reflectance of the seven materials considered as endmembers for spectral unmixing. 388 Figure 9 Results of a binary detection of the snow cover from a simple thresholding of the NDSI (left, snow in yellow), and fractional results obtained by spectral unmixing, following the approach of Pascal et al. [100]. NDSI, normalized difference snow index. 389 Figure 10 Example of biodiversity mapping results throughout the CICRA site located in lowland Amazonia. Panels are as follows: (A) natural color composite image from the CAO visible-to-shortwave infrared (VSWIR) imaging spectrometer; (B) a-diversity based on the Shannon index; and (C) b-diversity based on BrayeCurtis dissimilarity (no color scale is applicable, and a larger BrayeCurtis dissimilarity between two plots corresponds to larger differences in color in the RGB space between the two corresponding pixels). 394 Figure 11 Simulation of airborne optical imaging in a tropical forest from French Guiana (Paracou). The simulation is based on the integration of airborne LiDAR and field spectroscopy into the 3D radiative transfer model DART [163]. Three- dimensional mockups were computed from airborne LiDAR cloud points using AMAPvox [164], and leaf optical properties corresponding to sampled trees (delineated in red) were assigned to leaf elements from all voxels on the vertical column of the mockup. A generic set of leaf optical properties was applied to all trees with undocumented leaf optical properties using the PROSPECT leaf model [127]. Left: original image (red ¼ 640 nm; green ¼ 549 nm; blue ¼ 458 nm); center: simulation using a turbid representation for leaf elements; right: simulation using triangle approach for leaf elements. 396 xc List of Figures Chapter 3.2 Figure 1 Example of spectral signatures for the tree species considered in this study. 414 Figure 2 Airborne data (A) visible near-infrared, (B) short-wavelength infrared, (C) panchromatic, (D) digital surface model. 414 Figure 3 Results (A) Visible near-infrared, (B) short-wavelength infrared, (C) panchromatic, (D) digital surface model. Red tree crowns indicate misclassifications while blue crowns indicate good classifications. 415 Figure 4 Pléiades image of Mangegarri waste storage area (upper left) and field spectra of main materials: alumina, bauxite, and Bauxaline (upper right). Mangegarri is located at Gardanne, southern France, and area extension is about 0.5 � 1.5 km2. Pléiades image of Ochsenfeld waste storage area (lower left) and laboratory spectra of main materials: red and white gypsum (middle right), and samples presenting a mixture of calcite and iron oxyhydroxides in various proportions compared to red gypsum (lower right). The Ochsenfeld site is located at Thann, eastern France, and area extension is about 0.5 � 1.0 km2. 420 Figure 5 HySpex VNIR color composites of Gardanne (upper left) and maps of alumina (green), bauxite (red), and Bauxaline (blue) (lower left) based on band positions in the VNIR and SWIR. APEX VNIR color composite of Thann (upper right) and map of red gypsum (red), white gypsum (green), and calcite (blue) (lower right) based on band positions in the VNIR and SWIR. VNIR, Visible near-infrared; SWIR, Short-wavelength infrared. 422 Figure 6 CH4 map in ppm/m retrieved during the NAOMI (Collaboration project between ONERA and TOTAL) using HySpex-NEO SWIR hyperspectral camera and ONERA algorithm. 430 Figure 7 Top: CH4 map in ppm/m retrieved over Kern Oil River site (California) using HyTES LWIR hyperspectral camera (JPL) and ONERA algorithm. Bottom: CH4 map in ppm/m retrieved during the NAOMI (Collaboration project between ONERA and TOTAL) using TELOPS LWIR HyperCam hyperspectral camera and ONERA algorithm IMGSPEC. LWIR, Long-wave infrared. 431 Figure 8 Top: SWIR spectral sensitivity (reflectance change in % corresponding to a nominal reflectance of 0.1) due to CH4 corresponding to 2000 ppm m (red line) compared to reflectance uncertainties due to a signal-to-noise ratio (SNR) of 1000 and atmospheric correction. Bottom: LWIR spectral sensitivity (brightness temperature signal change after atmospheric compensation) due to CH4 corresponding to 1000 ppm m for a thermal contrast of 1K (red), 2K (blue), and 3K (green) compared to brightness temperature uncertainties from LWIR SNR spectra. SWIR, Short-wavelength infrared; LWIR, Long-wave infrared. 432 Figure 9 UAV route and the ground truth acquisition points (field 3). 436 Figure 10 A color image converted from a reflectance image along a UAV route of field 3. 437 Figure 11 Reflectance of rice, white sheet, and orange sheet. 437 List of Figures xci Chapter 3.3 Figure 1 Precision agriculture among different crop fields. 455 Figure 2 Clustering results by new methods. (A) FCM, fuzzy c-means; (B) FCIDE, automatic fuzzy clustering using an improved differential evolution algorithm; (C) AMASFC, adaptive memetic fuzzy clustering algorithm with spatial information; (D) AFCMDE, automatic fuzzy clustering method based on adaptive multi-objective differential evolution; (E) AFCMOMA, adaptive multi-objective memetic fuzzy clustering algorithm. 457 Figure 3 Heavy metal pollution in rice in all of the research area. 459 Figure 4 Light reflections in healthy, stressed, and dead leaves. The graphs below indicate the relative reflection in the blue (B), green (G), red (R), and near- infrared (NIR) channels. 460 Figure 5 Levels of water concentration in crop fields. (A) true colour image of a NVT (National Variety Trials) wheat field after sowing (the red rectangles are the experimental field boundaries); (B) image resulting from dividing first and second principal component images. 461 Figure 6 Interpretation of hyperspectral images of maize for detecting healthy and dead leaves of maize. 463 Chapter 3.5 Figure 1 EA.hy926 endothelial cell infected with Staphylococcus aureus. (A) White light imaging. Intracellular spherical particles (1 mm, indicated with white arrows) in single forms or grapelike groupings, characteristic of staphylococci. (B) False color Raman image (pixel size: 1 � 1 mm2) centered at 2938 cm�1 (CH stretching vibration) in arbitrary units showed in the grayscale bar. (C) High-resolution Raman scan N-FINDR (pixel size: 0.25 � 0.25 mm2) of the section marked in part A in a false color image. The relative RGB color contributions are assigned to bacteria (green), cellular nucleus (blue), perinuclear region (red), and background (black) spectral profiles. (D) Labeled fluorescence image of the same cell stained with an antibody binding S. aureus (Alexa Fluor 488, green) and DAPI binding DNA (blue), present in the nucleus and the bacteria. (E) Z-stack (depth imaging) of N-FINDRimages recorded at 1-mm intervals (C corresponds to the 2-mm plane). 509 Figure 2 (A) Raman spectra of biofilm treated with CS-PLGA NPs. The lower and upper spectra indicate Raman spectra of CS-PLGA NPs and biofilm, respectively, within biofilm treated with CS-PLGA NPs. (B) Visualization of biofilm treated with CS-PLGA NPs for 4 h by slit-scanning Raman spectromicroscopy. The Raman band images were reconstructed from Raman bands at 1770 cme1 (i) and 3180 cm�1 (ii). A superimposed image (iii) of CS- PLGA NPs (1770 cm�1), shown in blue in (i), and biofilm (3180 cm�1), shown in yellow in (ii), is also shown. Scale bars ¼ 5 mm. CS, chitosan; NPs, nanoparticles; PLGA, poly lactide-co-glycolide. 510 Figure 3 (A) Raman image acquired from a 48-h swarm plate constructed from a 1338e1376 cm�1 filter to include the marker band for quinolones/ quinolines; (B) loading plot of PC1 generated from analysis of the Raman image of the 48 h swarm plate. PC1 contains features that correspond to bands from standard spectra of quinolines possessing the same functional group. 511 xcii List of Figures Figure 4 Principal component loading plots from (A) “ex situ” protocol: Pantoea sp. YR343 planktonic cells mixed with preformed Ag nanoparticles and (B) “in situ” protocol: Pantoea sp. YR343 cells intimately coated with Ag NPs. 513 Figure 5 Theoretical Raman (red) calculated from vibrational frequencies (cm�1) of pyocyanin and corresponding assignments and experimental Resonant Raman (black) spectra of pyocyanin obtained at the indicated excitation wavelengths. 514 Figure 6 Imaging of violacein and pyocyanin expression in coculture of Chromobacterium violaceum (CV026) and Pseudomonas aeruginosa (PA14) grown for 20 h. The dashed squares indicate the confrontation zone. Culture was grown on gold 60 nm nanospheres on glass covered by lysogeny-broth agar (Au@agar). (A) SERS mapping of violacein (727 cm�1) (B) SERRS mapping of pyocyanin (544 cm�1) (C) SERS mapping of violacein and pyocyanin. (D) SERS intensities of violacein (727 cm�1) and SERRS intensities of pyocyanin (544 cm�1) as a function of distance. Three repetitions measured at the spots indicated in (C) with white asterisks and plotted. Standard deviation for each spot is shown in error bars. All measurements were acquired with an excitation laser wavelength of 785 nm, 5� objective, and a laser power of 12.21 kW cm�2 for 10 s. SERS, surface- enhanced Raman scattering; SERRS, Surface-enhanced resonance Raman scattering. 515 Figure 7 Detection of bacteria in milk. (A) CARS image of Escherichia coliemilk mixture. (B) Reconstructed map of E. coli and milk components after MCR analysis. The E. coli is mapped with red color and the milk is mapped with green color. The red dashed rectangular in (A) and (B) indicates the location where E. coli and milk are overlapped. (C) Corresponding phase retrieved output spectra of each component after MCR analysis. (D) CARS image of milk alone dried on glass. (E) Reconstructed map of E. coli and milk components after MCR analysis. (F) Corresponding phase retrieved output spectra of each component after MCR analysis. Scale bars: 10 mm. CARS, coherent anti-Stokes Raman scattering; MCR, multivariate curve resolution. 516 Chapter 3.6 Figure 1 Comparison between hypercube and RGB image. Hypercube is three- dimensional data set a 2D image on each wavelength. The lower left is the reflectance curve (spectral signature) of a pixel in the image. RGB color image only has three image bands on red, green, and blue wavelength, respectively. The lower right is the intensity curve of a pixel in the RGB image. 526 Figure 2 Schematic diagram of a push broom hyperspectral imaging system. NIR, Near-infrared. 527 Figure 3 A gray scale image of a melanoma lesion showing the transmission spectra in the nuclear and interstitial areas. 539 Figure 4 Spatial oxygen saturation maps. (A) Healthy male (29 years old) oxygen saturation map. Vascular separation from the background is seen as well as reasonable saturation values for veins versus arteries. (B) Zero-order color image. (C) Healthy male (58 years old) oxygen saturation map. (D) Zero- order color images [99]. 543 List of Figures xciii Figure 5 (A) Cross-section diagram of tissue sample for ADSI testing. (B) Color photographs of mouse tumor tissue sandwiched between two glass slides. The opening due to the black mask that was used for transmission imaging is marked by the yellow dashed line. The black line (left panel) indicates the location of bone embedded in the tissue. (C) Normalized spectra from regions of tumor and muscle tissue (as indicated in (B)). (D) Correlation map of data cube based on reference spectral signature related to the muscle tissue. (E) Correlation map of data cube based on reference spectral signature related to the tumor tissue [153]. ADSI, Angular domain spectroscopic imaging. 546 Figure 6 (A) Photomicroscopic and corresponding medical hyperspectral imaging image from breast tumor in situ (4 � 3 cm) (upper left and upper middle panels). Resected tumor and surrounding tissue (5 � 7 mm) was stained with hematoxylin and eosin and evaluated by histopathology after resection. Microscopic histological images with further resolution are displayed (right panels). (B) Representative examples of normal tissue (grade 0), benign tumor (grade 1), intraductal carcinomas (grade 2), papillary and cribriform carcinoma (grade 3), and carcinoma with invasion (grade 4) are represented [56]. 548 Figure 7 (A) Photographic image of the biliary tissue structure. (B) Classification of the biliary tissue types based on hyperspectral imaging, superimposed with the fluorescence image of the ICG-loaded microballoons. The dual-mode image clearly identifies the biliary anatomy and its relative location with respect to the surrounding tissue components. 550 Figure 8 The RGB image is shown on the left side. Using the method described, the segmented image can be viewed on the right side. Spleen is shown in red, peritoneum in pink, urinary bladder in blue, colon in green, and small intestine in yellow. 551 Chapter 3.7 Figure 1 An example of image analytical quantification of crystal count and area coverage of a metastable amorphous drug crystallizing over time. Top image series (AeD) illustrates series of polarized light micrographs obtained from the center position of a sample over time. Image series (EeH) presents the polarized light micrograph obtained from the edge of same sample over time. It was readily observed that the crystallization was more extensive and occurred earlier for sample series (EeH) as compared with (AeD). Bottom: AeD illustrate the response from the image analysis, where different samples followed over time demonstrate different crystallization extent. A total of over 80 polarized light micrographs were obtained during this study, and without image analysis drawing chemical objective conclusions based on the 80 micrographs will be very challenging. 569 Figure 2 Water distribution in freeze-dried well plates after different storage times at 11% RH (relative humidity) for Plate 1 and 43% RH for Plate 3. The color scale depicts the percentage moisture content. Recrystallization of the sample after 10-day storage in Plate 3 (storage at 43% RH). 573 Figure 3 Fourier transformed infrared images of the dissolution process of tablets with 20 wt% drug loading. The color scales indicate the integrated absorbance that each color represents. PEG, Polyethylene glycol. 574 Figure 4 Dissolution troubleshooting with Raman imaging; example of polymorphic transformations in the extruded pharmaceutical (polymer drug mixture). 575 xciv List of Figures Figure 5 Coherent anti-Stokes Raman scattering (CARS) and sum-frequency generation (SFG) combined for sensitive multimodal imaging of multiple solid-state forms and their changes on drug tablet surfaces. 576 Figure 6 Decision-making tool based on optical imagesand fuzzy logic. 579 Figure 7 X-ray computed micro tomography data generated subvolumes of 3D printed geometries filled with silicon oil. The brighter regions correspond to the silicon oil which is visible in the outer and inner compartment (A), only in the outer compartment (B), and only in the inner compartment (C). 580 Chapter 3.8 Figure 1 Schematic view of hyperspectral data cube and the different approaches to data processing in typical cultural heritage applications. IR, infrared; PCA, principal component analysis; MNF, minimum noise fraction. 588 Figure 2 The prototype of the IFAC-CNR hyperspectral scanner during a measurement campaign on a panel painting belonging to the San Marco museum collection in Florence (Italy). 591 Figure 3 (A) RGB colorimetric reconstruction using the Vis image cube of the “The Annunciation,” a scene from a tempera panel attributed to Beato Angelico, part of the artwork “Armadio degli Argenti” (1451 ca.) from the collection of San Marco museum in Florence. Image reconstructed by IFAC-CNR with the permission granted by San Marco museumdMinistero per i Beni e le Attività Culturali. The reproduction rights are reserved. (B) Elaborated false color image: PC2 ¼ R; PC4 ¼ G; PC5 ¼ B. The numbers indicate the pixel locations from which the four endmember spectra were extracted. (C) Reflectance spectra of the endmembers: 1) ultramarine blue (lapis); 2) mixture 1: ultramarine blue with an unknown pigment; 3) mixture 2: ultramarine blue mixed with unknown pigment; 4) cobalt blue. (D) spectral angle mapping classification map obtained using the endmembers spectra reported in C). 597 Chapter 3.9 Figure 1 Handprint near-infrared false-color hyperspectral images. The presence of explosive residues is highlighted by colored pixels: (A) in pink for ammonium nitrate, (B) in yellow for dynamite, (C) in red for single-base (SB) smokeless gunpowder, (D) in blue for double-base (DB) smokeless gunpowder, and (E) in green for black powder. There are 15,402 pixels in a finger sample and 24,821 pixels in a hand palm sample. Enlargements of regions in (B) and (D) are included for clarification. 608 Figure 2 Classical least squares (CLS) classification model applied to a mixture of semen and vaginal fluid on cotton fabric. The CLS colored maps for each class (cotton, semen, urine, and vaginal fluid) are displayed. The maximum CLS weight values obtained for each class within the two selected stained regions are indicated above every color map, the first value corresponding to region 1 and the second to region 2. 609 Figure 3 False color images obtained from the elaboration of Raman images of crossing lines drawn with gel and oil blue pen inks. Different times separating the application of each line were considered, and the horizontal line (in green) was always applied first. 611 List of Figures xcv Figure 4 RGB images of (A and C) two concrete drill core samples and (B and D) corresponding prediction images obtained with a partial least-squares discriminant analysis classification model. The classes are aggregates (A, in red) and mortar (M, in blue). 615 Figure 5 (AeD) Four examples of prediction images obtained from a near-infrared hyperspectral system and corresponding RGB images. The imaged objects are made of polypropylene (PP), polyvinyl chloride (PVC), paper, high- density polyethylene (HDPE), low-density polyethylene (LDPE), polyethylene terephthalate (PET), polystyrene (PS), and other plastic polymers (OTHER; N.A.). 616 Figure 6 Relationships between remote sensing and fieldwork operations. 618 xcvi List of Figures List of Tables Chapter 2.1 Table 1 Summary of the main preprocessing steps, the techniques for their application, and their benefits/drawbacks. 48 Chapter 2.3 Table 1 Reduced resolution assessment. Quantitative results on the Hyp-ALI data set. Best results among the compared fusion approaches are in boldface. 86 Table 2 Full resolution assessment. Quantitative results on the Hyp-ALI data set. Best results are in boldface. 86 Chapter 2.4 Table 1 Parameters of the LanceeWilliams update formula for the different agglomeration methods, together with the definition of the initial dissimilarity measure. 106 Chapter 2.6 Table 1 Normalized mean squared errors (NMSE) between estimated and reconstructed abundance fractions by the bilinear and linear mixture models, for the data sets orchard2 and orchard3, with and without considering shadows. 159 Table 2 Normalized mean squared errors (NMSE) between the estimated and the reconstructed images by the bilinear and the linear mixture models for the data sets orchard2 and orchard3, with and without considering shadows. 160 Table 3 Mass fractions of the endmembers for each mixture [25]. 162 Table 4 Evaluation results for the intimate mixture experiment. 164 Chapter 2.7 Table 1 Running times and reconstruction errors of the tested algorithms on the Houston data set. 198 Chapter 2.8 Table 1 Overview of near-infrared hyperspectral imaging in agricultural products. 215 Table 2 Overview of near-infrared hyperspectral imaging in other food products (fish and meat). 217 Table 3 Overview of near-infrared hyperspectral imaging in other products. 220 xcvii Chapter 2.9 Table 1 Landsat 8 imaging bands. 242 Chapter 2.10 Table 1 Studies of remote sensing image classification using ensemble methods published in journals since 2008. 251 Table 2 Classification results in percentage by using 20 training samples per class. 253 Table 3 Classification performance obtained using the standard method (all the spectral band) and the proposed method. The table reports the mean class accuracies (%) and relative standard deviations over 10-fold cross-validation. The best accuracies are highlighted in bold. 258 Table 4 Classification performance of the SVM [72], EMAP [41], EPF [69], HGF [73], IFRF [68], and PCA-EPFs [70] methods for the Indian Pines data set with 1% training samples. Class accuracies, average accuracies (AA), overall accuracies (OA), and the kappa coefficients, are reported with the relative standard deviations. For each row, the highest accuracy is shown in bold. 265 Table 5 Classification performance of the SVM [72], EMAP [41], EPF [69], HGF [73], IFRF [68], and PCA-EPFs [70] methods for the Indian Pines data set with 10% training samples. Class accuracies, average accuracies (AA), and overall accuracies (OA), are reported with the relative standard deviations. For each row, the highest accuracy is shown in bold. 267 Table 6 Classification performance of different filters, i.e., the BF [60], GF [66], RGF [81], BTF [61], WLS [62], and DTRF [63] on the feature extraction framework. Class accuracies, average accuracies (AA), overall accuracies (OA), and kappa coefficients are reported with the relative standard deviations. For each row, the highest accuracy is shown in bold. 269 Table 7 Classification performance of different filters, i.e., the BF [60], GF [66], RGF [81], BTF [61], WLS [62], and DTRF [63] on the probability optimization framework. Class accuracies, average accuracies (AA), overall accuracies (OA), and kappa coefficients are reported with the relative standard deviations. For each row, the highest accuracy is shown in bold. 271 Chapter 2.11 Table 1 List of statistical feature descriptors with the respective references divided by the data source: height from 3D point cloud, amplitude of the return signal, CHM, or spectral band. 287 Table 2 List of topographic feature descriptors with the respective references divided by the data source: height from 3D point cloud or CHM. 288 Table 3 List of structural feature descriptors with the respective references divided by the data source: height from 3D point cloud or return intensity. 289 Table 4 List of VI with the respective references. 290 Table 5 Studies of forest monitoring classified by the type of application and the level of fusion. 295 xcviii List of Tables Chapter 2.13 Table1 Root mean squared error (RMSE) for different Gaussian process (GP) schemes, when the source crops of the training data are low in leaf area index, and vice verse for the test data. 351 Table 2 Values of physical parameters used for simulating with the PROSAIL model, corresponding to wheat. 362 Chapter 3.1 Table 1 List of the main remote sensing imaging spectrometer instruments (more than hundreds of spectral channels) in Earth and Planetary science, with their main characteristics (date of arrival, mission platform, planetary body, spectral range, maximum number of recorded wavelength). 373 Chapter 3.2 Table 1 Observed fields. 435 Table 2 Correlation coefficients between SPAD readings and estimated chlorophyll indices. 437 Chapter 3.3 Table 1 Detection of heavy metals at crop fields by using remote sensing. 458 Table 2 Reflectance indices for water stress assessment from different species by using hyperspectral imaging and remote sensing. 461 Table 3 Main applications of hyperspectral imaging in quality evaluation of agricultural and preharvest products. 464 Chapter 3.5 Table 1 Vis-NIR, IR HSI application articles reviewed. 497 Table 2 Raman research articles reviewed. 499 Chapter 3.6 Table 1 Summary of representative hyperspectral imaging systems and their medical applications. 534 List of Tables xcix Personal Thinking and Acknowledgments It has been an amazing journey. It allowed me to realize how spread hyper- spectral and multispectral cameras are in a more and more complex world that looks for fast and reliable responses. Also it allowed me to see how the data mining/chemometrics/multivariate data analysis is faced from different per- spectives depending on the case, the application, and the educational back- ground of the researcher. And this helped me to acquire a knowledge that is difficult to acquire otherwise. This book has encouraged me to continue with further projects (like my immediate one AVITechdArtificial Vision Tech- nologies research group, granted by IKERBASQUE, the Basque Foundation for Science) and making more available the knowledge (with courses) and the methods (with www.hypertools.org). Rephrasing one friend of mine (Dr. Manel Bautista), “one image is worth 1000 spectra.” He might be right, but I would add “depending on the wave- length you are looking at.” And, of course, the type of camera, radiation, and algorithms for obtaining that image, etc. But all in all, the spirit of this sen- tence makes me forecast that we will see real-time applications of hyper- spectral and multispectral cameras in many different fields in the near future. Lastly, I would like to express my deepest gratitude to all the invited au- thors who have been part in this project. Without their wise contribution and their advices, this book would have never been possible. lxxvii http://www.hypertools.org Also, I would like to thank all the people who, in one way or another, were involved in the development of this book. Bilbao, July 2019 José Manuel Amigo Ikerbasque Research Professor (July 2019ePresent) Distinguished Professor of the University of Basque Country, Spain (July 2019ePresent) Associate Professor of the University of Copenhagen, Denmark (January 2011eJune 2019) Guest Professor of the Federal University of Pernambuco, Brazil (January 2017eApril 2018) lxxviii Personal Thinking and Acknowledgments Preface This book offers a wide overview of modern hyperspectral and multispectral imagings (HSI and MSI, respectively), the ways of obtaining them, the most important algorithms used to extract the relevant information hidden in them, and the different research fields that are having important benefits using them. Nowadays, cameras that are able to measure at wavelengths where the human being eyes cannot are becoming more and more available. From the first ap- plications in remote sensing to the ultramodern hyperspectral microscopes, there have been around 40 years of evolution in sensing capability, reliability, and portability of the devices and improvement in the data/image handling and analysis. This evolution has made it possible that nowadays it is normal to find general and tailored HSI or MSI devices in laboratories and industries. This breakthrough has promoted that scientists coming from different disciplines must face the analysis of the massive amount of information lying in HSI. Therefore, scientists with different points of view must handle similar data structures. And it is here that the spirit of this book resides. This book collects, for the first time, examples and algorithms used by the different communities using HSI. The scientific communities represented here (remote sensing, chemometrics, food science, pharmaceutics, forensics, art, analytical chemistry, medicine, etc.) are a clear example of how spread and how attractive HSI technology is. This books raises as an example that even analyzing different samples (potatoes, a tablet, the surface of a planet, or an artwork) we can find tailored equipment and we can use the same (or similar) algorithms. That is why this book is addressed to graduate and postgraduate students, researchers in many different fields, industries, and practitioners in any field dealing with any kind of HSI or MSI. This book contains chapters written by different invited authors in an in- dividual manner. All of them are renowned experts in their fields. Despite the fact of the individuality of the chapters, they have been meticulously chosen and arranged to represent a global understanding of what HSI and MSI are and how to analyze them. This is why the book is divided into three major sections. The first section introduces the basic spectroscopic and instrumental concepts in a very general manner. This was done in this way for two main reasons: (1) There are fundamental and basic concepts about spectroscopy (visible, near and middle infrared, Raman, X-ray, etc.) that can be found in well-known text books and (2) technology is continuously advancing, and it lxxv was decided that the essential aspects must be covered, acknowledging that there might be released new advances by the time this book is published. The second section is devoted to explain in detail different aspects in the analysis of HSI and MSI images. The chapters are arranged in a logical connecting thread when dealing with the application of methods, while I must say that this arrangement might vary depending of the type of HSI and their final aim. In that sense, the chapters deal with spatial and spectral pre- processing, image compression, pansharpening, exploratory analysis, spectral unmixing/multivariate curve resolution together with finding endmembers, regression, classification, image fusion, time series analysis, and statistical analysis. All the chapters are nicely conducted by the authors, and they contain dozens of examples. Moreover, most of the raw HSI and MSI algorithms are open source or they are available upon request. The third section is the final demonstration of the wide applicative range of HSI and MSI technology. Covering all the application areas is almost impossible. That is why the most known ones have been chosen. The section starts, obviously, with different applications in remote sensing. Three chapters collect applications in natural landscapes, anthropogenic activities, and vegetation and crops (precision agriculture). Then, the following chapters show how HS and MS cameras can be adapted to more laboratory and in- dustrial applications (e.g., food industry, biochemistry, medicine, pharma- ceutical industry, and artwork). Since the physical space of the book was limited, I decided to include a final chapter that shows different fields in which HSI and MSI are also used (e.g., forensics, waste sorting/recycling, archae- ology, and entomology). Being aware that there are topics, algorithms, and fields of expertise that could have been included in the book (maybe for a new edition), the book represents anoverall vision of HSI and MSI, and it includes an extensive bibliography. lxxvi Preface Chapter 1.1 Hyperspectral and multispectral imaging: setting the scene José Manuel Amigo* Professor, Ikerbasque, Basque Foundation for Science; Department of Analytical Chemistry, University of the Basque Country, Spain; Chemometrics and Analytical Technologies, Department of Food Science, University of Copenhagen, Denmark *Corresponding author. e-mail: jmar@life.ku.dk 1. Images: basic concepts, spatial resolution, and spectral information It might be a bit odd to start a book about hyperspectral imaging (HSI) and multispectral imaging (MSI) by defining what an “image” is. Nevertheless, I found it very appropriated, since nowadays the literature is full of surnames or forenames adopted depending on the type of images we are talking about. Terms like chemical imaging, confocal imaging, HSI, MSI, satellite imaging, microscope imaging, etc., are filling the papers, depending on the type of device used for acquiring the image or sometimes the use we want to give to the image. Thus, according to the Oxford Dictionary, an image is the repre- sentation of the external form of a person or thing in art. It is also a visible impression obtained by a camera, telescope, microscope, or other device, or displayed on a computer video screen [1]. For our purpose, let us say that an image is a bidimensional representation of a surface produced by any device that has the ability to obtain information in an XeY direction of the surface in a correlated manner. 1.1 Spatial resolution From my point of view, the most appropriate concept of image is the math- ematical one, “an image is a point or set formed by mapping from another point or set.” That is, to create an image, you need, at least, two-point sets. Putting together both definitions, it is clear that these two-point sets must be Hyperspectral Imaging. https://doi.org/10.1016/B978-0-444-63977-6.00001-8 Copyright © 2020 Elsevier B.V. All rights reserved. 3 https://doi.org/10.1016/B978-0-444-63977-6.00001-8 somehow spatially correlated, providing us the indirect definition of pixel, the second most used term in imaging. A pixel is the spatial subdivision of images. They are unique pieces of information that are, as well, spatially correlated. In other words, a pixel is affected by the surrounding pixels in such a way that the information contained in one of them depends on the information that the surrounding pixels contain (i.e., neighbors). A clear distinction must be made between the total amount of pixels and the size of that pixel with respect to the full image that it is being measured. If an image is the spatial representation of a surface, and that image is divided into pixels, the term spatial resolution comes up. According to the interna- tional standards, the spatial resolution of an image is the total amount of pixels in which an image is divided. A basic example is that if an image contains X pixels in the row direction and Y pixels in the column direction, the image has a spatial resolution of X � Y pixels. Even accepting this as a common prac- tice, it must be stressed that this definition is incomplete. The spatial resolution must be always measured in relative terms of distance. For instance, let us give the example of having a camera that is 10 � 10 pixels. If that camera is taking the picture of a 1 � 1 m, the real spatial resolution should be said to be 0.01 m2; while if the same camera is taking a picture of 0.5 � 0.5 m, the spatial resolution should be said to be 0.0025 m2. This will directly address the need of setting the area of the surface information that every pixel contains. 1.2 Spectral information: types of images There are different ways of classifying images. One of the most adequate ways is considering the amount of information that a pixel can contain. The infor- mation can come from a simple difference in intensities with respect to a reference, generating one single channel of information per pixel (e.g., scan- ning electron microscopy (SEM) images). Nevertheless, it is common nowa- days to find images that contain more channels of information for each pixel. These are the color-space images, the multispectral images, and the hyper- spectral images. 1.2.1 Single channel images Images for which there is only a single value of intensity for each pixel are called monochannel or single channel images. The values that each pixel can have vary depending of the type of device taking the image, and they are normally ranging between certain limits. For instance, Fig. 1 shows one single channel image of yogurt obtained with confocal laser scanning microscopy (CLSM) (see Ref. [2] for further details). Single channel images can be extracted from any image device by choosing one channel of spectral infor- mation or by using a device that only gives one point of information per pixel (e.g., CLSM, SEM, atomic force microscopy (AFM), etc.). 4 SECTION j I Introduction There are many ways of representing a single channel image by using different color maps. The most popular one is the grayscale color map in which the intensity value corresponds to a specific value of the grayscale (denoted with the corresponding color bar in Fig. 1). Normally, the spatial resolution of these devices trends to be considerably good, arising to micro- scope scale. The main drawback is normally the lack of chemical information, since all the information relies on one single value. Therefore, one of the most common methods to analyze these images is the direct application of thresholds, grouping the pixels in groups with similar value of pixel (ref digital image). Coming back to the example in Fig. 1, two different thresholds applied to the same picture give two different responses. In this case, the threshold around 24 gives an account of total protein content, while the threshold around 100 gives an account of microparticulated whey protein [2]. 1.2.2 Color-space images Color-space images are the images that try to mimic the human vision. They are composed normally by three channels of spectral information, the red (around 700 nm), green (around 550 nm), and blue (around 450 nm), that combined are able to recreate real colors in what is known as the RGB color space (image denoted as real color in Fig. 2) [3]. It is important to remark that the RGB is not the only color space that is used in this type of images. There are many ways of coding the color in other color spaces (like the hueesaturationevalue color-space or the luminosity-Red to Green-Blue gradient (L*a*b*), among others [3]) (Fig. 2). Color-space images are known for many years, and the technology for obtaining them is continuously advancing. Defining that technology is out of FIGURE 1 Confocal laser scanning microscopy image of a yogurt sample. The image is one of the images used in Ref. [2]. The right part of the figure shows the histogram of the grayscale values and the result of applying two different threshold values (24 for protein and 100 for micro- particulated whey protein). Hyperspectral and multispectral imaging: setting the scene Chapter j 1.1 5 this book chapter. However, it is important to remark that these images have normally a high spatial resolution, and once the illumination and the focal distance is controlled, they can be extremely powerful devices with limited chemical information. For instance, a flatbed scanner in the RGB mode can be an extraordinary analytical tool for specific cases since they have a high spatial resolution, the light income is constant and the focal distance is normally fixed [4e6]. 1.2.3 Multiband images Multiband or multispectral images are those that capture individual images at specific wave numbers or wavelengths, frequently taken by specific filters or LEDs, across the electromagnetic spectrum (normally the visible (VIS) and near-infrared (NIR) regions [7]. Multispectral images can be considered as special case of hyperspectralimages in which the wavelength range collected cannot be considered as continuous. That is, instead a continuous measure- ment between a certain wavelength range, the multispectral images contain information in discrete and specific wavelengths. Fig. 3 shows an example of a FIGURE 2 Real color picture in RGB color space of a butterfly, and representation of the red, green, and blue monochannels using the color map displayed in the right part of the figure. The two images in the bottom are the transformed image in L*a*b* color space. 6 SECTION j I Introduction 10 euros paper note that has been measured at 18 different wavelengths. That is, every single pixel contains the specific information collected at 18 different wavelengths. As it can be observed in the figure, when the sample is irradiated with light at different wavelengths, specific information is obtained. Each individual image can be considered as a single channel image. Therefore, it is mandatory to place the corresponding values of the color intensities obtained in each individual image. As we will see in further chapters, the fact of differentiating MSI from HSI is due to the different treatment that must be given to the images. The spectra obtained with MSI cannot be considered spectra, since normally the spectral signatures are measured at nonequidistant discrete wavelengths. It was applied in remote sensing, and it was the precursor of the hyperspectral images. A good example of MSI applied in remote sensing is the well-known Landsat satellite [8]. 1.2.4 Hyperspectral images Hyperspectral images are the images in which one continuous spectrum is measured for each pixel [9]. Normally, the spectral resolution is given in nanometers or wave numbers (Fig. 4). Hyperspectral images can be obtained from many different electromagnetic measurements. The most popular are visible (VIS), NIR, middle infrared FIGURE 3 Multispectral image taken to a 10 euros paper note (top left). The top right part shows the intensities of the 19 different wavelengths for two pixels. The bottom part shows different single channel pictures extracted for eight channels. Hyperspectral and multispectral imaging: setting the scene Chapter j 1.1 7 (MIR), and Raman spectroscopy. Nevertheless, there are many other types of HSI that are gaining popularity like confocal laser microscopy scanners that are able to measure the complete emission spectrum at certain excitation wavelength for each pixel, Terahertz spectroscopy, X-ray spectroscopy, 3D ultrasound imaging, or even magnetic resonance. Hyperspectral images are the only type of images where we can talk about spectral resolution (also known as radiometric resolution in remote sensing field). The spectral resolution is defined as the interval or separation (gap) between different wavelengths measured in a specific wavelength range. Obviously, the more bands (or spectral channels) acquired in a smaller wavelength range, the higher the spectral resolution will be. FIGURE 4 Representation of the image of a cookie measured with a hyperspectral camera in the wavelength range of 940e1600 nm (near infrared, NIR) with a spectral resolution of 4 nm. The spectra obtained in two pixels are shown and the false color image (single channel image) obtained at 1475 nm. The single channel image selected highlighted three areas of the cookie where water was intentionally added. This water is invisible in the VIS region (nothing can be appreciated in the real color picture). Nevertheless, water is one of the main elements that can be spotted in NIR. 8 SECTION j I Introduction 2. Data mining: chemometrics 2.1 Structure of a hyperspectral image A hyperspectral or multispectral image can be visualized as a hypercube of data (Figs. 3 and 4). The hypercube is defined by three dimensions: Two spatial (X and Y) and one spectral (l) [9]. The mathematical notation will be, then, a hypercube D will have dimensions (X � Y � l) [9]. This structure contains all the chemical information related to the surface measured. That is, hyperspectral data cubes are normally multicomponent systems. The pixels measured seldom contain selective wavelengths for a specific component since they trend to contain mixed information of more than one component. Moreover, it also contains artifacts like the spectral noise, spatial interferences, and redundant information. Therefore, there is a strong need to extract the desired information and get rid of the noise and further artifacts. As we will see further in the book, there is a plethora of algorithms that are able to extract the desired information from the data cube, and there are more and more coming due to the generation of faster computers and more reliable sensors. 2.2 Chemometrics It can be said without mistake that one of the major reasons for the expansion of HSI and MSI is the integration of data mining to extract the relevant in- formation from the data cube in a multivariate fashion. Most of the informa- tion gathered with HSI and MSI can be considered as chemical information. Therefore, it is also called chemometrics. Chemometrics is basically data mining applied to chemical information by using mathematical, statistical, and data analysis methods to achieve objective data evaluation by extracting the most important information from related and unrelated collections of chemical data by using mathematical and statistic tools [10]. The main aim of chemometrics is to provide a final image where selective information for a specific component can be found (in terms of concentration/ quantitative or presence/qualitative). Nevertheless, one of the major problems of chemometrics is that sometimes it becomes cumbersome to know which method to apply in each situation [9]. This will be unraveled and defined during the book. Fig. 5 gives an account of the major building blocks of data mining/chemometrics application in HSI and MSI. This flowchart is merely a guidance, and the methods included there are not exclusive for the building blocks. Also, the path to follow is strongly dependent of the type of analysis and the final target. The main building blocks in the analysis of HSI and MSI images (Fig. 5) are preprocessing, pattern recognition/exploration, resolution/spectral unmix- ing, segmentation, regression, classification, and image processing [9]. Each one of them aims at different purposes: Hyperspectral and multispectral imaging: setting the scene Chapter j 1.1 9 2.2.1 Preprocessing Sometimes this is a previous step that does not have the deserved importance, and, nevertheless, it is the main responsible for obtaining optimal results when any multivariate data model is applied afterward. The presence of erroneous or missed data values (e.g., dead pixels [7]), noninformative background, or extreme outliers; or the presence of spatial and spectral artifact (e.g., scattering or atmospheric influence) are aspects that must be considered way before or even during the modeling part. There are many methods for minimizing artifacts in our data or to highlight information on it (in both spectral and FIGURE 5 Comprehensive flowchart of the analysis of hyperspectral and multispectral im- ages. ANN, artificial neural networks; AsLS, asymmetric least squares; CLS, classical least squares; EMSC, extended multiplicative scatter correction; FSW-EFA; fixed size window- evolving factor analysis; ICA, independent component analysis; LDA, linear discriminant analysis; MCR, multivariate curve resolution; MLR, multiple linear regression; MSC, multi- plicative scatter correction; NLSU, nonlinear spectral unmixing; OPA, orthogonal projection approaches; PCA, principal component analysis; PLS-DA, partial least squares-discriminant analysis; PLS, partial least squares; SIMCA, soft independent modeling of class analogy; SIMPLISMA, simple-to-use interactive self-modeling mixture analysis; SNV, standard normal variate; SVM, support vectors machine; WLS, weighted least squares. Partiallyextracted and modified from J.M. Amigo, H. Babamoradi, S. Elcoroaristizabal, Hyperspectral image analysis. A tutorial, Analytica Chimica Acta 896 (2015) 34e51. doi:10.1016/j.aca.2015.09.030. with permission of Elsevier. 10 SECTION j I Introduction spatial directions). And, luckily, there are many algorithms that can help in this quest. Nevertheless, the decision of the proper preprocessing method is not sometimes straightforward, and normally is based in the combination of different methods to achieve a preprocessed data cube that, still, needs to be processed properly. 2.2.2 Pattern recognition/exploration Pattern recognition is, among the building blocks of Fig. 5, the only ones that can be purely denoted as not supervised (or unsupervised). They do not need a previous step of calibration (training), neither a decision step (e.g., number of components needed) in order to find hidden patterns in the data. The purpose of the unsupervised methods is to identify relationships between pixels, without any prior knowledge of classes or groups. They are used to give a first overview of the main sources of variance (variability) in the images. Among them, the most common method is principal component analysis (PCA). PCA is useful in order to elucidate the complex nature of HSI and MSI multicomponent systems by using mapping and displaying techniques [9,11]. 2.2.3 Segmentation Segmentation methods compile all the clustering methodologies and dendro- grams [12]. They divide the pixels in different groups considering their spectral similarities and dissimilarities. And even being unsupervised methods (no training step needed), there is a step in which a decision should be made. In the case of clustering, it is essential to guess the final number of clusters, and in dendrograms, a threshold must be set in order to group the pixels. Even though segmentation methods group the pixels according to their similarity, they cannot be considered as classification methods, since no training step is used. 2.2.4 Curve resolution methods/spectral unmixing Curve resolution or spectral unmixing methods aim at resolving mixtures that each individual pixel might contain, given the correct number of constituents [13,14]. The final result is a set of selective images for each constituent and their pure spectral profile. The main difference with explorative methods like PCA is that curve resolution methods do not aim at studying the main sources of variation in the data, but at giving an account of the hidden physicochemical behavior of each constituent in each pixel. A big debate could be stablished concerning the nature of curve resolution methods. In some aspects, they behave as unsupervised methods. Nevertheless, many of the occasions they rely in giving good initial estimations and proper spectral and spatial con- straints to obtain a suitable response. Hyperspectral and multispectral imaging: setting the scene Chapter j 1.1 11 2.2.5 Regression and classification Calculating the concentration of several compounds in an image (regression) and, specially, the classification of elements in different well-defined cate- gories (classification) is one of the major targets in HSI and MSI analysis. Regression and classification methods are pure supervised methods, since a robust and reliable set of samples of well-known concentration or well-known category is needed for the essential step of training of the model (calibration step). Once the training is perfectly performed, the properties or the classes are predicted in new images in a pixel-by-pixel manner [15]. Many algorithms for linear or nonlinear training models like partial least squares (PLS), multilinear regression, support vectors machine, or artificial neural networks can be found in regression and their adaptation for classification purposes (e.g., PLS-discriminant analysis). Moreover, many algorithms can be found being specifically designed for a purpose, like the single class approach that SIMCA (soft independent modeling of class analogy) offers [16]. Being supervised methods, the core of their reliability depends on the mandatory validation step. Validation (internal cross-validation or external validation) is the only tool able to give a real account of the ability of the model to predict. 2.2.6 Image processing Once one selective image for each individual constituent is achieved, the final aim might be to analyze the distribution, amount, shape of those constituents in the surface measured. This is directly linked to the well-known digital imaging processing methodologies [3]. There is, again, a plethora of algo- rithms that can be used for many different purposes. It is not the aim of this book chapter to focus on those methods. Therefore, we encourage the readers to read the provided references to have a better account of them [3]. 3. Growing the number of application fields of HSI and MSI The evolution in HSI and MSI cameras in the market has grown exponentially since the first works in remote sensing were published in the late 1970s, beginning of 1980s. The first scientific instrument capable of measuring MSI images was developed by Goetz et al. [17]. It was called the Shuttle Multi- spectral Infrared Radiometer (SMIRR), and it was placed in the second flight of the space shuttle in 1981 [17,18]. SMIRR was able to measure 10 narrow bands, and the main purpose was the identification of minerals in the surface of the planet. Seeing the excellent results, Goetz et al. proposed the creation of what it would be the first HSI camera, the Airbone Imaging Spectrometer (AIS) [19]. AIS was the first HSI camera that was able to collect 128 spectral bands in the range of 1200e2400 nm with a spectral resolution of 9.6 nm. The detector was able to collect a line of 32 pixels moving as a scanner (what is commonly known line mapping or push broom systems [19]). 12 SECTION j I Introduction From this point on, the technological advances in HSI and MSI have generated the eruption of the application fields within the area of remote sensing [20]. These technological advances are due to the rapid increasing in the sensing technology, higher computational capability, more robust and versatile instruments that can be adapted in different scenarios, and, of course, improvement in the data mining algorithms for processing the overwhelming amount of data that were being generated [21e23]. Airborne and satellite imaging opened the applications in the mineralogy [17], oceanography [24], environmental monitoring and water resources man- agement [25], vegetation [23,26], or precision agriculture [27]. Moreover, soon the scientific knowledge generated to create cameras working on the visible and NIR spectral range was also applied to other spectral radiations like MIR, Raman, nuclear magnetic resonance (NMR), fluorescence, X-ray, or even Terahertz spectroscopy, exponentially increasing the amount of ap- plications of HSI and MSI cameras [28]. Another of the major breakthroughs was produced when the HSI and MSI started to be adapted in more controlled environments. Normally, there is a “from minor to major” path in sciences. That is, new devices are developed in the laboratories, and then the devices go out of the laboratory environment. With HSI and MSI, the evolution was “from major to minor,” from satellites scanning the planet to the laboratory. Fields like biochemistry [29], food processing [27,30e34], pharmaceutical research and processing [12,35e38], forensic investigations [39,40], artwork and cultural heritage [41,42], medical research [43], recycling [9,44,45], among other areas started to use the same cameras as in remote sensing but adapted to their particular problem. All in all, and to finish this introductory chapter, I can certify that even though HSI and MSI are a science that is around 50 years old, it is still new and exciting. And there is an exciting and challenging future ahead of us, where faster and more reliable hyperspectral cameras willbe developed and, consequently, the data analysis technology to analyze them. References [1] O. university Press, Oxford dictionary, (n.d.). https://www.oxforddictionaries.com/. [2] I.C.I.C. Torres, J.M. Amigo Rubio, R. Ipsen, J.M. Amigo, R. Ipsen, Using fractal image analysis to characterize microstructure of low-fat stirred yoghurt manufactured with microparticulated whey protein, Journal of Food Engineering 109 (2012) 721e729, https:// doi.org/10.1016/j.jfoodeng.2011.11.016. [3] R.C. Gonzalez, R.E. Woods, S.L. Eddins, Digital Image Processing Using Matlab, 2004, https://doi.org/10.1117/1.3115362. [4] M. Vidal, J.M.M. Amigo, R. Bro, F. van den Berg, M. Ostra, C. Ubide, Image analysis for maintenance of coating quality in nickel electroplating baths e real time control, Analytica Chimica Acta 706 (2011) 1e7, https://doi.org/10.1016/j.aca.2011.08.007. Hyperspectral and multispectral imaging: setting the scene Chapter j 1.1 13 https://www.oxforddictionaries.com/ https://doi.org/10.1016/j.jfoodeng.2011.11.016 https://doi.org/10.1016/j.jfoodeng.2011.11.016 https://doi.org/10.1117/1.3115362 https://doi.org/10.1016/j.aca.2011.08.007 [5] L. de Moura França, J.M. Amigo, C. Cairós, M. Bautista, M.F. Pimentel, J. Manuel Amigo, C. Cair os, M. Bautista, M. Fernanda Pimentel, Evaluation and assessment of homogeneity in images. Part 1: unique homogeneity percentage for binary images, Chemometrics and Intelligent Laboratory Systems 171 (2017), https://doi.org/10.1016/ j.chemolab.2017.10.002. [6] N.C. da Silva, L. de Moura França, J.M. Amigo, M. Bautista, M.F. Pimentel, Evaluation and assessment of homogeneity in images. Part 2: homogeneity assessment on single channel non-binary images. Blending end-point detection as example, Chemometrics and Intelligent Laboratory Systems 180 (2018), https://doi.org/10.1016/j.chemolab.2018.06.011. [7] M. Vidal, J.M. Amigo, Pre-processing of hyperspectral images. Essential steps before image analysis, Chemometrics and Intelligent Laboratory Systems 117 (2012) 138e148, https:// doi.org/10.1016/j.chemolab.2012.05.009. [8] NASA, Landsat Science, (n.d.). https://www.oxforddictionaries.com/. [9] J.M. Amigo, H. Babamoradi, S. Elcoroaristizabal, Hyperspectral image analysis. A tutorial, Analytica Chimica Acta 896 (2015) 34e51, https://doi.org/10.1016/j.aca.2015.09.030. [10] D.L. Massart, B.G.M. Vandeginste, J.M.C. Buydens, S. de Jong, P.J. Lewi, J. Smeyers- Verberke, L.M.C. Buydens, S. De Jong, J. Smeyers-Verbeke, Handbook of Chemometrics and Qualimetrics, 1997. https://doi.org/10.1016/S0922-3487(97)80056-1. [11] A.K. Smilde, R. Bro, Principal component analysis (tutorial review), Analytical Methods 6 (2014) 2812e2831. https://doi.org/10.1039/c3ay41907j. [12] J.M. Amigo, J. Cruz, M. Bautista, S. Maspoch, J. Coello, M. Blanco, Study of pharma- ceutical samples by NIR chemical-image and multivariate analysis, Trends in Analytical Chemistry 27 (2008). https://doi.org/10.1016/j.trac.2008.05.010. [13] F. Franch-Lage, J.M. Amigo, E. Skibsted, S. Maspoch, J. Coello, Fast assessment of the surface distribution of API and excipients in tablets using NIR-hyperspectral imaging, In- ternational Journal of Pharmaceutics 411 (2011) 27e35. https://doi.org/10.1016/j.ijpharm. 2011.03.012. [14] A. de Juan, R. Tauler, Multivariate curve resolution (MCR) from 2000: progress in concepts and applications, Critical Reviews in Analytical Chemistry 36 (2006) 163e176. https://doi. org/10.1080/10408340600970005. [15] C. Ravn, E. Skibsted, R. Bro, Near-infrared chemical imaging (NIR-CI) on pharmaceutical solid dosage forms-comparing common calibration approaches, Journal of Pharmaceutical and Biomedical Analysis 48 (2008) 554e561. https://doi.org/10.1016/j.jpba.2008.07.019. [16] S. Wold, W.J. Dunn, E. Johansson, J.W. Hogan, D.L. Stalling, J.D. Petty, T.R. Schwartz, Application of Soft Independent Method of Class Analogy (SIMCA) in Isomer Specific Analysis of Polychlorinated Biphenyls, 2009, pp. 195e234. https://doi.org/10.1021/bk- 1985-0284.ch012. [17] A.F.H. Goetz, L.C. Rowan, M.J. Kingston, Mineral identification from orbit: initial results from the shuttle multispectral infrared radiometer, Science 218 (1982) 1020e1024. https:// doi.org/10.1126/science.218.4576.1020, 80-. [18] J.S. MacDonald, S.L. Ustin, M.E. Schaepman, The contributions of Dr. Alexander F.H. Goetz to imaging spectrometry, Remote Sensing of Environment 113 (2009). https://doi.org/ 10.1016/j.rse.2008.10.017. [19] G. Vane, A.F.H. Goetz, J.B. Wellman, Airborne imaging spectrometer: a new tool for remote sensing, IEEE Transactions on Geoscience and Remote Sensing GE-22 (2013) 546e549. https://doi.org/10.1109/tgrs.1984.6499168. [20] J.B. Campbell, R.H. Wynne, Introduction to Remote Sensing, 2011. 14 SECTION j I Introduction https://doi.org/10.1016/j.chemolab.2017.10.002 https://doi.org/10.1016/j.chemolab.2017.10.002 https://doi.org/10.1016/j.chemolab.2018.06.011 https://doi.org/10.1016/j.chemolab.2012.05.009 https://doi.org/10.1016/j.chemolab.2012.05.009 https://www.oxforddictionaries.com/ https://doi.org/10.1016/j.aca.2015.09.030 https://doi.org/10.1016/S0922-3487(97)80056-1 https://doi.org/10.1039/c3ay41907j https://doi.org/10.1016/j.trac.2008.05.010 https://doi.org/10.1016/j.ijpharm.2011.03.012 https://doi.org/10.1016/j.ijpharm.2011.03.012 https://doi.org/10.1080/10408340600970005 https://doi.org/10.1080/10408340600970005 https://doi.org/10.1016/j.jpba.2008.07.019 https://doi.org/10.1021/bk-1985-0284.ch012 https://doi.org/10.1021/bk-1985-0284.ch012 https://doi.org/10.1126/science.218.4576.1020 https://doi.org/10.1126/science.218.4576.1020 https://doi.org/10.1016/j.rse.2008.10.017 https://doi.org/10.1016/j.rse.2008.10.017 https://doi.org/10.1109/tgrs.1984.6499168 [21] S.C. Yoon, B. Park, Hyperspectral image processing methods, in: Food Eng. Ser., 2015, pp. 81e101. https://doi.org/10.1007/978-1-4939-2836-1_4. [22] L. Wang, C. Zhao, Hyperspectral Image Processing, 2015. https://doi.org/10.1007/978-3- 662-47456-3. [23] A. Pelizzari, R.A. Goncalves, M. Caetano, Computational Intelligence for Remote Sensing, 2008. https://doi.org/10.1007/978-3-540-79353-3. [24] S. Martin, An Introduction to Ocean Remote Sensing, 2013. https://doi.org/10.1017/ CBO9781139094368. [25] P.S. Thenkabail, Remote Sensing Handbook: Remote Sensing of Water Resources, Di- sasters, and Urban Studies, 2015. https://doi.org/10.1201/b19321. [26] P.S. Thenkabail, R.B. Smith, E. De Pauw, Hyperspectral vegetation indices and their re- lationships with agricultural crop characteristics, Remote Sensing of Environment (2000). https://doi.org/10.1016/S0034-4257(99)00067-X. [27] B. Park, R. Lu, Hyperspectral Imaging Technology in Food and Agriculture, 2015. https:// doi.org/10.1007/978-1-4939-2836-1. [28] S. Delwiche, J. Qin, K. Chao, D. Chan, B.-K. Cho, M. Kim, Line-scan hyperspectral im- aging techniques for food safety and quality applications, Applied Sciences 7 (2017) 125. https://doi.org/10.3390/app7020125. [29] R. Vejarano, R. Siche, W. Tesfaye, Evaluation of biological contaminants in foods by hyperspectral imaging: a review, International Journal of Food Properties 20 (2017) 1264e1297. https://doi.org/10.1080/10942912.2017.1338729. [30] J.M. Amigo, I. Martı́, A. Gowen, Hyperspectral imaging and chemometrics. A perfect combination for the analysis of food structure, composition and quality, Data Handling in Science and Technology 28 (2013) 343e370. https://doi.org/10.1016/B978-0-444-59528-7. 00009-0. [31] S. Munera, J.M. Amigo, N. Aleixos, P. Talens, S. Cubero, J. Blasco, Potential of VIS-NIR hyperspectral imaging and chemometric methods to identify similar cultivars of nectarine, Food Control 86 (2018). https://doi.org/10.1016/j.foodcont.2017.10.037. [32] X. Zou, J. Zhao, Nondestructive Measurement in Food and Agro-Products, 2015. https://doi. org/10.1007/978-94-017-9676-7. [33] Y. Liu, H. Pu, D.W. Sun, Hyperspectral imaging technique for evaluating food quality and safety duringvarious processes: a review of recent applications, Trends in Food Science & Technology 69 (2017) 25e35. https://doi.org/10.1016/j.tifs.2017.08.013. [34] D. Wu, D.W. Sun, Advanced applications of hyperspectral imaging technology for food quality and safety analysis and assessment: a review e Part II: fundamentals, Innovative Food Science and Emerging Technologies 19 (2013) 1e14. https://doi.org/10.1016/j.ifset. 2013.04.014. [35] J.X. Wu, D. Xia, F. Van Den Berg, J.M. Amigo, T. Rades, M. Yang, J. Rantanen, A novel image analysis methodology for online monitoring of nucleation and crystal growth during solid state phase transformations, International Journal of Pharmaceutics 433 (2012) 60e70. https://doi.org/10.1016/j.ijpharm.2012.04.074. [36] A.A. Gowen, C.P. O’Donnell, P.J. Cullen, S.E.J. Bell, Recent applications of Chemical Imaging to pharmaceutical process monitoring and quality control, European Journal of Pharmaceutics and Biopharmaceutics 69 (2008) 10e22. https://doi.org/10.1016/j.ejpb.2007. 10.013. [37] M. Khorasani, J.M. Amigo, C.C. Sun, P. Bertelsen, J. Rantanen, Near-infrared chemical imaging (NIR-CI) as a process monitoring solution for a production line of roll compaction Hyperspectral and multispectral imaging: setting the scene Chapter j 1.1 15 https://doi.org/10.1007/978-1-4939-2836-1_4 https://doi.org/10.1007/978-3-662-47456-3 https://doi.org/10.1007/978-3-662-47456-3 https://doi.org/10.1007/978-3-540-79353-3 https://doi.org/10.1017/CBO9781139094368 https://doi.org/10.1017/CBO9781139094368 https://doi.org/10.1201/b19321 https://doi.org/10.1016/S0034-4257(99)00067-X https://doi.org/10.1007/978-1-4939-2836-1 https://doi.org/10.1007/978-1-4939-2836-1 https://doi.org/10.3390/app7020125 https://doi.org/10.1080/10942912.2017.1338729 https://doi.org/10.1016/B978-0-444-59528-7.00009-0 https://doi.org/10.1016/B978-0-444-59528-7.00009-0 https://doi.org/10.1016/j.foodcont.2017.10.037 https://doi.org/10.1007/978-94-017-9676-7 https://doi.org/10.1007/978-94-017-9676-7 https://doi.org/10.1016/j.tifs.2017.08.013 https://doi.org/10.1016/j.ifset.2013.04.014 https://doi.org/10.1016/j.ifset.2013.04.014 https://doi.org/10.1016/j.ijpharm.2012.04.074 https://doi.org/10.1016/j.ejpb.2007.10.013 https://doi.org/10.1016/j.ejpb.2007.10.013 and tableting, European Journal of Pharmaceutics and Biopharmaceutics 93 (2015). https:// doi.org/10.1016/j.ejpb.2015.04.008. [38] A.V. Ewing, S.G. Kazarian, Recent advances in the applications of vibrational spectroscopic imaging and mapping to pharmaceutical formulations, Spectrochimica Acta Part A: Mo- lecular and Biomolecular Spectroscopy 197 (2018) 10e29. https://doi.org/10.1016/j.saa. 2017.12.055. [39] P. Chemistry, Near promising future of near infrared hyperspectral imaging in forensic sciences, Journal of Indexing and Metrices 25 (2014) 6e9. https://doi.org/10.1255/nirn. 1443. [40] G.J. Edelman, E. Gaston, T.G. Van Leeuwen, P.J. Cullen, M.C.G. Aalders, Hyperspectral imaging for non-contact analysis of forensic traces, Forensic Science International 223 (2012) 28e39. https://doi.org/10.1016/j.forsciint.2012.09.012. [41] C. Cucci, M. Picollo, Reflectance Spectroscopy Safeguards Cultural Assets, SPIE News- room, 2013. https://doi.org/10.1117/2.1201303.004721. [42] M. Bacci, C. Cucci, A.A. Mencaglia, A.G. Mignani, Innovative sensors for environmental monitoring in museums, Sensors 8 (2008) 1984e2005. https://doi.org/10.3390/s8031984. [43] G. Lu, B. Fei, Medical hyperspectral imaging: a review, Journal of Biomedical Optics 19 (2014) 010901. https://doi.org/10.1117/1.jbo.19.1.010901. [44] M. Vidal, A. Gowen, J.M. Amigo, NIR hyperspectral imaging for plastics classification, Journal of Indexing and Metrices (2012). https://doi.org/10.1255/nirn.1285. [45] D. Caballero, M. Bevilacqua, J. Amigo, Application of hyperspectral imaging and che- mometrics for classifying plastics with brominated flame retardants, Journal of Spectral Imaging 8 (2019). https://doi.org/10.1255/jsi.2019.a1. 16 SECTION j I Introduction https://doi.org/10.1016/j.ejpb.2015.04.008 https://doi.org/10.1016/j.ejpb.2015.04.008 https://doi.org/10.1016/j.saa.2017.12.055 https://doi.org/10.1016/j.saa.2017.12.055 https://doi.org/10.1255/nirn.1443 https://doi.org/10.1255/nirn.1443 https://doi.org/10.1016/j.forsciint.2012.09.012 https://doi.org/10.1117/2.1201303.004721 https://doi.org/10.3390/s8031984 https://doi.org/10.1117/1.jbo.19.1.010901 https://doi.org/10.1255/nirn.1285 https://doi.org/10.1255/jsi.2019.a1 Chapter 1.2 Configuration of hyperspectral and multispectral imaging systems José Manuel Amigoa,* and Silvia Grassib aProfessor, Ikerbasque, Basque Foundation for Science; Department of Analytical Chemistry, University of the Basque Country, Spain; Chemometrics and Analytical Technologies, Department of Food Science, University of Copenhagen, Denmark; bDepartment of Food, Environmental and Nutritional Sciences (DeFENS), Università degli Studi di Milano, Milano, Italy *Corresponding author. e-mail: jmar@life.ku.dk 1. Introduction Instrumentation is the key point of any reliable measurement system. In the field of hyperspectral imaging (HSI) and multispectral imaging (MSI), instrumentation has suffered an incredible expansion in the very last years due to the advances in sensing materials. HSI and MSI instrumentation is being continuously generated increasing the wavelength ranges of the electromag- netic spectrum and other types of spectroscopy (ultraviolet-visible, NIR, MIR, Raman, confocal laser fluorescence microscopy, XRI, X-ray computed to- mography (X-ray CT), TI, or ultrasound imaging (UI)). As we will see further in this chapter, all HSI and MSI methods, independently from the nature of the radiation, are a mixture between conventional imaging and conventional single-point spectroscopy (Fig. 1) in such a way that the combination of both makes possible to create HSI and MSI devices. The basic setup of an HSI or MSI system consist of a light source (i.e., a lighting system), adequate objective lens, a wavelength dispersion device, and a camera with a 2D detector (Fig. 1). The configurations shown in Fig. 1 might be different in some scenarios, as it is well known that single-point spectroscopy can be adapted to map a surface converting the technique in hyperspectral technique, with the adequate equipment. Also HSI and MSI configurations will depend on the type of camera and the final usage. While MSI trends to be closer to an advanced Hyperspectral Imaging. https://doi.org/10.1016/B978-0-444-63977-6.00002-X Copyright © 2020 Elsevier B.V. All rights reserved. 17 https://doi.org/10.1016/B978-0-444-63977-6.00002-X imaging system, HSI combines perfectly the spectral abilities of wavelength dispersion devices with the state-of-the-art 2D detectors to produce highly resolved images. Citing Qin et al. “If conventional imaging tries to answer the question where and conventional spectroscopy tries to answer the question what, then hyperspectral (and multispectral) imaging tries to answer the question where is what” [1]. This chapter primarily focuses on instrumentation for HSI and MSI that is normally applied in the visible and NIR wavelength range. That is why the first section (Section 2) deals with what happens when the light interacts with the matter. Section 3 will show the basic structure of an HSI and MSI image as well as define what are the main methods to obtain such data structures. Following sections are focused on showing the most important features of the main components of cameras: light source (Section 4), wavelength dispersion devices (Section 5), and detectors (Section 6). To finalize the chapter, the importance of the calibration of the camera is further discussed (Section 7). During the reading of the chapter, some key references will be provided (with further explanations of the important concepts). 2. Lightematter interaction The main fundamental for HSI and MSI is the fact that light interacts with the molecules that thesample contains. This interaction will depend on the nature of the sample (chemical interactions) and the physical properties of the surface (roughness, harness, compaction level, among others). When a photon is emitted FIGURE 1 Comparison of the basic setups for conventional imaging, hyperspectral and multi- spectral cameras, and conventional spectroscopy. 18 SECTION j I Introduction from a certain light source, they do it with a specific energy and trajectory. That energy and trajectory will be affected by the interaction with the sample [2]: First, the energy will decrease, and second the trajectory will change. The en- ergy will decrease depending mostly on the chemical properties of the mole- cules of the sample that the photon is hitting [2]. Therefore, when the photon arrives to the detector, it does it with a different energy, creating the bands that are characteristic of the spectrum of the molecules. We say that the molecules absorb certain amount of energy from the photon, allowing the remaining en- ergy to arrive to the detector. Nevertheless, before the energy (photon with different energy) arrives to the detector, several effects can occur in its trajec- tory. Basically, the photon, after hitting the sample, can be absorbed completely (therefore, converted into heat energy), reflected, or transmitted (Fig. 2). Absorption: It occurs when the photon at a given frequency hits a mole- cule (or better, the atoms of that molecule) whose electrons have the same vibrational frequency (Fig. 2). Then it is said that the electrons of the mole- cules and the photons are in resonance. And, consequently, the photons will not arrive to the detector. Reflection and transmission: Reflection and transmission of photon waves are due to the fact that the frequencies of the photon waves are not the same than the natural frequencies of the molecules that they are hitting (or better said, the vibrational frequencies of the electrons conforming those molecules are not the same). Consequently, the electrons vibrate for short periods of time with smaller amplitudes of vibration, and the energy is reemitted as photons with different frequency. If the object is transparent to the vibrational frequencies of the photon, the vibrations of the electrons are passed on to neighboring atoms through the bulk of the material and reemitted on the opposite side of the object. FIGURE 2 Lightematter interaction. Depiction of the physical and chemical effects that a photon might have when iterated with matter. Configuration of hyperspectral Chapter j 1.2 19 Then, the photon is said to be transmitted. If the molecules are opaque to the frequencies of the photon, the vibrations of the electrons of the molecules in the surface vibrate for short periods of time and then the energy is reflected, allowing the photon with different energy to arrive to the detector (Fig. 2). Depending on the energy of the photon and the properties of the molecules, as well as the surface characteristics, the photon can be reflected in a specular mode (reflected with the same angle than the incident) or in a scattered mode (reflected with different angle than the incident). Reflection might come with previous transmission from the interior of the sample. That is, there is certain degree of penetration of the photon. That is, depending on the energy of that photon and some characteristics of the sample, the photon will penetrate several micrometers or even millimeters in the sample, then it will follow a path inside the sample (thus, being affected by the different vibrational energies that it is crossing) and then reflected toward the detector (if the trajectory is the adequate) [3]. As example, Fig. 3 shows the color image of part of a hand. The right part of the figure shows the intensity image (reflectance) of the image taken at 950 nm where it can be observed the details of the veins in the fingers, demonstrating that the photons at 950 nm had enough energy to penetrate certain millimeters the human skin. Reflection and transmission of light are the two more common modes of measuring light in HSI and MSI. Especially popular is reflection, which has been used for many years in important scientific fields like remote sensing or industrial implementations. As a matter of example, Fig. 2 shows the different paths that a photon can have when it hits the sample and before it arrives to the detector. In remote sensing, moreover, there might be an interaction with dispersed solids in the atmosphere or aerosols, like the clouds. In that case, the photon can have the same effects as with the sample. Moreover, there might be photons that are transmitted toward the cloud and arrive to the sample and photos that are directly reflected to the detector before arriving the sample. FIGURE 3 RGB picture of part of a hand and the corresponding image taken at 950 nm. 20 SECTION j I Introduction 3. Acquisition modes HSI and MSI images are basically data cubes with two spatial dimensions (normally denoted as X and Y directions) and a third dimension with the spectral information per pixel (l). The cameras to acquire such data cubes can be classified according to the mechanical procedure to acquire an image. They can be grouped in spectral scanning (area and snapshot scanning), spatial scanning (point and line scanning), spatial-spectral scanning [4], and snapshot imaging [1]. Fig. 4 shows a graphical representation of point, line, and plane scan configurations. 3.1 Area or plane scanning. Global imaging The most straightforward way to obtain an HSI or MSI image is by taking a snapshot of all the spatial information at one single wavelength. Then, the wavelength can be tuned, and a new snapshot can be taken, allowing the record of 2D images one wavelength after the other. This is also known as global imaging, and the sequence of signal at different wavelengths gives a complete data cube with two spatial dimensions (X and Y) and one spectral dimension (l). This configuration is the preferred one for MSI systems, where a limited number of wavelengths are used. Several snapshot approaches are making inroads in bioimaging, such as tomography [6] and microscopy [7]. The cameras are normally very fast in the acquisition of images, and they have affordable prices. The light source is normally the environmental light, if a FIGURE 4 Point, line, and plane scan configurations in hyperspectral and multispectral (only the plane scan) imaging devices and the structure of the final data cube of dimensions X � Y � l. Figure extracted from J.M. Amigo, Practical issues of hyperspectral imaging analysis of solid dosage forms, Analytical and Bioanalytical Chemistry 398 (2010) 93e109. doi:10.1007/s00216- 010-3828-z. and slightly modified with permission of Springer. Configuration of hyperspectral Chapter j 1.2 21 multiobjective camera or a camera with tuneable filters is used. Nevertheless, there are versions of MSI devices using light emission devices as source of light [8]. The main drawback of these types of cameras is that the chemical information is normally poor, except for those cameras working with acousto- optic tunable filters (AOTF) in which a higher amount of wavelengths can be measured. Another drawback is the fact that the sample must remain still in order to acquire a sharp image. 3.2 Point scanning. Single-point mapping or whisker-broom imaging It is a scanning methodology that consists of the acquisition of one spectrum at a single spatial location (pixel) by moving the sample and, therefore, giving one- dimensional spectral information in every measurement. The data cube is then obtained by multiple scans in two orthogonal spatial directions (X and Y). These types of configurations result in a time-consuming measurement as it entails the acquisition of a single spectrum at a time, practically working as a normal spectrometer. However, it reduces the side effect of the sample illumi- nation as it guarantees the retention of a constantlighting path between the optical system and the sample. Moreover, it also guarantees a higher spectral resolution, which could be much more important than speed in specific appli- cations. These cameras are normally expensive and require additional supplies of, for instance, liquid nitrogen to cool down the temperature of the sensors. 3.3 Line scanning or push-broom imaging Line scanning systems are other spatial scanning imagers [9]. For each scan the sensor acquires the intensity spectrum of multiple spatial positions in one of the spatial dimensions (X or Y). Each line gives a 2D spatial-spectral in- formation and, thanks to a line by line augmentation, an HSI data cube is obtained with only one direction movement between the sample and the de- tector. Generally, the direction is transverse to the slit, turning into a conve- nient solution for industrial implementation such as conveyor belts systems. Recording a line at a time, push-broom systems are time-saving compared to whisker-broom systems, reaching up to a 100 times faster performance depending on the scanning area [10]. Nowadays, this is one of the preferred technologies for bench-top instruments but also in industrial applications. 3.4 Spatial-spectral or spatiospectral scanning Another methodology is what is known as time-scan imaging. With this method a set of images is acquired and then superimposed along the spectral or spatial dimension and transformed to the spectral image using, for example, Fourier methods [11]. 22 SECTION j I Introduction 3.5 Snapshot imaging The snapshot imaging is the only method in which there is no spatial or spectral scanning. The snapshot methodology aims at recording both spatial and spectral information of the sample with one single shot (i.e., it captures a hyperspectral image in a single integration time of a detector array). This type of hyperspectral imaging is still in evolution, since average size areas can be measured at a relative few wavelengths, mostly in the wavelength range of 400e910 nm [1], making it more a multispectral device rather than a hyper- spectral one. Nevertheless, the high speed of acquiring a multispectral image (that can go to 20 multispectral images per second) makes it worth to explore technology for video recording of scenes or real-time actions. 3.6 Encoding the data When a hyperspectral image is acquired, it is stored in a file where the spectra must be ordered in a logical manner in order to be able to reconstruct the data cube in any software. The most common formats are the band interleaved by pixel (BIP), band interleaved by line (BIL), and band sequential (BSQ). BIP is the encoding system in which the first pixels of all bands are placed in sequential order, followed by the second pixels of all bands, and so on. Instead, BIL stores the first line of the first band followed by the first line of the second band, etc. BSQ refers to the method in which each line of the data is followed immediately by the next line in the same spectral band. This format is optimal for the access of any part of a single spectral band, being the most used in MSI. Nowadays, the fact of storing the MSI or HSI images in any format does not make any difference in the sense of data reconstruction or retrieval, since most software contain dedicated algorithms to reconstruct the data cube properly. 4. Light source MSI and specially HSI systems can be considered as the merging of a spec- trometer and a vision-based device. Thus, the radiation source plays a key role in the setup since both, spectrometers and vision-based devices, are based on the lightematter interaction. Therefore, it is essential to design a lighting system to maximize the performance of the HSI system for the specific purpose. In the case of cameras implemented in satellites or airborne devices the light source is the sun. The sun behaves as a black body, emitting light in the electromagnetic spectrum of the ultraviolet (UV), visible, and near-infrared (NIR) wavelength range (Fig. 5). Concerning bench-top instruments, the selection of the light source should take into consideration the emitting range of the source, i.e., the source should emit in the spectral range necessary for the defined aim. Moreover, it is preferred the use of a source warranting the highest homogeneity in Configuration of hyperspectral Chapter j 1.2 23 illumination over a large area and the preservation of the samples. Regardless of the technology or the spectral range considered, HSI systems normally need more light than other vision-based systems. Indeed, after the light emission, before or after interacting with the sample, the radiation is always dispersed into narrow wavelength bands which are then measured individually. As for spectrometers, the available sources are mainly constituted of halogen lamps, light-emitting diodes (LEDs), and lasers. Besides these sources, with specific performance detailed below, xenon lamps, low-pressure metal vapor lamps, ultraviolet fluorescent lamps have been implemented for excitation mean for fluorescence spectroscopy. The optimal illumination must be as homogeneous as possible, covering a large area and without damaging the sample [5]. Moreover, how the light source arrives to the sample will strongly depend on the geometry of the sample and the light geometry as well (Fig. 6). The lighting configuration will vary according to the different acquisition configurations seen before (Fig. 4). In this way, line mapping systems are characterized by the fact that only the specific spatial position of the line being acquired in every step needs to be illuminated in every moment. In most cases, the light sources are normally placed forming a 45 degrees angle with the sample. Plane scan configurations require an illumination able to cover a much larger area with a homogeneous distribution of the light. 4.1 Halogen lamps In the operating range of ultraviolet, visible, and NIR regions, the appropriate illumination could be achieved by halogen lamps, generally tungsten halogen lamps (Fig. 7). FIGURE 5 Spectral irradiance spectra of atmospheric terrestrial and extraterrestrial light. 24 SECTION j I Introduction The broadband illumination emitted by halogen lamps has a smooth and continuous spectrum (Fig. 7) which is convenient for different acquisition modes. Generally, good performances have been achieved for reflectance applications. Nevertheless, when implementing high-intensity bulbs, also transmittance measurements could be achieved. The main advantages of halogen lamps are their commercial availability, the low costdlinked to bulb price and low voltage requireddand the wide elec- tromagnetic range covered (340e2500 nm; 29,500e4000 cm�1). However, some disadvantages should be remarked. One of the main one is the heat generated through the incandescent emission of the tungsten filament. The high temperature generated in the filament could be dangerous for temperature- sensitive samples [5], from explosive materials to ancient/precious pieces of art; moreover, changes related to temperature could bring to the shift of spectral peaks. Finally, in the instrument setup it should be considered that halogen lamps are also sensitive to vibrations, which could lead to source damage. 4.2 Light-emitting diodes LED solutions have been developed as a cheap alternative to generate light in VISeNIReHSI systems. Even though LEDs emit a monochromatic light FIGURE 6 Different geometries that the light sources can adopt, highlighting the emitted and reflected (or transmitted) light path, and how light interacts with the sample. Configuration of hyperspectral Chapter j 1.2 25 in the VIS region, generating different narrow-band wavelength (Fig. 7) depending on the material used for pen junction, there are several ways to combine them to generate white light; mainly dichromatic, trichromatic, or tetrachromatic approaches can be implemented in HSI systems [8]. Indeed, from two monochromatic visible-spectrumemitters, one emitting in the blue and the other one in the yellow spectral region, it is possible to generate complementary wavelengths. A more efficient dichromatic approach consisted in a blue-emitting LED generates white light combined with a semiconductor, AlGaInP, which acts as a wavelength converter. Higher-quality white light can also be generated by the mixing of three primary colors (trichromatic), or more (tetrachromatic). In case of trichromatic approaches, mainly GaInN blue, GaInN green, and AlGaInP orange emitters are implemented covering the VIS range (400e660 nm) with Gaussian distributions around the maximums at 455 nm, 525 nm, and 605 nm. In the infrared range, the first reported LEDs were based on GaAs covering a region of 870e980 nm [12]. From then, high number of LEDs system patents have been presented. Among them it is worth to mention FIGURE 7 Black, reflectance of spectralon reference material for camera calibration. Red, en- ergy emission of a Tungsten Halogen lamp at 3300 K. Green, emission of a green LED. Dark red, behavior of a band-pass filter at 1200 nm. The Y axis scale only belongs to the reflectance of Spectralon. 26 SECTION j I Introduction the patent presented by Rosenthal [13] developing an NIR quantitative analysis instrument with an LED light source covering 1200e1800 nm region by isolating and employing harmonic wavelengths emitted by an eight array of commercially available LEDs optically isolated via opaque light baffles. The implementation of LEDs as light source in imaging systems brings to various advantages, like the high-energy efficiency, long lifespan (10,000e50,000 h), low maintenance cost, small size, fast response, and enhanced reliability and durability even in case of vibrations. 4.3 Lasers As tungsten lamps and LEDs are highly implemented for visible and NIReHSI systems in reflectance and transmittance modes, lasers are highly implemented for Raman and fluorescence measurements. Indeed, lasers are real monochromatic sources with high power and directional energy, thus acting as proper excitation sources. In this case, measurements are based on changes of the incident light, after the interaction with the sample, in terms of intensity at different wavelengths. The light intensity changes, also defined as frequency shift, cause Raman scattering or fluorescence emission related to the chemical composition of the sample under study. Lasers have been implemented in HSI system as continuous wave or pulse mode emitters in different Raman and fluorescence imaging applications [14e16]. 4.4 Tunable light sources with wavelength dispersion devices Fig. 1 shows a configuration in which the wavelength dispersive device is placed before it arrives to the sample. That is, the sample is hit with light at specific wavelengths because the incident light is filtered or dispersed before arriving to the sample. This type of illumination is achieved with wavelength dispersion devices that will be introduced in the next section. 5. Wavelength dispersion devices The emitted broadband light by the light source or the reflected broadband light reflected by the sample (see Fig. 1 for seeing the different configurations) is dispersed by wavelength-dispersing elements before arriving to the detector. This part is an essential element in any hyperspectral and multispectral configuration. Here we empathize features of the most commonly used dispersive devices, like variable and tunable filters, imaging spectrographs, and Fourier-transform spec- trometers, acknowledging that this is a field in continuous development and more devices are continuously developed (e.g., the computer-generated hologram disperser used in snapshot hyperspectral imaging [17]). Configuration of hyperspectral Chapter j 1.2 27 5.1 Variable and tunable filters In spectral scanning systems, i.e., area scanning, variable or tunable filters are normally implemented as wavelength dispersion devices and can be positioned between the sample and the detector or between the lighting system and sample (Fig. 1). The variable filters are the simplest methods for wavelength selection. They consist in band-pass filters assembled on a wheel surface, whose rotation causes the light transfer through the different band-pass filters. The filters are characterized by a narrow band gap, allowing the pass of only a narrow part of the frequency of light (Fig. 7). Different features of the filters to consider are the center wavelength (CWL), filter width at half maximum (FWHM), and the peak transmission (PT). Filters are normally constructed with dielectric ma- terials that will allow the pass of the light at certain CWL, characterized by an FWHM. Depending on the material, there will be only a small fraction of the light intensity that will pass through the filter, which is the PT. They are mainly used for MSI devices as they acquire few separated wavelengths. In global imaging (plane scan) systems, the so-called tunable filters [18] are the most common lightning systems, mainly represented by electronically tunable filters such as liquid crystal (LC) and AOTF. AOTFs use acoustic wave frequencies to deform a birefringent crystal, normally tellurium dioxide, which acts as a grating by dispersing light in different wavelengths in a given di- rection [19]. The liquid crystal tunable filter (LCFT) is another optical filter that uses electronically controlled LC elements to transmit the selected wavelength [1,19]. One of the most classical examples of LCTF is the Lyot filter, constructed with birefringent plates or LyoteOhman system [20]. It works by varying a voltage on two linear polarizers which causes the polari- zation of an LC interlayer. Thus, the light is transmitted in narrow-band wavelengths with a resolution of several nanometers. Tunable filters can be customized according to the desired wave range. They can be used in array in which each filter selects a spectral band, allowing the exposure time optimi- zation for each separate wavelength. The main disadvantage is the unmodifi- able spectral resolution, which depends on the hardware structure. 5.2 Imaging spectrographs In general terms, a grating is a structure composed by transmitting or dispersing elements separated in periodic mode capable to split the broadband light into several beams with their own amplitude and phase. In point and line scanning systems, the dispersive element is normally a high-efficiency prism, even though diffraction grating solutions are available. Indeed, gratings are normally used in combination with prisms [21]. There are two main approaches in the con- struction of imaging spectrographs: reflection and transmission gratings. A reflection grating is mainly composed by reflective optical components (mirrors), which can be classified on the base of surface geometry, thus, 28 SECTION j I Introduction distinguishing plane (concave) and curved (convex) gratings. There are many different optical configurations, allowing the generation of high- quality images in terms of signal-to-noise ratio (SNR), absence of high- order aberration, and large field size [22]. On the other hand, reflecting gratings require expensive solutions to correct natural induced distortions. The solutions are often implemented in line scanning systems and could be an appropriate solution in case of Raman and fluorescence imaging, which requires high-reflective efficiency as low lighting conditions should be completely profited. The dispersive element is generally constituted by a transmission grating placed between two prisms with the integration of short and long pass filters. The light from the source is collimated to the first prism, the reflected beam reaches the grating, and it diffracts the light to the second prism. Thus, the light propagation depends on the specific wavelength, being the central waves propagated symmetrically and the external ones (shorter and longer) dispersed above and below the central wavelengths. The dispersedwavelengths are then projected, through a back lens, to the detector. 5.3 FT-imaging spectrometers Interferometers are widely used in infrared and Raman spectroscopies. They are based on the self-interference of a broadband light, which is split in two equal beams and recombined after different paths. The principle is easily translatable into imaging systems. The basis of the interferometer is that one beam of light is deviated to a fixed mirror and the other is oriented to a moving mirror. The beams are reflected back and recombined, but the changes in distance of moving mirror generate optical path difference and, thus, inter- ference, which is collected as interferogram by the detector and converted into a spectrum by Fourier transform. The mirror translation is highly influenced by vibrations, which can change mirror positions and thus signal angle recombination. Nevertheless, more reliable solutions to this problem have proposed in recent years, such as the corner cube Michelson interferometer in Fourier transform LWIR hyperspectral camera [23]. Even the Sagnac interferometer has been implemented as robust alternative in HSI systems [24]. In this case, the two beams originated from the beam splitter cover a ring path, reflecting by two or more mirrors and originating a triangular or squared path before exiting the interferometer. The changes in the interference are generated by the rotation of the beam splitter in a stepwise manner. The ring interferometer strategy allows more stable HSI solutions and covers shorter-wavelength range (up to visible region). In any case, interferometers are high sensible solutions when compared to other wavelength dispersion filters as they can reach high spectral resolution along a wide spectral range. Configuration of hyperspectral Chapter j 1.2 29 6. Detectors The detectors are designed to collect the incident light and convert it into electrical signals that can be transduced to visual interpretation of the spectral signature. In HSI and MSI, the detectors are normally mounted in the shape of focal plane array (FPA) architectures. An FPA is a sensing array (normally rectangular) assembled at the focal plane of the device that is composed by hundreds of individual detectors. Those detectors are, indeed, the ones in charge of transforming the light incoming into electronic pulses. In the market, there are two main solid-state detectors, the charge-coupled devices (CCDs) and the complementary metal-oxide-semiconductor detectors (CMOS), together with the main variations of both architectures. CCDs are devices composed by many light-sensitive materials (a.k.a. photodiodes) such as silicon (Si), germanium (Ge), indium gallium arsenide (InGaAs), indium antimonite (InSb), or mercury cadmium telluride (HgCdTe). Silicon (Si) is the most widely used material in semiconductor devices due to its low cost, relatively simple processing, and useful temperature range. Combined with other materials, it is nowadays possible to achieve CCDs that work in the range of 400e1000 nm (Si arrays) or even between 1000 and 5000 nm (InSb, HgCdTe, InGaAs) [25,26]. The CCD detector is characterized by high-quality images when there is sufficient light reaching the sensor. Nevertheless, for other applications in which the light intensity is low (e.g., fluorescence and Raman), high-performance cameras with electron multi- plying CCD or intensified CCD are usually preferred due to the high SNR [1]. Despite the advantages of CCDs, the CMOS are the alternative that is gaining more acceptance, since they include both photodetector and readout amplification for every pixel. This makes that the architecture of a CMOS is less complex than in CCDs, since the transformation light/electricity is done for every single pixel, without the vertical and horizontal collectors of the CCDs. That means lower manufacturing and power consumption cost. Nevertheless, the main problem of CMOS is the higher spectral noise. There is a continuous trend to compare which type of detector (CCD or CMOS) performs better. This is a very difficult question, and it strongly de- pends on many factors. Litwiller wrote a very well-detailed manuscript [26] comparing both architectures, pros and cons, making clear that both archi- tectures have room to grow in the coming years. 7. Calibration HSI and MSI cameras are mostly spectroscopic devices. Therefore, as all spectrometers, they need to be calibrated in order to obtain a reliable spectral information. But, moreover, they collect spatial information that must be well correlated with the ground coordinates (XeY spatial directions) of the scene that is measuring. Variations in the intensity of the light source, the capability 30 SECTION j I Introduction of all the sensing technologies involved in the process, mechanical parts, and vibrations may create spatial and spectral signatures that are biased from the real ones. Therefore, calibrating the camera is an essential step to be done before and during the collection of images to be sure that the spatial and spectral signatures are collected in the right conditions. When an HSI or MSI camera is purchased, the spatial response of the camera and the correct po- sition of the wavebands have been already calibrated by the manufacturer. Nevertheless, there are some operations in calibrating the lenses and the spectral response that must be done before, and sometimes during, the measurements. The spatial calibration is aimed at determining the range and the resolution of the spatial information collected. It will depend on the type of acquisition method used (line scan or plane scan). In bench-top instruments this calibra- tion can be easily performed by adjusting the focal distance and the lenses of one of the lines or images at a specific wavelength by using a printed checkerboard (Fig. 8). The reflectance calibration accounts for correcting and adjusting the 0%e 100% reflectance values for the sensors. This is done by taking an image of the dark response (0% of reflectance obtained by turning off the light sources or covering the lenses with nonreflective opaque black cap), and an image of the background response (100% reflectance obtained by measuring a uniform, high-reflectance standard or white ceramic-like Spectralon) (Fig. 7) [10]. With these two images, the relative reflectance image of the sample (I) is then obtained from the raw spectral image (I0) as follows: I ¼ ðIo � DÞ ðW � DÞ (1) where D and W correspond to the dark and background reference images, respectively. Accounting that absorbance follows a linear behavior, the FIGURE 8 Example of printable checkerboards (the USAF 1951 and a customized one) used for line mapping and plane scan HSI and MSI cameras. HSI, hyperspectral imaging; MSI, multi- spectral imaging. Configuration of hyperspectral Chapter j 1.2 31 consequent step is to transform the reflectance into absorbance as shown in Eq. (2): A¼ � log10 �ðI0 � DÞ W � D � (2) The operations in remote sensing devices are, somehow similar, with two main drawbacks. The spectral calibration must be done in a prelaunching step and, moreover, the light sources to calibrate the cameras in the laboratory are different from the light of the sun (the laboratory signals are too low in the blue and too high in the red). Therefore, the prelaunch radiance calibration is done by using light diffusers or radiometric spheres [1]. Nevertheless, in this field and specially in drone imaging, many alternatives are continuously proposed for obtaining in-flight radiance calibrations and corrections [27e29]. 8. Conclusions Hyperspectral and multispectral devices are continuously improving. That is a fact. More efficient sensors are developed, higher spatial and spectral resolution and shorter measurement time of the data cube are being generated by a growing offer that supplies the high demand of cameras. In this chapter, we have highlighted the most common aspects to what concerns instrumentation. The acquisitionmode is arguably a decision-making point in the purchase of a camera. It will depend on the final application to know whether a line scan or a single-point system (or any other configuration) is needed. This is a difficult decision to make that depends on the spatial and spectral resolution needed and also the speed of the measurement. Light sources also play an important role in the decision of purchasing a camera, since depending of the application the light source should comply not only with the spatial-spectral needs but also with the setup possibilities. From a customer point of view, we trend to put less attention to the wavelength-dispersive devices imple- mented and the detectors. Fortunately, these two elements are normally quite robust and reliable and depend on the manufacturer. Nevertheless, it is important to know what is the dispersive device implemented in the purchased camera as well as to know which type of CCD or CMOS configurations we purchase (if they are the ones implemented as detectors). Considering all these aspects, we acknowledge that what we look as cus- tomers is a combination of speed, high spatial and spectral resolution, and, of course, an affordable price. These aspects will be treated in another chapter of this book. Nevertheless, it seems that the direction to go is the new technol- ogies of single-shot devices, which are able to measure a data cube in milli- seconds with a reasonable spatial and spectral resolution. All in all, we can conclude that further and better advances in instrumentation are expected in a very promising future of HSI and MSI devices. 32 SECTION j I Introduction References [1] J. Qin, Hyperspectral imaging instruments, in: Hyperspectral Imaging Food Qual. Anal. Control, 2010, pp. 129e172, https://doi.org/10.1016/B978-0-12-374753-2.10005-X. [2] V.F. Weisskopf, How light interacts with matter, Scientific American 219 (2010) 60e71, https://doi.org/10.1038/scientificamerican0968-60. [3] J.A. Abbott, Quality measurement of fruits and vegetables, Postharvest Biology and Technology 15 (1999) 207e225, https://doi.org/10.1016/S0925-5214(98)00086-6. [4] Y. Garini, I.T. Young, G. McNamara, Spectral imaging: principles and applications, Cytometry, Part A 69 (2006) 735e747, https://doi.org/10.1002/cyto.a.20311. [5] J.M. Amigo, Practical issues of hyperspectral imaging analysis of solid dosage forms, Analytical and Bioanalytical Chemistry 398 (2010) 93e109, https://doi.org/10.1007/ s00216-010-3828-z. [6] N. Hagen, M.W. Kudenov, Review of snapshot spectral imaging technologies, Optical Engineering 52 (2013) 090901, https://doi.org/10.1117/1.oe.52.9.090901. [7] T. Gottschall, T. Meyer, M. Schmitt, J. Popp, J. Limpert, A. Tünnermann, Advances in laser concepts for multiplex, coherent Raman scattering micro-spectroscopy and imaging, Trends in Analytical Chemistry 102 (2018) 103e109, https://doi.org/10.1016/j.trac.2018.01.010. [8] E.F. Schubert, Light-emitting Diodes, second ed., 2006, https://doi.org/10.1017/ CBO9780511790546. [9] J.M. Amigo, H. Babamoradi, S. Elcoroaristizabal, Hyperspectral image analysis. A tutorial, Analytica Chimica Acta 896 (2015) 34e51, https://doi.org/10.1016/j.aca.2015.09.030. [10] G. ElMasry, D.W. Sun, Principles of hyperspectral imaging technology, in: Hyperspectral Imaging Food Qual. Anal. Control, 2010, pp. 3e43. https://doi.org/10.1016/B978-0-12- 374753-2.10001-2. [11] S. Grusche, Basic slit spectroscope reveals three-dimensional scenes through diagonal slices of hyperspectral cubes, Applied Optics 53 (2014) 4594. https://doi.org/10.1364/ao.53.004594. [12] T.M. Quist, R.H. Rediker, R.J. Keyes, W.E. Krag, B. Lax, A.L. McWhorter, H.J. Zeigler, Semiconductor maser of GaAs, Applied Physics Letters 1 (1962) 91e92. https://doi.org/10. 1063/1.1753710. [13] R.D. Rosenthal, Using LED Harmonic Wavelengths for Near-Infrared Quantitative, 5,218,207, 1993. [14] S. Begin, B. Burgoyne, V. Mercier, A. Villeneuve, R. Vallee, D. Cote, Coherent anti-Stokes Raman scattering hyperspectral tissue imaging with a wavelength-swept system, Biomed- ical Optics Express 2 (2011) 1296e1306. https://doi.org/10.1364/BOE.2.001296. [15] J. Ando, A.F. Palonpon, M. Sodeoka, K. Fujita, High-speed Raman imaging of cellular processes, Current Opinion in Chemical Biology 33 (2016) 16e24. https://doi.org/10.1016/ j.cbpa.2016.04.005. [16] S. Delwiche, J. Qin, K. Chao, D. Chan, B.-K. Cho, M. Kim, Line-scan hyperspectral im- aging techniques for food safety and quality applications, Applied Sciences 7 (2017) 125. https://doi.org/10.3390/app7020125. [17] C.E. Volin, M.F. Hopkins, E.L. Dereniak, T.M. Gleeson, M.R. Descour, D.W. Wilson, P.D. Maker, Demonstration of a computed-tomography imaging spectrometer using a computer-generated hologram disperser, Applied Optics 36 (2008) 3694. https://doi.org/10. 1364/ao.36.003694. [18] H.H. Szu, M. Vetterli, W.J. Campbell, J.R. Buss, N. Gat, O. Systems, Imaging spectroscopy using tunable filters : a review, SPIE - The International Society of Optical Engineering 4056 (2000) 50e64. Configuration of hyperspectral Chapter j 1.2 33 https://doi.org/10.1016/B978-0-12-374753-2.10005-X https://doi.org/10.1038/scientificamerican0968-60 https://doi.org/10.1016/S0925-5214(98)00086-6 https://doi.org/10.1002/cyto.a.20311 https://doi.org/10.1007/s00216-010-3828-z https://doi.org/10.1007/s00216-010-3828-z https://doi.org/10.1117/1.oe.52.9.090901 https://doi.org/10.1016/j.trac.2018.01.010 https://doi.org/10.1017/CBO9780511790546 https://doi.org/10.1017/CBO9780511790546 https://doi.org/10.1016/j.aca.2015.09.030 https://doi.org/10.1016/B978-0-12-374753-2.10001-2 https://doi.org/10.1016/B978-0-12-374753-2.10001-2 https://doi.org/10.1364/ao.53.004594 https://doi.org/10.1063/1.1753710 https://doi.org/10.1063/1.1753710 https://doi.org/10.1364/BOE.2.001296 https://doi.org/10.1016/j.cbpa.2016.04.005 https://doi.org/10.1016/j.cbpa.2016.04.005 https://doi.org/10.3390/app7020125 https://doi.org/10.1364/ao.36.003694 https://doi.org/10.1364/ao.36.003694 [19] H.R. Morris, C.C. Hoyt, P.J. Treado, Imaging spectrometers for fluorescence and Raman microscopy: acousto-optic and liquid crystal tunable filters, Applied Spectroscopy 48 (2005) 857e866. https://doi.org/10.1366/0003702944029820. [20] J. Beeckman, Liquid-crystal photonic applications, Optical Engineering 50 (2011) 081202. https://doi.org/10.1117/1.3565046. [21] C.H. Palmer, Diffraction Grating Handbook (2016). https://doi.org/10.1364/JOSA.46. 000050. [22] D. Bannon, R. Thomas, Harsh environments dictate design of imaging spectrometer, Laser Focus World 41 (2005) 93. [23] M. Fridlund, D. Bergström, T. Svensson, L. Axelsson, T. Hallberg, Design, Calibration and Characterization of a Low-cost Spatial Fourier transform LWIR Hyperspectral Imaging Camera With Spatial and Temporal Scanning Modes, 2018, p. 33. https://doi.org/10.1117/ 12.2304628. [24] Y. Ferrec, Optimal geometry for Sagnac and Michelson interferometers used as spectral imagers, Optical Engineering 45 (2006) 115601. https://doi.org/10.1117/1.2395923. [25] J.S. MacDonald, S.L. Ustin, M.E. Schaepman, The contributions of Dr. Alexander F.H. Goetz to imaging spectrometry, Remote Sensing of Environment 113 (2009). https://doi.org/ 10.1016/j.rse.2008.10.017. [26] D. Litwiller, CMOS vs. CCD: maturing technologies, maaturing markets, Photonics Spectra 39 (2005) 54e61. https://doi.org/10.1016/S0030-3992(03)00078-1. [27] T. Hakala, L. Markelin, E. Honkavaara, B. Scott, T. Theocharous, O. Nevalainen, R. Näsi, J. Suomalainen, N. Viljanen, C. Greenwell, N. Fox, Direct reflectance measurements from drones: sensor absolute radiometric calibration and system tests for forest reflectance characterization, Sensors 18 (2018). https://doi.org/10.3390/s18051417. [28] P. Wang, J. Zhang, Y. Lan, Z. Zhou, X. Luo, Radiometric calibration of low altitude mul- tispectral remote sensing images, Nongye Gongcheng Xuebao/Transactions Chinese Society of Agricultural Engineering 30 (2014) 199e206.https://doi.org/10.3969/j.issn.1002-6819. 2014.19.024. [29] Y. Guo, J. Senthilnath, W. Wu, X. Zhang, Z. Zeng, H. Huang, Radiometric calibration for multispectral camera of different imaging conditions mounted on a UAV platform, Sus- tainability 11 (2019) 978. 34 SECTION j I Introduction https://doi.org/10.1366/0003702944029820 https://doi.org/10.1117/1.3565046 https://doi.org/10.1364/JOSA.46.000050 https://doi.org/10.1364/JOSA.46.000050 https://doi.org/10.1117/12.2304628 https://doi.org/10.1117/12.2304628 https://doi.org/10.1117/1.2395923 https://doi.org/10.1016/j.rse.2008.10.017 https://doi.org/10.1016/j.rse.2008.10.017 https://doi.org/10.1016/S0030-3992(03)00078-1 https://doi.org/10.3390/s18051417 https://doi.org/10.3969/j.issn.1002-6819.2014.19.024 https://doi.org/10.3969/j.issn.1002-6819.2014.19.024 Chapter 2.1 Preprocessing of hyperspectral and multispectral images José Manuel Amigoa,* and Carolina Santosb aProfessor, Ikerbasque, Basque Foundation for Science; Department of Analytical Chemistry, University of the Basque Country, Spain; Chemometrics and Analytical Technologies, Department of Food Science, University of Copenhagen, Denmark; bDepartment of Fundamental Chemistry, Federal University of Pernambuco, Recife, Brazil *Corresponding author. e-mail: jmar@life.ku.dk 1. Why preprocessing? Hyperspectral imaging (HSI) and multispectral imaging (MSI) are analytical techniques based in the study of the chemical and physical behavior of the reflected or scattered light coming from a specific surface. The camera (sensor), the surface, and the light source are the analytical elements involved in each measurement. And, as any analytical element, they provide a response that is composed by the relevant analytical information, spectral noise, and different artifacts [1]. Attending to the light source, the analytical information is related to the number of photons emitted arriving to the surface. This light source is also subjected to fluctuations, whether is the sun, in a remote sensing scenario, or a halogen lamp, for bench-top hyperspectral devices. In both cases, the energy emitted might vary with time. Moreover, concerning remote sensing, that energy will pass through different atmospheric conditions before arriving to the sample. The surface measured (sample) is hit by the photons coming from the light source. The chemical nature (composition) and the physical nature (relieve, roughness) of the surface make the photons behave differently in different parts of the sample. Thus, the number of reflected photons and their remaining amount of energy is composed not only by the analytical relevant information but also by the physical influence of the surface. When the photons arrive to the camera in reflectance mode, they are detected by sensors that are also subjected to instrumental noise. Moreover, different sensors have different sensitivity to the photons, making the spectral Hyperspectral Imaging. https://doi.org/10.1016/B978-0-444-63977-6.00003-1 Copyright © 2020 Elsevier B.V. All rights reserved. 37 https://doi.org/10.1016/B978-0-444-63977-6.00003-1 signature dependent not only of the arriving photons but also of the quality of the sensing device. The effects in each one of these three parts of the measurement cannot be considered as individuals, since the signal collected is a combination/mixture of all the effects. They are reflected in two main types of distortions: the distortions reflected in the geometry of the sample and the ones reflected in the spectral signal. When the imaging area is small enough, the lens and/or Earth curvature does not cause significant distortions. However, if this is not the case, distortions such as the ones shown in Fig. 1 are observed. The figure shows an image that has been geometrically distorted due to the sensor and its position in the moment of image acquisition. On the other hand, when the field of view is small, different types of distortions can be observed, such as the ones indicated by Fig. 2 [3]. This image was taken using a bench-top instrument. This point is important to consider, since, as it was said before, the stability of light source, the sample, and the sensor plays also a fundamental role, in such a way that bench-top instruments (HSI and MSI cameras adapted to a platform in the laboratory) are more affected by certain types of artifacts than mobile/portable cameras (cameras implanted in satellites, drones, or industrial setups), and vice versa. 1.1 Spatial/geometric distortions Spatial issues are those that arise from the geometry of the sample, from un- controlled movements of the camera, and from the optics of the camera. The uncontrolled movements of the imaging system are especially relevant in mobile and portable cameras. In satellite imaging, for instance, the earth rotation and curvature will generate a known distortion in the acquired image; while cameras implemented in drones or airborne cameras will suffer from wind exposure and also ability of the operator during the fly. Normal geometric aberrations are optical distortions, such as the well-known pincushion and barrel distortions shown in Fig. 1, the aspect ratio between scales in the vertical direction with respect to the horizontal or viewing geometry, among others [4]. Cameras implemented in industrial setups working in conveyor belts could have deformities in the acquired image. In bench-top instruments, the platform FIGURE 1 Original and distorted aerial images of houses. The image is a free access photo taken by Blake Wheeler from Unplash website (https://unsplash.com/). 38 SECTION j II Algorithms and methods https://unsplash.com/ where the camera is implemented is normally quite robust, avoiding any problem of deformity of the acquired image. However, if confocal equipment is being employed, a specific portion of the sample could be out of focus, providing spatial distortions. Nevertheless, all cameras suffer from the roughness of the sample or irregularities of the measured surface. One problem that is more important in bench-top instruments is the fact that many samples are not squared or are smaller than the field of view. That is, part of the acquired image contains irrelevant information of the surface where the sample is placed for the measurement. As example, Fig. 2 shows four clusters of plastics laying over a neutral black paper. 1.2 Spectral/radiometric distortions Spectral distortions are mainly due to data recording instruments, fluctuations of the light source, and the nature of the sample. Regardless the spectral technique employed for image acquisition, the most common spectral issue is the noise. Sensors nowadays have the ability of measuring the spectral information with a high signal-to-noise ratio (SNR). Nevertheless, spectral noise will still be pre- sent (e.g., spectra shown in Fig. 2 are somehow affected by noise). In image analysis scenario, another important aspect to consider is the saturation of light that some pixels can exhibit. Since samples are normally a distribution of different elements in the surface, it is normal, that due to the shape of the elements, their different chemical nature, and the incident angle of FIGURE 2 Left, false RGB of a hyperspectral image showing four different clusters of plastic pellets. The spectral range was from 940 to 1640 nm with a spectral resolution of 4.85 nm. Further information about the acquisition, type of hyperspectral camera, and calibration can be found in Ref. [3]. Right, raw spectra of the selected pixels (marked with an “x” in the left figure). The black line in the left figure indicates a missing scan line. This figure is partially inspired by J.M. Amigo, H. Babamoradi, S. Elcoroaristizabal, Hyperspectral image analysis. A tutorial, Analytica Chimica Acta 896 (2015) 34e51. https://doi.org/10.1016/j.aca.2015.09.030. Preprocessing of hyperspectral and multispectral images Chapter j 2.1 39 https://doi.org/10.1016/j.aca.2015.09.030 the light,many pixels contain saturated information. This can be observed in the bottom right part of the false RGB image of Fig. 2, where many pixels are white due to the saturation of light in the detector. Light scattering is also a major concern when talking about reflectance mode and, especially, about near-infrared radiation. Scattering (in additive or multi- plicative way) will make the absorbance to drift the baseline of the spectra in a nonparametric manner to higher or lower values of absorbances. Scattering is due to the nature of the elements but also to the different shape of the elements, the roughness of the surface, and the ability of the radiation to penetrate in the element. See, for instance, how the baseline of the spectra shown in Fig. 2 is different for all of them. In remote sensing, the effect of the atmospheric scattering is of a major concern. The atmospheric scattering is caused by gases such as oxygen, nitrogen, ozone and also aerosols-like airborne particulate matter or clouds. When talking about Raman spectroscopy, scattered light is the source of relevant information, and fluorescence is the main cause of baseline drift, which is usually highly intense. This issue rises either from the sample itself or the highly energetic particles hitting the detector [5]. The background also plays an important role in the spectral signatures. If the background is not sufficiently neutral in the signal, it will influence the pixels belonging to the edge between the background and the elements. Background information is not of major concern on remote sensing. The signal is, however, affected by the atmosphere and the presence of aerosols and particles in suspension, which is not that relevant in bench-top instruments. For both image scenarios, one crucial spectral issue is the presence of dead pixels. Images are usually a set of a high number of pixels, and eventually, there are pixels that contain spiked signals in the spectrum, or parts of the spectrum are saturated, or simply they do not contain any information (black line in the false RGB figure of Fig. 2). This is normally generated by the malfunction of the sensor (in the case of spiked signals or no information) or by a wrong light exposure (saturated levels of light). Different approaches have been proposed to retain the analytical signal and minimize the effect of the different issues commented before. This chapter offers a revision of the main methodologies for HSI and MSI preprocessing. Moreover, we will emphasize the benefits and drawbacks arising in the application of each one of them. 2. Geometric corrections of distortions coming from the instrument A geometric correction is a transformation of HSI or MSI image applied to each individual channel in such a way that the distorted image is translocated to a standard reference axis (e.g., projected to coordinates in maps). This type of corrections is common in remote sensing scenarios, where a reference map of the scanned land is normally available [6e8]. The correction steps are two: 40 SECTION j II Algorithms and methods finding a proper set of coordinates in the images and in the reference map and then a step of interpolation/resampling of the distorted image to the correct reference points. The most common geometric correction method is based on ground control points (GCPs). The GCPs are reference points that are common to the distorted image and the reference map. They are normally permanent elements like road intersections or airport runways. Once the GPCs are chosen, a step of interpolation is performed. The most common resampling methods are the nearest neighbor, the bilinear, and the bicubic interpolations (Fig. 3). The nearest neighbor resampling is the most straightforward way of resampling. It consists of assigning the nearest pixel value to the corrected pixel (Fig. 3). Bilinear interpolation, instead, considers the closest 2 � 2 neighborhood of known pixel values of the distorted image surrounding the unknown pixel. This method gives much smoother looking images than the nearest neighborhood (Fig. 3). Finally, the bicubic interpolation considers the 4 � 4 neighborhood, instead. Bicubic interpolation normally produces sharper images than the previous two methods. 3. Dead pixels, spikes, and missing scan lines Dead pixels, spiked points in the spectra, and missing scan lines are caused by a punctual malfunction of the detector or by a saturation of light in certain points of the surface measured [9]. They can be encoded as missing values, FIGURE 3 Top, distorted image (dashed squares) and corrected image (solid squares). The pixel of interest is highlighted in bold. Bottom, three of the methodologies for pixel interpolation, highlighting in each ones the pixels of the distorted image used for the interpolation. Preprocessing of hyperspectral and multispectral images Chapter j 2.1 41 zero or infinite values; and their location and spread might vary depending on the quality of the detector and the reflectance of the sample. They all can distort the further processing of the HSI or MSI image. This is highly prob- lematic, since many of the routines for data mining can only handle a limited amount of missing values. Missing scan lines and dead pixels occur when a detector fails to operate during a scan (Fig. 2). Detecting missing scan lines and dead pixels can be done using different algorithms with major or minor complexity (like thresholding techniques [10], genetic or evolutionary algorithms [11e13], orminimumvolume ellipsoid (MVE) [14]). A much simpler methodology is the establishment of a predefined threshold in the number of zero values allowed in the spectrum of the pixel. Evidently, the threshold must be selected considering the nature of the data. Spiked wavelengths are sudden and sharp rises and falls in the spectrum (Fig. 4) [15]. They are caused by an abnormal behavior of the detector or saturation of light in certain spectral region. Spikes can be normally distin- guished in an easy manner. They trend to present a high deviation from the mean value of the spectrum (Fig. 4). Therefore, if a proper threshold of mean and standard deviation is chosen, they can be easily detecting, always consid- ering difference between the normal signal, the spiked signal, and the SNR [16]. Once the missing scan lines, dead pixels, and spikes have been detected, they must be replaced by a more appropriated value. In the case of spiked wavelengths, the most straightforward way of replacing the spiked value is the substitution of that value by the mean or median of a spectral window centered in the spiked point (Fig. 4). For missing scan lines and dead pixels, many alternatives have been proposed in the literature [16e21]. Nevertheless, profiting the rich amount of spatial information that HSI and MSI normally provide, one straightforward manner to replace missing scan lines and dead pixels is substitute them by the mean or the median of the spectral of the neighboring pixels. FIGURE 4 Left, spectrum containing one spiked point. The continuous red line denotes the mean of the spectrum. The upper and lower dashed red lines denote the mean � six times the standard deviation. Right, corrected spectrum where the spike has been localized and its value substituted by an average of the neighboring spectral values. 42 SECTION j II Algorithms and methods 4. Spectral preprocessing Spectral preprocessing can be defined as the set of mathematical operations that minimize and/or eliminate the influence of undesirable phenomena affecting directly to the spectral signature obtained (e.g., light scattering, particle size effects, or morphological differences, such as surface roughness and detector artifacts [22]). In HSI, it is common to adapt the preprocessing methods coming from classical spectroscopy [5,23]. They can be divided into different families, attending their purpose. Fig. 5 shows an example of their effect in the spectral signal and in the visualizationof the surface information: - Smoothing/denoising: The instrumental noise can be partly removed by using smoothing techniques, being SavitzkyeGolay the most popular one [23]. SavitzkyeGolay methodologies are based on the selection of a sub- window around a specific point and calculating its projection onto a polynomial fitting of the points of the subwindow. It is simple to imple- ment. Nevertheless, special care must be taken in the selection of the spectral subwindow, since large subwindows will eliminate informative peaks, while small windows might generate more noise. - Scatter correction: Scattering is reflected in a drift in the baseline of the spectra (Figs. 2 and 6). That drift can be additive or multiplicative, depending on the nature of the sample and the physical interaction of the sample with the light. There are two main methods for scattering removal. The first one, standard normal variate (SNV), is the most straightforward method. It subtracts the mean of the spectrum and divides it by the standard deviation. SNV removes FIGURE 5 Top, raw spectra of Fig. 2 and the spectra after different spectral preprocessing methods. Bottom, the image resulting at 1220 nm for each spectral preprocessing. This sample belongs to a data set by J.M. Amigo, H. Babamoradi, S. Elcoroaristizabal, Hyperspectral image analysis. A tutorial, Analytica Chimica Acta 896 (2015) 34e51. https://doi.org/10.1016/j.aca. 2015.09.030. Preprocessing of hyperspectral and multispectral images Chapter j 2.1 43 https://doi.org/10.1016/j.aca.2015.09.030 https://doi.org/10.1016/j.aca.2015.09.030 additive scattering without changing the shape of the original spectrum. Nevertheless, it cannot handle multiplicative scattering effect. Therefore, multiplicative scatter correction (MSC) is preferred when multiplicative scattering appears. MSC is also a quite straightforward method, since it pro- jects the spectrum of pixels against one reference spectrum and then the cor- rected spectrum is the subtraction of the offset from the original spectrum divided by the slope [23]. Themain drawback ofMSC is that it depends of the correct selection of a reference spectrum. This is quite complicated to achieve in hyperspectral images, since the samples trend to be a mixture of different compounds with different spectra. - Derivatives: The SavitzkyeGolay methodology can also be used for calculating the derivative profile of the spectrum. In that sense, the sub- window of points chosen is first fitted to a polynomial degree and then the derivative is calculated. First (1D) and second (2D) derivatives are the most common ones in spectroscopy. First derivative removes the additive scat- tering; while second derivative removes multiplicative scattering [26]. Another effect of derivatives is their ability of highlighting minor spectral FIGURE 6 Example of spectral preprocessing for minimizing the impact of the shape of the sample in the spectra. Top left, RGB image of a nectarine. Top middle, the average signal of the pixels contained in the green line of the top left figure. Top right, spectra of the green line of the top left figure showing the effect of the curvature. Bottom middle, average signal of the preprocessed spectra (with standard normal variate (SNV)) of the green line of the top left figure. Bottom right, preprocessed spectra of the green line of the top left figure. This sample belongs to a data set by S. Munera, J.M. Amigo, N. Aleixos, P. Talens, S. Cubero, J. Blasco, Potential of VIS-NIR hyper- spectral imaging and chemometric methods to identify similar cultivars of nectarine, Food Control. 86 (2018). doi:10.1016/j.foodcont.2017.10.037; S. Munera, J.M. Amigo, J. Blasco, S. Cubero, P. Talens, N. Aleixos, Ripeness monitoring of two cultivars of nectarine using VIS-NIR hyperspectral reflectance imaging, Journal of Food Engineering 214 (2017) 29e39. https://doi.org/10.1016/j. jfoodeng.2017.06.031. 44 SECTION j II Algorithms and methods https://doi.org/10.1016/j.jfoodeng.2017.06.031 https://doi.org/10.1016/j.jfoodeng.2017.06.031 differences. Nevertheless, special care must be taken in the choice of the derivative degree and in the subwindow size, since a high derivative degree with a small window size can create high amount of noise, while large subwindows can eliminate informative parts of the spectra [23]. One of the main properties of some spectral preprocessing methods is that they are also able to remove physical artifacts reflected in the signal. This is the case of the scattering promoted by the roughness of the sample measured or its nonplanar shape. As a matter of example, Fig. 6 shows how SNV can minimize the impact of the round shape of a nectarine [24,25]. In the figure, it is shown the raw data and the preprocessed data of one line of the hyper- spectral cube. This line of spectra is clearly affected by the shape of the nectarine, effect that is minimized when SNV is applied. 5. Background removal The selection of the regions of interest (RoI) of a sample is an important step before the analysis of the sample. This is especially relevant when the ge- ometry of the sample does not cover the whole area measured (as illustrated in the example of Fig. 2). If the sample does not cover all the scanned area, the area left outside the sample is usually composed by highly noisy spectra, and thus, it might hamper the good performance of further models. Moreover, removing it implies a substantial saving of computing time. As a matter of fact, successful background removal starts at image acqui- sition step. For bench-top instrument, it is usually possible to choose an appropriate background which facilitates its segmentation from the elements of interest. Manual selection of the RoIs, the use of specific thresholds in the histogram of the image obtained at specific wavelengths, K-means clustering, or even using the score surfaces of a principal component analysis (PCA) model are some of the methodologies that can be employed [27]. All of them have their own implication in the final result also considering the nature and the shapes of the samples in the surface. For example, Fig. 7 shows the performance of three different methodologies for removing the background of the preprocessed hyperspectral image (missing lines removed and SNVapplied) shown in Fig. 7. As it can be seen in the figure, three different methodologies provide three different answers. In this case, discerning between the edges of the samples and the background is a hard task, since the edges are pixels that contain strongly mixed information of the spectral signature of the plastic and the background. Therefore, special care must be placed in the accidental removal of informative areas of the sample. 6. Some advices on preprocessing Preprocessing is, probably, the main engine for a successful interpretation of our hyperspectral and multispectral images. As a matter of example, Fig. 8 Preprocessing of hyperspectral and multispectral images Chapter j 2.1 45 shows an example of two PCA models performed on the hyperspectral image of the plastics. The first PCA model is made on the raw data (normalized prior PCA). It can be seen that even though PC1 explains almost 89% of the vari- ance, this variance is wasted explaining the difference between the background and the plastics (Table 1). The second PCA is made on the preprocessed data (using the spectra in first derivative and removing the background). The PCA clearly shows the spectral differences of the four plastics by using only two principal components, making the model much more understandable, parsimonious, and probably stable. Most of the software packages for processing HSI and MSI already provide some preprocessing algorithms. And in this chapter, we have shown the benefits and drawbacks of some of the methodologies that can be applied. Nevertheless, one of the major issues in applying preprocessing methods is that the effectiveness of the correction applied must be evaluatedafter the application of processing algorithms. That is, as is shown in Fig. 8, the effi- ciency of the preprocessing methodology must be evaluated after seeing the FIGURE 7 Depiction of three different methodologies for background removal. Left, false RGB image. Top, K-means analysis of the hyperspectral image in SNV and the selection of clusters 2 and 4 to create the mask. Middle, false color image obtained at 971 nm of the hyperspectral image in SNV and the result of applying a proper threshold to create the mask. Bottom, PCA scatter plot of the hyperspectral image in SNV with the selected pixels highlighted in red to create the mask. All the analysis have been made using HYPER-Tools [28], freely download in Ref. [29]. SNV, standard normal variate; PCA, principal component analysis. 46 SECTION j II Algorithms and methods results of PCA. Therefore, applying preprocessing is, most of the times, a game of trial and error, although there are specific reports of genetic algo- rithms for preprocessing optimization [30,31]. Some main blocks have been presented here. Table 1 collects all the methods revisited here giving an account of their major benefits and drawbacks. One question that might arise is the order and number of preprocessing steps that must be used. Unfortunately, there is not a specific answer for that question. Or, better said, the answer can be again a trial error game. Sometimes the background is easily removed from the raw data and then the data included in the RoI are pre- processed; and sometimes a spectral preprocessing is needed for removing the background while another spectral preprocessing is needed for the analysis of the sample. In any case, there are some major advices that can be given: - Parsimony. The simpler, the better: Preprocessing normally changes the spatial and the spectral information, in such a way that those changes can remove informative parts in our image. Moreover, it can also introduce artifacts or generate the loss of important information if the proper method is not selected or correctly applied. Therefore, the simpler a preprocessing methodology is, the better, as long as we achieve the desired results. - Spatial and spectral corrections are connected: Smoothing the spectra in an HSI sample will not only remove the spectral noise but also will smooth the images arising from the data. The application of spectral corrections has an implication in the surface and vice versa. - There is a price to pay: By applying preprocessing, there will always be lost information. It is our responsibility to lose only the information that we can consider noise and stay with the analytical relevant information. FIGURE 8 Comparison of two PCA models performed in the hyperspectral image of the plastics [3]. For each PCA model, the score surfaces of the first two PCs, the scatter plot of PC1 versus PC2 and the corresponding loadings are shown. All the analyses have been made using HYPER-Tools [28], freely download in Ref. [29]. PCA, principal component analysis; PC, principal component. Preprocessing of hyperspectral and multispectral images Chapter j 2.1 47 TABLE 1 Summary of the main preprocessing steps, the techniques for their application, and their benefits/drawbacks. Preprocessing step Techniques Types of images Benefits Drawbacks Dead pixels Detection Median spectra/ thresholding [10,32] MSI and HSI Easy to implement and calculate Highly dependent of the signal-to- noise ratio. Risk of false positives. Genetic/ evolutionary algorithms [9,12,13] MSI and HSI Robust and reliable To find the best combination of parameters to optimize the models. Chosen criteria MSI and HSI Easy to implement and calculate Difficulties in finding the proper threshold. Suppression Neighboring interpolation MSI and HSI Easy to implement and calculate If the cluster of dead pixels is big, there could be the risk of losing resolution in this area Spikes Detection Manual inspection MSI and HSI Robust and reliable Time-consuming, specially in HSI images Neighbor filters [15,18,19] Robust and reliable To find the best combination of parameters to optimize the filters. When the background is an important part of the image, there may be problems to differentiate the spikes. 48 SECTION j II Algorithms and methods TABLE 1 Summary of the main preprocessing steps, the techniques for their application, and their benefits/drawbacks.dcont’d Preprocessing step Techniques Types of images Benefits Drawbacks Wavelets [17,20,21] Robust and reliable The selection of the proper wavelet (in the spatial and the spectral channels) for each type of image. Chosen criteria Robust and reliable Difficulties in finding the proper threshold. Suppression Neighbor interpolation [15,17e21] Easy to implement and calculate Background/ ROI PCA thresholding [10,33] MSI and HSI Robust selection of a specific area based of PC scores images The selection of the proper threshold is tedious and not obvious in some situations. Manual Selection of the desired area Time-consuming, specially working with time series images or large data sets K-means [34] Easy to implement and calculate Spectral preprocessing Denoising SavitzkyeGolay smoothing [23] HSI Easy to implement To find the best combination of parameters to optimize the filter, especially the window size. Continued Preprocessing of hyperspectral and multispectral images Chapter j 2.1 49 TABLE 1 Summary of the main preprocessing steps, the techniques for their application, and their benefits/drawbacks.dcont’d Preprocessing step Techniques Types of images Benefits Drawbacks Scatter correction MSC, SNV [23] SNV does not change the shape of the spectra Sometimes the suppression of artifacts is not totally achieved. MSC and derived techniques need additional information and may change the shape of the spectra. Derivatives [23] Removal of different baseline artifacts To find the best combination of parameters to optimize the filter. Especially the window size and the derivative order. Geometric corrections Nearest neighbor interpolation [35] MSI and HSI Easy to implement To find the proper set of reference points to make proper interpolation. Bilinear interpolation [35] Easy to implement Smooth edges are created To find the proper set of reference points to make proper interpolation. Bicubic interpolation [35] Easy to implement Smooth edges are created Sharper images To find the proper set of reference points to make proper interpolation. HIS, Hyperspectral imaging; MSC, Multiplicative scatter correction; MSI, Multispectral imaging; PCA, Principal component analysis; SNV, Standard normal variate. Extracted and reproduced from M. Vidal, J.M. Amigo, Pre-processing of hyperspectral images. Essential steps before image analysis, Chemometrics and Intelligent Laboratory Systems 117 (2012) 138e148. https://doi.org/10.1016/j.chemolab.2012.05.009 and modified with permission of Elsevier. 50 SECTION j II Algorithms and methods https://doi.org/10.1016/j.chemolab.2012.05.009 References [1] M. Vidal, J.M. Amigo, Pre-processing of hyperspectral images. Essential steps before image analysis, Chemometrics and Intelligent Laboratory Systems 117 (2012) 138e148, https:// doi.org/10.1016/j.chemolab.2012.05.009. [2] Unplash, Unplash, (n.d.). https://unsplash.com/. [3] J.M. Amigo, H. Babamoradi, S. Elcoroaristizabal, Hyperspectral image analysis. A tutorial, Analytica Chimica Acta 896 (2015) 34e51, https://doi.org/10.1016/j.aca.2015.09.030. [4] P.K. Varshney, M.K. Arora, Advanced Image Processing Techniques for Remotely Sensed Hyperspectral Data, Springer, 2004. [5] T. Bocklitz, A. Walter, K. Hartmann, P. Rösch, J. Popp, How to pre-process Raman spectra for reliable and stable models? Analytica Chimica Acta 704 (2011) 47e56, https://doi.org/ 10.1016/j.aca.2011.06.043. [6] T. Toutin, Review article: geometric processing of remote sensing images: models, algo- rithms and methods,International Journal of Remote Sensing 25 (2004) 1893e1924, https:// doi.org/10.1080/0143116031000101611. [7] N.G. Kardoulas, A.C. Bird, A.I. Lawan, Geometric correction of SPOT and landsat Imag- ery : a comparison of M a p and GPS-derived control points, Photogrammetric Engineering & Remote Sensing 62 (1996) 1173e1177. [8] A.J. De Leeuw, L.M.M. Veugen, H.T.C. Van Stokkom, Geometric correction of remotely- sensed imagery using ground control points and orthogonal polynomials, International Journal of Remote Sensing 9 (2007) 1751e1759, https://doi.org/10.1080/ 01431168808954975. [9] R. Leardi, Genetic algorithms in chemometrics and chemistry: a review, Journal of Che- mometrics 15 (2001) 559e569, https://doi.org/10.1002/cem.651. [10] J. Burger, P. Geladi, Hyperspectral NIR image regression part I: calibration and correction, Journal of Chemometrics 19 (2005) 355e363. https://doi.org/10.1002/cem.986. [11] R. Leardi, Experimental design in chemistry: a tutorial, Analytica Chimica Acta 652 (2009) 161e172. https://doi.org/10.1016/j.aca.2009.06.015. [12] B. Walczak, Outlier detection in multivariate calibration, Chemometrics and Intelligent Laboratory Systems 28 (1995) 259e272. https://doi.org/10.1016/0169-7439(95) 80062-E. [13] P. Vankeerberghen, J. Smeyers-Verbeke, R. Leardi, C.L. Karr, D.L. Massart, Robust regression and outlier detection for non-linear models using genetic algorithms, Chemo- metrics and Intelligent Laboratory Systems 28 (1995) 73e87. https://doi.org/10.1016/0169- 7439(95)80041-7. [14] C. Junghwan, P.J. Gemperline, Pattern recognition analysis of near-infrared spectra by robust distance method, Journal of Chemometrics 9 (1995) 169e178. https://doi.org/10. 1002/cem.1180090304. [15] L. Zhang, M.J. Henson, A practical algorithm to remove cosmic spikes in Raman imaging data for pharmaceutical applications, Applied Spectroscopy 61 (2007) 1015e1020. https:// doi.org/10.1366/000370207781745847. [16] Z. Nenadic, J.W. Burdick, Spike detection using the continuous wavelet transform, IEEE Transactions on Biomedical Engineering 52 (2005) 74e87. https://doi.org/10.1109/TBME. 2004.839800. Preprocessing of hyperspectral and multispectral images Chapter j 2.1 51 https://doi.org/10.1016/j.chemolab.2012.05.009 https://doi.org/10.1016/j.chemolab.2012.05.009 https://unsplash.com/ https://doi.org/10.1016/j.aca.2015.09.030 https://doi.org/10.1016/j.aca.2011.06.043 https://doi.org/10.1016/j.aca.2011.06.043 https://doi.org/10.1080/0143116031000101611 https://doi.org/10.1080/0143116031000101611 https://doi.org/10.1080/01431168808954975 https://doi.org/10.1080/01431168808954975 https://doi.org/10.1002/cem.651 https://doi.org/10.1002/cem.986 https://doi.org/10.1016/j.aca.2009.06.015 https://doi.org/10.1016/0169-7439(95)80062-E https://doi.org/10.1016/0169-7439(95)80062-E https://doi.org/10.1016/0169-7439(95)80041-7 https://doi.org/10.1016/0169-7439(95)80041-7 https://doi.org/10.1002/cem.1180090304 https://doi.org/10.1002/cem.1180090304 https://doi.org/10.1366/000370207781745847 https://doi.org/10.1366/000370207781745847 https://doi.org/10.1109/TBME.2004.839800 https://doi.org/10.1109/TBME.2004.839800 [17] F. Ehrentreich, L. Summchen, Spike removal and denoising of Raman spectra by wavelet transform methods, Analytical Chemistry 73 (2001) 4364e4373. https://doi.org/10.1021/ ac0013756. [18] C.J. Behrend, C.P. Tarnowski, M.D. Morris, Identification of outliers in hyperspectral Raman image data by nearest neighbor comparison, Applied Spectroscopy 56 (2002) 1458e1461. https://doi.org/10.1366/00037020260377760. [19] C.V. Cannistraci, F.M. Montevecchi, M. Alessio, Median-modified Wiener filter provides efficient denoising, preserving spot edge and morphology in 2-DE image processing, Pro- teomics 9 (2009) 4908e4919. https://doi.org/10.1002/pmic.200800538. [20] K. Koshino, H. Zuo, N. Saito, S. Suzuki, Improved spike noise removal in the scanning laser microscopic image of diamond abrasive grain using wavelet transforms, Optics Commu- nications 239 (2004) 67e78. https://doi.org/10.1016/j.optcom.2004.05.056. [21] P. Du, W.A. Kibbe, S.M. Lin, Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching, Bioinformatics 22 (2006) 2059e2065. https://doi.org/10.1093/bioinformatics/btl355. [22] J.M.Amigo, Practical issues of hyperspectral imaging analysis of solid dosage forms,Analytical and Bioanalytical Chemistry 398 (2010) 93e109. https://doi.org/10.1007/s00216-010-3828-z. [23] A. Rinnan, F. van den Berg, S.B. Engelsen, Review of the most common pre-processing techniques for near-infrared spectra, TRAC Trends in Analytical Chemistry 28 (2009) 1201e1222, isi:000272098600016. [24] S. Munera, J.M. Amigo, N. Aleixos, P. Talens, S. Cubero, J. Blasco, Potential of VIS-NIR hyperspectral imaging and chemometric methods to identify similar cultivars of nectarine, Food Control 86 (2018). https://doi.org/10.1016/j.foodcont.2017.10.037. [25] S. Munera, J.M. Amigo, J. Blasco, S. Cubero, P. Talens, N. Aleixos, Ripeness monitoring of two cultivars of nectarine using VIS-NIR hyperspectral reflectance imaging, Journal of Food Engineering 214 (2017) 29e39. https://doi.org/10.1016/j.jfoodeng.2017.06.031. [26] P. Geladi, D. MacDougall, H. Martens, Correction for near-infrared reflectance spectra of meat, Applied Spectroscopy (1985). https://doi.org/10.1366/0003702854248656. [27] N.R. Pal, S.K. Pal, A review on image segmentation techniques, Pattern Recognition 26 (1993) 1277e1294. [28] N. Mobaraki, J.M. Amigo, HYPER-Tools. A graphical user-friendly interface for hyper- spectral image analysis, Chemometrics and Intelligent Laboratory Systems 172 (2018). https://doi.org/10.1016/j.chemolab.2017.11.003. [29] J.M. Amigo, HYPER-tools Official Website, (n.d.). Hypertools.org (accessed March 10, 2019). [30] O. Devos, L. Duponchel, Parallel genetic algorithm co-optimization of spectral pre- processing and wavelength selection for PLS regression, Chemometrics and Intelligent Laboratory Systems 107 (2011) 50e58. https://doi.org/10.1016/j.chemolab.2011.01.008. [31] R.M. Jarvis, R. Goodacre, Genetic algorithm optimization for pre-processing and variable selection of spectroscopic data, Bioinformatics 21 (2005) 860e868. https://doi.org/10.1093/ bioinformatics/bti102. [32] P. Geladi, J. Burger, T. Lestander, Hyperspectral imaging: calibration problems and solu- tions, Chemometrics and Intelligent Laboratory Systems 72 (2004) 209e217, isi:000222986900012. [33] J. Burger, P. Geladi, Hyperspectral NIR image regression part II: dataset preprocessing diagnostics, Journal of Chemometrics 20 (2006) 106e119. https://doi.org/10.1002/cem.986. 52 SECTION j II Algorithms and methods https://doi.org/10.1021/ac0013756 https://doi.org/10.1021/ac0013756 https://doi.org/10.1366/00037020260377760 https://doi.org/10.1002/pmic.200800538 https://doi.org/10.1016/j.optcom.2004.05.056 https://doi.org/10.1093/bioinformatics/btl355 https://doi.org/10.1007/s00216-010-3828-z https://doi.org/10.1016/j.foodcont.2017.10.037 https://doi.org/10.1016/j.jfoodeng.2017.06.031 https://doi.org/10.1366/0003702854248656 https://doi.org/10.1016/j.chemolab.2017.11.003 http://Hypertools.org https://doi.org/10.1016/j.chemolab.2011.01.008 https://doi.org/10.1093/bioinformatics/bti102 https://doi.org/10.1093/bioinformatics/bti102 https://doi.org/10.1002/cem.986 [34] N. Dhanachandra, K. Manglem, Y.J. Chanu, Image segmentation using K -means clustering algorithm and subtractive clustering algorithm, Procedia Computer Science 54 (2015) 764e771. https://doi.org/10.1016/j.procs.2015.06.090. [35] D. Han, Comparison of commonly used image interpolation methods, in: Proc. 2nd Int. Conf. Comput. Sci. Electron. Eng. (ICCSEE 2013), 2013, pp. 1556e1559. Preprocessing of hyperspectral and multispectral images Chapter j 2.1 53 https://doi.org/10.1016/j.procs.2015.06.090 Chapter 2.2 Hyperspectral compression Giorgio Antonino Licciardi E. Amaldi Foundation, Rome, Italy 1. IntroductionThe necessity to extract increasingly detailed information content led to the evolution of hyperspectral sensors toward the acquisition of measurements with significantly greater spectral breadth and resolution. This, on one hand, permits the hyperspectral sensors to acquire information in hundreds of contiguous bands, allowing the possibility to have detailed spectral signa- tures and, on the other hand, hyperspectral images can be extremely large and their management, storage, and transmission can be extremely difficult. Thus, in several applications of hyperspectral image processing, data compression becomes mandatory. For instance, in Earth observation, hyperspectral images are acquired by sensors mounted on airborne or satellite-borne carriers. However, due to the size of a typical hyperspectral data set, not all the acquired data can be downlinked to a ground station. To give an example, a typical scene acquired by the EO-1 Hyperion instrument cover an area of 42 � 7 km corresponding approximately to 3129 � 256 pixels, each having 242 bands at 16 bit. In this case the dimensionality reduction of hyperspectral data becomes necessary in order to match the available transmission bandwidth. In general, image compression approaches can be divided according to the preservation of information. Lossless compression techniques preserve the total amount of information, and the reconstructed image is identical to the original. In near-lossless compression, the maximum absolute difference be- tween the reconstructed and original image does not exceed a user-defined value. On the other hand, lossy compression approaches are oriented to obtain a given target bit rate, thus, the reconstructed image should be as similar as possible to the original one. In general, lossless compression is highly desired to preserve all the information contained in the image. However, lossless algorithms are not able to provide high compression ratios. Aside from this general subdivision, compression algorithms can be grouped according to type of redundancy or correlation exploited. Hyperspectral Imaging. https://doi.org/10.1016/B978-0-444-63977-6.00004-3 Copyright © 2020 Elsevier B.V. All rights reserved. 55 https://doi.org/10.1016/B978-0-444-63977-6.00004-3 Basically, the information compression process depends mainly on the four types of redundancies present in hyperspectral images, such as statistical redundancy, spatial redundancy, spectral redundancy, and visual redundancy. Methods based on statistical redundancy analyze the probability of sym- bols. Popular techniques are designed to assign short code words to high- probability symbols, and long code words to low-probability symbols. These methods are usually called entropy coding. Spatial redundancy, also called intraband correlation, is based on the assumption that the pixel information could be partially obtained by neigh- boring pixels. Spatial redundancy can be removed through the use of spatial decorrelation techniques, such as transformation, that transform the image from the spatial domain into another domain, or prediction, that is used to predict the pixel values from the neighbor pixels. Spectral redundancy, or interbands correlation, is based on the high cor- relation existing between neighboring bands in hyperspectral images. Thus, spectral decorrelation is used to project the original spectra of the image into a low-dimensional feature space. Finally, visual redundancy is based on the fact that human eyes tend to be not very sensitive to high frequencies, thus compression based on visual redundancy is obtained by using data quantization. In general, the use of interband or intrabands correlations permits to divide compression algorithms into 2D and 3D approaches. 2D image compression algorithms usually exploit separately the intraband and interband correlations, while 3D approaches refer to the simultaneous use of both inter- and intraband correlations. Aside from the different techniques presented in the literature, usually hyperspectral compression makes use of quantization followed by entropy encoding in order to further compress the file size. With the term quantization, it is usually indicated the process to reduce a large set of values to a discrete set. In hyperspectral image compression quantization is in general applied in the frequency domain. Quantization is in general followed by entropy encoders that compress data by representing each input symbol with a variable length code word. The length of the code words depends on the frequency of the associated symbols. This means that the most frequent symbols have associated the shortest codes. The most common entropy encoders used for image compression are Huffman and arithmetic codings [1,2]. 2. Lossless approaches Most lossless hyperspectral image compression techniques are based on one of the following techniques: vector quantization (VQ), predictive coding or transform coding. However, transform-based schemes can yield excellent coding gain for lossy compression at low bit rates, while their lossless coding performance is inferior to specialized lossless compression schemes. 56 SECTION j II Algorithms and methods 2.1 Vector quantization VQ is a technique extensively analyzed for the compression of speech and images and can be resumed in a four-step processing (Fig. 1). In the first step, the original data are decomposed into a set of vectors. Then, a subset of those vectors is selected to form a training set on a second step. In the third step, an iterative clustering algorithm is used to generate a codebook from the training set. In the final step, defined as quantization, for each vector, the code word for the closest code vector in the codebook is found and trans- mitted [3]. 2.1.1 Vector formation The formation of vector process applied to hyperspectral images depends on the type of vector to be formed, that could be spatial or spectral, and the size of vector formed. Then, a codebook is generated for each block. For instance, a vector formed in the spectral domain of a hyperspectral image comprising N bands, has a length L, resulting from subdividing the N bands into B blocks such that N ¼ B � L. The compression in VQ is then obtained by replacing each vector with a label. Intuitively, code words associated to larger vectors can lead to higher compression rates. However, in lossless compression, there is a balance be- tween the increased matching error and the reduction in address entropy that results in an optimum length. 2.1.2 Training set selection Once the vectors for the original image are formed, the following step is to select the training sets and generate the relative codebooks. The quality of the training set depends mainly on the vector selection process and on the number of vectors selected. Since the statistical information of the training set should FIGURE 1 Vector quantization compression block diagram. Hyperspectral compression Chapter j 2.2 57 be representative of the original image, the training vectors are usually selected in order to be evenly distributed across the image [4]. The number of training vectors, on the other hand, depends mainly on the desired general- ization properties of the compression algorithm. Using a number of training vectors limited to a small number of images will produce a codebook able to obtain a high compression rate but probably is not applicable to other images. In this case, it is possible to transmit and store the codebook along with the compressed image. However, this may be not acceptable for practical trans- mission and storage uses since the development of the codebook for each image would require considerable time. Conversely, the use of universal codebooks generated from a large pool of images will result in a marginal degradation of compression rate but will provide a good generalization and faster results. 2.1.3 Codebook generation The algorithm used to generate the codebook influences the quality ofthe selected codebook. For instance, the K-Mean clustering method can be used to produce codebooks of a fixed size, giving better control over address entropy [5]. Another important issue influencing the quality of the codebook is related to the size of the codebook itself. As the size of the codebook is increased, each input vector will be able to find a closer reference vector, and therefore the difference image entropy will decrease. At the same time, however, the address entropy will increase. 2.1.4 Quantization This last step deals with the method used to search the codebook. There exist several approaches in the literature, spanning from full search to tree search [5] to lattice search [6]. However, since in general the codebook is unstructured, a full search is required despite of the high time consumption. In the literature, several other approaches have been derived from the VQ technique. For instance, the mean-normalized vector quantization (M-NVQ) has been proposed in Ref. [1], while in Ref. [7] a discrete cosine transform has been used in both spatial and spectral domains to exploit the redundancy in the M-NVQ output. In Ref. [8], the input vectors have been partitioned into a number of consecutive subsegments and a variation of the generalized Lloyd algorithm has been used to train vector quantizers for these subsegments. The subsegments are then quantized independently, and the quantization residual is entropy coded to achieve lossless compression. An optimization of this method, named local optimal partitioned vector quantization (LPVQ) has been presented in Ref. [9], where the distortion caused by local partition boundaries has been minimized. In Ref. [10], a technique called mean/shape vector 58 SECTION j II Algorithms and methods quantization (M/SVQ) subtracts the sample mean from the input vector and is scalar quantized for transmission; the resulting shape is vector quantized. While in Ref. [11], each block of the image is converted into a vector with zero mean and unit standard variation. Each input vector is then vector quantized, and the mean and variance are scalar quantized for transmission. Usually, VQ-based techniques require offline codebook training and online quantization-index searching. For these reasons these methods are extremely expensive from a computational point of view and not always well suited for real-time applications. 2.2 Predictive coding In predictive approaches for image compression, the main idea is to predict the value of a pixel using previously visited neighbor pixels. This process often consists of two distinct and independent components: modeling and coding. In the modeling part, once a prediction for a pixel is made, the difference be- tween the pixel and its prediction (defined as the prediction error), are stored. Then an encoding method is used to compress them. More in detail, ap- proaches using predictive coding are based on spatial, spectral, or hybrid predictors to decorrelate the original image, and the prediction error samples are then fed to an entropy coder. However, since prediction error samples usually present some residual correlation, in several cases, context-based en- tropy coding is carried out. In this framework, the samples are classified into a predefined number of homogeneous classes based on their spatial or spectral context. This means that the entropy of the context-conditioned model will be lower than that of the stationary one, resulting in the entropy coder providing improved compression. In general, the modeling part can be formulated as an inductive inference problem, in which the image is observed sample by sample (e.g., raster scan) according to a predefined order. At each interval t, after having scanned past data xt ¼ x1x2 . xt, it is possible to make inference on the next sample value xtþ1 by assigning to it a conditional probability distribution P($jxt). In a sequential formulation, the distribution P($jxt) can be derived from the past and is available to the decoder as it decodes the past string sequentially (Fig. 2). An alternative approach is based on a two-step scheme where the conditional distribution is learned in a first step from the whole image instead of a single sequence, then the obtained conditional distribution is provided to the decoder as header information. Today, the most widely used compression approach is JPG-LS (name derived from the Joint Photographic Experts Group), thanks to its good speed and compression ratio [12]. This technique implements the LOCO-I algorithm (low-complexity lossless compression for images), where the prediction is based on three neighboring points. Moreover, in JPG-LS, Hyperspectral compression Chapter j 2.2 59 context modeling is used to detect local features such as smoothness and texture patterns. Other techniques have been developed to improve the compression rate but with significantly more complex algorithms, such as the context-based, adaptive, lossless image codec (CALIC) [13], TMW [14], and EDP [15] (these references are derived from the author names). CALIC uses a more complex context-based method, which uses a large number of modeling states to condition a nonlinear predictor and adapt the predictors to varying the statistics of the source. The approach proposed in CALIC is extended in TMW by extracting global images information used to improve the overall quality of the predictions. EDP, on the other hand, makes use of an edge direct prediction approach using a large number of neighboring points. An extension of the CALIC algorithm from 2D to 3D has been proposed in Ref. [16], where depending on the correlation coefficient the algorithms switch between intraband and interband predictions. The resulting residual is then coded using context-based arithmetic codes. A further modification of 3D-CALIC algorithm has been proposed in Ref. [17], where, rather than switching between interband and intraband modes, the M-CALIC algorithm uses the full interband and a universal spectral predictor. In Ref. [18], an optimum linear predictor in terms of minimum mean square error and entropy coding are used, while in Ref. [19], the number of predictors is selected in both spatial and spectral domains through the use of fuzzy clustering and fuzzy prediction. FIGURE 2 Predictive coding compression block diagram. 60 SECTION j II Algorithms and methods A method based on Spectral-oriented Least SQuares (SLSQ) has been proposed in Ref. [20], where linear prediction is used to exploit spectral correlation while the prediction error is then entropy coded. The same authors proposed a low-complexity method using an interband linear predictor and interband least square predictor [21]. In Ref. [22], the spectral bands of the image are clustered and filtered by means of an optimum linear filter for each cluster. The output of the filters is then encoded using an adaptive entropy coder. This concept has been further extended in Ref. [23], where the lookup tables have been introduced. In this approach, the pixel that is nearest and equal to the pixel colocated with the one to be predicted in the previous band is taken as the prediction. The sign of the residual is coded first, followed by adaptive range coding of its absolute value. 3. Lossy approaches Recently, there has been an increasing interest in lossy compression because it permits to obtain higher compression rates than lossless approaches with a negligible loss of information. Many lossy approaches are based on transform coding in order to perform spatial or spectral decorrelations, followed by a quantization stage and an entropy coder (Fig. 3). More in particular, transform coding works in two steps, the first step is to transform the data in a domain where the representation of the data is more compact and less correlated. The second step is to encode this information as efficiently as possible. One of the most widely used approaches is the JPEG2000 (name derived from the Joint PhotographicExperts Group) that uses wavelet transform to perform decorrelation in the two spatial dimensions. The typical wavelet filter used in JPEG2000 is a reversible 5/3 filter. This approach has the ability to perform both lossless and lossy compression with the same algorithm. Indeed, FIGURE 3 Transform-based compression block diagram. Hyperspectral compression Chapter j 2.2 61 lossy compression can be obtained from a lossless encoded file by truncating the bitstream at the appropriate point [24]. Since hyperspectral images exhibit both spatial and spectral redundancy, several methods in the literature follow a high-performance scheme combining a spectral decorrelator, such as the KarhuneneLoève transform (KLT) or principal component analysis (PCA), the discrete wavelet transform (DWT), or the discrete cosine transform (DCT), followed by the JPEG2000 algorithm performing the spatial decorrelation, rate allocator, and entropy coder. 3.1 KarhuneneLoève transform The most efficient transform is the KLT which is strongly related to the PCA. The KLT procedure uses an orthogonal transformation to convert a set of correlated variables into a set of linearly uncorrelated features. The trans- formation is designed in order to have the first feature with the highest vari- ance possible, and the successive features have the highest variance possible, considering the orthogonality to the preceding features. In general, features showing high variance are associated to information while low variance is associated to noise. This permits to approximate the data in the feature space by discarding the features having the less relevant variances. Although the KLT is provably an optimal transformation, it has a few drawbacks. In particular, the KLT algorithm consists of sequential processes, which are computationally intensive, such as the covariance matrix computation, eigenvector evaluation, and matrix multiplications. Moreover, since the KLT transformation matrix is data dependent, it is necessary to transmit it to the decoder in any KLT-based compression system. In order to deal with the high computational complexity of the KLT, in Ref. [25], precomputed transform coefficients, obtained using a set of typical images, are applied to any image. However, this technique fits well with multispectral images but becomes problematic with hyperspectral images because the variations in the spectra between pixels become too important to be efficiently decorrelated by a KLT. Other approaches have proposed a simplified version of the KLT to be implemented onboard satellites [26,27]. In Ref. [28], the KLT has been deployed in JPEG2000 to provide spectral decorrelation as well as spectral dimensionality reduction. 3.2 Discrete wavelet transform The other most popular transform is the wavelet family. The DWT [29] is a widely used technique for the efficient decorrelation of data obtained by splitting the data into two half-rate subsequences, carrying information, respectively, on the approximation and detail of the original signal, or equivalently on the low- and high-frequency half-bands of its spectrum. Since most of the signal energy of real-world signals is typically concentrated in low 62 SECTION j II Algorithms and methods frequencies, this process splits the signal in a very significant and a little significant part, leading to good energy compaction. Wavelet-based lossy compression techniques are of particular interest due to their long history of providing excellent rate-distortion performance for traditional 2D imagery. Consequently, a number of prominent 2D compression algorithms have been extended to 3D. These include the 3D extensions of JPEG2000, SPIHT (set partitioning in hierarchical trees), and SPECK (Set Partitioned Embedded bloCK Coder) [30e34]. In particular, these approaches employ a 1D discrete wavelet transform for the spectral decorrelation, while a 2D DWT works spatially. In the literature several papers proposed hyperspectral lossy compression based on DWT which is becoming the standard [26,29,35,36]. Where in general multiresolution, wavelet transform is fully applied to each spectrum, and then the dyadic 2D wavelet decomposition is applied on each resulting plane. Wavelet-based compression techniques typically implement pro- gressive transmission through the use of embedded coding. Progressive image transmission allows an approximate image to be built up quickly and the details to be transmitted progressively through several passes over the image. 3.3 Discrete cosine transform The DCT is a technique allowing the conversion of a signal into elementary frequency components. More in particular, in the DCT the input signal is represented as a linear combination of weighted basis functions that are related to its frequency components. In general, the DCT does not directly reduce the number of bits required to represent the block. For instance, for an 8 � 8 block of 8-bit pixels, the DCT produces an 8 � 8 of 11-bit coefficients due to the range of coefficient values. However, considering that the DCT concentrate, the low-frequency co- efficients, and remaining other coefficients are mainly zero, the compression can be achieved by transmitting the near-zero coefficients and by quantizing and coding the remaining coefficients. The DCT, even if can be considered as an approximation of the full optimality of the KLT, offers an computational cost versus performance ratio and has been adopted for international standards, such as JPEG [37]. 3.4 Quality evaluation The introduction of lossy and near-lossy compression leads to the evaluation of the quality of the reconstructed images. From a theoretical point of view, the best practice would be to use ground truth. However, in many cases the ground truth is not available or not completely accurate. Besides the availability of ground truth, there are several ways to measure the quality of the reconstructed Hyperspectral compression Chapter j 2.2 63 images, for instance, through the use of statistic distortion measures, such as the signal-to-noise ratio (SNR) defined in Ref. [29] as: SNR ¼ 10 Log10 s2 MSE (4.1) where s2 is the variance of the original image while MSE is the mean square error between the original and the reconstructed images. In the case of real images, noise-free references may not be available. Thus, the SNR can be derived as the ratio between the mean value msignal of the pixels in the image and the standard deviation of the pixels snoise of a uniform area in the image: SNR ¼ 10 Log10 msignal snoise (4.2) In general, a lossy compression will remove noise, redundancies, and not relevant information from the image, thus it is expected to have an improve- ment of the SNR of the reconstructed image. Another evaluation measure can be obtained through the use of the Spectral Angle Mapper (SAM) algorithm that measures the spectral distance between the reconstructed image and the original one: SAM ¼ arccos � < X;X0 > kXk2kX0k2 � (4.3) SAM produces positive values with an ideal value of 0. However, due to noise suppression, values that are lower than three are referred to a good reconstructed image. 4. Conclusions Even if several approaches have been presented in the literature and imple- mented or used in real situations, there is still a high number of issues to be carried out in compression of hyperspectral image compression. First of all, the choice between lossless, lossy, or nearly lossless approaches mainly de- pends on the final use of the data. In the case of laboratory images where archiving and distribution are more important than data transmission and error recovery, the issues related to the compression are focused more on the pro- cessing and visualization sides. This means that a high compression rate is preferable over information preservation. For instance, progressive encoding permits the rapid quick of the compressed image with limited computational resources. On the other hand, in case of satellite-basedsensors, data trans- mission becomes relevant. Indeed, the trend for hyperspectral sensors is to- ward a continuous increase in spatial, spectral, and radiometric resolutions of the images. This means that the increase in the amount of data produced by the sensors is in contrast with the limited transmission capability, thus, there is the 64 SECTION j II Algorithms and methods need to perform the compression onboard the satellites. However, onboard compression presents several challenges. The first point is that when lossless compression is used, it is not possible to obtain high compression rates and consequently the acquisition capability is strongly reduced. On the other hand, lossy compression techniques allow high compression rates, but part of the information is lost. Near-lossless compression offers the best trade-off be- tween compression rates and quality preservation, but the community still refuses to accept these kinds of approaches even if the impact is proven to be negligible. Another important point is related to the processing capability. While the compression of laboratory images do not present any particular problems in terms of processing approach and computational time, the compression of images acquired onboard satellites presents several critical- ities. In particular, the electronic instrumentations onboard satellites have to follow several design constraints in terms of power consumption, heating, radiation protection, and data storage. This means that the processing power is reduced if compared to consumer electronic, thus not all the existing ap- proaches can be used for compression of hyperspectral images acquired from satellite. For these reasons, a direct comparison of the different compression methods is not possible. References [1] D. Huffman, A method for the construction of minimum-redundancy codes, Proceedings of the IRE 40 (9) (1952) 1098e1101. [2] I.H. Witten, R.M. Neal, J.G. Cleary, Arithmetic coding for data compression, Communi- cations of the ACM 30 (6) (1987) 520e540. [3] M.J. Ryan, J.F. Arnold, The lossless compression of AVIRIS images by vector quantization, IEEE Transactions on Geoscience and Remote Sensing 35 (3) (1997) 546e550. [4] Y. Linde, et al., An algorithm for vector quantizer design, IEEE Transactions on Commu- nications 28 (1980) 84e95. COMM. [5] B. Ramamurthi, A. Gersho, Image vector quantization with a perceptually based classifier, in: Proc. IEEE Int. Conf. Acoust., Speech, Sign. Process. , San Diego, CA, vol. 2, 1984, pp. 32.10.1e32.10.4. [6] J. Conway, N. Sloane, Fast quantizing and decoding algorithms for lattice quantizers and codes, IEEE Transactions on Information Theory 28 (1982) 227e232. IT. [7] M.R. Pickering, M.J. Ryan, Efficient spatial-spectral compression of hyperspectral data, IEEE Transactions on Geoscience and Remote Sensing 39 (7) (2001) 1536e1539. [8] G. Motta, F. Rizzo, and J. Storer, “Partitioned vector quantization: application to lossless compression of hyperspectral images,” in Proc. ICASSP, Apr. 6e10, 2003, vol. 3, pp. III- 241eIII-244. [9] G. Motta, F. Rizzo, and J. A. Storer, “Compression of hyperspectral imagery,” in Proc. DCC, Mar. 2003, pp. 333e342. [10] R. Baker, R. Gray, Image compression using nonadaptive spatial vector quantization, in: Conf. Rec. 16th Asilomar Conf. Circuits, Syst., Comput, 1982, pp. 55e61. [11] T. Murakami, K. Asai, E. Yamazaki, Vector quantizer of video signals, Electronics Letters 7 (1982) 1005e1006. Hyperspectral compression Chapter j 2.2 65 [12] M. Weinberger, G. Seroussi, G. Sapiro, The LOCO-I lossless image compression algorithm: principles and standardization into JPG-LS, IEEE Transactions on Image Processing 9 (2000) 1309e1324. [13] X. Wu, N. Memon, Context-based, adaptive, lossless image coding, IEEE Transactions on Communications 45 (1997) 437e444. [14] B. Meyer, P. Tischer, TMWA new method for lossless image compression, in: Proc. 1997 Int. Picture Coding Symp, 1997. [15] X. Li, M.T. Orchard, Edge directed prediction for lossless compression of natural images, in: Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348), Kobe, vol. 4, 1999, pp. 58e62. [16] X. Wu, N. Memon, Context-based lossless interband compression d extending CALIC, IEEE Transactions on Image Processing 9 (2000) 994e1001. [17] E. Magli, G. Olmo, E. Quacchio, Optimized onboard lossless and near-lossless compression of hyperspectral data using CALIC, IEEE Geoscience and Remote Sensing Letters 1 (1) (2004) 21e25. [18] R.E. Roger, M.C. Cavenor, Lossless compression of AVIRIS images, IEEE Transactions on Image Processing 5 (5) (1996) 713e719. [19] B. Aiazzi, P. Alba, L. Alparone, S. Baronti, Lossless compression of multi/hyper-spectral imagery based on a 3-D fuzzy prediction, IEEE Transactions on Geoscience and Remote Sensing 37 (5) (1999) 2287e2294. [20] F. Rizzo, B. Carpentieri, G. Motta, J.A. Storer, High performance compression of hyper- spectral imagery with reduced search complexity in the compressed domain, in: Proc. DCC, 2004, pp. 479e488. [21] F. Rizzo, B. Carpentieri, G. Motta, J.A. Storer, Low-complexity lossless compression of hyperspectral imagery via linear prediction, IEEE Signal Processing Letters 12 (2) (2005) 138e141. [22] J. Mielikainen, P. Toivanen, Clustered DPCM for the lossless compression of hyperspectral images, IEEE Transactions on Geoscience and Remote Sensing 41 (12) (2003) 2943e2946. [23] J. Mielikainen, Lossless compression of hyperspectral images using lookup tables, IEEE Signal Processing Letters 13 (3) (2006) 157e160. [24] Jpeg2000 Part 2 e Extensions, Document ISO/IEC 15444-2. [25] C. Thiebaut, E. Christophe, D. Lebedeff, C. Latry, CNES studies of on-board compression for multispectral and hyperspectral images, in: SPIE, Satellite Data Compression, Com- munications, and Archiving III, Vol. 6683. SPIE, 2007. [26] B. Penna, T. Tillo, E. Magli, G. Olmo, Transform coding techniques for lossy hyperspectral data compression, IEEE Transactions on Geoscience and Remote Sensing 45 (5) (2007) 1408e1421. [27] B. Penna, T. Tillo, E. Magli, G. Olmo, A new low complexity KLT for lossy hyperspectral data compression, in: IEEE International Geoscience and Remote Sensing Symposium, IGARSS’06, 2006, pp. 3525e3528. [28] Hyperspectral image compression using JPEG2000 and principal component analysis Q. Du, J.E. Fowler IEEE Geoscience and Remote Sensing Letters 4 (2), 201-205. [29] J.E. Fowler, J.T. Rucker, 3D wavelet-based compression of hyperspectral imagery, in: C.- I. Chang (Ed.), Hyperspectral Data Exploitation: Theory and Applications, Wiley, Hoboken, 2007, pp. 379e407 (Chapter 14). [30] B. Penna, T. Tillo, E. Magli, G. Olmo, Progressive 3-d coding of hyperspectral images based on jpeg 2000, IEEE Geoscience and Remote Sensing Letters 3 (1) (2006) 125e129. 66 SECTION j II Algorithms and methods [31] B.-J. Kim, Z. Xiong, W.A. Pearlman, Low bit-rate scalable video coding with 3-d set partitioning in hierarchical trees (3-d spiht), IEEE Transactions on Circuits and Systems for Video Technology 10 (8) (2000) 1374e1387. [32] X. Tang, W.A. Pearlman, Scalable hyperspectral image coding, in: Proceedings of the In- ternational Conference on Acoustics, Speech, and Signal Processing, Philadelphia, PA, vol. 2, 2005, pp. 401e404. [33] X. Tang, C. Sungdae, W.A. Pearlman, 3D set partitioning coding methods in hyperspectral image compression, in: Proc. Of ICIP e IEEE International Conference on Image Pro- cessing, Barcelona, Spain, 2003. [34] X. Tang, W.A. Pearlman, Three-dimensional wavelet-based compression of hyperspectral images, in: Hyperspectral Data Compression, Kluwer Academic Publishers, 2005. [35] E. Christophe, C. Mailhes, P. Duhamel, Hyperspectral image compression: adapting SPIHT and EZW to anisotropic 3D wavelet coding, IEEE Transactions on Image Processing 17 (12) (2008) 2334e2346. [36] G. Liu, F. Zhao, Efficient compression algorithm for hyperspectral images based on cor- relationcoefficients adaptive 3D zerotree coding, IET Image Processing 2 (2) (2008) 72e82. [37] G. Wallace, The JPEG still picture compression standard, IEEE Transactions on Consumer Electronics 38 (1) (1992). Hyperspectral compression Chapter j 2.2 67 Chapter 2.3 Pansharpening Gemine Vivone* Department of Information Engineering, Electrical Engineering and Applied Mathematics, University of Salerno, Salerno, Italy *e-mail: gvivone@unisa.it 1. Introduction Pansharpening, which stands for panchromatic (PAN) sharpening, refers to the fusion of a PAN image and a multispectral (MS) image. These images are usually simultaneously acquired over the same area. Pansharpening can be included into the data fusion framework because its goal is to combine, in a unique synthetic image, the spatial information provided by the PAN image (but not present in the MS) with the spectral information of the MS image (against the single PAN channel). Nowadays, PAN and MS images can be simultaneously acquired by several commercial satellites, see, e.g., the four-band MS cases, such as IKONOS and Geo-Eye, and the eight-band MS cases as the WorldView satellites (capturing bands from the visible near-infrared (VNIR) spectrum to the shortwave infrared (SWIR) spectrum). Thus, the possibility to reach very high-resolution images in both the spatial and spectral domains is really appealing. Unfortu- nately, physical constraints preclude this goal from being achieved by using a unique sensor, and data fusion approaches are the sole viable solution to reach this ambitious goal. Hence, the demand for pansharpened products is contin- uously growing and commercial products, such as Google Earth and Bing Maps, make massive use of them. Furthermore, pansharpening is a crucial preliminary step for many remote sensing tasks, such as change detection [1], object recognition [2], visual image analysis, and scene interpretation [3]. This interest can be even noticed into the scientific community, and it is justified by (1) the contest launched by the data fusion committee of the institute of electrical and electronics engineers (IEEE) Geoscience and Remote Sensing Society in 2006 [4], (2) the detailed surveys that can be easily found in the literature, see, e.g., Refs. [5e7], and (3) the comprehensive book dedicated to this problem recently published in Ref. [8]. Hyperspectral Imaging. https://doi.org/10.1016/B978-0-444-63977-6.00005-5 Copyright © 2020 Elsevier B.V. All rights reserved. 69 https://doi.org/10.1016/B978-0-444-63977-6.00005-5 This chapter proposes an overview of this issue. Section 2 is dedicated to the classification of pansharpening methods. Pansharpening techniques are divided into three main classes. The first two classes historically split methods in spectral and spatial techniques. The first presented category, the so-called component substitution (CS), is based on a spectral transformation in order to separate the spatial and the spectral information of the MS image. The second class, usually named multiresolution analysis (MRA), is relied upon the decomposition of the PAN image in order to extract its spatial details to be injected into the MS image. The last (third) class consists of new generation approaches for pansharpening. These are mainly based on the application of constrained optimization algorithms to solve the ill-posed problem of pansharpening. Afterward, in Section 3, we focus attention on the tricky problem of the assessment of pansharpening approaches. Unfortunately, as in many data fusion problems, a reference image is missing and, thus, universal measures for evaluating the quality of the enrichment introduced by pansharpening cannot be explicitly formulated. A first solution to this issue dates back to Wald et al. [9]. They define a protocol based on two properties: consistency and synthesis. Whereas the former is more easily achievable, the latter requires the knowledge of the original MS image at a higher resolution (i.e., the PAN one). This leads to some critical issues about the implementation of this protocol, such as the unavailability of a (reference) high-resolution MS image thus precluding the evaluation of the synthesis property. In order to face these problems, two main solutions are presented in this chapter. The first one relies on the reduction of the spatial resolution of both the original MS and PAN images in order to exploit the original MS image as reference [4]. It implicitly leads to a hypothesis of invariance among scales of the fusion procedures. Unfortunately, this hypothesis could be not always valid in practice [9,10]. Hence, the second solution employs indexes that do not require the availability of a reference image, see, e.g., Refs. [11,12]. Pansharpening has also been proposed for the fusion of PANand hyperspectral (HS) data [13]. In this chapter, and, in particular, in Section 4, the HS pan- sharpening problem is discussed. The new challenges in fusingHS data instead of classical MS images are remarked first. Afterward, a real example of fusing HS and PAN data simultaneously acquired by the HS imager (Hyperion) and the advanced land imager (ALI) sensors over the center of Paris (France) is presented. Finally, this chapter ends with concluding remarks in Section 5. 2. Classification of pansharpening methods Pansharpening approaches are historically classified into two main classes: CS and MRA [5,8]. The main difference between them is how to extract PAN details. MRA methods extract PAN details using spatial filters applied to the PAN image. Instead, CS exploits both the high spectral resolution image and the PAN image to extract PAN details. This difference in detail extraction 70 SECTION j II Algorithms and methods influences the main features of the final outcomes [5,14]. Recently, in the literature, even other fusion approaches, the so-called new generation of pansharpening [8], have been proposed. These are mainly related to Bayesian approaches [15], compressed sensing [16e18], and total variation techniques [19,20]. These methods cannot be recast in one of the two main classes, but they have shown, in some cases, appreciable results usually paid by an increment of the computational burden. In the next sections, we will go deep into the details of all these three classes. Some powerful examples of pan- sharpening techniques, even tested for HS pansharpening, will be detailed. All the approaches that will be presented in this chapter are global, i.e., the fusion is applied in the same way for the whole image. Some generalizations to context-based (or local) approaches can be found in the literature, see, e.g., Refs. [21e24], where the injection model is context-dependent and thus varying inside the acquired image. 2.1 Notation The notation and conventions used in the next sections are detailed first. Vectors are indicated in bold lowercase (e.g., x) with the ith element indicated as xi. Two- and three-dimensional arrays are expressed in bold uppercase (e.g., X). A high spectral resolution image X ¼ fXkgk¼1;.;N is a three-dimensional array composed by N bands indexed by the subscript k ¼ 1,., N; accordingly, Xk indicates the kth band of X. 2.2 Component substitution techniques CS approaches are based on a forward transformation of the higher spectral resolution (HSR) image, usually MS or, similarly, HS images, in order to separate the spatial and spectral information [6]. The sharpening process is obtained by substituting the spatial component, which theoretically retains all the spatial information, with the PAN image, which represents the data with the highest spatial content. In the literature, approaches that only partially replace the PAN image have been also proposed, see, e.g., Ref. [25]. The greater the correlation between the PAN and the replaced spatial component, the lesser the spectral distortion into the final fusion product [6]. Fig. 1 shows an example of fusion using two CS approaches (i.e., GrameSchmidt (GS) and GS adaptive (GSA)).In the case of GS, the spatial component, also called intensity component, is less correlated with the PAN image causing a greater spectral distortion with respect to the GSA, as can be seen in Fig. 1B comparing it with the reference image in Fig. 1A. In order to increase the correlation reducing the spectral distortion in the fused product, a histogram-matching (or equalization) procedure between the PAN and the intensity component is usually exploited. Figs. 2 and 3 shows an example of the benefits in applying this processing step before the CS using a principal Pansharpening Chapter j 2.3 71 component substitution (PCS) fusion approach [5]. The reduction of the spectral distortion is clear in Fig. 3. A greater color fidelity can be remarked when the histogram-matching is applied. Finally, once the substitution of the spatial (intensity) component with the PAN image is performed, a backward transformation is applied to transform the new spatial and spectral components in bands yielding the final fused image. Summarizing, the main steps to get a fused product into the CS framework are: l The upsampling of the HSR image to reach the same scale as the PAN image (this operation is preparatory for the image fusion); l The forward transformation of the HSR image in order to separate the spatial and the spectral contents; FIGURE 1 An example of spectral distortion for component substitution approaches (see, e.g., the river area on the right side of the images). An image acquired by the IKONOS sensor over the Toulouse city is fused: (A) Ground-truth (reference image) and the fusion products using the (B) GrameSchmidt (GS) and (C) the GS adaptive (GSA) approaches. A greater spatial distortion can be pointed out in the case of GS where a lower similarity between the panchromatic and the multispectral spatial (intensity) component is shown with respect to the GSA case. FIGURE 2 An example of spectral distortion for component substitution approaches. An image acquired by the IKONOS sensor over the Toulouse city is fused. Error maps between the ground- truth (reference image) and the fusion products using (A) the GrameSchmidt (GS) and (B) the GS adaptive (GSA) approaches. A greater spatial distortion can be pointed out in the case of GS where a lower similarity between the panchromatic and the multispectral spatial (intensity) component is shown with respect to the GSA case. 72 SECTION j II Algorithms and methods l The substitution of the spatial component with the PAN image (a histogram-matching can be also required in this phase); l The backward transformation to get the fused product. Under the hypotheses of (1) linear transformation and (2) spatial infor- mation retained in a unique component [26], the CS fusion process can be strongly simplified [5]. dHSRk ¼ gHSRk þ gkðP� ILÞ; (1) where dHSR is the pansharpened image, gHSR is the upsampled HSR image, the subscript k indicates the kth band, N is the number of bands, g ¼ [g1,.,gk, .,gN] is the vector of the injection gains, while, IL is defined as follows: IL ¼ XN i¼1 wi gHSRi; (2) where the weights w ¼ [w1, .,wi, .,wN] measure the spectral overlap among the spectral bands and the PAN image [6,27] (Fig. 4). Fig. 5 shows a flowchart describing the fusion process of the CS approach. Specifically, it is possible to notice blocks for (1) upsampling the HSR image to reach the PAN scale; (2) calculating the intensity component by Eq. (2); (3) histogram-matching the PAN image with the intensity component; (4) injecting the extracted details according to (1). The main advantages of CS-based fusion techniques are (1) high fidelity in rendering the spatial details in the final product [28] and (2) fast and easy FIGURE 3 An example of the spectral distortion reduction due to the histogram-matching for component substitution approaches (see, e.g., the river area on the right side of the images). An image acquired by the IKONOS sensor over the Toulouse city is fused: (A) Ground-truth (refer- ence image) and the fusion products using the (B) principal component substitution (PCS) without histogram-matching and (C) PCS with histogram-matching. A greater spatial distortion can be pointed out in the case of PCS without the histogram-matching with respect to the same procedure including the histogram-matching processing step. Pansharpening Chapter j 2.3 73 implementation [26]. Furthermore, robustness with respect to spatial mis- alignments can be also remarked [14]. On the contrary, the main shortcoming is the generation of a significant spectral distortion in the final fused product due to the spectral mismatch between the PAN and the intensity component [6]. The CS family includes many popular pansharpening approaches. In Refs. [13,29], three approaches based on KarhuneneLoéve [30] and GrameSchmidt [28,31] transformations have been compared for sharpening HS data. In the following, we will focus attention on these techniques that will be included in our benchmark for fusing HS and PAN data in the real example in Section 4. 2.2.1 Principal component substitution The PCS, proposed in Ref. [30], is a technique widely employed for pan- sharpening. It is based on the principal component analysis (PCA), a.k.a. KarhuneneLoéve transform or Hotelling transform. A rotation of the original data (i.e., a linear transformation) is performed to yield the so-called PCs. The hypothesis underlying its application to pansharpening is that the spatial in- formation (shared by all the channels) is concentrated in the first PC, while the spectral information (specific to each single band) is accounted by the other PCs. The whole fused process can be described by the general formulation stated by (1), where the w and g coefficient vectors are image-dependent because derived by the PCA procedure on the HSR image. FIGURE 4 An example of the spectral distortion reduction due to the histogram-matching for component substitution approaches. An image acquired by the IKONOS sensor over the Toulouse city is fused. Error maps between the ground-truth (reference image) and the fusion products using (A) the principal component substitution (PCS) without histogram-matching and (B) the PCS with histogram-matching. A greater spatial distortion can be pointed out in the case of PCS without the histogram-matching with respect to the same procedure including the histogram-matching pro- cessing step. 74 SECTION j II Algorithms and methods 2.2.2 GrameSchmidt The GS transformation is often exploited by pansharpening approaches. Its first implementation for pansharpening dates back to a patent by Kodak [31]. GS constitutes a more general method than PCS, which can be obtained by using the first PC as the low-resolution PAN image into the GS framework [23]. The fusion process starts by using, as the first basis vector, a synthetic low spatial resolution PAN image IL at the same scale of the HSR image. The representation of the HSR image is then carried out by building a complete orthogonal basis. The pansharpening procedure is completed by substituting the first component with the PAN image and by inverting the transformation. This fusion process can be expressed using (1), with gains [28]. gk ¼ cov gHSRk; IL ! varðILÞ ; (3) where cov (X, Y) indicates the covariance between X and Y and var (X) is the variance of X. By changing the way to generate the low spatial resolution PAN image, we can have different techniques. The simplest way to generate IL consists in FIGURE 5 Flowchart presenting the blocks of a generic component substitution pansharpening procedure. HSR, Higher spectral resolution; LPF, Low-pass filter. Pansharpening Chapter j 2.3 75 averaging the HSR bands (i.e., in setting wi ¼ 1/N, for all i ¼ 1, ., N); this fusion approach is often called GrameSchmidt [31]. In Ref. [28], the authors proposed an enhanced version, called GS adaptive (GSA), in which IL is generated by the linear model in (2) with the weights estimated via theminimization of the mean square error between (2) and a filtered and down- sampled version of the PAN image. 2.3 Multiresolution analysis methods The main concept under the approaches belonging to this category is to extract details by decomposing the PAN image exploiting the MRA framework. In the last years, this approach has been questioned considering it is time consuming for the specific application. Indeed, it has been demonstrated that the full decomposition of the PAN image is not generally required and only the low-pass filters have to be properly designed to extract PAN details [5,32]. Thus, the MRA approaches can collapse into the category of the spatial filtering methods, where the key issue is represented by the design of the low-pass filter to extract the details of the PAN image [5]. Several approaches have been developed in the literature to deal with this issue, and many pansharpening methods differ from this step. Indeed, the literature proposes the application of both linear filters (such as Gaussian filters [32], box filters [33], wavelet decompositions [34,35]) and, recently, nonlinear filters (see e.g., Ref. [36]). Filter estimation procedures, based on the deconvolution framework, have been also proposed [37]. Summarizing, the fusion process, for k ¼ 1, ., N, is formalized as follows [5]. dHSRk ¼ gHSRk þGkðP� PLÞ; (4) where PL indicates the low spatial resolution PAN image and G is a matrix of injection coefficients with the same size of gHSR. According to Eq. (4), the different approaches belonging to this category can differ from (1) the spatial filters used to get PL (some widespread solutions have been listed above) and (2) the injection coefficients fGkgk¼1;.;N. Common choices for these latter are: 1. Gk ¼ 1 for each k ¼ 1, ., N, where 1 is a unitary matrix with the same size as P. This choice identifies the so-called additive injection scheme [5,38]; 2. Gk ¼ gHSRk =PL for each k ¼ 1,., N. In this case, the details are weighted by the ratio of the HSR and PL, with the aim of reproducing, in the fused image, the local intensity contrast of the PAN [38]. This coefficient se- lection is often named high pass modulation (HPM) method or multipli- cative injection scheme. 3. Gk ¼ 1 covð gHSRk;PLÞ=varðPLÞ for each k ¼ 1,.,N. This injection model is often called projective in the literature [5,32]. 76 SECTION j II Algorithms and methods The general scheme of MRA fusion methods is reported in Fig. 6. Accordingly, the required blocks are (1) upsampling of HSR image to reach the PAN scale; (2) low-pass filtering of P to get PL; (3) calculation of the injection gains fGkgk¼1;.;N; (4) injection of the extracted details according to Eqn (4). Apart from the filter, the methods can differ from the application or not of the decimation step for PL. In the case of decimated approaches, it is possible, by properly designing the spatial filters, to compensate the aliasing of the HSR image through the fusion process [39]. The MRA family includes many popular pansharpening approaches. In the literature [13,29], three approaches have been compared for sharpening HS data. In the following, we will focus attention on these techniques that will be included in our benchmark for fusing HS and PAN data in the real example in Section 4. 2.3.1 Smoothing filter-based intensity modulation A popular implementation of (4) consists in applying a linear time-invariant (LTI) low-pass filter (LPF) hLP to the PAN image P to get PL. Therefore, PL ¼ P * hLP in which * denotes the convolution operator. The smoothing filter-based intensity modulation (SFIM) algorithm [33] sets hLP to a simple box (i.e., an average) filter and exploits the HPM as injection scheme. 2.3.2 Laplacian pyramid The resolution reduction can be obtained in more than one step in order to get the low-pass signal PL at the original resolution of the HSR image. This is FIGURE 6 Flowchart of a generic multiresolution analysis pansharpening approach. HSR, higher spectral resolution. Pansharpening Chapter j 2.3 77 commonly referred to as pyramidal decomposition and dates back to the seminal work of Burt and Adelson [40]. Gaussian filters can be tuned to closely match the HSR sensors’ modulation transfer function (MTF) [32], thus extracting all the required details to enhance the HSR image to the PAN spatial resolution. In order to properly design the filters to match the sensors’ MTF, the standard deviation, which is the unique parameter that characterizes the whole Gaussian distribution in this case, is set starting from sensor-based information (i.e., the value of the amplitude response at the Nyquist fre- quency provided by the manufacturer). Both the additive and the multiplicative injection schemes [32,38] have been exploited for HS sharpening. They are usually referred to as MTF-generalized Laplacian pyramid (MTF-GLP) [32] and MTF-GLP with high pass modulation (MTF-GLP-HPM) [38], respectively. 2.4 A new generation of pansharpening approaches The new generation of pansharpening approaches is based on superresolution paradigms or, generally speaking, the application of constrained optimization algorithms to solve the ill-posed problem of pansharpening. These paradigms are related to the issue of reconstructing the high spatial resolution image by fusing its low spatial resolution versions. This inverse problem is usually strongly ill-posed implying a nonunique solution. Therefore, various regularization methods, introducing different prior knowledge, have been proposed to stabilize the inversion [41]. A powerful and emerging approach relies upon the sparse representation of signals or com- pressed sensing theory [42,43]. The first seminal work for pansharpening is presented in Ref. [16], but it is not practicable requiring a huge database of unavailable high spatial high spectral images. Thus, several approaches have been proposed in the literature to overcome this issue, which employ standard representation matrices [44] or dictionaries constructed from a set of available PAN and HSR images [45]. A feasible option consists in deriving the dictio- naries only from the data set at hand. In particular, different solutions comprise a dictionary built from (1) only the PAN image [17], (2) both the PAN and the original HSR image [46], and (3) a set of synthetic pansharpened images [47]. Another class of new generation approaches is represented by Bayesian methods [15]. This is based on a posterior distribution of the full resolution image given the observed HSR and PAN images. The posterior is composed by two factors: (1) the likelihood function,which is theprobability density of the observed HSR and PAN images given the full resolution image and (2) the prior probability density of the target image. The prior probability is of critical importance to cope with the usual ill-posedness of the pansharpening inverse problem. Variational techniques, see e.g., Refs. [19,20], can be interpretable as a particular case of Bayesian methods thus representing another class of new generation pansharpening approaches. In this case, the target image is estimated by maximizing the posterior probability density of the full resolution image. 78 SECTION j II Algorithms and methods Examples of application of new generation approaches for HS pan- sharpening can be found in the comprehensive review [13]. However, the most of the methods belonging to this class suffer, on one hand, from modeling inaccuracies, and, on other hand, from high computational complexity that limits their applicability in practical cases and, in particular, with the growing of the spectral bands to be fused. 3. Quality assessment of pansharpening products The absence of a reference image in order to assess the performance is the main limitation for validating pansharpening products. To this aim, two main pro- cedures for assessing pansharpening approaches have been proposed in the literature. Historically speaking, the first protocol that tries to overcome theproblem of assessing pansharpening products dates back to Wald et al. [9]. The main idea is to spatially degrade the two images to fuse (i.e.,HSR andPAN). Thus, spatialLTI low-passfilters are usually exploited.Once the twoproducts tobe fused are properly degraded, the original HSR image is used as reference (target) image (or ground-truth). On one hand, this procedure is very accurate generating a reference image for the performance evaluation, on other hand, a hypothesis of invariance among scale of the performance of fused products has to be performed [3,5]. A further issue is also related to how to design the low-pass filters in order to have a proper degradation to get the new, artificially generated, products to be fused (indeed, this can bias the assessment of the fusion methods). In the following, we will try to give an answer to this question, see Section 3.1. In order to deal with these issues, the performance assessment at full resolution (i.e., the original resolutions of HSR and PAN images) has been proposed in the literature. On one hand, these approaches overcome the lim- itations of a validation at reduced resolution (i.e., Wald’s protocol), on other hand, these advantages are paid by a reduction of the performance assessment accuracy. In Section 3.2, we will detail a recent approach used for the vali- dation at full resolution. Due to the shortcomings of both the quantitative evaluation procedures, a qualitative evaluation of the fused outcomes through visual inspection is still necessary to support the quantitative evaluation [4,5]. 3.1 Wald’s protocol The procedure operating at reduced resolution is based on Wald’s protocol [9]. This requires that: 1. Any fused synthetic image, once degraded to its original resolution, should be as identical as possible to the original image HSRk. 2. Any fused synthetic image should be as identical as possible to the image that the corresponding sensor would observe with the highest resolution. Pansharpening Chapter j 2.3 79 3. The multi/hyperspectral set of synthetic images should be as identical as possible to the multi/hyperspectral set of images that the corresponding sensor would observe with the highest resolution. Thus, the fusion of the HSR and PAN images at reduced resolution can easily verify the synthesis properties of Wald’s protocol, see the second and third statements, thanks to the presence of the reference image (represented by the original HSR image). In particular, the degradation of the spatial resolution of the starting images is obtained by decimating (i.e., applying a low-pass filter and downsampling) by a sampling factor equal to the resolution ratio between the two images. Let us denote the reduced resolution of the HSR image and of the PAN image by HSR* and P*, respectively. To verify the requirements of this protocol, the choice of degradation filters becomes crucial. Generally speaking, the filters are defined for ensuring the consistency of the pansharpening process (the first Wald’s statement). Since the pansharpened image (that should match as close as possible with the original reference image HSR), once degraded to its original resolution, should be identical to the original HSR image (whose part is acted by its low spatial resolution version HSR*), it is straightforward that the resolution reduction of the HSR image has to be performed by employing a filter simulating the sensor’s optical transfer function (OTF) (i.e., the Fourier transform of the point spread function (impulse-response of the optical sys- tem)). The OTF consists of the phase transfer function and the MTF. The former can be safely neglected for our purposes, and the sensor’s blur is thus modeled by the sole MTF. Practically, the degradation filter has to match the MTF of the HSR sensor [32]. In addition, the filter used for obtaining the PAN image, P*, has to be designed. An ideal filter is a widespread choice for degrading it [32]. This is usually the right choice for degrading the PAN data because these images are used to be restored before being distributed. Several indexes have been proposed for comparing the fused product with the reference image according to Wald’s protocol. Vectorial (i.e., jointly considering all the spectral bands) similarity indexes are usually exploited. The most used are: l The spectral angle mapper (SAM) [48] is a simple index that measures the image spectral distortion. It calculates the angle between the corresponding pixels of the fused and reference images into the space defined by consid- ering each spectral band as a coordinate axis. Let I{n} ¼ [I1,{n}, . ,IN,{n}] and J{n} ¼ [J1,{n}, ., JN,{n}] be the pixel vectors of the HSR images I and J with N bands, the SAM between these two images is defined as: SAMðIfig; JfigÞ ¼ arccos � hIfig; Jfigi kIfigkkJfigk � (5) 80 SECTION j II Algorithms and methods in which h$,$i denotes the scalar product (or inner product) and k$k the vector l2 norm. The global value of SAM for the whole image is obtained by averaging the single measures over all the pixels. The SAM is usually measured in degrees. The optimal value of the SAM index is 0. The higher the value of the SAM index, the greater the measured spectral distortion. l The Erreur Relative Globale Adimensionnelle de Synthèse (ERGAS) [49] is a credited vectorial index accounting for spatial/radiometric distortions. It is defined as: ERGAS ¼ 100 R ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 N XN k¼1 � RMSEðIk; JkÞ mðIkÞ �2 ; vuut (6) where RMSE is the root mean square error defined as RMSEðI; JÞ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi E h ðI� JÞ2 ir ; (7) m represents the mean of the image, R is the scale ratio between the PAN and the HSR data, and E [$] denotes the mean operator. Since the ERGAS is composed by a sum of RMSE values, its optimal value is 0. The higher the value of the ERGAS index, the greater the measured distortion. l The Q4 or the Q2n indexes [50,51] are vectorial extensions, accounting for spectral distortion, of the Q-index [52] to vector data up to four bands, Q4 index [50], and to vector data with a number of spectral bands greater than four, Q2n index [51]. In practice, the Q4/Q2n indexes are based on modeling each pixel I{i} as a quaternion. Ifig ¼ Ifig;1 þ iIfig;2 þ jIfig;3 þ kIfig;4: (8) The Q-index [52] (or universal image quality index) has been developed in the image processing literature to overcome some limitations of the RMSE for perceptual quality assessment [52]. Its physical interpretation becomes straightforward by writing its expression in the form: QðI; JÞ ¼ sIJ sIsJ 2IJ� I �2 þ �J�2 2sIsJ� s2I þ s2J �; (9) where sIJ is the sample covariance of I and J, I is the sample mean of I, and sI is the sample standard deviation of I. Accordingly, it comprises, in the order, an estimate of correlation coefficient, the differences in the mean luminance and in the contrast. Pansharpening Chapter j 2.3 81 All the Q-indexes vary in the range [e1, 1], where 1 denotes the best fi- delity to reference. 3.2 The quality without reference assessment In order to perform the quality evaluation at the original (full) resolution, several quality without no reference (QNR) indexes have been proposed in the literature [8]. A powerful recent proposal combines the evaluation of the spatial quality for pansharpened images, i.e., the spatial distortion DS, of the QNR protocol [11] and the spectral quality, measured by the spectral distortion index Dl, proposed in Ref. [53]. The comprehensive index is called hybrid QNR (HQNR) [12]. It is defined as: HQNR ¼ ð1� DlÞað1� DSÞb: (10) Namely, it is composed by the product, weighted by the coefficients a and b, of two separate values Dl and DS, which quantify the spectral and the spatial distortion, respectively. The higher the HQNR index, the better the quality of the fused product.The ideal value is 1 when both the spectral distortion (Dl) and spatial distortion (DS) are equal to 0. Dl is inspired by the consistency (first) property of Wald’s protocol. Thus, a low-pass filter matching the shape of the MTF of the corresponding spectral channel (usually the MTFs of HSR instruments have a Gaussian shape) is applied to the fused product to compare it, after downsampling, with the original HSR image. The similarity between the decimated pansharpened product and the original low spatial resolution HSR data is measured by the means of the Q4/Q2n index [50,51]. Instead, DS is calculated by DS ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 N XN i¼1 Q� dHSRi ;P � � QðHSRi;PLPÞ q;q vuut (11) where PLP is a low-resolution (artificial) PAN image at the same resolution of the HSR image obtained by filtering with a low-pass filter the original PAN image and q is usually set to 1 [11]. From a theoretical point of view, the perfect (even local) alignment between the interpolated version of the HSR and the PAN images should be assured to avoid the loss of meaning for this quality index. Unfortunately, practically, local small disalignments between the HSR and PAN images can be observed (e.g., roofs of skyscrapers could be misaligned because of the two images are often aligned on the ground). 4. Sharpening of hyperspectral data Different spatial resolutions of satellite sensors are due to the trade-off in the design of electro-optical systems. Such a trade-off is aimed at balancing 82 SECTION j II Algorithms and methods some aspects, such as the signal-to-noise ratio (SNR), the physical di- mensions of sensor arrays, and the transmission rate from the satellite to the ground stations. Indeed, SNR usually decreases when spatial and spectral resolutions increase. The time delay integration is a technology that is able to increase the SNR for high-resolution PAN and MS sensors. Unfortunately, this solution is not feasible for HS instruments [8], thus, for satellite in- struments, the spatial resolution of HS sensors is expected to be limited to tenths of meters in the next future. Physical dimensions of sensor arrays are also crucial for SNR. Indeed, the SNR is reduced with the reduction of their physical dimensions. Although new technologies can be developed, see e.g., efficient detectors based on solid state devices, spatial resolution will be limited by the need of obtaining reasonable large swath widths. Last but not the least, limitation is given by the transmission rate. Indeed, because of the high spectral resolution of HS sensors, a huge volume of data to be stored on-board and transmitted to the ground stations is expected with consequently strong power and memory requirements. This implies limits to the spatial resolution of HS systems. All these considerations advise the simultaneous acquisitions of a PAN image and an HS image, and then fusing them to obtain a synthetic product with both high spatial and high spectral resolutions. Thus, HS pansharpening represents a hot topic to be investigated. In fact, many researchers have focused attention on this issue trying to give an answer to the question how HS and PAN images can be fused in a proper way. Thus, a comprehensive review about the state-of-the-art for HS pansharpening has been proposed in Ref. [13]. The most of the presented approaches have been heritage from the MS pan- sharpening literature. Hence, a real example of fusion for HS and PAN data exploiting classical pansharpening methods is presented in Section 4.2. State- of-the-art pansharpening approaches have been used as benchmark. These are related to the CS and MRA classes. Despite new generation methods have shown good performance in the literature of HS pansharpening, these are neglected in this section due to the usual huge computational burden. Before having a look at the real example, the new challenges in HS pansharpening are reviewed first in Section 4.1. 4.1 New challenges in hyperspectral pansharpening HS pansharpening is more complicated than that of the classical pan- sharpening problem fusing MS data. The main reasons, which justify why this fusion process is more complex than the classical MS pansharpening, are listed, below: l Whereas PAN and MS sensors almost acquire in the same spectral range, the spectral range of an HS sensor is usually wider than that of a PAN sensor. Indeed, the spectral range of a PAN sensor is close to the visible Pansharpening Chapter j 2.3 83 spectral range, instead, HS sensors often cover the visible, the NIR, and the SWIR spectrum ranges. Thus, a key point for HS pansharpening techniques is the injection model required to preserve the spectral information of the HSR data. Indeed, spatial details that are not available for several HS bands have to be inferred through this model, in particular, when there is no overlap between the HS spectrum and the PAN spectrum. This difficulty already existed, to some extent, in the classical MS pansharpening, but it is much more important in the case of HS pansharpening. l The spatial scale ratio between HS and PAN can be greater than four (typical case for pansharpening) and/or not power of two (see the fusion example in Section 4.2). This implies that the application of some extraction detail approaches developed for the classical MS pansharpening, such as the wavelet-based ones, is not straightforward. l The format of HS data is crucial. Unlike MS pansharpening, in which spectral radiance and reflectance are equivalent from a performance point of view, for HS pansharpening the presence of (1) absorption bands, (2) spectrally selective path radiance offsets, and (3) strong decaying of solar irradiance make fusion to be processed if a reflectance data product is available. l The adopted fusion approaches should be as simple as possible in order to limit the computational complexity due to the hundreds of spectral bands tobe fused. l Because of the number of spectral bands to be fused, the spectral distortion plays a crucial rule. Thus, an HS pansharpening approach should be designed to minimize the spectral distortion. 4.2 Hyperspectral pansharpening: a real example Several classical pansharpening approaches are compared on a data set acquired over the center of Paris (France) by theHS imager (Hyperion) and theALI sensors on-board of the Earth Observing-1 (EO-1) satellite. The Hyperion sensor is capable of resolving 220 spectral bands (from 0.4 to 2.5 mm) with a 30-m spatial resolution. Instead, the ALI sensor acquires nine MS channels and a PAN image. ThePANcamera has a spatial resolution of 10 m (scale ratio between PANandHS is 3) and a spectral coverage from 0.48 to 0.69 mm. Both sensors are mounted on the same platform, thus alleviating image coregistration issues. The swaths ofALI andHyperion are partially overlapped in such away that there is a part of the scene, in which the PAN and HS data are simultaneously available. In the experiments, the sole bands, which overlap the spectral range of the PAN channel, are exploited for the fusion (i.e., fromband14 to band 33). This data set is namedHyp-ALI, from here on. ThePCS [30], theGS [31], and theGSA [28]methods are considered into the CS class. Whereas, the SFIM [33], the Gaussian Laplacian pyramid with MTF-matched filter (MTF-GLP) [32], and theGaussianMTF-matched filterwith high pass modulation (MTF-GLP-HPM) injection model [38] are analyzed into the MRA class. 84 SECTION j II Algorithms and methods A first assessment at reduced resolution is provided to the readers in Fig. 7 in which GT indicates the ground-truth (reference) image corresponding to the original HS data before the simulation process according to Wald’s protocol described in Section 3.1. Thus, both the original HS and PAN images are degradedat a lower spatial resolution. In particular, the original HS image is degraded in order to simulate a spatial resolution of 90 m and the PAN image is degraded to get a spatial resolution equal to 30 m (retaining the original scale ratio equal to 3). The filters used in this phase are detailed in Section 3.1. The quantitative results are reported in Table 1. The best results are provided by the MTF-GLP-based approaches. By focusing attention on this category, the best injection strategy is provided by the HPM with slightly better results (see the overall quality index Q2n in Table 1) than the additive strategy, corroborating the outcomes in the pansharpening literature, see e.g., Ref. [38]. The best CS approach is instead the GSA that exploits a regression approach to build the intensity component reducing the spectral distortion (see the SAM index in Table 1) with respect to the other compared approaches in the same class. The performance of the GSA are very close to the best MRA-based approaches with a very good rendering (very close to the GT), see Fig. 7E. Into the MRA family, the advantages of using Gaussian MTF-matched filters with respect to the box filter (used by the SFIM approach) are straightforward. Indeed, the former are preferable compared to the latter, thanks to (1) the capability to extract more spatial details and (2) a strong reduction of artifacts, see Fig. 7FeH. FIGURE 7 Reduced resolution Hyp-ALI data set (Red ¼ band 30, Green ¼ band 20, Blue ¼ - band 14): (A) ground-truth; (B) EXP; (C) principal component substitution; (D) GrameSchmidt; (E) GrameSchmidt adaptive; (F) smoothing filter-based intensity modulation; (G) modulation transfer function-generalized Laplacian pyramid; (H) modulation transfer function-generalized Laplacian pyramid with high pass modulation. Pansharpening Chapter j 2.3 85 These results are corroborated by the full resolution assessment. The quantitative results using the HQNR index are reported in Table 2. Again, the best results are clearly obtained by the MTF-GLP approaches (HQNR indexes TABLE 1 Reduced resolution assessment. Quantitative results on the Hyp- ALI data set. Best results among the compared fusion approaches are in boldface. Algorithm Q2n SAM [o] ERGAS GT 1.0000 0.0000 0.0000 EXP 0.5283 0.9661 3.1941 PCS 0.7965 0.7977 2.2648 GS 0.7967 0.7976 2.2637 GSA 0.8877 0.7301 2.3373 SFIM 0.7997 0.7990 2.2411 MTF-GLP 0.8889 0.7188 1.7478 MTF-GLP-HPM 0.8930 0.7209 1.7170 GS, GrameSchmidt; GSA, GS adaptive; GT, Ground-truth; MTF-GLP, Modulation transfer function- generalized Laplacian pyramid;MTF-GLP-HPM, Modulation transfer function-generalized Laplacian pyramid with high pass modulation; PCS, Principal component substitution; SFIM, Smoothing filter- based intensity modulation. TABLE 2 Full resolution assessment. Quantitative results on the Hyp-ALI data set. Best results are in boldface. Algorithm Dl DS HQNR EXP 0.0486 0.3210 0.6460 PCS 0.1839 0.0321 0.7899 GS 0.1840 0.0318 0.7901 GSA 0.0771 0.0732 0.8553 SFIM 0.0368 0.0627 0.9028 MTF-GLP 0.0216 0.0337 0.9455 MTF-GLP-HPM 0.0202 0.0375 0.9430 GS, GrameSchmidt; GSA, GS adaptive; MTF-GLP, Modulation transfer function-generalized Lap- lacian pyramid; MTF-GLP-HPM, Modulation transfer function-generalized Laplacian pyramid with high pass modulation; PCS, Principal component substitution; SFIM, Smoothing filter-based intensity modulation. 86 SECTION j II Algorithms and methods very close to each other). A strong reduction of the spectral distortion is generally shown by the MRA-based methods with respect to the CS-based techniques, see the Dl index in Table 2. Again, the unique CS approach that is able to reduce this kind of distortion is the GSA, which obtains intermediate performance, and it is in between the PCS/GS and MRA approaches. On the other hand, the MRA approaches slightly suffer from a spatial distortion, DS, point of view, compared to CS-based methods, such as PCS and GS. In particular, the SFIM approach obtains a very high value of DS, see Table 2, corroborating the inability of box filters in properly extracting spatial details. The visual inspection of some close-ups of the fused products obtained at full resolution is shown in Fig. 8. These corroborate the quantitative outcomes. 5. Concluding remarks With the increasing of satellite missions and the number of on-orbit sensors, data fusion approaches are getting more and more attention into the scientific community. Pansharpening is one of the main image fusion problems. In this chapter, the principal approaches for pansharpening and their classification have been presented. The two main assessment procedures have also been discussed. The last part of the chapter has been dedicated to the particularization of the pansharpening problem to the sharpening of HS data, i.e., to the fusion of a PAN image with HS (instead of MS) data. New challenges with regard to this problem have been pointed out, and a real example of the fusion of HS and PAN FIGURE 8 Close-ups of the full resolution Hyp-ALI data set (Red ¼ band 30, Green ¼ band 20, Blue ¼ band 14): (A) panchromatic; (B) EXP; (C) principal component substitution; (D) GrameSchmidt; (E) GrameSchmidt adaptive; (F) smoothing filter-based intensity modulation; (G) modulation transfer function-generalized Laplacian pyramid; (H) modulation transfer func- tion-generalized Laplacian pyramid with high pass modulation. Pansharpening Chapter j 2.3 87 data (acquired by the Hyperion sensor and the ALI sensor, respectively) has been shown exploiting a benchmark of classical state-of-the-art pansharpening techniques. The performance of the compared approaches has been evaluated using both the assessment protocols. The advantages in using MRA methods, and, in particular, the ones based on a prior physical knowledge about the acquisition systems (i.e., spatial filters matched with the sensor’s MTF) what- ever the adopted injection rule (i.e., additive or HPM), have been pointed out in comparison with some widely used CS techniques. This gap in performance, verified both at reduced resolution and at full resolution, can be justified by considering that MRA methods are able to reduce the spectral distortion of fused products, which represents a key point when HS data are fused. References [1] C. Souza Jr., L. Firestone, L.M. Silva, D. Roberts, Mapping forest degradation in the Eastern Amazon from SPOT 4 through spectral mixture models, Remote Sensing of Environment 87 (2003) 494e506. [2] A. Mohammadzadeh, A. Tavakoli, M.J. Valadan Zoej, Road extraction based on fuzzy logic and mathematical morphology from pan-sharpened IKONOS images, Photogrammetric Record 21 (2006) 44e60. [3] F. Laporterie-Déjean, H. de Boissezon, G. Flouzat, M.-J. Lefèvre-Fonollosa, Thematic and statistical evaluations of five panchromatic/multispectral fusion methods on simulated PLEIADES-HR Images, Information Fusion 6 (2005) 193e212. [4] L. Alparone, L. Wald, J. Chanussot, C. Thomas, P. Gamba, L.M. Bruce, Comparison of pansharpening algorithms: outcome of the 2006 GRS-S data fusion contest, IEEE Trans- actions on Geoscience and Remote Sensing 45 (2007) 3012e3021. [5] G. Vivone, L. Alparone, J. Chanussot, M. Dalla Mura, A. Garzelli, G. Licciardi, R. Restaino, L. Wald, A critical comparison among pansharpening algorithms, IEEE Transactions on Geoscience and Remote Sensing 53 (2015) 2565e2586. [6] C. Thomas, T. Ranchin, L. Wald, J. Chanussot, Synthesis of multispectral images to high spatial resolution: a critical review of fusion methods based on remote sensing physics, IEEE Transactions on Geoscience and Remote Sensing 46 (2008) 1301e1312. [7] B. Aiazzi, L. Alparone, S. Baronti, A. Garzelli, M. Selva, Twenty-five years of pansharpening: a critical review and new developments, in: C.-H. Chen (Ed.), Signal and Image Processing for Remote Sensing, second ed., CRC Press, Boca Raton, FL, USA, 2012, pp. 533e548. [8] L.Alparone,B.Aiazzi, S.Baronti, A.Garzelli, RemoteSensing ImageFusion,CRCPress, 2015. [9] L. Wald, T. Ranchin, M. Mangolini, Fusion of satellite images of different spatial resolu- tions: assessing the quality of resulting images, Photogrammetric Engineering & Remote Sensing 63 (1997) 691e699. [10] C. Thomas, L. Wald, Analysis of changes in quality assessment with scale, in: Proc. 9th Int. Conf. Inf. Fusion, 2006, pp. 1e5. [11] L. Alparone, B. Aiazzi, S. Baronti, A. Garzelli, F. Nencini, M. Selva, Multispectral and panchromatic data fusion assessment without reference, Photogrammetric Engineering & Remote Sensing 74 (2008) 193e200. [12] B. Aiazzi, L. Alparone, S. Baronti, R. Carlà, A. Garzelli, L. Santurri, Full scale assessment of pansharpening methods and data products, in: SPIE Remote Sensing, 2014, 924402e924402. 88 SECTION j II Algorithms and methods [13] L. Loncan, S. Fabre, L.B. Almeida, J.M. Bioucas-Dias, L. Wenzhi, X. Briottet, G.A. Licciardi, J. Chanussot, M. Simoes, N. Dobigeon, J.Y. Tourneret, M.A. Veganzones, W. Qi, G. Vivone, N. Yokoya, Hyperspectral pansharpening: a review, IEEE Geoscience and Remote Sensing Magazine 3 (2015) 27e46. [14] S. Baronti, B. Aiazzi, M. Selva, A. Garzelli, L. Alparone, A theoretical analysis of the effects of aliasing and misregistration on pansharpened imagery, IEEE Journal of Selected Topics in Signal Processing 5 (2011) 446e453. [15] D. Fasbender, J. Radoux, P. Bogaert, Bayesian data fusion for adaptable image pan- sharpening, IEEE Transactions on Geoscience and Remote Sensing 46 (2008) 1847e1857. [16] S. Li, B. Yang, A new pan-sharpening method using a compressed sensing technique, IEEE Transactions on Geoscience and Remote Sensing 49 (2011) 738e746. [17] X.X. Zhu, R. Bamler, A sparse image fusion algorithm with application to pan-sharpening, IEEE Transactions on Geoscience and Remote Sensing 51 (2013) 2827e2836. [18] M.R. Vicinanza, R. Restaino, G. Vivone, M. Dalla Mura, G. Licciardi, J. Chanussot, A pansharpening method based on the sparse representation of injected details, IEEE Geoscience and Remote Sensing Letters 12 (2015) 180e184. [19] F. Palsson, J.R. Sveinsson, M.O. Ulfarsson, A new pansharpening algorithm based on total variation, IEEE Transactions on Geoscience and Remote Sensing 11 (2014) 318e322. [20] X. He, L. Condat, J. Bioucas-Dias, J. Chanussot, J. Xia, A new pansharpening method based on spatial and spectral sparsity priors, IEEE Transactions on Image Processing 23 (2014) 4160e4174. [21] B. Aiazzi, L. Alparone, S. Baronti, A. Garzelli, Context-driven fusion of high spatial and spectral resolution images based on oversampled multiresolution analysis, IEEE Trans- actions on Geoscience and Remote Sensing 40 (2002) 2300e2312. [22] R. Restaino, M. Dalla Mura, G. Vivone, J. Chanussot, Context-adaptive pansharpening based on image segmentation, IEEE Transactions on Geoscience and Remote Sensing 55 (2017) 753e766. [23] B. Aiazzi, S. Baronti, F. Lotti, M. Selva, A comparison between global and context-adaptive pansharpening of multispectral images, IEEE Transactions on Geoscience and Remote Sensing 6 (2009) 302e306. [24] A. Garzelli, Pansharpening of multispectral images based on nonlocal parameter optimization, IEEE Transactions on Geoscience and Remote Sensing 53 (2015) 2096e2107. [25] J. Choi, K. Yu, Y. Kim, A new adaptive component-substitution-based satellite image fusion by using partial replacement, IEEE Transactions on Geoscience and Remote Sensing 49 (2011) 295e309. [26] T.-M. Tu, S.-C. Su, H.-C. Shyu, P.S. Huang, A new look at IHS-like image fusion methods, Information Fusion 2 (2001) 177e186. [27] T.-M. Tu, P.S. Huang, C.-L. Hung, C.-P. Chang, A fast intensity- hue-saturation fusion technique with spectral adjustment for IKONOS imagery, IEEE Transactions on Geoscience and Remote Sensing 1 (2004) 309e312. [28] B. Aiazzi, S. Baronti, M. Selva, Improving component substitution pansharpening through multivariate regression of MSþPan data, IEEE Transactions on Geoscience and Remote Sensing 45 (2007) 3230e3239. [29] G. Vivone, R. Restaino, G. Licciardi, M. Dalla Mura, J. Chanussot, Multiresolution analysis and component substitution techniques for hyperspectral pansharpening, in: Proc. IEEE IGARSS, 2014, pp. 2649e2652. Pansharpening Chapter j 2.3 89 [30] P.S. Chavez Jr., S.C. Sides, J.A. Anderson, Comparison of three different methods to merge multiresolution and multispectral data: landsat TM and SPOT panchromatic, Photogram- metric Engineering & Remote Sensing 57 (1991) 295e303. [31] C.A. Laben, B.V. Brower, Process for Enhancing the Spatial Resolution of Multispectral Imagery Using Pan-Sharpening, U.S. Patent # 6,011,875, 2000. [32] B. Aiazzi, L. Alparone, S. Baronti, A. Garzelli, M. Selva, MTF-tailored multiscale fusion of high-resolution MS and pan imagery, Photogrammetric Engineering & Remote Sensing 72 (2006) 591e596. [33] J.G. Liu, Smoothing filter based intensity modulation: a spectral preserve image fusion technique for improving spatial details, International Journal of Remote Sensing 21 (2000) 3461e3472. [34] J. Núñez, X. Otazu, O. Fors, A. Prades, V. Palà, R. Arbiol, Multiresolution-based image fusion with additive wavelet decomposition, IEEE Transactions on Geoscience and Remote Sensing 37 (1999) 1204e1211. [35] X. Otazu, M. González-Audı́cana, O. Fors, J. Núñez, Introduction of sensor spectral response into image fusion methods. Application to wavelet-based methods, IEEE Trans- actions on Geoscience and Remote Sensing 43 (2005) 2376e2385. [36] R. Restaino, G. Vivone, M. Dalla Mura, J. Chanussot, Fusion of multispectral and panchromatic images based on morphological operators, IEEE Transactions on Image Processing 25 (2016) 2882e2895. [37] G. Vivone, M. Simões, M. Dalla Mura, R. Restaino, J. Bioucas-Dias, G. Licciardi, J. Chanussot, Pansharpening based on semiblind deconvolution, IEEE Transactions on Geoscience and Remote Sensing 53 (2015) 1997e2010. [38] G. Vivone, R. Restaino, M. Dalla Mura, G. Licciardi, J. Chanussot, Contrast and error-based fusion schemes for multispectral image pansharpening, IEEE Geoscience and Remote Sensing Letters 11 (2014) 930e934. [39] B. Aiazzi, L. Alparone, S. Baronti, A. Garzelli, M. Selva, Advantages of Laplacian pyra- mids over “à trous” wavelet transforms, in: L. Bruzzone (Ed.), Proc. SPIE Image Signal Process. Remote Sens. XVIII, vol. 8537, 2012, 853704e1e853704e10. [40] P.J. Burt, E.H. Adelson, The Laplacian pyramid as a compact image code, IEEE Trans- actions on Communications 31 (1983) 532e540. [41] J. Yang, J. Wright, T. Huang, Y. Ma, Image super-resolution via sparse representation, IEEE Transactions on Image Processing 19 (2010) 2861e2873. [42] D.L. Donoho, Compressed sensing, IEEE Transactions on Information Theory 52 (2006) 1289e1306. [43] E.J. Candès, J. Romberg, T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information, IEEE Transactions on Information Theory 52 (2006) 489e509. [44] D. Liu, P.T. Boufounos, Pan-sharpening with multi-scale wavelet dictionary, in: Proc. IEEE IGARSS, 2012, pp. 2397e2400. [45] C. Jiang, H. Zhang, H. Shen, L. Zhang, A practical compressed sensing- based pan- sharpening method, IEEE Geoscience and Remote Sensing Letters 9 (2015) 629e633. [46] S. Li, H. Yin, L. Fang, Remote sensing image fusion via sparse representations over learned dictionaries, IEEE Transactions on Geoscience and Remote Sensing 51 (2013) 4779e4789. [47] M. Cheng, C. Wang, J. Li, Sparse representation based pansharpening using trained dic- tionary, IEEE Transactions on Geoscience and Remote Sensing 11 (2014) 293e297. 90 SECTION j II Algorithms and methods [48] R. H. Yuhas, A. F. H. Goetz, J. W. Boardman, Discrimination among semi-arid landscape endmembers using the spectral angle mapper (SAM) algorithm, in: Proc. Summaries 3rd Annu. JPL Airborne Geosci. Workshop, pp. 147e149. [49] L. Wald, Data Fusion: Definitions and Architectures d Fusion of Images of Different SpatialResolutions, Les Presses de l’École des Mines, Paris, France, 2002. [50] L. Alparone, S. Baronti, A. Garzelli, F. Nencini, A global quality measurement of pan- sharpened multispectral imagery, IEEE Geoscience and Remote Sensing Letters 1 (2004) 313e317. [51] A. Garzelli, F. Nencini, Hypercomplex quality assessment of multi-/hyper-spectral images, IEEE Transactions on Geoscience and Remote Sensing 6 (2009) 662e665. [52] Z. Wang, A.C. Bovik, A universal image quality index, IEEE Signal Processing Letters 9 (2002) 81e84. [53] M.M. Khan, L. Alparone, J. Chanussot, Pansharpening quality assessment using the mod- ulation transfer functions of instruments, IEEE Transactions on Geoscience and Remote Sensing 11 (2009) 3880e3891. Pansharpening Chapter j 2.3 91 Chapter 2.4 Unsupervised exploration of hyperspectral and multispectral images Federico Marinia,* and José Manuel Amigob aDepartment of Chemistry, University of Rome La Sapienza, Roma, Italy; bProfessor, Ikerbasque, Basque Foundation for Science; Department of Analytical Chemistry, University of the Basque Country, Spain; Chemometrics and Analytical Technologies, Department of Food Science, University of Copenhagen, Denmark *Corresponding author. e-mail: federico.marini@uniroma1.it 1. Unsupervised exploration methods Hyperspectral and multispectral images) are normally complex data sets composed by a finite number of chemical compounds distributed in a surface. Depending on the type of spectroscopic method used for acquiring the spectral information per pixel, finding selective information for each compound is essential to understand the chemical information that the image contains. One method could be the simple exploration of an image by displaying the infor- mation gathered by different wavelengths. Let us take as example, the Raman image depicted in Fig. 1. This image is an emulsion of oil-in-water analyzed by modified Raman imaging (SA Explorer 1) [1,2]. The image was recorded using a laser at 633 nm excitation. The composition of this surface has been widely described elsewhere [1e4], to the point that it is now often used as a bench- mark image. The surface is composed by a background phase and several drops of the oily phase. The total amount of spectral features are four, being two pure signatures and two mixtures. One of the main features of Raman spectroscopy is the fact that it almost always results in sharp peaks, some of them being selective for a specific chemical compound. In this case, the display of some selective wavelengths helps in a great extent to elucidate qualitatively the presence and distribution of some of the chemical compounds in the sample (Fig. 1, right). Even, a simple exposition of a false color image generated by mimicking the hyperspectral channels into false RGB channels can offer some qualitative information of the distribution of certain compounds. Hyperspectral Imaging. https://doi.org/10.1016/B978-0-444-63977-6.00006-7 Copyright © 2020 Elsevier B.V. All rights reserved. 93 https://doi.org/10.1016/B978-0-444-63977-6.00006-7 Nevertheless, this information must be taken with extreme caution, since it is merely qualitative and hidden compounds can be found in small peaks or even peaks being overlapped. Considering other types of images, wavelength exploration is even more cumbersome. For example, near-infrared (NIR) hyperspectral images are characterized by broad bands that trend to overlap with other bands coming from different compounds in the sample. Moreover, many times neither the spectral nor the spatial signatures contain selective information of one particular component. That is, the pixels are normally composed by mixtures of different compounds and the spectral signatures, therefore, are normally overlapped. The main goals of exploratory data anal- ysis, as introduced by Tukey [5], are to obtain the best insight into a data set, to uncover its underlying structure, and to detect outliers and anomalies, mostly by means of graphical approaches. It is evident how the multivariate nature of the involved signals (both in bulk spectroscopy and in hyperspectral imaging) makes a direct representation of the data cumbersome and suboptimal. Indeed, since the human eye is able to see at most three dimensions, inspection of a multivariate data set would require to consider all the possible plots obtainable by combining two or three of the measured variables: this operation would be not only impossibly lengthy, but also it would provide a very partial picture of the system since, in each plot, the largest part of the overall variability would remain undisplayed. Based on this consideration, it is evident how the main problem in exploratory data analysis is to find a way to summarize the most salient features in the data in a few variables (ideally two or three), so that they could also be used for graphical display without a significant loss in information. Unsupervised exploration methods are methods that give qualitative in- formation of the compounds distributed in a surface in a simple and non- supervised manner by using only the self-contained information of the sample. FIGURE 1 Exploration of a Raman image of an emulsion [1,2]. Left, false color image of a Raman hyperspectral image. Top right, 30 random spectra taken from the image and bottom right, the corresponding images obtained for some selected wavelengths. 94 SECTION j II Algorithms and methods These methods are mostly based on the differences/similarities in the spectral signatures of the pixels conforming the hyperspectral image and provide a first overview of the distribution of some compounds in the sample. Two big families are most used in hyperspectral imaging (his) and multi- spectral imaging (MSI) framework, projection methods and clustering tech- niques. In this chapter, we explain how projection methods and clustering work dealing with hyperspectral and multispectral images, highlighting their main benefits and drawbacks. Everything is illustrated with simple examples used as benchmark examples. All analysis were developed by using HYPER-Tools [6] (freely downloadable from www.hypertools.orgdlast accessed April 2019). 2. Projection methods: Principal component analysis In more rigorous geometrical terms, there is a necessity of finding a subset of suitable (i.e., relevant to the problem under investigation) directions in space onto which projecting the data points and the methods which operate such transformation are called projection techniques or, focusing on the mathe- matical nature of the corresponding models, bilinear methods. Given the data matrix D, which in the case of hyperspectral imaging could correspond, e.g., to the unfolded data hypercube, the goal of projection methods is to look for F directions in space (characterized by their direction cosines, gathered in the data matrix B), so that when the data are projected onto them according to: A ¼ DB (1) the corresponding scores A, i.e., the coordinates of the points in this new subspace describe the system as relevantly as possible according to some prespecified criterion of interest. Within this general framework, in the context of exploratory data analysis, the most used projection method is principal component analysis (PCA) which searches for direction which provides a representation of the data set which is as close as possible to the original matrix D, i.e., which constitute the best F-dimensional fit of the data in a least squares sense. Due to its fundamental importance for data analysis, PCA and its applications to unsupervised anal- ysis of hyperspectral images will be discussed in detail in the following section. 2.1 Basics of principal component analysis PCA [7] is a projection method that looks for orthogonal directions in the multivariate space which account for as much as possible of the variability in the data, i.e., which provide the representation that best approximates the data in the least square sense. Unsupervisedexploration Chapter j 2.4 95 http://www.hypertools.org Given a data cube D with two spatial dimensions (X and Y) and one spectral dimension (l), the first step before applying any bilinear model is the unfolding of the data cube into a matrix such as D(X � Y,l) (Fig. 2). The first principal component (PC1), whose direction cosines are called the loadings p1, is then identified as follows. If the projection of the original data D onto the direction p1 is indicated as bD1, then:bD1 ¼Dp1 � pT1 p1 ��1 pT1 ¼ Dp1p T 1 (2) where the last equality holds since p1 is unit norm. This principal component is calculated in the direction of the maximum variance. Therefore, it will be the most important principal component. The second principal component is calculated in a similar manner but not on the original matrix D, but on the “deflated” matrix resulting from subtracting the first principal component from the original matrix: Dnew ¼D� bD1 (3) This operation is repeated for as many principal components as needed in such a way that the matrix in now decomposed by a series of orthogonal principal components that contain all the relevant information together with a residual matrix E(X�Y,l): D¼ t1p T 1 þ t2p T 2 þ.þ E (4) Giving the general PCA model: D¼TPT þ E (5) FIGURE 2 Graphical representation of a principal component analysis model of a hyperspectral sample containing two chemical compounds. 96 SECTION j II Algorithms and methods Once the corresponding scores and loadings are calculated, the final step in HSI and MSI is the refolding of the scores to obtain the so-called score sur- faces (as seen in Fig. 2). Considering the previous example of the emulsion, Fig. 3 shows the PCA model of this hyperspectral image. As it can be denoted, the first three PCs account for 97.55% of the explained variance, showing the distribution of the three major compounds and leaving just 0.84% of the variance for the fourth PC. One of the main advantages of PCA is that since it is an unsupervised exploratory method there is not a strong need to assess the exact amount of PCs needed to explain all the sources of variance in the sample [8]. This choice is left to the analyst, and, consequently, it is the analyst who decides how many PCs are needed. In the previous example, a question could arise about whether the fourth PC is needed or not. The answer to this question is in the loading. If the loading profile denotes explainable chemical changes (as it might be this case), then the PC contains relevant information. Otherwise, this information will go to the residuals of the sample. Another effect that can be observed in the composite image (using PC1, PC2, and PC3 as they were the R, G, and B channels of a digital image) of the PCA model in Fig. 3 is the smoothing effect that can be appreciated in the surface. The surface of the false color image looks noisier, while the composite image looks smoother. This is due to the fact that random and instrumental FIGURE 3 Principal component analysis (PCA) model of the emulsion sample. Top left, the false color image. Top, right, the first four PCs with the corresponding explained variance. Bottom left, a composite image using PC1, PC2, and PC3 surfaces, and they were the RGB channels. Bottom right, the loadings corresponding to the first four PCs. PC, principal component. Unsupervised exploration Chapter j 2.4 97 noise, together with other minor artifacts in the surface have been removed and left in the residuals, since they represent neglectable sources of variance. This is one of the reasons why PCA is also called a variable reduction method, and it is used for hyperspectral data compression [8,9]. 2.2 PCA in multispectral imaging The application of PCA in MSI follows the same principles as in HSI. The main difference is that the preprocessing and normalization steps are more adequate to the fact that the images at different wavelengths are measured in a noncontinuous manner so they can be nonequidistant [9,10]. Therefore, vari- ables are normalized using autoscaling instead of mean centering, which is the standard procedure in HSI [8,9]. As an example, Fig. 4 presents a PCA model made on the MSI of a banknote of 10 euros. This MSI was composed of 18 different wavelengths covering a range between 350 and 950 nm. The PCA model denotes that there are different parts of the banknote painted with different types of paints (some of them totally invisible in the NIR region). 2.3 Common misunderstandings using PCA in HSI and MSI PCA model has some features that have been seeing by several authors as either errors or drawbacks and, therefore, have discouraged researchers to use FIGURE 4 Principal component analysis (PCA) model of a multispectral image of a banknote of 10 euros. Top left, the true color (RGB) image. Bottom left, a composite image using PC1, PC2, and PC3 surfaces, and they were the RGB channels. Middle, the first four PCs with the corre- sponding explained variance. Right, the loadings corresponding to the first four PCs. PC, principal component. 98 SECTION j II Algorithms and methods it. For this reason we consider fundamental to cite here some of the common misunderstandings that might arise in the literature. - Sign ambiguity: One of the major features of PCA is the sign ambiguity in the calculations. The sing ambiguity is shown here as: t1p T 1 ¼ ð� t1Þ �� pT1 � (6) Obviously, the mathematical results of the left and right parts in the equation are the same. Nevertheless, it might happen that, apparently, different results are obtained when running PCA several times on the same sample (Fig. 5). This issue has been said to be a big problem in certain publications. Nevertheless, far for being an issue, this is a mathematical artifact that has no impact whatsoever in the final result. - Scores with negative concentration and loadings with negative spectral intensity: First of all, we must clarify that scores are not absolute values of concentrations and, analogously, loadings are absolute values of intensity. The fact of obtaining negative score values for one component in a score surface does not mean that the component is less important than another that has positive score values. The importance is relative to the origin. In such a way that scores of the same absolute value but with different sign have the same importance. The same fact can be applied to the loading profiles. This is something logical, since the scores and loadings are calculated on data which are almost always at least centered and anyway in FIGURE 5 Sign ambiguity shown in the example of Fig. 2. Left shows the result obtained in the analysis in Fig. 2. Right shows the same result, but multiplied times �1. PC, principal component. Unsupervised exploration Chapter j 2.4 99 order to maximize a measure of variance calculated onto an orthogonal basis. Moreover, the data are usually normalized prior to the analysis. - One PC is not one chemical compound: One of major mistakes applying PCA in HSI and MSI is to ascertain that each PC belongs exclusively to one chemical compound. This is not true, since PCA is a variance method. Therefore, the PCs are calculated to maximize the variance. Moreover, each PC is constrained to be orthogonal and it is wrongly assumed that each chemical compound is an independent source of variance. A simple example can be found in the PCA model of the banknote presented in Fig. 4. A density scatter plot of PC1 versus PC2 denotes four different groups of pixels, grouped this way due to their spectral similarity and, at the same time, dissimilarity (Fig. 6). Observing this distribution of pixels and retrieving the spatial information of their location, it can be observed that each one of those four groups represents a different area in the surface of the banknote, in such a way that using PC1 and PC2 scatter plots and their corresponding loadings we can study the chemical compounds in the banknote and the different combinationsbetween them. For these reasons, it is absolutely mandatory to display the score surface obtained together with the corresponding loading profile, to be able to elucidate, to some extent, the chemistry hidden in the samples. 2.4 Splitting the sample One aspect that is particularly interesting applying PCA to images is the fact that the area measured might contain multiple sources of variance with different influence on the data. It can happen that one area of the sample is FIGURE 6 PC1 versus PC2 density scatter plot of the multispectral image of the banknote of 10 euros and the selection of four different pixel regions in the scatter plot and their position in the sample. PC, principal component. 100 SECTION j II Algorithms and methods covered by a compound (or a combination of compounds) whose spectral influence is much higher pixel-wise than other minor compounds. In such cases, a common practice should be to study the different areas separately in order to fully understand the different sources of variability of the sample under study. As an example, Fig. 7 shows different PCA models performed in different areas of one single sample. The sample is a pharmaceutical tablet that contains an external cover, different layers of coatings, and a core composed of several chemical compounds. The HSI was taken in a wavelength range of 1000e2600 nm. Further information can be found elsewhere [11]. The PCA model performed in the whole surface can only barely differen- tiate between the core of the tablet and the exterior layers (Fig. 7A). Sequen- tially removing the different parts of the tablet, we can observe much clearer details of the different compounds in the different sections (Fig. 7B,C and D). 3. Clustering Another important aspect of the (unsupervised) exploratory analysis of multivariate data is clustering [12e14]. Clustering methods partition the available data into groups of objects (usually, but also variables can be clus- tered), sharing similar features. The idea behind clustering is that objects allocated in the same groups are as similar as possible to one another and as different as possible from the individuals in the other groups. These groups are also called clusters. In this paragraph, the most commonly adopted strategies for clustering of multivariate data will be presented, with special focus on their application in the context of image analysis. Clustering algorithms are usually divided into hierarchical and partitioning (nonhierarchical) strategies: the former, as the name suggests, operate by defining a hierarchy of clusters, which are ordered according to increasing similarity/dissimilarity, whereas the latter learn the group structure directly. 3.1 Hierarchical clustering Hierarchical clustering methods [15] are based on building a hierarchy of clusters, which is often represented graphically by means of a tree structure called a dendrogram. This hierarchy results from the fact that clusters of increasing similarity are merged to form larger ones (agglomerative algo- rithms) or that larger clusters are split into smaller ones of decreasing dis- similarities (divisive procedure). Such an approach has the advantage of allowing exploration of the data at different levels of similarity/dissimilarity and providing a deeper insight into the relationships among samples (or var- iables, if those are the items to be clustered). Another big advantage is that, since they are based on defining a similarity index among the clusters, any kind of data (real-valued, discrete, binary) can be analyzed by these techniques. Unsupervised exploration Chapter j 2.4 101 FIGURE 7 Principal component analysis (PCA) models performed to a hyperspectral image of a tablet (further information [11]). For every line the first four PCs and the corresponding loadings are shown. (A) PCA model of the whole surface. (B) PCA model of the coatings and the core of the tablet. (C) PCA model of only the core of the tablet. (D) PCA model of only the coatings of the tablet. PC, principal component. 1 0 2 S E C T IO N j II A lgo rith m s an d m eth o d s In detail, in agglomerative clustering, one starts with single object clusters (singletons) and proceeds by progressively merging the most similar clusters, until a stopping criterion (which could be a predefined number of groups k) is reached. In some cases, the procedure ends only when all the clusters are merged into a single one, which is when one aims at investigating the overall granularity of the data structure. On the other hand, divisive strategies start with a single cluster which is iteratively split into groups which are as dis- similar as possible from one another, until either a stopping criterion is met or any cluster is a singleton. In general, hierarchical procedures have the further advantage of not requiring the definition of the final number of clusters a priori, as instead occurs with partitioning strategies. In this paragraph, agglomerative procedures, which are the ones most frequently adopted, will be discussed in greater detail, but the same concepts can be easily generalized to the divisive ones. As briefly anticipated above, agglomerative hierarchical techniques start with a configuration in which each sample is a (singleton) cluster and proceed by recursively merging the most similar clusters until a termination criterion is met. Accordingly, two elements are of utmost importance when defining a hierarchical strategy: a measure of similarity/dissimilarity, which is usually related to a distance, and a way to generalize this measure to be applied to pairs of clusters (subsets of individuals), rather than to pairs of samples; the latter is usually called a linkage metrics. The definition of the linkage metrics is fundamental as it determines the connectivity between the clusters and, in general, the shape of the tree. Using as a prototype of dissimilarity measure the Euclidean distance between two samples dij: dij ¼kxi � xjk ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXNv l¼1 ðxil � xjlÞ2 vuut (7) where xi and xj are the vectors of measurements on the samples i and j, respectively, while xil and xjl are their lth components, Nv being the total number of measured variables, the main linkage distances usually adopted in hierarchical clustering will be briefly discussed and compared. In single linkage (nearest neighbor), the distance between two clusters is defined as the smallest distance between objects belonging to these clusters, i.e., the distance between the closest objects of the two individual groups: dðCm;CnÞ¼min i; j dij i˛Cm; j˛Cn (8) where d(Cm, Cn) is the distance between clusters m and n. This metrics favors cluster separation and does not take in any account the internal cohesion of the groups. On the other hand, if a small cluster is initially formed, it can lead to the progressive merging of one object at a time to this cluster in what is called a chain effect. Unsupervised exploration Chapter j 2.4 103 Complete linkage (farthest neighbor) metrics, which defines the inter- cluster distance as the maximum distance between the samples in the two groups, i.e., as the distance of the farthest objects, behaves oppositely to single linkage. As said, the distance between two clusters m and n is defined as: dðCm;CnÞ¼max i; j dij i˛Cm; j˛Cn (9) Complete linkage produces clusters which are usually similar in size across the whole agglomeration process, and, as a consequence, the resulting dendrogram appears more balanced. A linkage metrics which is intermediate between nearest and farthest neighbors is the average linkage, also called UPGMA (unweighted pair group method using arithmetic averages). In this approach, the distance between two clusters m and n is calculated as the arithmetic average of the distances be- tween all possible pair of objects belonging to the different groups: dðCm;CnÞ¼ PNm i¼1 PNn j¼1 dij NmNn i˛Cm; j˛Cn (10) whereNm and Nn being the number of objects in clusters m and n, respectively. Differently than in the cases described above, other linkage metrics, sometimes also called geometric approaches, assume that the clusters may be represented by their central points, e.g., the centroid or the median. In the centroid method, also called UPGMC (unweighted pair group method using centroids), the distance between two clusters m and n is defined as the distance between their respective means (centroids) xm and xn: dðCm;CnÞ ¼ kxm � xnk (11) where xm ¼ PNm�1 i¼1 PNm j¼iþ1 dij Nm i; j˛Cm (12) and xn ¼ PNn�1 i¼1 PNn j¼iþ1 dij Nn i; j˛Cn (13) Although rather straightforward and geometrically sound, in some cases, this metrics can lead to the paradox that the centroid of a new cluster resulting from the fusion of, let us say, clusters m and n, CmWn; is closer to a third cluster than either m or n were, leading to an incongruence to the tree hier- archical structure. This drawback can be partially overcome by resorting to the 104 SECTION j II Algorithms and methods median method, or WPGMC (weighted pair group method using centroids), in which the (pseudo)-centroid of the cluster formed by merging clusters m and n is defined as the average of the centroids of the two groups, so that, in defining the next step of agglomeration, the dominance of the largest group is downweighed: xmWn ¼ xm þ xn 2 (14) A last group of strategies focus on optimizing the homogeneity within clusters and define the agglomeration strategy as the one minimizing the decrease of homogeneity upon the merging; this is normally measured in terms of minimum increase of the within cluster variance, as in the most commonly used of these approaches, i.e., Ward’s method [16]. For algorithmic purposes, it is worth mentioning that all hierarchical methods described in this paragraph can be easily implemented through the so- called LanceeWilliams dissimilarity update formula [17]. Indeed, agglomer- ative clustering is based on merging pairs of clusters defined at previous stages of the recursive procedure. When doing so, to proceed to the next steps it is necessary to calculate the distance between the newly formed group and the remaining clusters. LanceeWilliams formula allows to calculate this distance straightforwardly, according to the following relation: dðCmWn;CsÞ¼amdðCm;CsÞ þ andðCn;CsÞ þ bdðCm;CnÞ þ gjdðCm;CsÞ � dðCn;CsÞj (15) where the coefficients am, an, b, and g define the agglomerative criterion. The values of these coefficients for the linkage metrics discussed above, together with the information on whether Euclidean distance or its squared value, are taken as dissimilarity measures, as reported in Table 1. Whatever the linkage metrics chosen, as already anticipated the main outcome of hierarchical methods is a tree structure representing the whole process of progressively merging the less dissimilar clusters. Such a tree is called a dendrogram and has the structure displayed in Fig. 8, for the case of 37 individuals. The bottom part of the plot represents the starting point of the analysis, where there are as many clusters as individuals (the abscissa does not have a numerical scale and just accommodates object indices arranged so as to pro- vide the best readability of the figure). As the agglomeration proceeds, pairs of clusters are merged to form new ones (the horizontal lines in the plot indicate these fusions), until at the end of the process, there is only one big group containing all the objects. The ordinate of the plot is the distance at which the various mergings occur: at each stage of the procedure, this distance represents also the minimum distance between pairs of clusters. Unsupervised exploration Chapter j 2.4 105 TABLE 1 Parameters of the LanceeWilliams update formula for the different agglomeration methods, together with the definition of the initial dissimilarity measure. Method am an b g Dissimilarity measure Single linkage 1 2 1 2 0 � 1 2 dij Complete linkage 1 2 1 2 0 1 2 dij Average linkage (UPGMA) Nm NmþNn Nn NmþNn 0 0 dij Centroid (UPGMC) Nm NmþNn Nn NmþNn �NmNn ðNmþNnÞ2 0 d2 ij Median (WPGMC) 1 2 1 2 � 1 4 0 d2 ij Ward NmþNs NmþNnþNs NnþNs NmþNnþNs �Ns NmþNnþNs 0 d2 ij FIGURE 8 Schematic representation of a dendrogram for a simulated data set involving 37 objects from three clusters with different within-group variance. Complete linkage was used as metrics and the resulting hierarchical tree shows the progressive agglomeration of the groups from individual sample clusters up to the last step when all objects are grouped into a single cluster. 106 SECTION j II Algorithms and methods 3.2 Partitional clustering methods Partition clustering algorithms [18] are the second group of clustering technique which will be presented in this chapter and are, by far, the ones which are most often used in the framework of image analysis, due to their characteristics, which make them more suitable to deal with this kind of data. Indeed, given a data set constituted of N individuals, partitioning techniques split the data into a predefined numberK of groups, so that each object should bemapped (assigned) only to a single cluster and each cluster should contain at least one sample (“crisp” or “hard” clustering conditions). These conditions can be mathemati- cally defined by introducing the membership function U, i.e., a matrix repre- sentation of the partition: the generic element ofU, which is indicated as uik, can only take values in [0,1], as it describes the membership degree of the ith object (characterized by the vector of measurements xi) to the kth cluster Ck. Accordingly, the matrix elements should satisfy the following rules:XN i¼1 uik > 0 ck (16) XK k¼1 uik ¼ 1 ci (17) (N and K being the total number of objects and clusters, respectively), so that a value of 0 means that the individual is not a member of the group, whereas the value 1 means that it completely belongs to the cluster. Under this premises, the conditions for partitional clustering can be summarized as follows: uik ¼ � 1 if xi ˛Ck 0 otherwise (18) These algorithms operate by iteratively relocating objects among groups until a stopping criterion, which is usually expressed in terms of a minimum of a loss function, is met. Unfortunately, the obtained partition may result to be only locally optimal, as global optimality could only be guaranteed by an exhaustive search which is computationally impractical. Iterative relocation methods, whose most famous and used member is K-means, identify pro- totypes as the most representative points of the clusters and define the loss function Jm as the overall discrepancy in the partition, i.e., as the sum of an appropriate dissimilarity/distance between each point and the prototype of the cluster it has been assigned to. In K-means [19e22], the prototypes are defined as the cluster centroids xCk and the squared Euclidean distance is used as dissimilarity measure in the loss function: Jm � U; xCk � ¼XN i¼1 XK k¼1 uik ��xi � xCk ��2 (19) Unsupervised exploration Chapter j 2.4 107 Accordingly, the main steps of the algorithm can be summarized as follows [21]: 1. Define an initial partition of the data, by providing a first estimate of the centroids of the K clusters (e.g., by random selection or by less na€ıve initialization procedures) 2. (Re)calculate the memberships of all the data points based on the current centroids 3. Update the coordinates of the centroids for some or all the clusters based on the new memberships of the individuals 4. Repeat steps 2 and 3 until convergence (no changes in Jm or U) When looking at the algorithm, it is relatively straightforward to observe that, if the number of objects to be clustered is large, K-means is significantly faster, from a computational standpoint, than hierarchical approaches, and this is one of the main reasons why it is often preferred when dealing with images. Moreover, it producesconvex-shaped clusters which are, in general, tighter than those resulting from hierarchical techniques. On the other hand, the main drawback is that the choice of the optimal value of K is not trivial, and it is normally done by trying different values and comparing the outcomes. However, comparing the quality of the obtained clusters may also be not very straightforward. 3.2.1 Fuzzy clustering Partitional clustering algorithms may also be “fuzzified” to allow the possi- bility of individuals belonging to two or more groups with a different degree of membership [23e25]. This is accomplished by removing the constraint rep- resented by Eq. (18), and allowing the elements of the matrix U describing the partition to take any possible value in the interval [0,1]. Accordingly, the matrix U is said to define K fuzzy sets, i.e., a partition in groups with a continuous degree of membership. Fuzzy clustering strategies, then, aim at finding not only the optimal coordinates of the cluster centroids, but also the best values of the membership functions. This is accomplished by minimizing a loss function which resembles the one of Eq. (19), the only difference being the introduction of the fuzzy exponent m (which is always >1): Jm � U; xCk � ¼XN i¼1 XK k¼1 umik ��xi � xCk ��2 A (20) In Eq. (20), the notation k/k2A generically indicates the norm induced by the metrics A on the multivariate space, which almost always is the Euclidean norm. The role of the meta-parameter m is to modulate the extent of “fuzzi- ness” in the partition: smaller membership values (“softer” clustering) are favored by increasing values of m. Anyway, it is reported in the literature that a value of 2 should work for most of the applications. 108 SECTION j II Algorithms and methods Also in the case of fuzzy K-means, to calculate the optimal partition an iterative relocation algorithm is used, whose relevant steps are summarized below [25]: 1. Decide the number of clusters K, fix the value of the fuzzy coefficient m and define the metrics of the multivariate space, in order to be able to calculate the norm k/k2A 2. Obtain a first estimate of the partition matrix U, e.g., by random selection or specific initialization approaches 3. Calculate the cluster centroids as xCk ¼ PN i¼1 umikxiPN i¼1 umik 4. Update the membership functions according to uik ¼ 0@PK s¼1 kxi�xCkkA kxi�xCskA ! 2 m�1 1A�1 5. Repeat steps 3 and 4 until convergence, e.g., until ��Uiter � Uiter�1 ��2 A < ε: 3.2.2 Spatio-spectral fuzzy clustering When applying clustering algorithms to a data set, partitioning is normally based only on the values of the measured variables; in the context of spectral imaging, this means that only the spectral signature of the pixels, here defined as xi, constitutes the basis for clustering. However, by doing so the spatial relationship among the pixels is not taken into account when defining the grouping and this could constitute a limitation in practical applications, since it is reasonable to think that objects which are close in space may possess similar clustering tendencies. To overcome this drawback, it is possible to modify the fuzzy K-means algorithm, so that also the spatial information contributes to the definition of the partition [26]. In detail, the spatio-spectral fuzzy K-means algorithms follow the steps of the conventional fuzzy K-means procedure up to the update of the membership function uik (step 4). After that, a spatial function is introduced to exploit the spatial information: hik ¼ X j˛NBðxiÞ ujk (21) where NB(xi) defines a square window of pixels centered in xi (and usually having dimensions 3 � 3 or 5 � 5). The function hik can be considered as a membership function estimated only on the basis of the spatial information and, analogously to uik, accounts for the probability of the i th pixel to belong to the kth cluster. Accordingly, the spatial function is then used to update the membership function through the relation: unewik ¼ upikh q ikPK k¼1u p ikh q ik (22) Unsupervised exploration Chapter j 2.4 109 where the exponents p and q govern the relative contribution of the spectral and spatial information, respectively. Here it should be stressed that in cases where the image is relatively homogeneous, the introduction of the spatial information just strengthens the spectral-based partitioning, so that the resulting clusters are the same that would be obtained by standard fuzzy C-means. On the other hand, when there are noisy pixels, the introduction of the spatial contribution may allow to downweigh the effect of noise, blur, and/ or spikes reducing the impact of misassignments and ameliorating the segmentation. 3.3 Common misunderstandings using clustering As with PCA, the main advantage (and risk, to some extent) of using clustering is the facility of applying any type of clustering and obtaining quantitative conclusions from methodologies that are merely explorative. Therefore, it is also important to highlight here some common misunderstandings that might occur applying clustering methods. - Clustering (and also PCA) are not classification methods: There is a trend to refer to clustering (and, by extension, PCA) as classification methods. In strict terms, the word “classification” involves supervision. That is, any method that is aimed at classifying needs a previous step of “training” and a further step of “validation.” In clustering and PCA, those stages do not exist per se. It is true that PCA can be used as a class- modeling methodology, but in order to do so, it should be followed by the definition of a suitable classification criterion and proper validation; anyway, even when these steps are undertaken, the resulting approach should be named SIMCA (soft independent modeling of class analogy) [27]. - It is complicated to set the proper number of clusters: Since clustering is not a classification method (does not rely on the a priori knowledge on the existence of a specified number of categories) and clustering techniques are unsupervised approaches based mostly on distances, there might be groups of pixels with similar distances between them that could be considered as independent clusters or only a cluster. For example, Fig. 9 shows different K-means analyses of the banknote using different number of clusters in the calculations. Based on the information obtained in the PCA model of the banknote (Fig. 4), one can argue that four clusters could be enough. Moreover, the centroids make perfect chemical sense, and they are similar to the loadings previously obtained. Nevertheless, a K-means model of the same sample but using five clusters denotes that there might exist a fifth cluster (pink cluster in the figure) that shows different features in the banknote (e.g., the star in the middle of the banknote). There have been several proposals to 110 SECTION j II Algorithms and methods find the correct amount of clusters to use (e.g., silhouette index [28] or, simply, PCA). Nevertheless, these parameters are merely informative and care must be taken when applied. - Clustering methods with pixels containing mixtures: A clear distinction must be done when applying clustering depending of the type of infor- mation that one pixel can contain. If the pixel contains only the spectral signature of one chemical compound (a plastic, for instance) then, using methods like K-means could be advisable (one pixel, one cluster identifi- cation). Nevertheless, if the pixel contains a mixture of chemical com- pounds, methods like K-means might lead to a misgrouping of pixels, since now one pixel might belong to some groups at the same time. Therefore, methods like fuzzy clustering might be advisable (one pixel, different probabilities of belonging to the different clusters). For instance, let us take the sample displayed in Fig. 10. This sample is a simple binary mixture of ibuprofen and starch. The sample was measured with an NIR-hyperspectral camera working in the wavelength rangeof 1000e2000 nm. Further technical information can be found elsewhere [28]. Applying K-means with two clusters, it is obvious that the answer is suboptimal, since there is ibuprofen and starch. But there is also a mixture area separating both compounds. Therefore, it can be considered that there might be three clusters (ibuprofen, starch, and the mixing area). The main problem is that the mixing area, itself, contains pixels with different degrees of mixtures, in such a way that a K-means model with 10 clusters might seem also valid. Nevertheless, looking at the centroids obtained for the analysis with 10 clusters it can be observed that they are all linear combinations of the pure spectra of ibuprofen and starch. In this case, since the pixels might contain FIGURE 9 K-means clustering of the banknote by using four, five, and six clusters. Top, cluster assignation by colors. Bottom, the corresponding centroids. Unsupervised exploration Chapter j 2.4 111 a mixture of different chemical compounds, a more appropriated fuzzy clustering methodology with two clusters can be performed, obtaining not a straight assignment of one pixeldone cluster, but calculating a probability of belonging of one pixel to each cluster (Fig. 10). 4. Final observations In the present chapter, some of the most used unsupervised exploration methods, namely projection techniques and clustering, have been revised and discussed with particular focus on their application in the context of HSI and MSI. Here it should be stressed that such approaches have been chosen since they are the most popular ones, and therefore the most frequently used, but they are not the only ones. Information was also provided on how to use the discussed approaches and, in particular, it was highlighted which their benefits and their potential drawbacks could be, trying to demystify some common misunderstandings about these techniques. FIGURE 10 Cluster analysis of a mixture composed by ibuprofen and starch. Top, K-means models with 2, 3, and 10 clusters with the corresponding centroids. Bottom, Fuzzy clustering model with two clusters and the corresponding centroids. Sample taken from J.M. Amigo, J. Cruz, M. Bautista, S. Maspoch, J. Coello, M. Blanco, Study of pharmaceutical samples by NIR chemical- image and multivariate analysis, TrAC Trends in Analytical Chemistry 27 (2008). doi:10.1016/ j.trac.2008.05.010. 112 SECTION j II Algorithms and methods Indeed, especially when the final focus of the data analysis is some sort of prediction, be it of qualitative (classification) or quantitative (calibration) nature, the role of exploratory data analysis is often underrated, and this is something that should be avoided. Indeed, exploratory analysis provides a wealth of information per se and gives a first insight into the data which can be very useful even when exploration itself is not the final goal. Unsupervised methods are a source of knowledge that can give extremely valuable infor- mation as a first approach to the analysis of any HSI and MSI samples, since they are hypothesis-free and allow the data to “talk for themselves.” At the same time, one should always be conscious of the unsupervised nature of the models which means, for instance, the lack of any predictive validation, and treat the information extracted from the data as it is, without being tempted by overinterpretation of the results obtained. Only by doing so, exploratory data analysis can represent a powerful tool to inspect HSI and MSI data. References [1] J.J. Andrew, M.A. Browne, I.E. Clark, T.M. Hancewicz, A.J. Millichope, Raman imaging of emulsion systems, Applied Spectroscopy 52 (1998) 790e796, https://doi.org/10.1366/ 0003702981944472. [2] J.J. Andrew, T.M. Hancewicz, Rapid analysis of Raman image data using two-way multi- variate curve resolution, Applied Spectroscopy 52 (1998) 797e807, https://doi.org/10.1366/ 0003702981944526. [3] A. De Juan, M. Maeder, T. Hancewicz, R. Tauler, Use of local rank-based spatial infor- mation for resolution of spectroscopic images, Journal of Chemometrics 22 (2008) 291e298, https://doi.org/10.1002/cem.1099. [4] A. de Juan, M. Maeder, T. Hancewicz, R. Tauler, Local rank analysis for exploratory spectroscopic image analysis. Fixed size image window-evolving factor analysis, Chemo- metrics and Intelligent Laboratory Systems 77 (2005) 64e74, isi:000229851300008. [5] J.W. Tukey, Exploratory Data Analysis, Addison Wesley, Reading, MA, 1977. [6] N. Mobaraki, J.M. Amigo, HYPER-Tools. A graphical user-friendly interface for hyper- spectral image analysis, Chemometrics and Intelligent Laboratory Systems 172 (2018), https://doi.org/10.1016/j.chemolab.2017.11.003. [7] A.K. Smilde, R. Bro, Principal component analysis (tutorial review), Anal. Methods. 6 (2014) 2812e2831, https://doi.org/10.1039/c3ay41907j. [8] J.M. Amigo, H. Babamoradi, S. Elcoroaristizabal, Hyperspectral image analysis. A tutorial, Analytica Chimica Acta 896 (2015) 34e51, https://doi.org/10.1016/j.aca.2015.09.030. [9] M. Vidal, J.M. Amigo, Pre-processing of hyperspectral images. Essential steps before image analysis, Chemometrics and Intelligent Laboratory Systems 117 (2012) 138e148, https:// doi.org/10.1016/j.chemolab.2012.05.009. [10] J.M. Amigo, Practical issues of hyperspectral imaging analysis of solid dosage forms, Analytical and Bioanalytical Chemistry 398 (2010) 93e109. https://doi.org/10.1007/ s00216-010-3828-z. [11] C. Cairós, J.M. Amigo, R. Watt, J. Coello, S. Maspoch, Implementation of enhanced cor- relation maps in near infrared chemical images: application in pharmaceutical research, Talanta 79 (2009) 657e664. https://doi.org/10.1016/j.talanta.2009.04.042. Unsupervised exploration Chapter j 2.4 113 https://doi.org/10.1366/0003702981944472 https://doi.org/10.1366/0003702981944472 https://doi.org/10.1366/0003702981944526 https://doi.org/10.1366/0003702981944526 https://doi.org/10.1002/cem.1099 https://doi.org/10.1016/j.chemolab.2017.11.003 https://doi.org/10.1039/c3ay41907j https://doi.org/10.1016/j.aca.2015.09.030 https://doi.org/10.1016/j.chemolab.2012.05.009 https://doi.org/10.1016/j.chemolab.2012.05.009 https://doi.org/10.1007/s00216-010-3828-z https://doi.org/10.1007/s00216-010-3828-z https://doi.org/10.1016/j.talanta.2009.04.042 [12] D.L. Massart, L. Kaufmann, The Interpretation of Analytical Chemical Data by the Use of Cluster Analysis, Wiley, New York, NY, 1983. [13] L. Kaufmann, P.J. Rousseeuw, Finding Groups in Data, Wiley, New York, NY, 1990. [14] B.S. Everitt, S. Landau, M. Leese, D. Stahl, Cluster Analysis, fifth ed., Wiley, New York, NY, 2011. [15] F. Murtagh, P. Contreras, Algorithms for hierarchical clustering: an overview, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 7 (2017) e1219. https:// doi.org/10.1002/widm.1219. [16] J.H. Ward Jr., Hierarchical grouping to optimize an objective function, Journal of the American Statistical Association 58 (1963) 236e244. [17] G.N. Lance, W.T. Williams, A general theory of classificatory sorting strategies. 1. Hier- archical systems, Computer Journal 9 (1967) 373e380. [18] X. Jin, J. Han, Partitional clustering, in: C. Sammut, G.I. Webb (Eds.), Encyclopedia of Machine Learning, Springer, Boston, MA, 2011. https://doi.org/10.1007/978-0-387-30164- 8_631. [19] H. Steinhaus, Sur la division des corps matériels en parties, Bulletin De L’academie Polonaise des Sciences 4 (1957) 801e804. [20] Least square quantization in PCM, Bell Telephone Laboratories Paper S.P. Lloyd, Least squares quantization in PCM, Published later as: S.P. Lloyd, IEEE Transactions on Infor- mation Theory 28 (2) (1957) 129e137. [21] E.W. Forgy, Cluster analysis of multivariate data: efficiency versus interpretability of classifications, Biometrics 21 (1965) 768e769. [22] J.B. MacQueen, Some methods for classification and analysis of multivariate observations, in: Proc. 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, University of CaliforniaPress, Berkeley, CA, 1967, pp. 281e297. [23] J.C. Bedzek, Pattern Recognition with Fuzzy Objective Functions, Plenum Press, New York, NY, 1981. [24] J.C. Bedzek, R. Ehrlich, M. Trivedi, W.E. Full, Fuzzy clustering: a new tool for geo- statistical analysis, Int. Syst, Means, Decis. 2 (1981) 13. [25] J.C. Bedzek, R. Ehrlich, W.E. Full, FCM: the fuzzy c-means clustering algorithm, Com- puters & Geosciences 10 (1984) 191e203. [26] K.-S. Chuang, H.-L. Tzeng, S. Chen, J. Wu, T.-J. Chen, Fuzzy c-means clustering with spatial information for image segmentation, Computerized Medical Imaging and Graphics 30 (2006) 9e15. https://doi.org/10.1016/j.compmedimag.2005.10.001. [27] S. Wold, M. Sjöström, SIMCA: a method for analyzing chemical data in terms of similarity and analogy. In: Kowalski, B.R. (Ed.) Chemometrics, Theory and Application, American Chemical Society Symposium Series No. vol. 52, American Chemical Society: Washington, pp 243-282. doi:10.1021/bk-1977-0052.ch012. [28] J.M. Amigo, J. Cruz, M. Bautista, S. Maspoch, J. Coello, M. Blanco, Study of pharma- ceutical samples by NIR chemical-image and multivariate analysis, TrAC Trends in Analytical Chemistry 27 (2008). https://doi.org/10.1016/j.trac.2008.05.010. 114 SECTION j II Algorithms and methods https://doi.org/10.1002/widm.1219 https://doi.org/10.1002/widm.1219 https://doi.org/10.1007/978-0-387-30164-8_631 https://doi.org/10.1007/978-0-387-30164-8_631 https://doi.org/10.1016/j.compmedimag.2005.10.001 https://doi.org/10.1016/j.trac.2008.05.010 Chapter 2.5 Multivariate curve resolution for hyperspectral image analysis Anna de Juana,* aChemometrics group, Department of Chemical Engineering and Analytical Chemistry, Universitat de Barcelona (UB), Barcelona, Spain *Corresponding author. e-mail: anna.dejuan@ub.edu 1. Hyperspectral images: Structures and related models An image provides spatial information on a sample surface. We can go from simpler images, represented by a gray intensity value associated with each pixel to others represented by the basic form of color, displayed in R(ed), G(reen), and B(lue) coordinates. These simple and pioneering images were focused on the plain representation of shape and color and were more linked to a visual description of objects. When chemistry enters into play, more sample characteristics are relevant. The beauty of images lies on the fact that the classical questions related to the composition of a sample, summarized by which components and how much of each of them, go together with spatial-related questions, such as where and how components distribute in the sample. The measurement that joins together chemical and spatial information about the samples is the hyperspectral image. Thus, sample surfaces are compartmented in small areas (pixels) and a spectrum is recorded on each of them. In this way, images link the chemical information (spectrum) to the spatial structure of the samples (represented by the pixel areas) [1e4]. The typical way to represent a hyperspectral image (HSI) is as a cube with two spatial dimensions (represented by the pixel coordinates of a sample) and a third spectral dimension (see Fig. 1). To understand the behavior of this measurement, we should not forget that an HSI is formed by thousands of spectra that, as such, obey the bilinear BeereLambert law. Thus, any pixel spectrum can be described by the sum of the spectral signatures of the image Hyperspectral Imaging. https://doi.org/10.1016/B978-0-444-63977-6.00007-9 Copyright © 2020 Elsevier B.V. All rights reserved. 115 https://doi.org/10.1016/B978-0-444-63977-6.00007-9 constituents that form the sample, weighted by the concentration of each of these constituents in that particular pixel. To formulate the BeereLambert law adapted to an HSI, the original image cube needs to be unfolded into a data table (or data matrix D) by putting the image spectra one on top of each other. This image data table, D (sized nr. pixels � nr. of spectral channels), is described by the bilinear model CST, where ST (sized nr. image constituents � nr. of spectral channels) is the matrix that contains the spectral signatures of the image constituents, i.e., the qualitative composition, and C (sized nr. pixels � nr. of image constituents) is the matrix that contains the related concentration profiles for each of the spectral signatures, i.e., the abundance of a particular image constituent in each of the pixels of the image, the quanti- tative composition. In an image defined by a bilinear model, every image constituent is defined by a dyad of vectors, formed by the spectral signature of the constituent and the related concentration profile. The unfolded represen- tation of an image misses the spatial description of the sample surface. To recover this information, it is sufficient to refold the concentration profile of every constituent into a 2D map recovering the original sample geometry. In this way, both quantitative information and spatial distribution are displayed in a single visual representation (see Fig. 1). Despite the fact that the image is displayed as a data cube, it is relevant to understand that the two spatial dimensions are only related to pixel position and the image cube should never be attempted to be mathematically described by products of three factor terms linked to the three spatial dimensions of the cube. FIGURE 1 Image cube and bilinear model. 116 SECTION j II Algorithms and methods Although most hyperspectral images have three dimensions, 4D or 5D images can also be encountered. Such a situation can happen because the spatial or the spectral information increases in complexity. Speaking about spatial complexity, it often happens that images are devoted to describe sample volumes. In this case, the term voxel is used instead of pixel and the distri- bution of the components in the sample is studied in the three spatial di- mensions, i.e., covering different depth slices of the sample. This provides a 4-D image, with three spatial dimensions (linked to x, y, and z pixel co- ordinates) and one spectral dimension. As in the case of the 3D image in Fig. 1, the three spatial dimensions indicate only position and these images follow the BeereLambert law, represented by the bilinear model D ¼ CST. To accommodate to the bilinear model, the pixels of the different slices in these 4D images will be unfolded into a single data table and will be described as in Fig. 1. The only different step will be the refolding of the concentration profiles into maps, since every concentration profile will be block-wise refolded into the related maps of each of the original cube slices (see Fig. 2A) [3e5]. A different case happens when the increase in complexity comes from the spectral measurement. In Fig. 1, it is implicitly assumed that the spectral in- formation is enclosed in a vector of numbers and this happens most of the times when Raman, IR, MS, or UV spectra are used. However, in some in- stances, such as for excitation-emission fluorescence measurements (EEM), a 2D spectral measurement can be obtained per pixel. As a consequence, when a sample surface is imaged by EEM measurements, a 4D image is obtained, with two spatial dimensions and two spectral dimensions (excitation/emission). In this particular case, the mathematical model that describes the measurement is different. The original 4D hypercube needs to be unfolded into a data cube, where the 2D EEM landscapes of the different pixels will be stacked one on top of the other. The image cube will be sized (nr. pixels � nr. of excitation channels � nr. of emission channels) and will obey a trilinear model, i.e., every image constituent will be represented by a triad of profiles, the excitation spectrum, the emission spectrum, and the related concentration profile (see Fig. 2B) [6,7]. As in Fig. 1, the concentration profile will be refolded into the suitable 2D map to recover the spatial structure of the sample surface. In the framework of EEM fluorescenceimages, when different depths of the sample are imaged, a 5D image would be obtained, where three dimensions would associate with the three pixel coordinates and two to the EEM fluorescence landscape. Again, since pixel coordinates only indicate position, an image cube would be ob- tained and a trilinear model would still hold to describe these measurements. As in Fig. 2A, the concentration profile would need to be block-wise refolded into the maps of the different 3D sample slices. This introductory section was devoted to explain the natural measurement models of hyperspectral images and, intentionally, we did not mention any Multivariate curve resolution Chapter j 2.5 117 data analysis tool. The ideal algorithms to treat HSI should adapt to these models and, at the same time, preserve the natural characteristics of the spectral signatures and concentration profiles of the image constituents. Multivariate curve resolution (MCR) is one of these methods and will be explained in detail in next sections. FIGURE 2 (A) Bilinear model of a 4D image formed by three spatial dimensions (x, y, and z) and one spectral dimension. (B) Trilinear model of a 4D image formed by two spatial dimensions (x and y) and two spectral (excitation/emission) dimensions. 118 SECTION j II Algorithms and methods 2. Multivariate curve resolution: The direct link with the measurement model When analyzing hyperspectral images, the sole starting information is the image itself. Although it is known that the measurement behaves according to the BeereLambert law, the raw image spectra do not provide a straight answer about the number of image constituents, the spectral identity of each of them, or their related concentration map and, nevertheless, this tends to be the sought information by the scientist. Knowing that the BeereLambert law is formally a bilinear model, it is not surprising that the first and most common multivariate data analysis tool used to interpret data matrices coming from HSI information was principal component analysis (PCA) [1,8e11]. Indeed, PCA provides a bilinear model, expressed as: D[ TPTDE (1) where T are scores, PT are loadings, and E is the variance unexplained by the model. PCA aims at reproducing the original data with an optimal fit using a small number of components that do not repeat information among them. Components are orthogonal to each other and are sorted in decreasing order of variance explained. The description of an image using a bilinear model with a small number of components matches some of the requirements of the mea- surement, i.e., bilinearity and simplicity, since a few spectral signatures properly combined can reproduce any pixel spectrum in the image. However, the reason why PCA is not the ultimate solution to recover the natural Beer- eLambert law of HSI is the condition of orthogonality imposed to the com- ponents (scores and loadings) retrieved. Indeed, when thinking of the spectral signatures of image constituents provided by IR, Raman, fluorescence imaging systems, it is extremely unlikely that the correlation coefficient among all of them be zero. These signatures always show to a major or minor extent a similarity between them, and PCA cannot adapt to this characteristic. Like- wise, other abstract bilinear decomposition tools, such as independent component analysis (ICA) [12e16], generally fail at retrieving the HSI natural BeereLambert law because the bilinear components provided by the method are statistically independent, another characteristic rarely obeyed by spectral signatures of image constituents. For this reason, PCA and ICA are known to provide useful information about relevant spatial and spectral features in im- ages, but the components that these methods provide should never be asso- ciated in a straightforward manner with spectral signatures and concentration profiles of image constituents. MCR is one of the tools that best adapts to the HSI measurement [3,4,17e20]. In common with PCA and ICA, it describes the data matrix of Multivariate curve resolution Chapter j 2.5 119 image spectra, D, with a bilinear model providing an optimal fit, expressed as the BeereLambert law, D[CSTDE (2) where C is the matrix of concentration profiles, ST is the matrix of spectral signatures of the image constituents, and E is the variance unexplained by the model. However, MCR replaces strong mathematical requirements, such as orthogonality or statistical independence, by other constraints more adapted to the real chemical characteristics of the concentration profiles and spectral signatures of image constituents, such as nonnegativity. The selection of constraints that respect the real properties of the HSI measurement explains why the use of MCR has increased in popularity to deal with this kind of data. For a better understanding of the explanation above, the bilinear decomposi- tion of the same Raman emulsion image by PCA and MCR is represented in Fig. 3. Whereas PCA scores and loadings present negative values (unaccept- able in concentration maps and Raman signatures), MCR provides maps and spectral signatures that respect this condition. It is relevant to say that, since both methods aim at explaining optimally the original HSI data with the smallest possible number of components, PCA is often used before MCR to have an estimate of how many image constituents are present in an image. It is assumed that the number of components needed in a PCA model will be the same as the number of MCR components required. FIGURE 3 Principal component analysis (PCA) model (top plot) and multivariate curve reso- lution (MCR) model (bottom plot) from a Raman emulsion image. 120 SECTION j II Algorithms and methods Thus, MCR designs the family of methods that provide bilinear de- compositions that describe data with an optimal fit with chemically mean- ingful components. Within this description, many algorithms can be considered. Within the field of hyperspectral image analysis, the most common ones come from the remote sensing community, e.g., vertex component analysis [21], simplex-based decomposition algorithms [22,23], or from the chemometric field, such as nonnegative matrix factorization [24e26] or multivariate curve resolution-alternating least squares (MCR-ALS) [3,4,27e32]. Almost all these methods, in a compulsory or flexible manner, use the nonnegativity constraint to derive the spectral signatures and con- centration profiles of image constituents. Remote sensing algorithms use sometimes additional mathematical conditions, e.g., the assumption that the vertices of the simplex enclosing all image spectra correspond to the pure spectral signatures of components (endmembers), and some methods require necessarily the presence of pure pixel spectra for a correct performance [21e23]. Other chapters of this book describe more extensively these algo- rithms, known in the remote sensing community as unmixing methods. To avoid duplications, this chapter will be devoted to the detailed description of the MCR-ALS method. The choice of this algorithm responds to the capability to adapt to single image analysis or image fusion scenarios and to the intensive work oriented to design dedicated constraints to respond to the natural spectral and spatial characteristics of images [3e5]. 3. Multivariate curve resolution-alternating least squares MCR-ALS is an iterative curve resolution based on the alternating optimiza- tion of the C and ST matrices under the action of constraints [4,6,19,20]. As the name suggests, it is a least squares approach that involves in each iterative cycle the two following operations oriented to least squares estimate matrices C and ST: C ¼ DSðSTSÞ�1 (3) ST [ ðCTCÞ�1 CTD (4) In each one of these steps, matrices C and ST are modified in a suitable way introducing all the available information about the shape and behavior of both spectral signatures and concentration profiles of the image of interest. This information can vary depending on thespatial structure of the sample and the imaging platform used, as will be described later on in this section. Likewise, when nothing else is specified, the least squares steps are carried out according to Eqs. (3) and (4) above and assuming random homoscedastic noise in the raw image measurement. When other kinds of noise are present, e.g., hetero- scedastic noise proportional to the signal, and the noise structure is well known, weighted alternating least squares algorithms can be used that Multivariate curve resolution Chapter j 2.5 121 incorporate the noise-related information in the estimation of C and ST [33e35]. However, the routine practice shows that no significant differences are obtained between MCR-ALS and multivariate curve resolution-weighted least squares (MCR-WALS) unless the noise level is high, and superior re- sults are only obtained for MCR-WALS when the noise information is spec- ified in an accurate way. The main steps followed in an MCR-ALS analysis are listed below and will be explained in detail afterward: 1. Determination of the number of components in the raw image (D). 2. Generation of initial estimates of C or ST matrix. 3. Alternating least-squares optimization of C and ST under constraints until convergence is achieved. Before analyzing the raw image, a suitable preprocessing of the image raw spectra is required to avoid that undesirable artifacts of the measurement could compromise the bilinearity of the data and could hinder retrieving correct results. Preprocessing in imaging has been addressed in many works, and it is not the main scope of this chapter. It involves essentially the same treatments that could be performed on classical spectroscopic data obtained with similar platforms [3]. Thus, as examples, Raman spectra will be corrected for the presence of cosmic peaks and the often high fluorescence background will be suppressed by the asymmetric least squares method [36,37]; near-infrared spectra instead would be subject to a multiplicative scatter correction to eliminate background or could be used in derivative form to suppress back- ground and enhance spectral differences among image constituents [37,38]. Other preprocessing treatments linked to image measurements imply the detection of abnormal pixels because of spectral saturation or null intensity (dead pixels). In these instances, the abnormal pixels are simply suppressed from the analysis or are replaced by estimates based on the use of normal neighboring spectra. Sometimes preprocessing can also be oriented to decrease the dimensionality of images and binning treatments or other compression methods can be used. Once the image spectra are suitably preprocessed, they can be submitted to MCR-ALS analysis. The first step in MCR-ALS involves estimating the number of components needed in the MCR model. Such information can be known beforehand but, otherwise, can be inferred by using auxiliary algo- rithms, such as PCA. It is important to know that the number of components estimated in this way is not definitive and, often, several MCR models with different number of components are tested. The definitive MCR results are chosen on the basis of the model fit and the interpretability of the spectral signatures and distribution maps. As in many other data analysis tools, parsimony is preferred and the smallest model that can describe well the data is the best. At this point, it is important to discuss the wide concept of component in HSI. It is actually any entity that can provide a distinct spectral 122 SECTION j II Algorithms and methods signature. This includes the classical association of component with chemical compound, but also other possibilities, such as polymorphic forms of the same substance or biological tissues or cell compartments. In the biological context, a component is defined by a spectral signature, but it is known to be formed by a homogeneous mixture of many chemical compounds or biomolecules. The same wide concept of component applies to environmental images, when vegetation, asphalt, water, etc., i.e., landscape compartments, are designed as components. Since MCR-ALS is based on an alternating least squares optimization, Eqs. (3) and (4) need to use the original image matrix, D, and either C or ST to retrieve the counterpart matrix of the bilinear model. Before starting the HSI analysis, only D is available and an initial estimate of C or ST is required. There are many possibilities to generate initial estimates, but some guidelines can be provided. A golden rule is starting with initial estimates that obey the natural properties of the concentration profiles or spectral signatures sought. In this way, the optimization will happen in a faster way because the starting point is closer to the final optimum and there is less chance to have divergence problems. Such a rule discards, for instance, starting with profiles formed by sets of random numbers, or advises against the use of profiles that may have negative parts, e.g., PCA scores or loadings, if concentration profiles or pos- itive spectral signatures need to be retrieved. In the history of MCR, many auxiliary methods were designed to help in the task of initial estimate generation. Some of them, based on local rank analysis, were oriented to process analysis and relied on the continuous and sequential character that concentration profiles have in this context, e.g., chromatographic peaks elute one after the other or, in reaction systems, re- agents turn into products [39,40]. This does not happen in HSI, where con- centration profiles are linked to the unfolded pixel direction. In this case, the concentration profiles do not present the necessary sequential pattern and other methods are required to generate initial estimates. The best option in this context is offered by the methods of purest variables selection [41]. Algo- rithms of this kind, such as SIMPLISMA [42], orthogonal projection approach (OPA) [43], or key set factor analysis (KSFA) [44], among others, find the most dissimilar rows or columns in a data set D. The big advantage of these algorithms is that no need of ordered spectral or concentration patterns is required and, as such, they adapt to any kind of scenario, from process analysis to images or to environmental data. Besides, they select rows or columns of the D matrix and, in doing so, provide an estimate that is clearly related to the final profiles sought. Purest variable selection methods on HSI analysis can select the purest pixels (rows) of the image matrix D and provide an initial estimate of spectral signatures for matrix ST or select the purest spectral channels of D (columns) and provide an initial estimate of the matrix of concentration profiles C. Although either option is potentially acceptable, the tendency in HSI is using Multivariate curve resolution Chapter j 2.5 123 spectral estimates because they tend to provide more unmixed information, i.e., there are usually more chances to find pixels where only one component or a simple mixture of few components is present than spectral channels with specific information for a particular component [3]. Although purest variable selection methods offer clear advantages to generate initial estimates, it is worth advising against the use of those as MCR methods. The assumption that the purest spectra selected are the pure spectral signatures sought and, as such, a single least squares step using the selected spectra and the original image as in Eq. (4) will provide the concentration profiles of the pure image constituents is often erroneous. This will only be valid if there are pure pixels for each of the constituents of the image, and this is often unlikely and, in most cases, unproven by additional data analysis exploration. Once initial estimates are available, the alternating least squares optimi- zation of C and ST can start. This is the core step of the MCR-ALS algorithm, and constraints play an essential role. Constraints have a doublefunction: on the one hand, they encode the systematic information about the general properties of spectral signatures and concentration profiles to provide chemi- cally meaningful solutions and, on the other hand, they drive the optimization process and limit the span of feasible solutions for the final profiles of the MCR bilinear model, i.e., they decrease the ambiguity associated with MCR solutions [27]. Constraints modify the shape of any calculated profile so that it fulfills a preset condition. Within the MCR framework, the application of constraints is always optional, and it can be done in a flexible way. Thus, the selected constraints can be different for profiles in C and ST and for the different components of the system [3,29]. The most typical constraint applied in MCR methods is nonnegativity. It is always applied to concentration profiles because concentrations can only be positive or null and very often to spectral profiles, since many signal intensities are naturally positive, e.g., Raman or fluorescence intensities, absorption [29,45]. In HSI, the use of constraints may seem more limited than in other areas, such as process analysis. Indeed, none of the constraints usually applied to evolving process profiles, based on sequential or monotonic characteristics, is useful [29]. This limitation and the fact that HSI were seen for a long time mainly as large spectroscopic data sets restricted the use of constraints to nonnegativity. Nowadays, the spatial dimension of images is taken in consideration, and this has led to the emergence of adapted and specific HSI constraints. The spatial dimension of images was first used to encode adapted selec- tivity and local rank constraints [27,29]. This kind of constraints set the absence of one or more components per pixel and are among the most important to reduce the ambiguity in MCR solutions. The way to set local rank 124 SECTION j II Algorithms and methods constraints in HSI starts by an exploratory local rank analysis using the fixed size image window-evolving factor analysis (FSIW-EFA) algorithm [46]. FSIW-EFAworks performing PCA analyses on small 2D or 3D pixel windows that contain a pixel spectrum and all its spatial neighbors. Such an operation is done scanning all possible pixel windows across the whole image surface (see Fig. 4). This exhaustive analysis provides a local rank map that displays the number of overlapping components present in every pixel. To know the identity of the absent components in every pixel, reference spectral informa- tion of the image constituents and local rank information must be combined. The reference spectral information can come from pure variable selection methods or from a preliminary MCR resolution of the image using only nonnegativity. For every pixel, the correlation coefficients between the pixel spectrum and the reference spectra of the image constituents are calculated. Knowing the number of absent constituents in every pixel from the local rank map, the components with lowest correlation coefficient with the pixel spec- trum are set to be absent. As for any other constraint, the application is flexible and only the pixels with a clear estimate of the rank (number of overlapping components) and a clear identification of the absent components are con- strained [47]. FIGURE 4 (A) Fixed size image window-evolving factor analysis application to a hyperspectral image. Principal component analysis (PCA) analyses and local rank map (B) combination of local rank and reference spectral information to obtain masks of absent components in pixels (in red). These absences are used as local rank constraints in multivariate curve resolution analysis. Multivariate curve resolution Chapter j 2.5 125 In the local rank constraints, the spatial concept of pixel neighborhood is used. There are other constraints that take into account characteristics related to the spatial distribution of components in the concentration map. To impose these constraints, the stretched pixel concentration profiles are refolded into the related maps, onto which the suitable spatial conditions are applied [48]. The possible characteristics exploited to set spatial constraints are the smoothness in maps, the preservation of edges to define objects more accu- rately, or the sparseness for maps known to belong to minor and scattered compounds [49e51]. Obviously, the application of these constraints has a close relationship with the spatial nature of the image analyzed. For instance, environmental images can benefit from edge-preserving constraints because landscape compartments are, by nature, well delimited. Instead, pharmaceu- tical powder mixtures, much less spatially structured, may accommodate sparseness constraints for a better spatial definition of minor compounds in formulations. So far, the constraints described relate to properties of the spectral signa- tures or concentration profiles (maps). There are other constraints related to the model of the measurement. Indeed, Section 1 described that most hyper- spectral images follow a bilinear model, as defined by MCR, but some of them obey trilinear models, such as EEM fluorescence images. There are algorithms that provide naturally trilinear models, such as PARAFAC [6], or trilinear decomposition [52] and can be used to deal with the latter kind of images. However, MCR can still perform in an appropriate manner when trilinearity is used as a constraint [53,54]. To do that, the cube in Fig. 2B will be unfolded into a data matrix, sized nr. pixels x (nr. excitation channels x nr. emission channels) (see Fig. 5). Every row in the matrix contains the emission spectra corresponding to the different excitation wavelengths scanned. The trilinearity constraint will be applied in the spectral dimension (ST matrix) and will force a common shape to all emission spectra related to the different excitation channels for a particular component. To do so, each component is constrained separately (see Fig. 5). Within an iteration, all emission spectra calculated for a particular component are arranged as columns in a single matrix, and a PCA analysis is performed. The first component includes all necessary information. Thus, the score shows the common shape for all emission spectra and the loading the scale information of every emission spectrum, related to each excitation wavelength. The trilinear emission spectra are reconstructed using the product of the first score by the loading related to the suitable excitation wavelength. After MCR analysis, the concentration profiles are refolded into maps and a single emission spectral shape is obtained per every component. The excitation spectrum per every component is derived by integrating the area of the emission spectra at the different excitation wavelengths. Although this strategy does not offer any advantage over the use of trilinear-born methods, such as PARAFAC, it is particularly suitable when image fusion with other measurements is of interest, as will be seen in the next section [5]. 126 SECTION j II Algorithms and methods Once the MCR optimization of concentration profiles and spectral signa- tures is finished, figures of merit related to the model fit are used, such as the lack of fit (% LOF) or variance explained (in Eqs. (5) and (6), respectively) can be calculated. %LOF ¼ 100 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiP i; je 2 ijP i; jd 2 ij s (5) %var:expl: ¼ 100 1� P i; je 2 ijP i; jd 2 ij ! (6) where eij is the residual associated with the MCR reproduction of the related element in the original data set, dij. FIGURE 5 (A) Four-dimensional excitation-emission fluorescence measurement image struc- tured as a data matrix. (B) Implementation of the trilinearity constraint in the ST matrix of emission spectra signatures. PCA, Principal component analysis. Multivariate curve resolution Chapter j 2.5 127 The optimization is usually finished when the difference of model fit be- tween consecutive iterationsdoes not improve significantly (e.g., a difference of less than 0.1% among the lack of fit between consecutive iterations). Other possibilities are using a maximum number of iterations or criteria based on the comparison of shapes of the optimized profiles among iterations. To complete the MCR-ALS analysis, it is advisable to estimate the am- biguity associated with the resolved profiles. Although image analysis does not tend to provide very ambiguous solutions because the large amount and di- versity of pixel composition helps to obtain accurate concentration profiles and spectra, there are many methods with available software that inform on the presence or absence of ambiguity and on the extent of ambiguity when existing [55e57]. Some of these methods, such as the so-called MCR bands, do not show limitations linked to the number of components of the system [57]. 4. Image fusion (or multiset analysis) Image fusion defines the scenario of working with several images altogether, coming from a single or from different platforms [5]. There are many sci- entific problems that require acquiring and relating several images, e.g., in process monitoring, when images come from different depth layers of the same sample, when the information of different spectroscopic platforms is complementary, but too often this task is done by analyzing images one by one and relating the final results. Possible reasons for the lack of real image fusion examples are the scarce available algorithms for this purpose and the presence of complex problems, such as the spatial congruence and the dif- ferences in spatial resolution when images from different platforms need to be fused. Image fusion in the framework of MCR is called multiset analysis [20,54]. Indeed, there are many fields in which MCR-ALS works with multisets, also called augmented data matrices, formed by several data blocks that come from different experiments, different samples, or that can be monitored with different instrumental measurements. The only requirement to append data blocks to form a multiset is that they share, at least, one mode in common and some components. A multiset formed by data matrices that behave according to a bilinear model will also follow a bilinear model. This means that the same bilinear decomposition methods applied to a single data set are also valid to interpret the information contained in multisets [20,54]. Multisets in hyperspectral image analysis can be formed by several images collected with the same platform, images on the same sample collected by different platforms, or both possibilities at the same time (see Fig. 6) [3e5]. When a multiset includes several images from the same platform, the spectral mode needs to be common, i.e., the images should have been scanned covering the same spectral range, and the multiset is built appending the blocks of pixel 128 SECTION j II Algorithms and methods spectra one on top of each other. This gives rise to a column-wise augmented matrix and to the related bilinear model, as expressed in Eq. (7): ½D1; D2;.; Dn�[ ½C1; C2;.; Cn�STD½E1; E2;.; En� (7) where Di is the data matrix that contains the pixel spectra of the ith image in the multiset and Ci is the related set of concentration profiles. In this case, the bilinear model related to the multiset [D1;D2;.;Dn] is formed by an augmented concentration matrix, [C1;C2;.;Cn] and a single ST matrix, with spectral signatures valid for all images in the multiset. The maps for each image constituent in every image would be recovered refolding conveniently the related Ci profiles. It is relevant to mention that the multisets structured as column-wise augmented matrices can have a completely different pixel mode in every image because only the spectral mode needs to be common. This offers a wealth of possibilities since images with different sizes, geometries, and spatial resolution can be analyzed together as long as the spectral measurement has been carried out in the same way. Besides, since the only information in common is the spectral mode, relevant interesting spectroscopic information, such as pixel spectra from known compounds (embedding media, identified constituents, etc.) can also be part of the multiset if required [38]. The bilinear model provides also a single matrix ST, with spectral signatures valid for all images in the multiset that are very well defined because of the amount and diversity of information contained in all images treated together. FIGURE 6 Multiset structures and bilinear models for (A) several images obtained with the same spectroscopic platform and (B) a single image obtained with several platforms. Multivariate curve resolution Chapter j 2.5 129 When a multiset is formed by images collected on the same image by different spectroscopic platforms, the pixel mode needs to be common, i.e., there should be spatial congruency among the different images collected and the pixel size needs to be the same. This requirement often needs pre- processing that involves spatial transforms and balancing spatial resolutions among images [5]. When the pixel mode is common, the blocks of pixel spectra of the different images are placed one besides the other forming a row- wise augmented data matrix, and the multiset structure obeys the bilinear model expressed in Eq. (8): ½D1D2.Dn�[C � ST1 S T 2.STn � D½E1E2.En� (8) The bilinear model is formed by a single C matrix, which will give rise to a single set of maps, and an extended ST matrix, formed by many Si T blocks as spectroscopic techniques used in the different platforms. This kind of multiset benefits from the complementary information provided by different techniques that can help to differentiate much better among image constituents. In the most extreme case, different techniques can be sensitive to different compo- nents and only the fused structure will provide a complete reliable picture of all image constituents in the sample analyzed. A third possible scenario would imply joining images from different samples and acquired in different platforms. In this case, the images appended would form a row- and column-wise augmented matrix. The related bilinear model is shown in Eq. (9): 2 6666666666664 D11 D12 D13 :::: D1L D21 D22 D23 :::: D2L D31 D32 D33 :::: D3L :::: :::: :::: :::: :::: DK1 DK2 DK3 :::: DKL 3 7777777777775 ¼ 0 BBBBBBBBBBBB@ C1 C2 C3 :::: CK 1 CCCCCCCCCCCCA h ST1 ST2 ST3 :::: STL i þ 0 BBBBBBBBBBBB@ E11 E12 E13 :::: E1L E21 E22 E23 :::: E2L E31 E32 E33 :::: E3L :::: :::: :::: :::: :::: EK1 EK2 EK3 :::: EKL 1 CCCCCCCCCCCCA ¼ CaugS T augDEaug (9) In this case, both C and ST matrices would be augmented and the requirements and advantages of the previous multisets would also hold. Analyzing a multiset by MCR-ALS follows the same steps listed in Section 2 for a single image analysis because the work is done on an (augmented) matrix. The determination of components and initial estimates should be performed on the multiset and the previous constraints described are also applicable. In terms of flexibility in constraint application, multisets offer an additional turn of the screw, since constraints can be applied differently per block. Such optionality helps to respect the spatial and spectral characteristics of all data blocks appended [28,54]. 130 SECTION j II Algorithms and methods Working with multisets provides always more accurate and reliable results than analyzing the individual images one at a time. Advantages are linked to the use of more diverse information that allows a better definition of the image constituents and, as a consequence, a reduction in the ambiguity associated with the resolved maps and spectral signatures [58]. Below, some comments on examples of multisets formed by images from the same platform and im- ages from different platforms are provided. 4.1 Image multisets formed by images coming from the same platform As mentioned above, the only requirement for multisets formed by imagesfrom the same platform is a common spectral mode. This simple condition makes this kind of multiset the most commonly found in research studies. There are scientific problems particularly suitable for this kind of multiset. Clear examples are formed by images that come from the same sample collected at different depths, as can be done in confocal Raman imaging [3,4]. This strategy provides a clear description of the sample in the three dimensions and helps solving interesting problems, such as knowing the sequence of use of inks in forensic document studies (see Fig. 7) [59]. FIGURE 7 Multivariate curve resolution results (maps and spectral signatures) obtained from a multiset analysis of ink images obtained at different depths in a document. The sequence of use of inks can be seen from the distribution maps (Pilot BPG is more dominant in the upper layers in the ink intersection and crosses over Pilot BAB). Multivariate curve resolution Chapter j 2.5 131 Another example refers to images collected during process monitoring [60,61]. In these situations, having a bilinear model with a single set of spectral signatures provides consistency to the final solution, since maps at different depths or maps from the evolution of a constituent in a process are always related to the same spectral signature. Besides, the complementary compositional information in the different layers (or process stages) helps to model all compounds more easily, since minor compounds in a particular image may have a more dominant presence in a related image of the multiset structure. Sometimes multisets with common spectral information are formed by images of related samples and it is relevant to find the link among all of them. This would be the example of multisets acquired on individuals of the same biological population, where common resolved spectral signatures are useful to define more clearly the fingerprint of tissues and compounds appearing in all individuals (useful to define general trends of the population) and can be distinguished from compounds of specific individuals (associated with natural biological variability) (see Fig. 2) [38,62]. Another paradigmatic example of multisets of related images refers to the use of HSI for quantitative purposes, where calibration and test images can be analyzed in the same multiset structure [38]. This last example will be studied in more detail in the next section. 4.2 Image multisets coming from different platforms Often the expression image fusion is preserved for those situations in which the images analyzed together come from different platforms. This scenario is way more challenging than coupling images acquired with the same image acquisition system because spectral and spatial differences among the different images need to be handled [5]. The major complexity linked to image fusion from different platforms comes because the pixel mode among different techniques needs to be com- mon. Achieving this condition implies surmounting problems linked to dif- ferences in spatial orientation and spatial resolution among images. When the spatial resolution of the images to be fused is the same, there are different algorithms oriented to do coregistering of images, and most of them lay on the selection and use of some reference points in the images to be aligned. Once these pixels are selected, suitable transformations of translation and rotation are performed to obtain spatially congruent pixels among images [64]. The only drawback in this process is the delicate step of selection of reference pixels, which is simple in structured images with clear landmarks, such as environmental landscapes or some kinds of biological tissues, but can be less obvious when images have a poor spatial structure, e.g., mixtures of pharmaceutical powders. To overcome this problem, other procedures have been proposed that work with all pixels in the images. This strategy avoids the 132 SECTION j II Algorithms and methods possible bias in the selection of reference pixels and provides more robust results in the optimization of shift and rotation parameters because much more information is used [65]. A possible way to optimize the rotation and trans- lation parameters among images is via a simplex-based algorithm that mini- mizes the cost function: ssqðQ; dx; dyÞ ¼ XX� Ar i; j ðx; y;aÞ � As i; jðxþdx; yþdy;aþQÞði; jÞ �2 (10) where A(i,j) represents any single image measurement (intensity, singular value, concentration, etc.) associated with a particular pixel with Cartesian coordinates i,j. Ar (i,j) refers to the image taken as reference and As (i,j) to the image to be spatially matched (dx, dy, and q are the translation in x and y directions and rotation angle needed to achieve the optimal image matching, respectively). There are many possible options in terms of spatial information used to match images. When the sample has a clear contour, binarized sample contour maps may be a good reference. When this is not the case, binarized versions of score maps or unmixed distribution maps particularly comparable among images are also suitable. Once images are appropriately matched, MCR-ALS can be applied to a multiset structure that will have the pixel mode in common and different spectral modes for each of the platforms appended. Nevertheless, most imaging platforms present different spatial resolutions. An obvious solution is binning (downsampling) the images with the highest resolution to match the one with lowest resolution and build a multiset as the one shown in Fig. 6. However, the ultimate aim would be performing image fusion preserving the natural spatial resolution of each particular image to obtain the maximum spectral and spatial detail about the samples. Recently, a solution for this problem has been found in the multiset analysis world by using the so-called incomplete multiset structures [5,66]. These structures are row- and column-wise augmented matrices with some missing data blocks and were originally used to address environmental problems. The translation of this concept to work with images with different spatial resolution is represented in Fig. 8 [67]. For the example of two images, X1HR, with high spatial resolution, and X2LR, with low spatial resolution, a first regular multiset structure is built by appending X2LR with a downsampled version of the first image, X1LR. These two images with the same spatial resolution are first spatially matched and used to build a complete multiset with a common pixel dimension [X1LR X2LR]. Afterward, the original image with highest resolution (X1HR) is appended below its lowest spatial resolution version [X1LR;X1HR]. The result is an incomplete multiset structure with three blocks because the image with lower spatial detail (X2LR) does not have an equivalent with high spatial resolution. To resolve this multiset structure, an adapted version of the MCR-ALS algorithm is employed that is based on the idea that an incomplete multiset Multivariate curve resolution Chapter j 2.5 133 can be defined as a group of intersecting complete multisets with information in common. For the example in Fig. 8, the algorithm used optimizes simul- taneously the profiles in the two complete multiset structures, expressed as follows: ½X1LRX2LR� ¼ CLR½S1S2�T (11) ½X1LR;X1HR�[ ½C1LR;C1HR�ST1 (12) After each iteration, an objective function including the two optimizations is used to drive the global model, defined as: min � k ½X1LRX2LR� � CLR½S1S2�T k þ k ½X1LR;X1HR� � ½CLR;CHR�ST1 k � (13) The outcome of the algorithm is a set of pure extended spectral signatures including information of all connected platforms, [S1 S2] T, associated with a related set of high spatial resolution maps CHR. The only limitation of this approach appears when there are some image constituents that are only present in the image with the lowest spatial resolution. In this case, maps with high spatial resolution cannot be retrieved for these constituents [67]. Amore detailed description of this algorithm can be encountered in Ref. [67], where simulated examples and image fusion structures including MS/IR and Raman/IR images are presented. Fig. 9 shows the incomplete FIGURE 8 Incomplete multiset used to couple images obtained from different spectroscopic platforms with different spatial resolutions. 134 SECTION j II Algorithms and methods structure used in the study of the fusion of an IR image (spatial resolution 44 � 44 mm2) and a Raman image (spatial resolution 100 � 100 mm2) of a tonsil tissue sample. The final results obtained are the signatures of the different tissue and subtissue components of the sample and the distribution maps at the highest possible resolution. When the spectroscopic direction is considered, different platforms can provide spectra that may show important differences in terms of number of spectral channels. In such a case, the information of the techniques with higher number of spectral readings will dominate the analysis. Tackling this problem may involve binning of some of the techniques, or compression by variable selection or by using scores provided by auxiliary methods, such as PCA or even MCR scores obtained on the different individual images. Another com- mon operation to fuse information of different techniques is the rescaling of FIGURE 9 Multivariate curve resolution results obtained from the analysis of an incomplete multiset formed by Raman and FT-IR images from a sample of tonsil tissue. FT-IR, Fourier- transform infrared. Multivariate curve resolution Chapter j 2.5 135 the spectroscopic measurement so that signal intensities from the different blocks are comparable. This operation implies dividing the full blocks by a suitable scaling factor that can be established visually or can be obtained otherwise, e.g., the norm of each matrix block [5]. Although the problems associated with the spatial mode are the most challenging and relevant, there exists a situation that may cause difficulties in data fusion that is linked to the characteristics of the spectroscopic techniques used. Indeed, in the first section, techniques providing 3D images, e.g., Raman, IR, and 4D images, e.g., EEM fluorescence, were presented. Coupling images that present a different number of spectroscopic dimensions per pixel is also a challenge. To build a multiset structure joining a 3D image (obeying a bilinear model) and a 4D image (obeying a trilinear model), both images need to be unfolded to form a data matrix (sized nr pixels x nr. spectral channels). In the case of a 3D image, pixel spectra are simply located one on top of each other, as in Fig. 1; in the case of the 4D EEM image, unfolding takes place as in Fig. 2B (see Fig. 10). The analysis of the multiset in Fig. 10 by MCR-ALS would provide a bilinear model. To preserve the natural trilinear behavior of the 4D EEM image and knowing that constraints can be applied differently in the Si T blocks of the multiset structure, trilinearity will be applied as a constraint only to the blocks of the emission spectra related to the EEM image. In this way, MCR-ALS will solve the image fusion problem of 3D and 4D images by using a hybrid bilinear/trilinear model [5,63]. 5. Use of resolved maps and spectral signatures: going beyond MCR The main purpose of MCR when applied to hyperspectral image analysis is recovering the underlying BeereLambert law of the image measurement and providing concentration maps and spectral signatures of the image constitu- ents. However, the MCR scores (C profiles) and loadings (ST profiles) have very desirable properties that make them excellent starting information for other data analysis purposes. Indeed, MCR scores and loadings are FIGURE 10 (A) Image fusion of 3D and 4D excitation-emission fluorescence measurement fluorescence images. 136 SECTION j II Algorithms and methods noise-filtered compressed representations of the compositional and structural information of the image constituents, respectively. Besides, each of these profiles is chemically meaningful and contains constituent-specific informa- tion. The use of MCR scores and loadings increases the performance and flexibility of data analysis tasks when compared to the straight use of raw image spectra as initial information [3,4]. Below, some outstanding examples of further use of MCR concentration profiles and spectral signatures in different data analysis applications are described. 5.1 Use of MCR scores (concentration profiles or maps) The matrix C of concentration profiles retrieved by MCR provides two different kinds of information: on the one hand, the rows of the matrix provide compressed interpretable information of the composition of each particular pixel and, on the other hand, every column shows the relative variation of abundance of a particular constituent along the whole image. When each concentration profile is refolded into the related 2D map, the information of pixel relative abundance and spatial distribution of a particular constituent is obtained. This diversity of information makes that different data analysis tasks focused on the compositional information (segmentation), on the relative abundance of the image constituents in pixels (quantitative analysis), or on the spatial distribution of constituents (heterogeneity studies and superresolution applications) may work very efficiently when using MCR scores as starting information. Segmentation is a common task in hyperspectral image analysis and in- cludes all unsupervised data analysis tools oriented to find classes of similar pixels, i.e., pixels with similar composition [3]. The outcome of this analysis is a segmentation map, where the pixel classes can be displayed, and the class centroids, which represent the average pixel for each class. Traditionally, raw spectra were used to perform image segmentation because the shape and in- tensity of spectra is directly related to chemical composition. Class centroids were the mean spectra of all pixels belonging to the same class. The use of MCR scores for image segmentation offers a series of advantages. On the one hand, the compressed information speeds up the segmentation analysis and the classes are more accurately defined because the concentration profiles are noise-filtered, in contrast with the raw spectra that contain the experimental error incorporated. The interpretation of the characteristics of the class cen- troids becomes also easier because the information defining every pixel is formed by relative abundances of image constituents, i.e., straight composition information. Therefore, the centroids obtained offer the average composition of every class. Fig. 11A shows the MCR maps and spectral signatures of the constituents of a kidney stone image; Fig. 11B shows the related segmentation maps and centroids when raw image spectra (top plot) and MCR scores (bottom plots) are used, respectively. The segmentation maps are similar using Multivariate curve resolution Chapter j 2.5 137 both kinds of information, the centroid spectra are very similar to each other and make difficult defining the characteristics of each class, whereas the MCR centroids provide clear information on the composition of every class, expressed as relative concentration of the different image constituents [37]. There are additional advantages of using MCR scores in segmentation linked to the fact that every profile in C contains compound-specific infor- mation. This allows omitting certain profiles for segmentation tasks, e.g., those related to modeled background contributions, or selecting only some of the chemical concentration profiles for segmentation if pixel similarities want to be expressed on the basis of only some particular image constituents. Pre- processing can also be used when relative concentrations of different image constituents in the image are very unbalanced, e.g., autoscaling or normali- zation of each concentration profile before segmentation can be doneto enhance the relevance of minor compounds in the image. All the strategies above are unthinkable when raw pixel spectra with mixed information of all image constituents are used [37]. To end up the use of MCR scores for segmentation, it is relevant to mention that segmentation can be done on a single image or on image multisets. When done on image multisets formed by images collected on different samples (see Fig. 6A and Eqs. 7 and 9), segmentation is performed taking altogether the MCR scores from the augmented C matrix containing the Ci scores of every particular image. Such a strategy is very valuable when classes common to all FIGURE 11 (A) Maps and resolved spectra for a kidney stone Raman image, (B) segmentation maps and centroids obtained from raw image spectra and from multivariate curve resolution (MCR) scores. 138 SECTION j II Algorithms and methods images need to be distinguished from classes specific from a particular image. Such an idea is interesting in multisets formed by images from samples of individuals belonging to the same biological population. Classes in all samples refer to population trends, whereas specific classes for an individual are related to the natural biological variability within a population [68]. The most known use of MCR scores is related to quantitative analysis of image constituents [3,37]. To do that, an initial column-wise augmented multiset (see Fig. 6A) is built by appending together calibration and unknown images of samples containing the constituents of interest. Every concentration profile contains information associated with a particular image constituent, and this information will be used to build the related calibration model and do the suitable predictions. A different calibration model will be built per each image constituent. At this point, it is relevant to mention that quantitative analysis can be performed at a bulk image level or at a local pixel level. Both tasks can be done using the information contained in the MCR scores, as shown sche- matically in Fig. 12. First the MCR analysis is performed on the multiset containing the calibration and test samples, and compound-specific informa- tion is obtained in the concentration profiles. For a particular image constituent represented by an augmented concentration profile, the average concentration value of each image map (coming from the elements in the profile of the suitable Ci block) is computed. The MCR average concentration values for the calibration samples (in arbitrary units) are regressed against the real reference concentration values of the samples, and the calibration line is obtained. The prediction step can be done at two levels: (1) bulk image concentrations can be predicted for the test samples by submitting the average MCR concentration value of the related image maps to the calibration model and (2) for any image, the real pixel concentration of an image constituent can be found by sub- mitting the MCR pixel concentration value to the calibration model. FIGURE 12 Use of multivariate curve resolution scores for quantitative image analysis at a bulk image and local pixel level. Multivariate curve resolution Chapter j 2.5 139 This approach allows performing quantitative analysis on images with a different number of pixels and geometry, since calibration lines and bulk image predictions are done based on the average image concentration values and not on total integrated areas under concentration profiles, as in other MCR applications. In the analysis of many products, e.g., pharmaceutical formulations or feed products, the information on the amount of the different constituents in the product needs to be complemented by information on the heterogeneity of the mixture that can be obtained from the sample images. Heterogeneity studies can also be performed at an individual constituent level using MCR scores. The definition of heterogeneity incorporates two different aspects, the so- called constitutional heterogeneity and the distributional heterogeneity [68]. Constitutional heterogeneity is the term that defines the scatter in pixel con- centration values within an image and looks upon the characteristics of each pixel individually, disregarding the properties of the pixel neighborhood. Such a heterogeneity contribution is easily described by histograms built with the different concentration values obtained in the resolved MCR profiles. The higher the standard deviation linked to the histogram, the higher the consti- tutional heterogeneity. Equally important is the distributional heterogeneity that takes into account how uniformly distributed the different constituents in the sample surface are. Such a concept needs to be defined taking into account the properties of the pixel neighborhood. To do so, the starting information in the concentration profile of a particular constituent needs to be refolded into the concentration map to recover the spatial information. Indicators of image constituent heterogeneity can be obtained by using approaches such as mac- ropixel analysis, which analyzes properties of small pixel concentration win- dows that contain a pixel and the immediate concentric neighbors. Heterogeneity curves are obtained showing the change in the average variance of pixel neighborhood concentration values as a function of the size of the pixel neighborhood selected. Steeper decreases of the heterogeneity curves are related to lower distributional heterogeneities, i.e., to material more uniformly distributed. Fig. 13 shows the maps for three components of a pharmaceutical formulation image, e.g., starch, caffeine, and acetylsalicylic acid (AAS). Whereas histograms of the three compounds are very similar in spread about the mean value (constitutional heterogeneity), AAS and starch are compounds distributed in a much less uniform way than caffeine; hence the smoother decay in their heterogeneity curves (distributional heterogeneity). A completely different use of MCR scores is related to the application of superresolution strategies to hyperspectral images [69,70]. Superresolution was born in image processing to enhance the spatial detail of gray or RGB images. The concept behind was obtaining a single image with higher spatial detail from the combination of the information contained in several images with lower spatial resolution captured on the same surface slightly x- and/or y-shifted from one another by a subpixel motion step. After the suitable 140 SECTION j II Algorithms and methods mathematical transformations, explained in detail in Refs. [71,72], a super- resolved image with a pixel size equal to the subpixel motion step was obtained. Such an idea was valid for gray images and, when applied to RGB images, the superresolution step was separately applied to the red, green, and blue channels. The superresolution concept is equally interesting for hyperspectral images to surmount the limitations of instrumental spatial resolution. However, the plain adaptation of the superresolution strategy to deal separately with the image frames coming from the different spectral channels is too computa- tionally intensive and not viable. To solve the problem, a combination of MCR multiset analysis and use of MCR scores for superresolution has been pro- posed [69]. Fig. 14 shows an example of superresolution applied to Fourier- transform infrared (FT-IR) images acquired on a HeLa cell. First of all, 36 low-resolution images, with a pixel size equal to 3.5 � 3.5 mm2, were collected x- and/or y-shifted 0.6 mm from one another. These images were appended to form a multiset and MCR-ALS was applied. A single ST matrix was obtained that very well defined of the cell compartments because they were coming from a high number of images with complementary information and an augmented C matrix, with the concentration profiles of each of the low spatial FIGURE 13 Heterogeneity information obtained from multivariate curve resolution (MCR) maps of compounds in a pharmaceutical