<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="research-article">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">MS</journal-id><journal-title-group>
    <journal-title>Mechanical Sciences</journal-title>
    <abbrev-journal-title abbrev-type="publisher">MS</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Mech. Sci.</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">2191-916X</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/ms-15-87-2024</article-id><title-group><article-title>A convolutional neural-network-based diagnostic framework for industrial bearing</article-title><alt-title>A convolutional neural-network-based diagnostic framework for industrial bearing</alt-title>
      </title-group><?xmltex \runningtitle{A convolutional neural-network-based diagnostic framework for industrial bearing}?><?xmltex \runningauthor{B. Yu and C. Xie}?>
      <contrib-group>
        <contrib contrib-type="author" corresp="no">
          <name><surname>Yu</surname><given-names>Bowen</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="yes">
          <name><surname>Xie</surname><given-names>Chunli</given-names></name>
          <email>xcl08@126.com</email>
        </contrib>
        <aff id="aff1"><institution>College of Mechanical and Electrical Engineering, Northeast Forestry University, Harbin 150040, China</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Chunli Xie (xcl08@126.com)</corresp></author-notes><pub-date><day>19</day><month>February</month><year>2024</year></pub-date>
      
      <volume>15</volume>
      <issue>1</issue>
      <fpage>87</fpage><lpage>98</lpage>
      <history>
        <date date-type="received"><day>1</day><month>October</month><year>2022</year></date>
           <date date-type="rev-recd"><day>11</day><month>November</month><year>2023</year></date>
           <date date-type="accepted"><day>22</day><month>December</month><year>2023</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2024 Bowen Yu</copyright-statement>
        <copyright-year>2024</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://ms.copernicus.org/articles/15/87/2024/ms-15-87-2024.html">This article is available from https://ms.copernicus.org/articles/15/87/2024/ms-15-87-2024.html</self-uri><self-uri xlink:href="https://ms.copernicus.org/articles/15/87/2024/ms-15-87-2024.pdf">The full text article is available as a PDF file from https://ms.copernicus.org/articles/15/87/2024/ms-15-87-2024.pdf</self-uri>
      <abstract><title>Abstract</title>

      <p id="d1e83">The problem of industrial bearing health monitoring and fault diagnosis has recently been a popular research topic. Extracting sufficient features from the input raw vibration signals and mapping them to the most likely fault labels is the essence of bearing fault diagnosis. This study proposes a novel framework for bearing defect diagnostics by merging dilated residual convolutional neural networks and attention mechanisms. In this framework, multiple parallel dilated convolutional networks can automatically learn rich fault features at each scale from vibration signals. Simultaneously, the attention approach boosts fault-related features and suppresses irrelevant ones, improving fault detection performance and generalization. According to the experimental results of two different bearing datasets, the framework achieves a higher accuracy and can accurately identify various types of faults.</p>
  </abstract>
    
<funding-group>
<award-group id="gs1">
<funding-source>Natural Science Foundation of Heilongjiang Province</funding-source>
<award-id>LH2021F002</award-id>
</award-group>
</funding-group>
</article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d1e95">With the advent of the era of big industrial data, mechanical equipment is constantly developing toward complexity and intelligence. As a basic component of mechanical equipment, industrial bearings are also components with a high incidence of failure. Industrial bearing failure can directly lead to the deterioration of the operating condition of mechanical equipment and pose significant safety issues. Based on statistics, bearing failures account for 40 %–70 % of the electro-mechanical drive system, resulting in substantial losses (Lessmeier et al., 2016). Therefore, real-time and accurate diagnostics of industrial bearings are critical for ensuring smooth operation and extending the equipment's life.</p>
      <p id="d1e98">The commonly used methods for industrial bearing diagnosis include oil pressure, infrared thermal imaging, vibroacoustic measurements, electric current, etc. (Thoppil et al., 2021). Since low-cost vibration sensors can conveniently collect a wide range of vibration fault information, vibration-signal-based diagnostic methods are most widely adopted in health condition monitoring. The fault signals of industrial bearings are non-smooth and contain a lot of background noise (Lin et al., 2004), and the defects are rarely single. It is a great challenge for the diagnosis model to extract the effective fault information from the complex vibration signal and ensure classification accuracy. Bearing fault diagnosis models usually have two major parts: feature extraction and fault classification (Chen et al., 2021). Feature extraction refers to the extraction of representative fault-related information from the raw data based on the technicians' signal processing knowledge and practical engineering experience, usually divided into time and frequency domain information.</p>
      <p id="d1e101">A single time-domain signal often cannot accurately express bearing fault information. It is common to transform it into the frequency domain or time–frequency domain, such as wavelet packet (Yen and Lin, 2000), envelope analysis (Tsao et al., 2012), and empirical mode decomposition distribution (Yu and Junsheng, 2006). Various machine learning algorithms are used as classifiers on the extracted fault features, such as support vector machines (SVMs) (Yang et al., 2007), random forest methods (Roy et al., 2020), and <inline-formula><mml:math id="M1" display="inline"><mml:mi>K</mml:mi></mml:math></inline-formula> nearest neighbor (KNN) (Tian et al., 2015).</p>
      <p id="d1e111">Fault diagnosis models based on shallow machine learning techniques and manual feature extraction methods have shown excellent recognition accuracy. However, there are several clear drawbacks: (1) it needs to manually extract features from the original vibration signal based on experience and signal processing knowledge, which requires intensive computation. What's more, the effect of diagnosis mainly<?pagebreak page88?> depends on the quality of feature extraction. (2) Feature extraction and classification are two independent processes, and the unsynchronized extraction and classification cannot meet the requirements of real-time diagnosis in the production system. (3) The diagnostic capability of shallow machine learning is slightly insufficient in the face of massive and strongly dynamic data, making it difficult to adapt to complex working conditions (Jiao et al., 2020).</p>
      <p id="d1e115">The rapid development of deep learning has brought a new approach to bearing fault diagnosis. As a common deep learning model, the convolutional neural network (CNN) has achieved remarkable results in object recognition, image processing, and audio classification (Khan et al., 2020). Due to its multi-layer network structure, CNN has a strong adaptive feature learning ability, does not require any complicated manual extraction process, and can automatically learn fault feature representations from raw signal data.</p>
      <p id="d1e118">Zhu et al. (2019) transformed the original one-dimensional time-domain signal into a time–frequency map through a short-time Fourier transform and input it into a convolutional neural network to identify fault features. Wang et al. (2019) compared the classification performance of eight different time–frequency analysis methods on the AlexNet model.</p>
      <p id="d1e121">Mechanical equipment conditions are complicated and varied throughout the operation, and bearing failures come in various shapes and locations. As a result, the signal contains multiple scales of characteristics. Jiang et al. (2018) introduced multi-scale coarse-grained layers in CNNs to capture different granularity features by smooth shifting. Peng et al. (2020) offered a multi-branch, multi-scale CNN for learning rich and complementary defect information from wheel set bearings. Multi-scale convolutional networks have more layers and are susceptible to degradation. Liu et al. (2019) incorporated residual learning into CNNs to improve model training and prevent performance deterioration. Surendran et al. (2022) utilized a residual multi-scale CNN model (inception-resnet v2) to extract high-level fault characteristics and optimize the parameters using the sailfish algorithm.</p>
      <p id="d1e124">Motivated by the prior studies, we have developed a novel fault diagnostic framework, which incorporates multi-filter dilated CNNs, a residual convolutional neural network and attentional mechanisms. The framework enables automatic feature extraction and end-to-end detection for fault identification, enhancing CNN stability and generalization in complicated conditions. The following is a list of the paper's major contributions: <list list-type="order"><list-item>
      <p id="d1e129">A multi-scale extraction module based on dilated CNNs is proposed. Dilated CNNs produce diverse receptive fields to capture different fault features by adjusting dilation rates.</p></list-item><list-item>
      <p id="d1e133">We design the residual connection module to transfer feature information between different layers, enabling data from shallow levels to flow into deeper layers and reducing information loss during transmission. Additionally, a wide convolutional kernel is used to capture long-term dependencies and mitigate noise interference.</p></list-item><list-item>
      <p id="d1e137">Multiple attention mechanisms are applied in the module. The attention module assigns different weights to captured fault features, thus enhancing representative features while suppressing irrelevant ones.</p></list-item><list-item>
      <p id="d1e141">The diagnostic framework is proposed. It was validated under various scenarios with bearing vibration signals, and the effect of various dilation rates and reduction ratios on the model extraction capability was examined. It is experimentally demonstrated that the framework performs well in complex situations.</p></list-item></list></p>
      <p id="d1e144">The following is the structure of this paper: Sect. 2 introduces the background knowledge of convolutional neural networks, Sect. 3 explains the diagnostic framework's components and its procedure, and Sect. 4 introduces the experimental datasets of Case Western Reserve University (CWRU) and Jiangnan University (JNU). Section 5 in this paper presents the experimental results under different tasks and their corresponding analyses. Conclusions are presented in Sect. 6.</p>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Convolutional neural networks</title>
      <p id="d1e155">Convolutional neural networks encompass three key concepts: sparse interaction, parameter sharing, and equi-variant representation (Goodfellow et al., 2016). As illustrated in Fig. 1, a typical convolutional neural network comprises convolutional, pooling, and fully connected layers.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F1" specific-use="star"><?xmltex \currentcnt{1}?><?xmltex \def\figurename{Figure}?><label>Figure 1</label><caption><p id="d1e160">Convolutional neural network architecture.</p></caption>
        <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://ms.copernicus.org/articles/15/87/2024/ms-15-87-2024-f01.png"/>

      </fig>

<sec id="Ch1.S2.SS1">
  <label>2.1</label><title>Feature extraction</title>
      <p id="d1e176">The convolutional layer in CNNs stands as a cornerstone of its architecture. In this layer, input data are meticulously scanned using convolutional kernels. These kernels, akin to filters, possess a strong ability to discern intricate patterns within the data. Through the convolutional operation, these filters capture hierarchical and abstract representations of the input. Activation functions in CNNs map the inputs of neurons to their respective outputs. This transformation commonly employs nonlinear operations, enabling the network to learn the complex nonlinear relationships within the data. Pooling layers play a crucial role in simplifying computations within neural networks, which condense input dimensions by synthesizing local regions of the feature map into a single outcome. A CNN systematically extracts discriminative features related to faults from vibration signals by executing convolution and pooling operations.</p>
</sec>
<sec id="Ch1.S2.SS2">
  <label>2.2</label><title>Classification</title>
      <p id="d1e188">After the feature extraction phase, the acquired features, refined through layers of convolutions and pooling, flow into<?pagebreak page89?> densely connected layers, where intricate connections form, capturing abstract concepts. Each neuron in these fully connected layers acts as a learned feature detector, discerning complex combinations of visual elements. The final layer, often adorned with the softmax activation function, transforms these intricate features into probabilities, quantifying the network's confidence in each potential class.</p>
      <p id="d1e191">Traditional models often incorporate multiple fully connected layers to capture intricate dependencies within the data. However, as convolutional neural networks progress to depth and complexity, fully connected layers lead to a surge in parameters, which amplifies the risk of overfitting and puts forward higher requirements for the performance of diagnostic equipment. In response to the problem, Hinton et al. (2012) proposed the dropout method to randomly abandon the connection, reduce the co-adaptation between nodes, and empower the network to acquire robust features that generalize better to unseen data. To further improve the anti-fitting ability of the model and reduce the parameters in the training process, Lin et al. (2013) proposed a global average pooling method, which is different from the traditional fully connected layer by performing global average pooling on each feature map. Figure 2 compares the operational mechanisms of global average pooling and fully connected layers.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F2"><?xmltex \currentcnt{2}?><?xmltex \def\figurename{Figure}?><label>Figure 2</label><caption><p id="d1e196">Comparison of fully connected layer and global average pooling.</p></caption>
          <?xmltex \igopts{width=241.848425pt}?><graphic xlink:href="https://ms.copernicus.org/articles/15/87/2024/ms-15-87-2024-f02.png"/>

        </fig>

<?xmltex \hack{\newpage}?>
</sec>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Methods</title>
<sec id="Ch1.S3.SS1">
  <label>3.1</label><title>Multi-filter dilated convolutional neural network</title>
      <p id="d1e223">The vibration signal gathered by the accelerometer is typically non-stationary, meaning that the signal's frequency component fluctuates with time and has a significant degree of uncertainty. It comprises complex feature information of various timescales and presents typical multi-scale characteristics. Meanwhile, bearing faults come in various shapes and sizes, and different types of faults produce distinct characteristic frequencies. Due to these factors, traditional convolutional neural networks with a fixed filter do not extract enough information for accurate fault diagnosis.</p>
      <p id="d1e226">We propose a multi-filter dilated convolution module to mine multi-scale information from the bearing vibration signals to perform the feature extraction work. The module's structure is depicted in Fig. 3. The module uses four parallel dilated convolution structures with a filter <inline-formula><mml:math id="M2" display="inline"><mml:mi>w</mml:mi></mml:math></inline-formula>, sliding in the vibration signal for convolutions to obtain multiple feature maps (Wang and Ji, 2018). It can get several receptive fields in the vibration signal sequence, allowing each output of the local convolution stage to catch different scale features, the formula being defined as
            <disp-formula id="Ch1.E1" content-type="numbered"><label>1</label><mml:math id="M3" display="block"><mml:mrow><mml:mi>o</mml:mi><mml:mo>=</mml:mo><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>s</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>S</mml:mi></mml:munderover><mml:mi>f</mml:mi><mml:mo>[</mml:mo><mml:mi>i</mml:mi><mml:mo>+</mml:mo><mml:mi>d</mml:mi><mml:mo>⋅</mml:mo><mml:mi>s</mml:mi><mml:mo>]</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>w</mml:mi><mml:mo>[</mml:mo><mml:mi>i</mml:mi><mml:mo>]</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M4" display="inline"><mml:mi>d</mml:mi></mml:math></inline-formula> is the dilation rate and <inline-formula><mml:math id="M5" display="inline"><mml:mi>s</mml:mi></mml:math></inline-formula> denotes kernel size; the operation process of dilated convolution is depicted in Fig. 4a.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F3"><?xmltex \currentcnt{3}?><?xmltex \def\figurename{Figure}?><label>Figure 3</label><caption><p id="d1e302">Multi-filter dilated convolutional neural network.</p></caption>
          <?xmltex \igopts{width=213.395669pt}?><graphic xlink:href="https://ms.copernicus.org/articles/15/87/2024/ms-15-87-2024-f03.png"/>

        </fig>

      <?xmltex \floatpos{t}?><fig id="Ch1.F4" specific-use="star"><?xmltex \currentcnt{4}?><?xmltex \def\figurename{Figure}?><label>Figure 4</label><caption><p id="d1e314">Dilated convolutional neural network and residual block.</p></caption>
          <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://ms.copernicus.org/articles/15/87/2024/ms-15-87-2024-f04.png"/>

        </fig>

      <p id="d1e323">Given the diverse output sizes stemming from different dilation rates, we employ padding techniques to ensure uniform output lengths. Subsequently, feature maps from distinct levels are connected along the channel dimensions (Chen and Shi, 2021). Each structure is equipped with 64 filters and a kernel size of 5, so we can get the output in Eq. (2).
            <disp-formula id="Ch1.E2" content-type="numbered"><label>2</label><mml:math id="M6" display="block"><mml:mrow><mml:mi>o</mml:mi><mml:mo>=</mml:mo><mml:mfenced open="[" close="]"><mml:mrow><mml:msub><mml:mi>o</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>o</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">⋯</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>o</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>-</mml:mo><mml:mi>h</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mfenced><mml:mo>∈</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mi>L</mml:mi><mml:mo>×</mml:mo><mml:mi>C</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></disp-formula>
          Because the feature maps formed by convolution are substantially different in recognizing bearing fault features, we employ attention mechanism approaches to learn discriminative<?pagebreak page90?> features and disregard valueless data. First, the global temporal information in the feature map of <inline-formula><mml:math id="M7" display="inline"><mml:mi>L</mml:mi></mml:math></inline-formula> length is compressed into a channel descriptor using the global average pooling layer (Hu et al., 2018).
            <disp-formula id="Ch1.E3" content-type="numbered"><label>3</label><mml:math id="M8" display="block"><mml:mrow><mml:msub><mml:mi>z</mml:mi><mml:mi>c</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>L</mml:mi></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>L</mml:mi></mml:munderover><mml:msub><mml:mi>o</mml:mi><mml:mi>c</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M9" display="inline"><mml:mrow><mml:msub><mml:mi>z</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is channel-wise statistics, reflecting the global information of the <inline-formula><mml:math id="M10" display="inline"><mml:mi>c</mml:mi></mml:math></inline-formula> feature map.</p>
      <p id="d1e451">To properly capture the correlation of the channels in the channel-wise statistics, the next step is to fuse the feature map information of each channel across the fully connection layer (Ye and Yu, 2021).
            <disp-formula id="Ch1.E4" content-type="numbered"><label>4</label><mml:math id="M11" display="block"><mml:mrow><mml:mi>s</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant="italic">σ</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mi mathvariant="italic">δ</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mi>z</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mfenced><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M12" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula> represents the ReLU activation function, <inline-formula><mml:math id="M13" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M14" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> denote the fully connected layers, and <inline-formula><mml:math id="M15" display="inline"><mml:mi mathvariant="italic">σ</mml:mi></mml:math></inline-formula> is the sigmoid that compresses the dynamic range of the vector between [<inline-formula><mml:math id="M16" display="inline"><mml:mrow><mml:mn mathvariant="normal">0</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>]. After that, channel-wise multiplication in Eq. (5) is performed to complete the rescaling of the original features in the channel dimension with the learned weights.
            <disp-formula id="Ch1.E5" content-type="numbered"><label>5</label><mml:math id="M17" display="block"><mml:mrow><mml:mi>v</mml:mi><mml:mo>⋅</mml:mo><mml:mi>s</mml:mi><mml:mo>=</mml:mo><mml:mfenced open="[" close="]"><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:msub><mml:mi>s</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi>s</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">⋯</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mi>c</mml:mi></mml:msub><mml:msub><mml:mi>s</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:mfenced></mml:mrow></mml:math></disp-formula>
          After extracting characteristics from each structure, they are combined across channels using concatenation.</p>
</sec>
<sec id="Ch1.S3.SS2">
  <label>3.2</label><title>Residual convolutional neural network</title>
      <p id="d1e599">The residual network provides the shortcut connection approach (He et al., 2016) connecting earlier layers to later layers via shortcuts to allow the flow of information across distinct layers. Shortcut connections encompass identity and projection shortcuts, as illustrated in Eqs. (6)–(7).

                <disp-formula specific-use="gather" content-type="numbered"><mml:math id="M18" display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="Ch1.E6"><mml:mtd><mml:mtext>6</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mi>y</mml:mi><mml:mo>=</mml:mo><mml:mi>F</mml:mi><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E7"><mml:mtd><mml:mtext>7</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mi>y</mml:mi><mml:mo>=</mml:mo><mml:mi>F</mml:mi><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mi>s</mml:mi></mml:msub><mml:mi>x</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula>

            where <inline-formula><mml:math id="M19" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M20" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> refer to input and output, <inline-formula><mml:math id="M21" display="inline"><mml:mi>F</mml:mi></mml:math></inline-formula> is residual mapping, and <inline-formula><mml:math id="M22" display="inline"><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> indicates weight. The schematic diagram can be found in Fig. 4b.</p>
      <p id="d1e711">We create a residual convolutional neural network that learns to integrate feature information from the 1D signal, which comprises convolutional layers with residual connections and a pooling layer. In designing our convolutional layers, we utilize a wider convolutional kernel of 32 for the initial filters to mitigate noise interference and capture global trend information more effectively (Liang and Zhao, 2021). Subsequently, the second layer employs filters with a kernel size of 7. The chosen number of filters for these convolutional layers are 128 and 256 to optimize the feature extraction capabilities of our model.</p>
      <p id="d1e714">Due to the alteration in the number of channels across different layers, it is necessary to perform dimensional matching. This is accomplished via projection shortcuts as defined in Eq. (7), which employ a convolutional layer with a 1 <inline-formula><mml:math id="M23" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 1 kernel size to facilitate this transition. Finally, applying the ReLU activation function introduces nonlinearity into the model, enhancing its capacity to model complex functions.</p>
</sec>
<sec id="Ch1.S3.SS3">
  <label>3.3</label><title>Discriminative enhanced module</title>
      <p id="d1e732">After the features are extracted by the multi-filter dilated and residual convolutional neural network, they are fused by element-wise addition. A discriminative enhanced module (DEM) is then applied to the extracted multi-level features to further deepen the model's ability to screen for critical features before entering the classification layer. The DEM module is shown in Fig. 5. It includes two branches: the spatial channel attention module and the channel attention module (Woo et al., 2018).</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F5" specific-use="star"><?xmltex \currentcnt{5}?><?xmltex \def\figurename{Figure}?><label>Figure 5</label><caption><p id="d1e737">Discriminative enhanced module.</p></caption>
          <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://ms.copernicus.org/articles/15/87/2024/ms-15-87-2024-f05.png"/>

        </fig>

      <p id="d1e746">As for spatial information, we apply the convolutional operation to aggregate the compressed information from the channel dimensions (Roy et al., 2018).</p>
      <p id="d1e750">The channel attention technique produces two feature maps with complementing global information through average and maximum pooling, respectively. Then, these two feature maps are then subjected to two separate convolution operations.
            <disp-formula id="Ch1.E8" content-type="numbered"><label>8</label><mml:math id="M24" display="block"><mml:mtable class="split" rowspacing="0.2ex" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">out</mml:mi></mml:msub></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:mi mathvariant="italic">σ</mml:mi><mml:mfenced close="" open="("><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>⊗</mml:mo><mml:mfenced close=")" open="("><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>⊗</mml:mo><mml:msub><mml:mi>C</mml:mi><mml:mo>max⁡</mml:mo></mml:msub></mml:mrow></mml:mfenced></mml:mrow></mml:mfenced></mml:mrow></mml:mfenced></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:mfenced open="" close=")"><mml:mrow><mml:mo>+</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>⊗</mml:mo><mml:mfenced open="(" close=")"><mml:mrow><mml:mi mathvariant="italic">δ</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>⊗</mml:mo><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">avg</mml:mi></mml:msub></mml:mrow></mml:mfenced></mml:mrow></mml:mfenced></mml:mrow></mml:mfenced><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
          where <inline-formula><mml:math id="M25" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mo>max⁡</mml:mo></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M26" display="inline"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi mathvariant="normal">avg</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are the global maximum pooling and global average pooling feature, <inline-formula><mml:math id="M27" display="inline"><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M28" display="inline"><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> are the weights, <inline-formula><mml:math id="M29" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula> is ReLU activation, <inline-formula><mml:math id="M30" display="inline"><mml:mi mathvariant="italic">σ</mml:mi></mml:math></inline-formula> denotes sigmoid activation, and <inline-formula><mml:math id="M31" display="inline"><mml:mo>⊗</mml:mo></mml:math></inline-formula> is convolution operation.</p>
      <p id="d1e901">Input vectors can be optimized adaptively by DEM, to score the characteristics adaptively learned at various scales to enhance key information.</p>
</sec>
<sec id="Ch1.S3.SS4">
  <label>3.4</label><title>Fault detection framework</title>
      <p id="d1e912">We present a framework for fault detection in Fig. 6. The framework adopts an end-to-end learning approach comprising four steps: signal acquisition and segmentation, model<?pagebreak page91?> architecture development, model training workflow, and fault classification.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F6" specific-use="star"><?xmltex \currentcnt{6}?><?xmltex \def\figurename{Figure}?><label>Figure 6</label><caption><p id="d1e917">Fault detection framework.</p></caption>
          <?xmltex \igopts{width=367.040551pt}?><graphic xlink:href="https://ms.copernicus.org/articles/15/87/2024/ms-15-87-2024-f06.png"/>

        </fig>

      <p id="d1e926"><italic>Step 1: signal acquisition and segmentation.</italic> The acquisition system collects the vibration data of mechanical components in varying states. Next, the original signal is divided into smaller units every 1024 data points, represented by
            <disp-formula id="Ch1.E9" content-type="numbered"><label>9</label><mml:math id="M32" display="block"><mml:mrow><mml:mi>v</mml:mi><mml:mo>=</mml:mo><mml:mfenced close="}" open="{"><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">⋯</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>v</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:mfenced><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
          Each segment of the vibration signal is tagged with one hot code during processing, so a separate bearing time-domain vibration dataset is represented as
            <disp-formula id="Ch1.E10" content-type="numbered"><label>10</label><mml:math id="M33" display="block"><mml:mrow><mml:mfenced close="}" open="{"><mml:mrow><mml:mfenced open="(" close=")"><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:mfenced><mml:mo>,</mml:mo><mml:mfenced close=")" open="("><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:mfenced><mml:mo>,</mml:mo><mml:mi mathvariant="normal">⋯</mml:mi><mml:mi mathvariant="normal">⋯</mml:mi><mml:mo>,</mml:mo><mml:mfenced open="(" close=")"><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:mfenced></mml:mrow></mml:mfenced><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
          If the number of data obtained are insufficient, data augmentation techniques are used to increase the sample size; otherwise, they are not necessary.</p>
      <p id="d1e1031"><italic>Step 2: model architecture development.</italic> Fault characteristics are extracted using multi-filter dilated and residual convolutional neural networks. The discriminative enhanced module picks characteristics from the extracted information that enhance the discriminating ones. Finally, we use the global average pooling layer to generate feature vectors and transfer them into softmax to output several fault sorts.
            <disp-formula id="Ch1.E11" content-type="numbered"><label>11</label><mml:math id="M34" display="block"><mml:mrow><mml:mi mathvariant="normal">Softmax</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:msub><mml:mi>z</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfenced><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:msub><mml:mi>z</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:msubsup><mml:mo>∑</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:msub><mml:mi>z</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:math></disp-formula></p>
      <p id="d1e1086"><italic>Step 3: model training workflow.</italic> We feed the model our training data and then iteratively move forward through each model layer to get the prediction. According to the loss function, the loss between the prediction and the target is<?pagebreak page92?> determined. The error is then back-propagated while modifying the training parameters to minimize the difference.</p>
      <p id="d1e1091"><italic>Step 4: fault classification.</italic> Input test data to the trained model. The detection model returns the fault category corresponding to the input signal.</p>
</sec>
</sec>
<sec id="Ch1.S4">
  <label>4</label><title>Data description</title>
<sec id="Ch1.S4.SS1">
  <label>4.1</label><title>CWRU dataset</title>
      <p id="d1e1112">The rolling bearing dataset from Case Western Reserve University (CWRU; <uri>http://csegroups.case.edu/bearingdatacenter</uri>, last access: September 2022) has been<?pagebreak page93?> used extensively and has been considered a benchmark (Smith and Randall, 2015) in recent years.</p>
      <p id="d1e1118">The vibration signals were collected from an accelerometer mounted on the drive end (DE) with a sampling frequency of 12 kHz. Three locations of failure were considered: inner-ring failure, outer-ring failure, and ball failure. Each position had three different fault diameters: 0.007, 0.014, and 0.021 in. Consequently, there were three fault types times three different fault diameters, along with normal operating conditions, totaling 10 fault types.</p>
      <p id="d1e1121">Datasets A, B, and C encompass vibration data from SKF 6205 bearings in three conditions: 1772 rpm (1 hp), 1750 rpm (2 hp), and 1730 rpm (3 hp), where bearing failures are seeded through electro-discharge machining, as shown in Table 1.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T1"><?xmltex \currentcnt{1}?><label>Table 1</label><caption><p id="d1e1128">Description of CWRU dataset.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="5">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="center"/>
     <oasis:colspec colnum="3" colname="col3" align="center"/>
     <oasis:colspec colnum="4" colname="col4" align="left"/>
     <oasis:colspec colnum="5" colname="col5" align="center"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry namest="col1" nameend="col3" align="center">Load </oasis:entry>
         <oasis:entry colname="col4">Fault</oasis:entry>
         <oasis:entry colname="col5">Defect size</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry rowsep="1" namest="col1" nameend="col3" align="center">(hp, horsepower) </oasis:entry>
         <oasis:entry colname="col4">location</oasis:entry>
         <oasis:entry colname="col5">(inch)</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">A</oasis:entry>
         <oasis:entry colname="col2">B</oasis:entry>
         <oasis:entry colname="col3">C</oasis:entry>
         <oasis:entry colname="col4"/>
         <oasis:entry colname="col5"/>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">1</oasis:entry>
         <oasis:entry colname="col2">2</oasis:entry>
         <oasis:entry colname="col3">3</oasis:entry>
         <oasis:entry colname="col4">Normal</oasis:entry>
         <oasis:entry colname="col5">–</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">1</oasis:entry>
         <oasis:entry colname="col2">2</oasis:entry>
         <oasis:entry colname="col3">3</oasis:entry>
         <oasis:entry colname="col4">Inner race</oasis:entry>
         <oasis:entry colname="col5">0.007/0.014/0.021</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">1</oasis:entry>
         <oasis:entry colname="col2">2</oasis:entry>
         <oasis:entry colname="col3">3</oasis:entry>
         <oasis:entry colname="col4">Outer race</oasis:entry>
         <oasis:entry colname="col5">0.007/0.014/0.021</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">1</oasis:entry>
         <oasis:entry colname="col2">2</oasis:entry>
         <oasis:entry colname="col3">3</oasis:entry>
         <oasis:entry colname="col4">Ball</oasis:entry>
         <oasis:entry colname="col5">0.007/0.014/0.021</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><?xmltex \gdef\@currentlabel{1}?></table-wrap>

</sec>
<sec id="Ch1.S4.SS2">
  <label>4.2</label><title>JNU dataset</title>
      <p id="d1e1273">Two types of roller bearings (N205 and NU205) were used in Jiangnan University bearing datasets (Li et al., 2013). The sampling frequency is 50 kHz, and the sampling duration is 20 s. The vertical vibration signals of the bearings were measured in four states: normal, defective inner ring, defective outer ring, and defective roller element. The measurements were conducted independently using an accelerometer, amplified through a signal conditioner, and recorded. We utilize data obtained at 800 rpm; detailed information is outlined in Table 2.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T2"><?xmltex \currentcnt{2}?><label>Table 2</label><caption><p id="d1e1279">Description of JNU dataset.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="3">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="center"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1">Speed</oasis:entry>
         <oasis:entry colname="col2">Fault location</oasis:entry>
         <oasis:entry colname="col3">Defect size</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">(rpm)</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">(mm)</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">800</oasis:entry>
         <oasis:entry colname="col2">Normal</oasis:entry>
         <oasis:entry colname="col3">–</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">800</oasis:entry>
         <oasis:entry colname="col2">Inner race</oasis:entry>
         <oasis:entry colname="col3">0.3 <inline-formula><mml:math id="M35" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 0.25</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">800</oasis:entry>
         <oasis:entry colname="col2">Outer race</oasis:entry>
         <oasis:entry colname="col3">0.3 <inline-formula><mml:math id="M36" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 0.25</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">800</oasis:entry>
         <oasis:entry colname="col2">Roller element</oasis:entry>
         <oasis:entry colname="col3">0.5 <inline-formula><mml:math id="M37" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 0.15</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><?xmltex \gdef\@currentlabel{2}?></table-wrap>

</sec>
<sec id="Ch1.S4.SS3">
  <label>4.3</label><title>Data augmentation</title>
      <p id="d1e1399">Training the model with a substantial volume of vibration signal data is critical to ensuring robust model fit. We have implemented an overlapping data augmentation technique (Zhang et al., 2017) to address the limited number of samples, as illustrated in Fig. 7. This method significantly increases the available training data, enhancing our model's performance.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F7"><?xmltex \currentcnt{7}?><?xmltex \def\figurename{Figure}?><label>Figure 7</label><caption><p id="d1e1404">Overlapping data augmentation.</p></caption>
          <?xmltex \igopts{width=241.848425pt}?><graphic xlink:href="https://ms.copernicus.org/articles/15/87/2024/ms-15-87-2024-f07.png"/>

        </fig>

</sec>
</sec>
<sec id="Ch1.S5">
  <label>5</label><title>Experimental validation and results</title>
      <p id="d1e1423">In this section, we validate various scenarios and analyze the corresponding outcomes.</p>
<sec id="Ch1.S5.SS1">
  <label>5.1</label><title>Experimental setting</title>
      <p id="d1e1433">The training set of A, B, and C comprises 2000 samples, with a separate validation set and test set, each containing 300 samples.</p>
      <p id="d1e1436">We choose cross-entropy loss as the loss function to measure performance, and its mathematical formula is as follows:
            <disp-formula id="Ch1.E12" content-type="numbered"><label>12</label><mml:math id="M38" display="block"><mml:mrow><mml:mi>H</mml:mi><mml:mo>(</mml:mo><mml:mi>p</mml:mi><mml:mo>,</mml:mo><mml:mi>q</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:munder><mml:mo movablelimits="false">∑</mml:mo><mml:mi>x</mml:mi></mml:munder><mml:mi>p</mml:mi><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo><mml:mi>log⁡</mml:mi><mml:mi>q</mml:mi><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M39" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> indicates the true probability distribution and <inline-formula><mml:math id="M40" display="inline"><mml:mrow><mml:mi>q</mml:mi><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is the predicted probability distribution.</p>
      <p id="d1e1515">In order to accelerate the training speed while avoiding falling into local optimal points, this paper uses the Adam stochastic optimization algorithm for training, which can dynamically adjust the learning rate of different parameters by iterating the weights according to the training data. The dropout rate during training is 0.3.</p>
      <p id="d1e1518">The framework used for the experiments is TensorFlow 2.6.0, running on a computer with an Intel i5 11400 CPU and an RTX3060 12 GB GPU. To better train the model in the TensorFlow framework, callback functions, early stop, and exponential decay learning rate scheduler are utilized to ensure optimum generalization performance.</p>
<sec id="Ch1.S5.SS1.SSS1">
  <label>5.1.1</label><title>The effect of dilation rates</title>
      <p id="d1e1529">If the kernel size of the filter is ks and the dilation rate is <inline-formula><mml:math id="M41" display="inline"><mml:mi>d</mml:mi></mml:math></inline-formula>, then the equivalent convolution kernel size ks<inline-formula><mml:math id="M42" display="inline"><mml:msup><mml:mi/><mml:mo>′</mml:mo></mml:msup></mml:math></inline-formula> is
              <disp-formula id="Ch1.E13" content-type="numbered"><label>13</label><mml:math id="M43" display="block"><mml:mrow><mml:msup><mml:mi mathvariant="normal">ks</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:mi mathvariant="normal">ks</mml:mi><mml:mo>+</mml:mo><mml:mo>(</mml:mo><mml:mi mathvariant="normal">ks</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>)</mml:mo><mml:mo>×</mml:mo><mml:mo>(</mml:mo><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
            When the dilation rate exceeds 1, the receptive field of the convolution kernel can be enlarged based on Eq. (13). By configuring a set of dilation rates to establish multiple receptive fields, the module can capture signal features across a broad range of scales.</p>
      <p id="d1e1590">To find the optimal combination, we attempted to validate sets with various rates. The standard convolution combination is (<inline-formula><mml:math id="M44" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>), while the other combinations tested include (<inline-formula><mml:math id="M45" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">3</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula>), (<inline-formula><mml:math id="M46" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">4</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">8</mml:mn></mml:mrow></mml:math></inline-formula>), (<inline-formula><mml:math id="M47" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">3</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">5</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">7</mml:mn></mml:mrow></mml:math></inline-formula>), and (<inline-formula><mml:math id="M48" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">4</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">6</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">9</mml:mn></mml:mrow></mml:math></inline-formula>).</p>
      <?pagebreak page94?><p id="d1e1693">Validation was conducted using training set C, test set A, and test set B, while keeping all other variables constant and with an initial reduction ratio of 16. In the comparison of sets of dilation rates in Fig. 8, the combination of (<inline-formula><mml:math id="M49" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">3</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula>) achieved the best results with an average diagnostic accuracy of 97.5 %, indicating it has the strongest feature extraction performance.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F8"><?xmltex \currentcnt{8}?><?xmltex \def\figurename{Figure}?><label>Figure 8</label><caption><p id="d1e1719">Results of different sets of dilation rates.</p></caption>
            <?xmltex \igopts{width=179.252362pt}?><graphic xlink:href="https://ms.copernicus.org/articles/15/87/2024/ms-15-87-2024-f08.png"/>

          </fig>

</sec>
<sec id="Ch1.S5.SS1.SSS2">
  <label>5.1.2</label><title>The effect of reduction ratios</title>
      <p id="d1e1736">The attention mechanism enhances the model's ability to discern fault features by adjusting weights, and the reduction ratio stands out as a crucial parameter. This ratio reduces computational complexity by decreasing the number of input channels, aiding the model in efficiently capturing inner correlations. An appropriate reduction ratio helps regulate weight ranges, concentrating on the most pertinent elements in the input and enhancing the model's expressiveness. Typically, the reduction ratio is a positive integer, with common values being 2, 4, 8, 16, etc. Comparative diagnostic results for the model with different reduction ratios are presented in Fig. 9. The optimal ratio for this diagnostic task is 4.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F9"><?xmltex \currentcnt{9}?><?xmltex \def\figurename{Figure}?><label>Figure 9</label><caption><p id="d1e1741">Results of different reduction ratios.</p></caption>
            <?xmltex \igopts{width=179.252362pt}?><graphic xlink:href="https://ms.copernicus.org/articles/15/87/2024/ms-15-87-2024-f09.png"/>

          </fig>

</sec>
</sec>
<sec id="Ch1.S5.SS2">
  <label>5.2</label><title>Performance under different workloads</title>
      <p id="d1e1759">Due to production requirements and unforeseen external environmental factors, machines often operate under varying conditions, including speed, load, and temperature. These fluctuations impact the vibration frequency and amplitude of signals captured by accelerometers, leading to notable differences in signal characteristics. Consequently, these changes introduce interference, affecting the accuracy of classification. The variation in data distribution caused by these fluctuations significantly impacts the overall generalization performance of fault diagnosis models.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F10" specific-use="star"><?xmltex \currentcnt{10}?><?xmltex \def\figurename{Figure}?><label>Figure 10</label><caption><p id="d1e1764">Results of different workloads.</p></caption>
          <?xmltex \igopts{width=312.980315pt}?><graphic xlink:href="https://ms.copernicus.org/articles/15/87/2024/ms-15-87-2024-f10.png"/>

        </fig>

      <p id="d1e1773">It requires good robustness of the diagnostic model to adapt to various changes in operating conditions. In the following section, the adaptability of the models, i.e., the proposed method, ConvNet, DRSN-CW (Zhao et al., 2019), ResNet, and TICNN (Zhang et al., 2018), is tested under different workloads. The validation method involves training the model on a specific workload and then applying it to test sets from a different workload, with results presented in Fig. 10.</p>
      <p id="d1e1777">TICNN, configured with a wide kernel size of 64, consistently achieves accuracy above 96 % for the initial four workloads. However, a substantial decline in diagnostic accuracy is observed for C–A and C–B, plummeting to 78.06 % and 86.72 %, respectively. In contrast to TICNN, ConvNet employs small kernels in each layer. Notably, ConvNet gains better performance exclusively in the A–B scenario. This observation demonstrates that a wide kernel is instrumental in extracting crucial vibrational features from the signal while suppressing spurious feature interference.</p>
      <p id="d1e1780">DRSN-CW embeds soft thresholding within the architecture, achieving an average diagnostic accuracy of 96.78 % for the initial four loads. Nevertheless, it incorporates a limited number of filters, potentially impeding the extraction of intricate feature representations and diminishing the capacity to distinguish various fault features effectively. DRSN-CW can just achieve more than 80 % diagnostic ability under C–A and C–B.</p>
      <p id="d1e1783">ResNet uses the same filter quantity as DRSN-CW in the B–C scenario but falls behind DRSN-CW by over 9 %, indicating that incorporating soft thresholding aids the DRSN-CW model in learning more robust feature representation. Our model achieves the best average diagnostic results in multi-load domain adaptation tasks, demonstrating that our model in this study can learn more discriminative features about these defects and enhance the model's robustness in complex scenarios.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F11" specific-use="star"><?xmltex \currentcnt{11}?><?xmltex \def\figurename{Figure}?><label>Figure 11</label><caption><p id="d1e1788">t-SNE visualization results.</p></caption>
          <?xmltex \igopts{width=369.885827pt}?><graphic xlink:href="https://ms.copernicus.org/articles/15/87/2024/ms-15-87-2024-f11.png"/>

        </fig>

      <p id="d1e1797">To intuitively observe the final classification results, we employ the t-SNE algorithm (Van Der Maaten and Hinton, 2008) to map abstract features under the A–C workload into a comprehensible geometric plane, as illustrated in Fig. 11. Different colors represent various bearing failures, visually demonstrating the degree of separation among failure characteristics. The model equipped with the DEM module exhibits higher separation when distinguishing between different classes of samples, underscoring its ability to prioritize fault-relevant information and enabling more accurate classification decisions.</p>
</sec>
<?pagebreak page95?><sec id="Ch1.S5.SS3">
  <label>5.3</label><title>Performance with the JNU dataset</title>
      <p id="d1e1808">To assess the generalizability of the proposed model, validation experiments were performed using the JNU dataset. In order to gain a deeper understanding of the model's diagnostic capabilities, we introduced the confusion matrix to provide a detailed representation of the model's performance within each fault category.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F12" specific-use="star"><?xmltex \currentcnt{12}?><?xmltex \def\figurename{Figure}?><label>Figure 12</label><caption><p id="d1e1813">Comparison of confusion matrices.</p></caption>
          <?xmltex \igopts{width=398.338583pt}?><graphic xlink:href="https://ms.copernicus.org/articles/15/87/2024/ms-15-87-2024-f12.png"/>

        </fig>

      <p id="d1e1822"><?xmltex \hack{\newpage}?>As illustrated in Fig. 12, the four confusion matrices represent the diagnostic results of our model, TICNN, DRSN-CW, and ResNet. In the case of outer-race fault detection, DRSN-CW exhibits a 12 % probability of erroneously classifying such faults as roller element faults. Regarding roller element damage diagnosis, this model encounters a 2 % chance of confusion with inner-race faults and an 8.2 % likelihood of incorrect classification as outer-rac<?pagebreak page96?>e faults. TICNN accurately predicts all samples in the healthy state; however, it trails our model by 4 % in diagnosing inner-race faults and lags by a margin of 2 % in the other two fault scenarios. ResNet shows commendable accuracy in diagnosing faults in inner-race and healthy-state samples, with a 100 % correct prediction rate in these categories. Nonetheless, its diagnostic accuracy diminishes by 8 % and 4 % for the remaining fault types.</p>
      <p id="d1e1827">Our proposed model excels in superior diagnostic proficiency across a spectrum of bearing faults and degrees of damage, accurately detecting all instances of faults in three scenarios and attaining a 94 % accuracy in identifying outer-race faults. The experimental outcomes demonstrate that our model achieves a convincing performance with the JNU dataset, substantiating its robust generalizability.</p>
</sec>
</sec>
<sec id="Ch1.S6" sec-type="conclusions">
  <label>6</label><title>Conclusion</title>
      <p id="d1e1839">This study offers a novel framework based on convolutional neural networks for detecting industrial bearing faults. The framework incorporates dilated convolution, residual convolutional neural network, and attention mechanisms to gain rich fault feature representations and adaptively enhance key information. It is able to extract multi-scale features from nonlinear vibration signals to overcome the limitations of single-structure convolutional neural networks' weak flexibility and extraction capacity. Multiple types of experimental validation are performed on bearing datasets. The experimental results show that the proposed model considerably exceeds traditional CNNs in feature learning and classification ability.</p>
</sec>

      
      </body>
    <back><notes notes-type="dataavailability"><title>Data availability</title>

      <p id="d1e1846">The data in this study can be requested from the corresponding author.​​​​​​​</p>
  </notes><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d1e1852">BY conceptualized the work, decided on the methodology, and wrote the article. CX led the review and editing of the paper.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d1e1858">The contact author has declared that neither of the authors has any competing interests.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d1e1864">Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.</p>
  </notes><ack><title>Acknowledgements</title><?pagebreak page97?><p id="d1e1870">The authors would like to express their thanks for the open source data from Case Western Reserve University and Jiangnan University.</p></ack><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d1e1875">This research has been supported by the Natural Science Foundation of Heilongjiang Province (grant no. LH2021F002).</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d1e1881">This paper was edited by Jeong Hoon Ko and reviewed by three anonymous referees.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bib1"><label>1</label><?label 1?><mixed-citation>Chen, W. and Shi, K.: Multi-scale Attention Convolutional Neural Network for time series classification, Neural Networks, 136, 126–140, <ext-link xlink:href="https://doi.org/10.1016/j.neunet.2021.01.001" ext-link-type="DOI">10.1016/j.neunet.2021.01.001</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib2"><label>2</label><?label 1?><mixed-citation>Chen, X., Zhang, B., and Gao, D.: Bearing fault diagnosis base on multi-scale CNN and LSTM model, J. Intell. Manuf., 32, 971–987, <ext-link xlink:href="https://doi.org/10.1007/s10845-020-01600-2" ext-link-type="DOI">10.1007/s10845-020-01600-2</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib3"><label>3</label><?label 1?><mixed-citation> Goodfellow, I., Bengio, Y., and Courville, A.: Deep learning, MIT press, ISBN 9780262035613, 2016.</mixed-citation></ref>
      <ref id="bib1.bib4"><label>4</label><?label 1?><mixed-citation>He, K., Zhang, X., Ren, S., and Sun, J.: Deep residual learning for image recognition, Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, Nevada, USA, 26 June–1 July 2016, IEEE, 770–778, <ext-link xlink:href="https://doi.org/10.1109/cvpr.2016.90" ext-link-type="DOI">10.1109/cvpr.2016.90</ext-link>, 2016.​​​​​​​</mixed-citation></ref>
      <ref id="bib1.bib5"><label>5</label><?label 1?><mixed-citation>Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. R.: Improving neural networks by preventing co-adaptation of feature detectors, arXiv [preprint], <ext-link xlink:href="https://doi.org/10.48550/arXiv.1207.0580" ext-link-type="DOI">10.48550/arXiv.1207.0580</ext-link>, 3 July 2012.</mixed-citation></ref>
      <ref id="bib1.bib6"><label>6</label><?label 1?><mixed-citation>Hu, J., Shen, L., and Sun, G.: Squeeze-and-excitation networks, Proceedings of the IEEE conference on computer vision and pattern recognition, Salt Lake City, Utah, USA, 18–22 June 2018, IEEE, 7132–7141, <ext-link xlink:href="https://doi.org/10.1109/cvpr.2018.00745" ext-link-type="DOI">10.1109/cvpr.2018.00745</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bib7"><label>7</label><?label 1?><mixed-citation> Jiang, G., He, H., Yan, J., and Xie, P.: Multiscale convolutional neural networks for fault diagnosis of wind turbine gearbox, IEEE T. Ind. Electron., 66, 3196–3207, 2018.</mixed-citation></ref>
      <ref id="bib1.bib8"><label>8</label><?label 1?><mixed-citation>Jiao, J., Zhao, M., Lin, J., and Liang, K.: A comprehensive review on convolutional neural network in machine fault diagnosis, Neurocomputing, 417, 36–63, <ext-link xlink:href="https://doi.org/10.1016/j.neucom.2020.07.088" ext-link-type="DOI">10.1016/j.neucom.2020.07.088</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib9"><label>9</label><?label 1?><mixed-citation>Khan, A., Sohail, A., Zahoora, U., and Qureshi, A. S.: A survey of the recent architectures of deep convolutional neural networks, Artif. Intell. Rev., 53, 5455–5516, <ext-link xlink:href="https://doi.org/10.1007/s10462-020-09825-6" ext-link-type="DOI">10.1007/s10462-020-09825-6</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib10"><label>10</label><?label 1?><mixed-citation> Lessmeier, C., Kimotho, J. K., Zimmer, D., and Sextro, W.: Condition monitoring of bearing damage in electromechanical drive systems by using motor current signals of electric motors: A benchmark data set for data-driven classification, PHM Society European Conference, Bilbao, Spain, 5–8 July 2016, 152–156​​​​​​​, 2016.</mixed-citation></ref>
      <ref id="bib1.bib11"><label>11</label><?label 1?><mixed-citation>Li, K., Ping, X., Wang, H., Chen, P., and Cao, Y.: Sequential fuzzy diagnosis method for motor roller bearing in variable operating conditions based on vibration analysis, Sensors, 13, 8013–8041, <ext-link xlink:href="https://doi.org/10.3390/s130608013" ext-link-type="DOI">10.3390/s130608013</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bib12"><label>12</label><?label 1?><mixed-citation>Liang, H. and Zhao, X.: Rolling bearing fault diagnosis based on one-dimensional dilated convolution network with residual connection, IEEE Access, 9, 31078–31091, <ext-link xlink:href="https://doi.org/10.1109/access.2021.3059761" ext-link-type="DOI">10.1109/access.2021.3059761</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib13"><label>13</label><?label 1?><mixed-citation>Lin, J., Zuo, M. J., and Fyfe, K. R.: Mechanical fault detection based on the wavelet de-noising technique, J. Vib. Acoust., 126, 9–16, <ext-link xlink:href="https://doi.org/10.1115/1.1596552" ext-link-type="DOI">10.1115/1.1596552</ext-link>, 2004.</mixed-citation></ref>
      <ref id="bib1.bib14"><label>14</label><?label 1?><mixed-citation>Lin, M., Chen, Q., and Yan, S.: Network in network, arXiv [preprint], <ext-link xlink:href="https://doi.org/10.48550/arXiv.1312.4400" ext-link-type="DOI">10.48550/arXiv.1312.4400</ext-link>, 16 December 2013.</mixed-citation></ref>
      <ref id="bib1.bib15"><label>15</label><?label 1?><mixed-citation>Liu, R., Wang, F., Yang, B., and Qin, S. J.: Multiscale kernel based residual convolutional neural network for motor fault diagnosis under nonstationary conditions, IEEE T. Ind. Inform., 16, 3797–3806, <ext-link xlink:href="https://doi.org/10.1109/tii.2019.2941868" ext-link-type="DOI">10.1109/tii.2019.2941868</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bib16"><label>16</label><?label 1?><mixed-citation> Peng, D., Wang, H., Liu, Z., Zhang, W., Zuo, M. J., and Chen, J.: Multibranch and multiscale CNN for fault diagnosis of wheelset bearings under strong noise and variable load condition, IEEE T. Ind Inform, 16, 4949–4960, 2020.</mixed-citation></ref>
      <ref id="bib1.bib17"><label>17</label><?label 1?><mixed-citation> Roy, A. G., Navab, N., and Wachinger, C.: Recalibrating fully convolutional networks with spatial and channel “squeeze and excitation” blocks, IEEE T. Med. Imaging, 38, 540–549, 2018.</mixed-citation></ref>
      <ref id="bib1.bib18"><label>18</label><?label 1?><mixed-citation>Roy, S. S., Dey, S., and Chatterjee, S.: Autocorrelation aided random forest classifier-based bearing fault detection framework, IEEE Sens. J., 20, 10792–10800, <ext-link xlink:href="https://doi.org/10.1109/jsen.2020.2995109" ext-link-type="DOI">10.1109/jsen.2020.2995109</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib19"><label>19</label><?label 1?><mixed-citation>Smith, W. A. and Randall, R. B.: Rolling element bearing diagnostics using the Case Western Reserve University data: A benchmark study, Mech. Syst. Signal Pr., 64, 100–131, <ext-link xlink:href="https://doi.org/10.1016/j.ymssp.2015.04.021" ext-link-type="DOI">10.1016/j.ymssp.2015.04.021</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bib20"><label>20</label><?label 1?><mixed-citation>Surendran, R., Khalaf, O. I., and Andres, C.: Deep learning based intelligent industrial fault diagnosis model, CMC-Comput. Mater. Con., 70, 6323–6338, <ext-link xlink:href="https://doi.org/10.32604/cmc.2022.021716" ext-link-type="DOI">10.32604/cmc.2022.021716</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib21"><label>21</label><?label 1?><mixed-citation>Thoppil, N. M., Vasu, V., and Rao, C.: Deep learning algorithms for machinery health prognostics using time-series data: a review, J. Vib. Eng. Technol., 9, 1123–1145, <ext-link xlink:href="https://doi.org/10.1007/s42417-021-00286-x" ext-link-type="DOI">10.1007/s42417-021-00286-x</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib22"><label>22</label><?label 1?><mixed-citation>Tian, J., Morillo, C., Azarian, M. H., and Pecht, M.: Motor bearing fault detection using spectral kurtosis-based feature extraction coupled with <inline-formula><mml:math id="M50" display="inline"><mml:mi>K</mml:mi></mml:math></inline-formula>-nearest neighbor distance analysis, IEEE T. Ind. Electron., 63, 1793–1803, <ext-link xlink:href="https://doi.org/10.1109/tie.2015.2509913" ext-link-type="DOI">10.1109/tie.2015.2509913</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bib23"><label>23</label><?label 1?><mixed-citation>Tsao, W.-C., Li, Y.-F., Du Le, D., and Pan, M.-C.: An insight concept to select appropriate IMFs for envelope analysis of bearing fault diagnosis, Measurement, 45, 1489–1498, <ext-link xlink:href="https://doi.org/10.1016/j.measurement.2012.02.030" ext-link-type="DOI">10.1016/j.measurement.2012.02.030</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bib24"><label>24</label><?label 1?><mixed-citation> Van der Maaten, L. and Hinton, G.: Visualizing data using t-SNE, J. Mach. Learn. Res., 9, 2579–2605, 2008.​​​​​​​</mixed-citation></ref>
      <ref id="bib1.bib25"><label>25</label><?label 1?><mixed-citation>Wang, J., Mo, Z., Zhang, H., and Miao, Q.: A deep learning method for bearing fault diagnosis based on time-frequency image, IEEE Access, 7, 42373–42383, <ext-link xlink:href="https://doi.org/10.1109/access.2019.2907131" ext-link-type="DOI">10.1109/access.2019.2907131</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bib26"><label>26</label><?label 1?><mixed-citation>Wang, Z. and Ji, S.: Smoothed dilated convolutions for im proved dense prediction, Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining, London, United Kingdom, 19–23 August 2018, ACM, 2486–2495, <ext-link xlink:href="https://doi.org/10.1145/3219819.3219944" ext-link-type="DOI">10.1145/3219819.3219944</ext-link>, 2018.​​​​​​​</mixed-citation></ref>
      <ref id="bib1.bib27"><label>27</label><?label 1?><mixed-citation>Woo, S., Park, J., Lee, J.-Y., and Kweon, I. S.: Cbam: Convolutional block attention module, Proceedings of the European conferenc<?pagebreak page98?>e on computer vision (ECCV), Munich, Germany, 8–14 September 2018, Springer, 3–19, <ext-link xlink:href="https://doi.org/10.1007/978-3-030-01234-2_1" ext-link-type="DOI">10.1007/978-3-030-01234-2_1</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bib28"><label>28</label><?label 1?><mixed-citation>Yang, Y., Yu, D., and Cheng, J.: A fault diagnosis approach for roller bearing based on IMF envelope spectrum and SVM, Measurement, 40, 943–950, <ext-link xlink:href="https://doi.org/10.1016/j.measurement.2006.10.010" ext-link-type="DOI">10.1016/j.measurement.2006.10.010</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bib29"><label>29</label><?label 1?><mixed-citation>Ye, Z. and Yu, J.: AKSNet: A novel convolutional neural network with adaptive kernel width and sparse regularization for machinery fault diagnosis, J. Manuf. Syst., 59, 467–480, <ext-link xlink:href="https://doi.org/10.1016/j.jmsy.2021.03.022" ext-link-type="DOI">10.1016/j.jmsy.2021.03.022</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib30"><label>30</label><?label 1?><mixed-citation>Yen, G. G. and Lin, K.-C.: Wavelet packet feature extraction for vibration monitoring, IEEE T. Ind. Electron., 47, 650–667, <ext-link xlink:href="https://doi.org/10.1109/ijcnn.1999.836202" ext-link-type="DOI">10.1109/ijcnn.1999.836202</ext-link>, 2000.</mixed-citation></ref>
      <ref id="bib1.bib31"><label>31</label><?label 1?><mixed-citation>Yu, Y. and Junsheng, C.: A roller bearing fault diagnosis method based on EMD energy entropy and ANN, J. Sound Vib., 294, 269–277, <ext-link xlink:href="https://doi.org/10.1016/j.jsv.2005.11.002" ext-link-type="DOI">10.1016/j.jsv.2005.11.002</ext-link>, 2006. </mixed-citation></ref><?xmltex \hack{\newpage}?>
      <ref id="bib1.bib32"><label>32</label><?label 1?><mixed-citation>Zhang, W., Peng, G., Li, C., Chen, Y., and Zhang, Z.: A new deep learning model for fault diagnosis with good anti-noise and domain adaptation ability on raw vibration signals, Sensors, 17, 425, <ext-link xlink:href="https://doi.org/10.3390/s17020425" ext-link-type="DOI">10.3390/s17020425</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bib33"><label>33</label><?label 1?><mixed-citation>Zhang, W., Li, C., Peng, G., Chen, Y., and Zhang, Z.: A deep convolutional neural network with new training methods for bearing fault diagnosis under noisy environment and different working load, Mech. Syst. Signal Pr., 100, 439–453, <ext-link xlink:href="https://doi.org/10.1016/j.ymssp.2017.06.022" ext-link-type="DOI">10.1016/j.ymssp.2017.06.022</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bib34"><label>34</label><?label 1?><mixed-citation>Zhao, M., Zhong, S., Fu, X., Tang, B., and Pecht, M.: Deep residual shrinkage networks for fault diagnosis, IEEE T. Ind. Inform., 16, 4681–4690, <ext-link xlink:href="https://doi.org/10.1109/tii.2019.2943898" ext-link-type="DOI">10.1109/tii.2019.2943898</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bib35"><label>35</label><?label 1?><mixed-citation>Zhu, Z., Peng, G., Chen, Y., and Gao, H.: A convolutional neural network based on a capsule network with strong generalization for bearing fault diagnosis, Neurocomputing, 323, 62–75, <ext-link xlink:href="https://doi.org/10.1016/j.neucom.2018.09.050" ext-link-type="DOI">10.1016/j.neucom.2018.09.050</ext-link>, 2019.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>A convolutional neural-network-based diagnostic framework for industrial bearing</article-title-html>
<abstract-html/>
<ref-html id="bib1.bib1"><label>1</label><mixed-citation>
      
Chen, W. and Shi, K.: Multi-scale Attention Convolutional Neural Network for time series classification, Neural Networks, 136, 126–140, <a href="https://doi.org/10.1016/j.neunet.2021.01.001" target="_blank">https://doi.org/10.1016/j.neunet.2021.01.001</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>2</label><mixed-citation>
      
Chen, X., Zhang, B., and Gao, D.: Bearing fault diagnosis base on
multi-scale CNN and LSTM model, J. Intell. Manuf., 32,
971–987, <a href="https://doi.org/10.1007/s10845-020-01600-2" target="_blank">https://doi.org/10.1007/s10845-020-01600-2</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>3</label><mixed-citation>
      
Goodfellow, I., Bengio, Y., and Courville, A.: Deep learning, MIT
press, ISBN 9780262035613, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>4</label><mixed-citation>
      
He, K., Zhang, X., Ren, S., and Sun, J.: Deep residual learning for image recognition, Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, Nevada, USA, 26 June–1 July 2016, IEEE, 770–778, <a href="https://doi.org/10.1109/cvpr.2016.90" target="_blank">https://doi.org/10.1109/cvpr.2016.90</a>, 2016.​​​​​​​

    </mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>5</label><mixed-citation>
      
Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., and
Salakhutdinov, R. R.: Improving neural networks by preventing co-adaptation
of feature detectors, arXiv [preprint], <a href="https://doi.org/10.48550/arXiv.1207.0580" target="_blank">https://doi.org/10.48550/arXiv.1207.0580</a>, 3 July 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>6</label><mixed-citation>
      
Hu, J., Shen, L., and Sun, G.: Squeeze-and-excitation networks, Proceedings of the IEEE conference on computer vision and pattern recognition, Salt Lake City, Utah, USA, 18–22 June 2018, IEEE, 7132–7141, <a href="https://doi.org/10.1109/cvpr.2018.00745" target="_blank">https://doi.org/10.1109/cvpr.2018.00745</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>7</label><mixed-citation>
      
Jiang, G., He, H., Yan, J., and Xie, P.: Multiscale convolutional neural
networks for fault diagnosis of wind turbine gearbox, IEEE T. Ind. Electron., 66, 3196–3207, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>8</label><mixed-citation>
      
Jiao, J., Zhao, M., Lin, J., and Liang, K.: A comprehensive review on
convolutional neural network in machine fault diagnosis, Neurocomputing,
417, 36–63, <a href="https://doi.org/10.1016/j.neucom.2020.07.088" target="_blank">https://doi.org/10.1016/j.neucom.2020.07.088</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>9</label><mixed-citation>
      
Khan, A., Sohail, A., Zahoora, U., and Qureshi, A. S.: A survey of the
recent architectures of deep convolutional neural networks, Artif.
Intell. Rev., 53, 5455–5516, <a href="https://doi.org/10.1007/s10462-020-09825-6" target="_blank">https://doi.org/10.1007/s10462-020-09825-6</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>10</label><mixed-citation>
      
Lessmeier, C., Kimotho, J. K., Zimmer, D., and Sextro, W.: Condition monitoring of bearing damage in electromechanical drive systems by using motor current signals of electric motors: A benchmark data set for data-driven classification, PHM Society European Conference, Bilbao, Spain, 5–8 July 2016, 152–156​​​​​​​, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>11</label><mixed-citation>
      
Li, K., Ping, X., Wang, H., Chen, P., and Cao, Y.: Sequential fuzzy
diagnosis method for motor roller bearing in variable operating conditions
based on vibration analysis, Sensors, 13, 8013–8041, <a href="https://doi.org/10.3390/s130608013" target="_blank">https://doi.org/10.3390/s130608013</a>, 2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>12</label><mixed-citation>
      
Liang, H. and Zhao, X.: Rolling bearing fault diagnosis based on
one-dimensional dilated convolution network with residual connection, IEEE
Access, 9, 31078–31091, <a href="https://doi.org/10.1109/access.2021.3059761" target="_blank">https://doi.org/10.1109/access.2021.3059761</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>13</label><mixed-citation>
      
Lin, J., Zuo, M. J., and Fyfe, K. R.: Mechanical fault detection based on
the wavelet de-noising technique, J. Vib. Acoust., 126, 9–16, <a href="https://doi.org/10.1115/1.1596552" target="_blank">https://doi.org/10.1115/1.1596552</a>, 2004.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>14</label><mixed-citation>
      
Lin, M., Chen, Q., and Yan, S.: Network in network, arXiv [preprint], <a href="https://doi.org/10.48550/arXiv.1312.4400" target="_blank">https://doi.org/10.48550/arXiv.1312.4400</a>, 16 December 2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>15</label><mixed-citation>
      
Liu, R., Wang, F., Yang, B., and Qin, S. J.: Multiscale kernel based
residual convolutional neural network for motor fault diagnosis under
nonstationary conditions, IEEE T. Ind. Inform., 16, 3797–3806, <a href="https://doi.org/10.1109/tii.2019.2941868" target="_blank">https://doi.org/10.1109/tii.2019.2941868</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>16</label><mixed-citation>
      
Peng, D., Wang, H., Liu, Z., Zhang, W., Zuo, M. J., and Chen, J.:
Multibranch and multiscale CNN for fault diagnosis of wheelset bearings
under strong noise and variable load condition, IEEE T. Ind Inform, 16, 4949–4960, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>17</label><mixed-citation>
      
Roy, A. G., Navab, N., and Wachinger, C.: Recalibrating fully convolutional
networks with spatial and channel “squeeze and excitation” blocks, IEEE
T. Med. Imaging, 38, 540–549, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>18</label><mixed-citation>
      
Roy, S. S., Dey, S., and Chatterjee, S.: Autocorrelation aided random forest
classifier-based bearing fault detection framework, IEEE Sens. J.,
20, 10792–10800, <a href="https://doi.org/10.1109/jsen.2020.2995109" target="_blank">https://doi.org/10.1109/jsen.2020.2995109</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>19</label><mixed-citation>
      
Smith, W. A. and Randall, R. B.: Rolling element bearing diagnostics using
the Case Western Reserve University data: A benchmark study, Mech.
Syst. Signal Pr., 64, 100–131, <a href="https://doi.org/10.1016/j.ymssp.2015.04.021" target="_blank">https://doi.org/10.1016/j.ymssp.2015.04.021</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>20</label><mixed-citation>
      
Surendran, R., Khalaf, O. I., and Andres, C.: Deep learning based
intelligent industrial fault diagnosis model, CMC-Comput. Mater. Con., 70, 6323–6338, <a href="https://doi.org/10.32604/cmc.2022.021716" target="_blank">https://doi.org/10.32604/cmc.2022.021716</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>21</label><mixed-citation>
      
Thoppil, N. M., Vasu, V., and Rao, C.: Deep learning algorithms for
machinery health prognostics using time-series data: a review, J.
Vib. Eng. Technol., 9, 1123–1145, <a href="https://doi.org/10.1007/s42417-021-00286-x" target="_blank">https://doi.org/10.1007/s42417-021-00286-x</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>22</label><mixed-citation>
      
Tian, J., Morillo, C., Azarian, M. H., and Pecht, M.: Motor bearing fault
detection using spectral kurtosis-based feature extraction coupled with
<i>K</i>-nearest neighbor distance analysis, IEEE T. Ind. Electron., 63, 1793–1803, <a href="https://doi.org/10.1109/tie.2015.2509913" target="_blank">https://doi.org/10.1109/tie.2015.2509913</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>23</label><mixed-citation>
      
Tsao, W.-C., Li, Y.-F., Du Le, D., and Pan, M.-C.: An insight concept to
select appropriate IMFs for envelope analysis of bearing fault diagnosis,
Measurement, 45, 1489–1498, <a href="https://doi.org/10.1016/j.measurement.2012.02.030" target="_blank">https://doi.org/10.1016/j.measurement.2012.02.030</a>, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>24</label><mixed-citation>
      
Van der Maaten, L. and Hinton, G.: Visualizing data using t-SNE, J.
Mach. Learn. Res., 9, 2579–2605, 2008.​​​​​​​

    </mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>25</label><mixed-citation>
      
Wang, J., Mo, Z., Zhang, H., and Miao, Q.: A deep learning method for
bearing fault diagnosis based on time-frequency image, IEEE Access, 7,
42373–42383, <a href="https://doi.org/10.1109/access.2019.2907131" target="_blank">https://doi.org/10.1109/access.2019.2907131</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>26</label><mixed-citation>
      
Wang, Z. and Ji, S.: Smoothed dilated convolutions for im proved dense prediction, Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining, London, United Kingdom, 19–23 August 2018, ACM, 2486–2495, <a href="https://doi.org/10.1145/3219819.3219944" target="_blank">https://doi.org/10.1145/3219819.3219944</a>, 2018.​​​​​​​

    </mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>27</label><mixed-citation>
      
Woo, S., Park, J., Lee, J.-Y., and Kweon, I. S.: Cbam: Convolutional
block attention module, Proceedings of the European conference on computer vision (ECCV), Munich, Germany, 8–14 September 2018, Springer, 3–19, <a href="https://doi.org/10.1007/978-3-030-01234-2_1" target="_blank">https://doi.org/10.1007/978-3-030-01234-2_1</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>28</label><mixed-citation>
      
Yang, Y., Yu, D., and Cheng, J.: A fault diagnosis approach for roller
bearing based on IMF envelope spectrum and SVM, Measurement, 40, 943–950,
<a href="https://doi.org/10.1016/j.measurement.2006.10.010" target="_blank">https://doi.org/10.1016/j.measurement.2006.10.010</a>, 2007.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>29</label><mixed-citation>
      
Ye, Z. and Yu, J.: AKSNet: A novel convolutional neural network with
adaptive kernel width and sparse regularization for machinery fault
diagnosis, J. Manuf. Syst., 59, 467–480, <a href="https://doi.org/10.1016/j.jmsy.2021.03.022" target="_blank">https://doi.org/10.1016/j.jmsy.2021.03.022</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>30</label><mixed-citation>
      
Yen, G. G. and Lin, K.-C.: Wavelet packet feature extraction for vibration
monitoring, IEEE T. Ind. Electron., 47, 650–667,
<a href="https://doi.org/10.1109/ijcnn.1999.836202" target="_blank">https://doi.org/10.1109/ijcnn.1999.836202</a>, 2000.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>31</label><mixed-citation>
      
Yu, Y. and Junsheng, C.: A roller bearing fault diagnosis method based on
EMD energy entropy and ANN, J. Sound Vib., 294, 269–277,
<a href="https://doi.org/10.1016/j.jsv.2005.11.002" target="_blank">https://doi.org/10.1016/j.jsv.2005.11.002</a>, 2006.


    </mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>32</label><mixed-citation>
      
Zhang, W., Peng, G., Li, C., Chen, Y., and Zhang, Z.: A new deep learning
model for fault diagnosis with good anti-noise and domain adaptation ability
on raw vibration signals, Sensors, 17, 425, <a href="https://doi.org/10.3390/s17020425" target="_blank">https://doi.org/10.3390/s17020425</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>33</label><mixed-citation>
      
Zhang, W., Li, C., Peng, G., Chen, Y., and Zhang, Z.: A deep convolutional
neural network with new training methods for bearing fault diagnosis under
noisy environment and different working load, Mech. Syst. Signal Pr., 100, 439–453, <a href="https://doi.org/10.1016/j.ymssp.2017.06.022" target="_blank">https://doi.org/10.1016/j.ymssp.2017.06.022</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>34</label><mixed-citation>
      
Zhao, M., Zhong, S., Fu, X., Tang, B., and Pecht, M.: Deep residual
shrinkage networks for fault diagnosis, IEEE T. Ind. Inform., 16, 4681–4690,
<a href="https://doi.org/10.1109/tii.2019.2943898" target="_blank">https://doi.org/10.1109/tii.2019.2943898</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>35</label><mixed-citation>
      
Zhu, Z., Peng, G., Chen, Y., and Gao, H.: A convolutional neural network
based on a capsule network with strong generalization for bearing fault
diagnosis, Neurocomputing, 323, 62–75, <a href="https://doi.org/10.1016/j.neucom.2018.09.050" target="_blank">https://doi.org/10.1016/j.neucom.2018.09.050</a>, 2019.

    </mixed-citation></ref-html>--></article>
