Published Online

Display Method:
Deep Audio-visual Learning: A Survey
Hao Zhu, Man-Di Luo, Rui Wang, Ai-Hua Zheng, Ran He
doi: 10.1007/s11633-021-1293-0
Audio-visual learning, aimed at exploiting the relationship between audio and visual modalities, has drawn considerable attention since deep learning started to be used successfully. Researchers tend to leverage these two modalities to improve the performance of previously considered single-modality tasks or address new challenging problems. In this paper, we provide a comprehensive survey of recent audio-visual learning development. We divide the current audio-visual learning tasks into four different subfields: audio-visual separation and localization, audio-visual correspondence learning, audio-visual generation, and audio-visual representation learning. State-of-the-art methods, as well as the remaining challenges of each subfield, are further discussed. Finally, we summarize the commonly used datasets and challenges.
Advances in Deep Learning Methods for Visual Tracking: Literature Review and Fundamentals
Xiao-Qin Zhang, Run-Hua Jiang, Chen-Xiang Fan, Tian-Yu Tong, Tao Wang, Peng-Cheng Huang
doi: 10.1007/s11633-020-1274-8
Recently, deep learning has achieved great success in visual tracking tasks, particularly in single-object tracking. This paper provides a comprehensive review of state-of-the-art single-object tracking algorithms based on deep learning. First, we introduce basic knowledge of deep visual tracking, including fundamental concepts, existing algorithms, and previous reviews. Second, we briefly review existing deep learning methods by categorizing them into data-invariant and data-adaptive methods based on whether they can dynamically change their model parameters or architectures. Then, we conclude with the general components of deep trackers. In this way, we systematically analyze the novelties of several recently proposed deep trackers. Thereafter, popular datasets such as Object Tracking Benchmark (OTB) and Visual Object Tracking (VOT) are discussed, along with the performances of several deep trackers. Finally, based on observations and experimental results, we discuss three different characteristics of deep trackers, i.e., the relationships between their general components, exploration of more effective tracking frameworks, and interpretability of their motion estimation components.
A Comprehensive Review of Group Activity Recognition in Videos
Li-Fang Wu, Qi Wang, Meng Jian, Yu Qiao, Bo-Xuan Zhao
doi: 10.1007/s11633-020-1258-8
Human group activity recognition (GAR) has attracted significant attention from computer vision researchers due to its wide practical applications in security surveillance, social role understanding and sports video analysis. In this paper, we give a comprehensive overview of the advances in group activity recognition in videos during the past 20 years. First, we provide a summary and comparison of 11 GAR video datasets in this field. Second, we survey the group activity recognition methods, including those based on handcrafted features and those based on deep learning networks. For better understanding of the pros and cons of these methods, we compare various models from the past to the present. Finally, we outline several challenging issues and possible directions for future research. From this comprehensive literature review, readers can obtain an overview of progress in group activity recognition for future studies.
Research Article
Design and Analysis of a Novel 2T2R Parallel Mechanism with the Closed-loop Limbs
Hai-Rong Fang, Peng-Fei Liu, Hui Yang, Bing-Shan Jiang
doi: 10.1007/s11633-021-1294-z
This paper presents a novel four degrees of freedom (DOF) parallel mechanism with the closed-loop limbs, which includes two translational (2T) DOF and two rotational (2R) DOF. By connecting the proposed parallel mechanism with the guide rail in series, the 5-DOF hybrid robot system is obtained, which can be applied for the composite material tape laying in aerospace industry. The analysis in this paper mainly focuses on the parallel module of the hybrid robot system. First, the freedom of the proposed parallel mechanism is calculated based on the screw theory. Then, according to the closed-loop vector equation, the inverse kinematics and Jacobian matrix of the parallel mechanism are carried out. Next, the workspace stiffness and dexterity analysis of the parallel mechanism are investigated based on the constraint equations, static stiffness matrix and Jacobian condition number. Finally, the correctness of the inverse kinematics and the high stiffness of the parallel mechanism are verified by the kinematics and stiffness simulation analysis, which lays a foundation for the automatic composite material tape laying.
2D and 3D Palmprint and Palm Vein Recognition Based on Neural Architecture Search
Wei Jia, Wei Xia, Yang Zhao, Hai Min, Yan-Xiang Chen
doi: 10.1007/s11633-021-1292-1
Palmprint recognition and palm vein recognition are two emerging biometrics technologies. In the past two decades, many traditional methods have been proposed for palmprint recognition and palm vein recognition and have achieved impressive results. In recent years, in the field of artificial intelligence, deep learning has gradually become the mainstream recognition technology because of its excellent recognition performance. Some researchers have tried to use convolutional neural networks (CNNs) for palmprint recognition and palm vein recognition. However, the architectures of these CNNs have mostly been developed manually by human experts, which is a time-consuming and error-prone process. In order to overcome some shortcomings of manually designed CNN, neural architecture search (NAS) technology has become an important research direction of deep learning. The significance of NAS is to solve the deep learning model′s parameter adjustment problem, which is a cross-study combining optimization and machine learning. NAS technology represents the future development direction of deep learning. However, up to now, NAS technology has not been well studied for palmprint recognition and palm vein recognition. In this paper, in order to investigate the problem of NAS-based 2D and 3D palmprint recognition and palm vein recognition in-depth, we conduct a performance evaluation of twenty representative NAS methods on five 2D palmprint databases, two palm vein databases, and one 3D palmprint database. Experimental results show that some NAS methods can achieve promising recognition results. Remarkably, among different evaluated NAS methods, ProxylessNAS achieves the best recognition performance.
Low-cost Position and Force Measurement System for Payload Transport Using UAVs
Daniel Ceferino Gandolfo, Claudio D. Rosales, Lucio R. Salinas, J. Gimenez, Ricardo Carelli
doi: 10.1007/s11633-021-1281-4
In recent years, multiple applications have emerged in the area of payload transport using unmanned aerial vehicles (UAVs). This has attracted considerable interest among the scientific community, especially the cases involving one or several rotarywing UAVs. In this context, this work proposes a novel measurement system which can estimate the payload position and the force exerted by it on the UAV. This measurement system is low cost, easy to implement, and can be used either in indoor or outdoor environments (no sensorized laboratory is needed). The measurement system is validated statically and dynamically. In the first test, the estimations obtained by the system are compared with measurements produced by high-precision devices. In the second test, the system is used in real experiments to compare its performance with the ones obtained using known procedures. These experiments allowed to draw interesting conclusions on which future research can be based.
Fuzzy Tuned PID Controller for Envisioned Agricultural Manipulator
Satyam Paul, Ajay Arunachalam, Davood Khodadad, Henrik Andreasson, Olena Rubanenko
doi: 10.1007/s11633-021-1280-5
The implementation of image-based phenotyping systems has become an important aspect of crop and plant science research which has shown tremendous growth over the years. Accurate determination of features using images requires stable imaging and very precise processing. By installing a camera on a mechanical arm driven by motor, the maintenance of accuracy and stability becomes non-trivial. As per the state-of-the-art, the issue of external camera shake incurred due to vibration is a great concern in capturing accurate images, which may be induced by the driving motor of the manipulator. So, there is a requirement for a stable active controller for sufficient vibration attenuation of the manipulator. However, there are very few reports in agricultural practices which use control algorithms. Although, many control strategies have been utilized to control the vibration in manipulators associated to various applications, no control strategy with validated stability has been provided to control the vibration in such envisioned agricultural manipulator with simple low-cost hardware devices with the compensation of non-linearities. So, in this work, the combination of proportional-integral-differential (PID) control with type-2 fuzzy logic (T2-F-PID) is implemented for vibration control. The validation of the controller stability using Lyapunov analysis is established. A torsional actuator (TA) is applied for mitigating torsional vibration, which is a new contribution in the area of agricultural manipulators. Also, to prove the effectiveness of the controller, the vibration attenuation results with T2-F-PID is compared with conventional PD/PID controllers, and a type-1 fuzzy PID (T1-F-PID) controller.
Fault Information Recognition for On-board Equipment of High-speed Railway Based on Multi-Neural Network Collaboration
Lu-Jie Zhou, Jian-Wu Dang, Zhen-Hai Zhang
doi: 10.1007/s11633-021-1298-8
It is of great significance to guarantee the efficient statistics of high-speed railway on-board equipment fault information, which also improves the efficiency of fault analysis. Considering this background, this paper presents an empirical exploration of named entity recognition (NER) of on-board equipment fault information. Based on the historical fault records of on-board equipment, a fault information recognition model based on multi-neural network collaboration is proposed. First, considering Chinese recorded data characteristics, a method of constructing semantic features and additional features based on character granularity is proposed. Then, the two feature representations are concatenated and passed into the gated convolutional layer to extract the dependencies from multiple different subspaces and adjacent characters in parallel. Next, the local features are transmitted to the bidirectional long short-term memory (BiLSTM) to learn long-term dependency information. On top of BiLSTM, the sequential conditional random field (CRF) is used to jointly decode the optimized tag sequence of the whole sentence. The model is tested and compared with other representative baseline models. The results show that the proposed model not only considers the language characteristics of on-board fault records, but also has obvious advantages on the performance of fault information recognition.
Identification and Classification of Driving Behaviour at Signalized Intersections Using Support Vector Machine
Soni Lanka Karri, Liyanage Chandratilak De Silva, Daphne Teck Ching Lai, Shiaw Yin Yong
doi: 10.1007/s11633-021-1295-y
When the drivers approaching signalized intersections (onset of yellow signal), the drivers would enter into a zone, where they will be in uncertain mode assessing their capabilities to stop or cross the intersection. Therefore, any improper decision might lead to a right-angle or back-end crash. To avoid a right-angle collision, drivers apply the harsh brakes to stop just before the signalized intersection. But this may lead to a back-end crash when the following driver encounters the former′s sudden stopping decision. This situation gets multifaceted when the traffic is heterogeneous, containing various types of vehicles. In order to reduce this issue, this study′s primary objective is to identify the driving behaviour at signalized intersections based on the driving features (parameters). The secondary objective is to classify the outcome of driving behaviour (safe stopping and unsafe stopping) at the signalized intersection using a support vector machine (SVM) technique. Turning moments are used to identify the zones and label them accordingly for further classification. The classification of 50 instances is identified for training and testing using a 70%−30% rule resulted in an accuracy of 85% and 86%, respectively. Classification performance is further verified by random sampling using five cross-validation and 30 iterations, which gave an accuracy of 97% and 100% for training and testing. These results demonstrate that the proposed approach can help develop a pre-warning system to alert the drivers approaching signalized intersections, thus reducing back-end crash and accidents.
A Novel Heterogeneous Actor-Critic Algorithm with Recent Emphasizing Replay Memory
Bao Xi, Rui Wang, Ying-Hao Cai, Tao Lu, Shuo Wang
doi: 10.1007/s11633-021-1296-x
Reinforcement learning (RL) algorithms have been demonstrated to solve a variety of continuous control tasks. However, the training efficiency and performance of such methods limit further applications. In this paper, we propose an off-policy heterogeneous actor-critic (HAC) algorithm, which contains soft Q-function and ordinary Q-function. The soft Q-function encourages the exploration of a Gaussian policy, and the ordinary Q-function optimizes the mean of the Gaussian policy to improve the training efficiency. Experience replay memory is another vital component of off-policy RL methods. We propose a new sampling technique that emphasizes recently experienced transitions to boost the policy training. Besides, we integrate HAC with hindsight experience replay (HER) to deal with sparse reward tasks, which are common in the robotic manipulation domain. Finally, we evaluate our methods on a series of continuous control benchmark tasks and robotic manipulation tasks. The experimental results show that our method outperforms prior state-of-the-art methods in terms of training efficiency and performance, which validate the effectiveness of our method.
DLA+: A Light Aggregation Network for Object Classification and Detection
Fu-Tian Wang, Li Yang, Jin Tang, Si-Bao Chen, Xin Wang
doi: 10.1007/s11633-021-1287-y
An efficient convolution neural network (CNN) plays a crucial role in various visual tasks like object classification or detection, etc. The most common way to construct a CNN is stacking the same convolution block or complex connection. These approaches may be efficient but the parameter size and computation (Comp) have explosive growth. So we present a novel architecture called “DLA+”, which could obtain the feature from the different stages, and by the newly designed convolution block, could achieve better accuracy, while also dropping the computation six times compared to the baseline. We design some experiments about classification and object detection. On the CIFAR10 and VOC data-sets, we get better precision and faster speed than other architecture. The lightweight network even allows us to deploy to some low-performance device like drone, laptop, etc.
PokerNet: Expanding Features Cheaply via Depthwise Convolutions
Wei Tang, Yan Huang, Liang Wang
doi: 10.1007/s11633-021-1288-x
Pointwise convolution is usually utilized to expand or squeeze features in modern lightweight deep models. However, it takes up most of the overall computational cost (usually more than 90%). This paper proposes a novel Poker module to expand features by taking advantage of cheap depthwise convolution. As a result, the Poker module can greatly reduce the computational cost, and meanwhile generate a large number of effective features to guarantee the performance. The proposed module is standardized and can be employed wherever the feature expansion is needed. By varying the stride and the number of channels, different kinds of bottlenecks are designed to plug the proposed Poker module into the network. Thus, a lightweight model can be easily assembled. Experiments conducted on benchmarks reveal the effectiveness of our proposed Poker module. And our PokerNet models can reduce the computational cost by 7.1%−15.6%. PokerNet models achieve comparable or even higher recognition accuracy than previous state-of-the-art (SOTA) models on the ImageNet ILSVRC2012 classification dataset. Code is available at
Fault Classification for On-board Equipment of High-speed Railway Based on Attention Capsule Network
Lu-Jie Zhou, Jian-Wu Dang, Zhen-Hai Zhang
doi: 10.1007/s11633-021-1291-2
The conventional troubleshooting methods for high-speed railway on-board equipment, with over-reliance on personnel experience, is characterized by one-sidedness and low efficiency. In the process of high-speed train operation, numerous text-based on-board logs are recorded by on-board computers. Machine learning methods can help technicians make a correct judgment of fault types using the on-board log reasonably. Therefore, a fault classification model of on-board equipment based on attention capsule networks is proposed. This paper presents an empirical exploration of the application of a capsule network with dynamic routing in fault classification. A capsule network can encode the internal spatial part-whole relationship between various entities to identify the fault types. As the importance of each word in the on-board log and the dependencies between them have a significant impact on fault classification, an attention mechanism is incorporated into the capsule network to distill important information. Considering the imbalanced distribution of normal data and fault data in the on-board log, the focal loss function is introduced into the model to adjust the imbalanced data. The experiments are conducted on the on-board log of a railway bureau and compared with other baseline models. The experimental results demonstrate that our model outperforms the compared baseline methods, proving the superiority and competitiveness of our model.
Skill Learning for Robotic Insertion Based on One-shot Demonstration and Reinforcement Learning
Ying Li, De Xu
doi: 10.1007/s11633-021-1290-3
In this paper, an efficient skill learning framework is proposed for robotic insertion, based on one-shot demonstration and reinforcement learning. First, the robot action is composed of two parts: expert action and refinement action. A force Jacobian matrix is calibrated with only one demonstration, based on which stable and safe expert action can be generated. The deep deterministic policy gradients (DDPG) method is employed to learn the refinement action, which aims to improve the assembly efficiency. Second, an episode-step exploration strategy is developed, which uses the expert action as a benchmark and adjusts the exploration intensity dynamically. A safety-efficiency reward function is designed for the compliant insertion. Third, to improve the adaptability with different components, a skill saving and selection mechanism is proposed. Several typical components are used to train the skill models. And the trained models and force Jacobian matrices are saved in a skill pool. Given a new component, the most appropriate model is selected from the skill pool according to the force Jacobian matrix and directly used to accomplish insertion tasks. Fourth, a simulation environment is established under the guidance of the force Jacobian matrix, which avoids tedious training process on real robotic systems. Simulation and experiments are conducted to validate the effectiveness of the proposed methods.
Designing an Intelligent Control Philosophy in Reservoirs of Water Transfer Networks in Supervisory Control and Data Acquisition System Stations
Ali Dolatshahi Zand, Kaveh Khalili-Damghani, Sadigh Raissi
doi: 10.1007/s11633-021-1284-1
In this paper, a hybrid neural-genetic fuzzy system is proposed to control the flow and height of water in the reservoirs of water transfer networks. These controls will avoid probable water wastes in the reservoirs and pressure drops in water distribution networks. The proposed approach combines the artificial neural network, genetic algorithm, and fuzzy inference system to improve the performance of the supervisory control and data acquisition stations through a new control philosophy for instruments and control valves in the reservoirs of the water transfer networks. First, a multi-core artificial neural network model, including a multi-layer perceptron and radial based function, is proposed to forecast the daily consumption of the water in a reservoir. A genetic algorithm is proposed to optimize the parameters of the artificial neural networks. Then, the online height of water in the reservoir and the output of artificial neural networks are used as inputs of a fuzzy inference system to estimate the flow rate of the reservoir inlet. Finally, the estimated inlet flow is translated into the input valve position using a transform control unit supported by a nonlinear autoregressive exogenous model. The proposed approach is applied in the Tehran water transfer network. The results of this study show that the usage of the proposed approach significantly reduces the deviation of the reservoir height from the desired levels.
STRNet: Triple-stream Spatiotemporal Relation Network for Action Recognition
Zhi-Wei Xu, Xiao-Jun Wu, Josef Kittler
doi: 10.1007/s11633-021-1289-9
Learning comprehensive spatiotemporal features is crucial for human action recognition. Existing methods tend to model the spatiotemporal feature blocks in an integrate-separate-integrate form, such as appearance-and-relation network (ARTNet) and spatiotemporal and motion network (STM). However, with blocks stacking up, the rear part of the network has poor interpretability. To avoid this problem, we propose a novel architecture called spatial temporal relation network (STRNet), which can learn explicit information of appearance, motion and especially the temporal relation information. Specifically, our STRNet is constructed by three branches, which separates the features into 1) appearance pathway, to obtain spatial semantics, 2) motion pathway, to reinforce the spatiotemporal feature representation, and 3) relation pathway, to focus on capturing temporal relation details of successive frames and to explore long-term representation dependency. In addition, our STRNet does not just simply merge the multi-branch information, but we apply a flexible and effective strategy to fuse the complementary information from multiple pathways. We evaluate our network on four major action recognition benchmarks: Kinetics-400, UCF-101, HMDB-51, and Something-Something v1, demonstrating that the performance of our STRNet achieves the state-of-the-art result on the UCF-101 and HMDB-51 datasets, as well as a comparable accuracy with the state-of-the-art method on Something-Something v1 and Kinetics-400.
EDT Method for Multiple Labelled Objects Subject to Tied Distances
Andre Marasca, Andre Backes, Fabio Favarim, Marcelo Teixeira, Dalcimar Casanova
doi: 10.1007/s11633-021-1285-0
The success of new scientific areas can be assessed by their potential for contributing to new theoretical approaches aligned with real-world applications. The Euclidean distance transform (EDT) has fared well in both cases, providing a sound theoretical basis for a number of applications, such as median axis transform, fractal analysis, skeletonization, and Voronoi diagrams. Despite its wide applicability, the discrete form of the EDT includes interesting properties that have not yet been fully exploited in the literature. In this paper, we are particularly interested in the properties of 1) working with multiple objects/labels; and 2) identifying and counting equidistant pixels/voxels from certain points of interest. In some domains (such as dataset classification, texture, and complexity analysis), the result of applying the EDT transform with different objects, and their respective tied distances, may compromise the performance. In this sense, we propose an efficient modification in the method presented in [1], which leads to a novel approach for computing the distance transform in a space with multiple objects, and for counting equidistant pixels/voxels.
Research on Voiceprint Recognition of Camouflage Voice Based on Deep Belief Network
Nan Jiang, Ting Liu
doi: 10.1007/s11633-021-1283-2
The problem of disguised voice recognition based on deep belief networks is studied. A hybrid feature extraction algorithm based on formants, Gammatone frequency cepstrum coefficients (GFCC) and their different coefficients is proposed to extract more discriminative speaker features from the original voice data. Using mixed features as the input of the model, a masquerade voice library is constructed. A masquerade voice recognition model based on a depth belief network is proposed. A dropout strategy is introduced to prevent overfitting, which effectively solves the problems of traditional Gaussian mixture models, such as insufficient modeling ability and low discrimination. Experimental results show that the proposed disguised voice recognition method can better fit the feature distribution, and significantly improve the classification effect and recognition rate.
Robust Optimal Higher-order-observer-based Dynamic Sliding Mode Control for VTOL Unmanned Aerial Vehicles
Yashar Mousavi, Amin Zarei, Arash Mousavi, Mohsen Biari
doi: 10.1007/s11633-021-1282-3
This paper investigates the precise trajectory tracking of unmanned aerial vehicles (UAV) capable of vertical take-off and landing (VTOL) subjected to external disturbances. For this reason, a robust higher-order-observer-based dynamic sliding mode controller (HOB-DSMC) is developed and optimized using the fractional-order firefly algorithm (FOFA). In the proposed scheme, the sliding surface is defined as a function of output variables, and the higher-order observer is utilized to estimate the unmeasured variables, which effectively alleviate the undesirable effects of the chattering phenomenon. A neighboring point close to the sliding surface is considered, and as the tracking error approaches this point, the second control is activated to reduce the control input. The stability analysis of the closed-loop system is studied based on Lyapunov stability theorem. For a better study of the proposed scheme, various trajectory tracking tests are provided, where accurate tracking and strong robustness can be simultaneously ensured. Comparative simulation results validate the proposed control strategy′s effectiveness and its superiorities over conventional sliding mode controller (SMC) and integral SMC approaches.
A Spatial Cognitive Model that Integrates the Effects of Endogenous and Exogenous Information on the Hippocampus and Striatum
Jing Huang, He-Yuan Yang, Xiao-Gang Ruan, Nai-Gong Yu, Guo-Yu Zuo, Hao-Meng Liu
doi: 10.1007/s11633-021-1286-z
Reproducing the spatial cognition of animals using computational models that make agents navigate autonomously has attracted much attention. Many biologically inspired models for spatial cognition focus mainly on the simulation of the hippocampus and only consider the effect of external environmental information (i.e., exogenous information) on the hippocampal coding. However, neurophysiological studies have shown that the striatum, which is closely related to the hippocampus, also plays an important role in spatial cognition and that information inside animals (i.e., endogenous information) also affects the encoding of the hippocampus. Inspired by the progress made in neurophysiological studies, we propose a new spatial cognitive model that consists of analogies between the hippocampus and striatum. This model takes into consideration how both exogenous and endogenous information affects coding by the environment. We carried out a series of navigation experiments that simulated a water maze and compared our model with other models. Our model is self-adaptable and robust and has better performance in navigation path length. We also discuss the possible reasons for the results and how our findings may help us understand real mechanisms in the spatial cognition of animals.
Global FLS-based Consensus of Stochastic Uncertain Nonlinear Multi-agent Systems
Jia-Xi Chen, Jun-Min Li
doi: 10.1007/s11633-021-1279-y
Using graph theory, matrix theory, adaptive control, fuzzy logic systems and other tools, this paper studies the leader-follower global consensus of two kinds of stochastic uncertain nonlinear multi-agent systems (MAS). Firstly, the fuzzy logic systems replaces the feedback compensator as the feedforward compensator to describe the uncertain nonlinear dynamics. Secondly, based on the network topology, all followers are divided into two categories: One is the followers who can obtain the leader signal, and the other is the follower who cannot obtain the leader signal. Thirdly, based on the adaptive control method, distributed control protocols are designed for the two types of followers. Fourthly, based on matrix theory and stochastic Lyapunov stability theory, the stability of the closed-loop systems is analyzed. Finally, three simulation examples are given to verify the effectiveness of the proposed control algorithms.
Optimal Policies for Quantum Markov Decision Processes
Ming-Sheng Ying, Yuan Feng, Sheng-Gang Ying
doi: 10.1007/s11633-021-1278-z
Markov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.
A Regularized LSTM Method for Predicting Remaining Useful Life of Rolling Bearings
Zhao-Hua Liu, Xu-Dong Meng, Hua-Liang Wei, Liang Chen, Bi-Liang Lu, Zhen-Heng Wang, Lei Chen
doi: 10.1007/s11633-020-1276-6
Rotating machinery is important to industrial production. Any failure of rotating machinery, especially the failure of rolling bearings, can lead to equipment shutdown and even more serious incidents. Therefore, accurate residual life prediction plays a crucial role in guaranteeing machine operation safety and reliability and reducing maintenance cost. In order to increase the forecasting precision of the remaining useful life (RUL) of the rolling bearing, an advanced approach combining elastic net with long short-time memory network (LSTM) is proposed, and the new approach is referred to as E-LSTM. The E-LSTM algorithm consists of an elastic mesh and LSTM, taking temporal-spatial correlation into consideration to forecast the RUL through the LSTM. To solve the over-fitting problem of the LSTM neural network during the training process, the elastic net based regularization term is introduced to the LSTM structure. In this way, the change of the output can be well characterized to express the bearing degradation mode. Experimental results from the real-world data demonstrate that the proposed E-LSTM method can obtain higher stability and relevant values that are useful for the RUL forecasting of bearing. Furthermore, these results also indicate that E-LSTM can achieve better performance.
Application of Machine Learning for Online Reputation Systems
Ahmad Alqwadri, Mohammad Azzeh, Fadi Almasalha
doi: 10.1007/s11633-020-1275-7
Users on the Internet usually require venues to provide better purchasing recommendations. This can be provided by a reputation system that processes ratings to provide recommendations. The rating aggregation process is a main part of reputation systems to produce global opinions about the product quality. Naive methods that are frequently used do not consider consumer profiles in their calculations and cannot discover unfair ratings and trends emerging in new ratings. Other sophisticated rating aggregation methods that use a weighted average technique focus on one or a few aspects of consumers′ profile data. This paper proposes a new reputation system using machine learning to predict reliability of consumers from their profile. In particular, we construct a new consumer profile dataset by extracting a set of factors that have a great impact on consumer reliability, which serve as an input to machine learning algorithms. The predicted weight is then integrated with a weighted average method to compute product reputation score. The proposed model has been evaluated over three MovieLens benchmarking datasets, using 10-folds cross validation. Furthermore, the performance of the proposed model has been compared to previous published rating aggregation models. The obtained results were promising which suggest that the proposed approach could be a potential solution for reputation systems. The results of the comparison demonstrated the accuracy of our models. Finally, the proposed approach can be integrated with online recommendation systems to provide better purchasing recommendations and facilitate user experience on online shopping markets.
Research on Transfer Learning of Vision-based Gesture Recognition
Bi-Xiao Wu, Chen-Guang Yang, Jun-Pei Zhong
doi: 10.1007/s11633-020-1273-9
Gesture recognition has been widely used for human-robot interaction. At present, a problem in gesture recognition is that the researchers did not use the learned knowledge in existing domains to discover and recognize gestures in new domains. For each new domain, it is required to collect and annotate a large amount of data, and the training of the algorithm does not benefit from prior knowledge, leading to redundant calculation workload and excessive time investment. To address this problem, the paper proposes a method that could transfer gesture data in different domains. We use a red-green-blue (RGB) Camera to collect images of the gestures, and use Leap Motion to collect the coordinates of 21 joint points of the human hand. Then, we extract a set of novel feature descriptors from two different distributions of data for the study of transfer learning. This paper compares the effects of three classification algorithms, i.e., support vector machine (SVM), broad learning system (BLS) and deep learning (DL). We also compare learning performances with and without using the joint distribution adaptation (JDA) algorithm. The experimental results show that the proposed method could effectively solve the transfer problem between RGB Camera and Leap Motion. In addition, we found that when using DL to classify the data, excessive training on the source domain may reduce the accuracy of recognition in the target domain.
STEP AP 242 Managed Model-based 3D Engineering: An Application Towards the Automation of Fixture Planning
Remil George Thomas, Deepak Lawrence K., Manu R.
doi: 10.1007/s11633-020-1272-x
Fixture design and planning is one of the most important manufacturing activities, playing a pivotal role in deciding the lead time for product development. Fixture design, which affects the part-quality in terms of geometric accuracy and surface finish, can be enhanced by using the product manufacturing information (PMI) stored in the neutral standard for the exchange of product model data (STEP) file, thereby integrating design and manufacturing. The present paper proposes a unique fixture design approach, to extract the geometry information from STEP application protocol (AP) 242 files of computer aided design (CAD) models, for providing automatic suggestions of locator positions and clamping surfaces. Automatic feature extraction software “FiXplan”, developed using the programming language C#, is used to extract the part feature, dimension and geometry information. The information from the STEP AP242 file is deduced using geometric reasoning techniques, which in turn is utilized for fixture planning. The developed software is observed to be adept in identifying the primary, secondary, and tertiary locating faces and locator position configurations of prismatic components. Structural analysis of the prismatic part under different locator positions was performed using commercial finite element method software, ABAQUS, and the optimized locator position was identified on the basis of minimum deformation of the workpiece. The area-ratio (base locator enclosed area (%)/work piece base area (%)) for the ideal locator configuration was observed as 33%. Experiments were conducted on a prismatic workpiece using a specially designed fixture, for different locator configurations. The surface roughness and waviness of the machined surfaces were analysed using an Alicona non-contact optical profilometer. The best surface characteristics were obtained for the surface machined under the ideal locator positions having an area-ratio of 33%, thus validating the predicted numerical results. The efficiency, capability and applicability of the developed software is demonstrated for the finishing operation of a sensor cover – a typical prismatic component having applications in the naval industry, under different locator configurations. The best results were obtained under the proposed ideal locator configuration of area-ratio 33%.
Delayed Teleoperation with Force Feedback of a Humanoid Robot
Viviana Moya, Emanuel Slawiñski, Vicente Mut
doi: 10.1007/s11633-020-1267-7
Teleoperation systems allow the extension of human capabilities to remote-control devices by providing the operator with conditions similar to those at the remote site through a communication channel that sends information from one site to the other. This article aims to present an analysis of the benefits of force feedback applied to the bilateral teleoperation of a humanoid robot with time-varying delay. As a control scheme, we link adaptive inverse dynamics compensation, balance control, and P+d like controllers. Finally, a test is performed where an operator simultaneously handles the locomotion (forward velocity and turn angle) and arm of a simulated 3D humanoid robot to do a pick-and-place task using two master devices with force feedback, where indexes such as time to complete the task, coordination errors, path tracking error, and percentage of successful tests are reported for different time-delays. We conclude with the results achieved.
Behavior-based Autonomous Navigation and Formation Control of Mobile Robots in Unknown Cluttered Dynamic Environments with Dynamic Target Tracking
Nacer Hacene, Boubekeur Mendil
doi: 10.1007/s11633-020-1264-x
While different species in nature have safely solved the problem of navigation in a dynamic environment, this remains a challenging task for researchers around the world. The paper addresses the problem of autonomous navigation in an unknown dynamic environment for a single and a group of three wheeled omnidirectional mobile robots (TWOMRs). The robot has to track a dynamic target while avoiding dynamic obstacles and dynamic walls in an unknown and very dense environment. It adopts a behavior-based controller that consists of four behaviors: “target tracking”, “obstacle avoidance”, “dynamic wall following” and “avoid robots”. The paper considers the problem of kinematic saturation. In addition, it introduces a strategy for predicting the velocity of dynamic obstacles based on two successive measurements of the ultrasonic sensors to calculate the velocity of the obstacle expressed in the sensor frame. Furthermore, the paper proposes a strategy to deal with dynamic walls even when they have U-like or V-like shapes. The approach can also deal with the formation control of a group of robots based on the leader-follower structure and the behavior-based control, where the robots have to get together and maintain a given formation while navigating toward the target, avoiding obstacles and walls in a dynamic environment. The effectiveness of the proposed approaches is demonstrated via simulation.
Robust Observer-based Control of Nonlinear Multi- Omnidirectional Wheeled Robot Systems via High Order Sliding-mode Consensus Protocol
M. R. Rahimi Khoygani, R. Ghasemi, P. Ghayoomi
doi: 10.1007/s11633-020-1254-z
This paper presents a novel observer-based controller for a class of nonlinear multi-agent robot models using the high order sliding mode consensus protocol. In many applications, demand for autonomous vehicles is growing; omnidirectional wheeled robots are suggested to meet this demand. They are flexible, fast, and autonomous, able to find the best direction and can move on an optional path at any time. Multi-agent omnidirectional wheeled robot (MOWR) systems consist of several similar or different robots and there are multiple different interactions between their agents, thus the MOWR systems have complex dynamics. Hence, designing a robust reliable controller for the nonlinear MOWR operations is considered an important obstacles in the science of the control design. A high order sliding mode is selected in this work that is a suitable technique for implementing a robust controller for nonlinear complex dynamics models. Furthermore, the proposed method ensures all signals involved in the multi-agent system (MAS) are uniformly ultimately bounded and the system is robust against the external disturbances and uncertainties. Theoretical analysis of candidate Lyapunov functions has been presented to depict the stability of the overall MAS, the convergence of observer and tracking error to zero, and the reduction of the chattering phenomena. In order to illustrate the promising performance of the methodology, the observer is applied to two nonlinear dynamic omnidirectional wheeled robots. The results display the meritorious performance of the scheme.
Observer-based Multirate Feedback Control Design for Two-time-scale System
Ravindra Munje, Wei-Dong Zhang
doi: 10.1007/s11633-020-1268-6
The use of a lower sampling rate for designing a discrete-time state feedback-based controller fails to capture information of fast states in a two-time-scale system, while the use of a higher sampling rate increases the amount of computation considerably. Thus, the use of single-rate sampling for systems with slow and fast states has evident limitations. In this paper, multirate state feedback (MRSF) control for a linear time-invariant two-time-scale system is proposed. Here, multirate sampling refers to the sampling of slow and fast states at different sampling rates. Firstly, a block-triangular form of the original continuous two-time-scale system is constructed. Then, it is discretized with a smaller sampling period and feedback control is designed for the fast subsystem. Later, the system is block-diagonalized and equivalently represented into a system with a higher sampling period. Subsequently, feedback control is designed for the slow subsystem and overall MRSF control is derived. It is proved that the derived MRSF control stabilizes the full-order system. Being the transformed states of the original system, slow and fast states need to be estimated for the MRSF control realization. Hence, a sequential two-stage observer is formulated to estimate these states. Finally, the applicability of the design method is demonstrated with a numerical example and simulation results are compared with the single-rate sampling method. It is found that the proposed MRSF control and observer designs reduce computations without compromising closed-loop performance.
Learning Deep RGBT Representations for Robust Person Re-identification
Ai-Hua Zheng, Zi-Han Chen, Cheng-Long Li, Jin Tang, Bin Luo
doi: 10.1007/s11633-020-1262-z
Person re-identification (Re-ID) is the scientific task of finding specific person images of a person in a non-overlapping camera networks, and has achieved many breakthroughs recently. However, it remains very challenging in adverse environmental conditions, especially in dark areas or at nighttime due to the imaging limitations of a single visible light source. To handle this problem, we propose a novel deep red green blue (RGB)-thermal (RGBT) representation learning framework for a single modality RGB person Re-ID. Due to the lack of thermal data in prevalent RGB Re-ID datasets, we propose to use the generative adversarial network to translate labeled RGB images of person to thermal infrared ones, trained on existing RGBT datasets. The labeled RGB images and the synthetic thermal images make up a labeled RGBT training set, and we propose a cross-modal attention network to learn effective RGBT representations for person Re-ID in day and night by leveraging the complementary advantages of RGB and thermal modalities. Extensive experiments on Market1501, CUHK03 and DukeMTMC-reID datasets demonstrate the effectiveness of our method, which achieves state-of-the-art performance on all above person Re-ID datasets.