Large-scale multi-objective optimization problems (MOPs) that involve a large number of decision variables, have emerged from many real-world applications. While evolutionary algorithms (EAs) have been widely acknowledged as a mainstream method for MOPs, most research progress and successful applications of EAs have been restricted to MOPs with small-scale decision variables. More recently, it has been reported that traditional multi-objective EAs (MOEAs) suffer severe deterioration with the increase of decision variables. As a result, and motivated by the emergence of real-world large-scale MOPs, investigation of MOEAs in this aspect has attracted much more attention in the past decade. This paper reviews the progress of evolutionary computation for large-scale multi-objective optimization from two angles. From the key difficulties of the large-scale MOPs, the scalability analysis is discussed by focusing on the performance of existing MOEAs and the challenges induced by the increase of the number of decision variables. From the perspective of methodology, the large-scale MOEAs are categorized into three classes and introduced respectively: divide and conquer based, dimensionality reduction based and enhanced search-based approaches. Several future research directions are also discussed.
One of the most significant challenges in the neuroscience community is to understand how the human brain works. Recent progress in neuroimaging techniques have validated that it is possible to decode a person′s thoughts, memories, and emotions via functional magnetic resonance imaging (i.e., fMRI) since it can measure the neural activation of human brains with satisfied spatiotemporal resolutions. However, the unprecedented scale and complexity of the fMRI data have presented critical computational bottlenecks requiring new scientific analytic tools. Given the increasingly important role of machine learning in neuroscience, a great many machine learning algorithms are presented to analyze brain activities from the fMRI data. In this paper, we mainly provide a comprehensive and up-to-date review of machine learning methods for analyzing neural activities with the following three aspects, i.e., brain image functional alignment, brain activity pattern analysis, and visual stimuli reconstruction. In addition, online resources and open research problems on brain pattern analysis are also provided for the convenience of future research.
The coronavirus global pandemic has spread faster and more severely than experts had anticipated. While this has presented itself as a great challenge, researchers worldwide have shown ingenuity and dexterity in adapting technology and devising new strategies to combat this pandemic. However, implementing these strategies alone impedes the nature of everyone′s daily life. Hence, an intersection between these strategies and the technological advantages of robotics, artificial intelligence, and autonomous systems is essential for near-to-normal operation. In this review paper, different applications of robotic systems, various aspects of modern technologies, including medical imaging, telemedicine, and supply chains, have been covered with respect to the COVID-19 pandemic. Furthermore, concerns over user′s data privacy, job losses, and legal aspects of the implementation of robotics are also been discussed.
Objective image quality assessment (IQA) plays an important role in various visual communication systems, which can automatically and efficiently predict the perceived quality of images. The human eye is the ultimate evaluator for visual experience, thus the modeling of human visual system (HVS) is a core issue for objective IQA and visual experience optimization. The traditional model based on black box fitting has low interpretability and it is difficult to guide the experience optimization effectively, while the model based on physiological simulation is hard to integrate into practical visual communication services due to its high computational complexity. For bridging the gap between signal distortion and visual experience, in this paper, we propose a novel perceptual no-reference (NR) IQA algorithm based on structural computational modeling of HVS. According to the mechanism of the human brain, we divide the visual signal processing into a low-level visual layer, a middle-level visual layer and a high-level visual layer, which conduct pixel information processing, primitive information processing and global image information processing, respectively. The natural scene statistics (NSS) based features, deep features and free-energy based features are extracted from these three layers. The support vector regression (SVR) is employed to aggregate features to the final quality prediction. Extensive experimental comparisons on three widely used benchmark IQA databases (LIVE, CSIQ and TID2013) demonstrate that our proposed metric is highly competitive with or outperforms the state-of-the-art NR IQA measures.
With the rapid increase of the amount of vehicles in urban areas, the pollution of vehicle emissions is becoming more and more serious. Precise prediction of the spatiotemporal evolution of urban traffic emissions plays a great role in urban planning and policy making. Most existing methods usually focus on estimating vehicle emissions at historical or current moments which cannot well meet the demands of future planning. Recent work has started to pay attention to the evolution of vehicle emissions at future moments using multiple attributes related to emissions, however, they are not effective and efficient enough in the combination and utilization of different inputs. To address this issue, we propose a joint framework to predict the future evolution of vehicle emissions based on the GPS trajectories of taxis with a multi-channel spatiotemporal network and the motor vehicle emission simulator (MOVES) model. Specifically, we first estimate the spatial distribution matrices with GPS trajectories through map-matching algorithms. These matrices can reflect the attributes related to the traffic status of road networks such as volume, speed and acceleration. Then, our multi-channel spatiotemporal network is used to efficiently combine three key attributes (volume, speed and acceleration) through the feature sharing mechanism and generate a precise prediction of them in the future period. Finally, we adopt an MOVES model to estimate vehicle emissions by integrating several traffic factors including the predicted traffic states, road networks and the statistical information of urban vehicles. We evaluate our model on the Xi′an taxi GPS trajectories dataset. Experiments show that our proposed network can effectively predict the temporal evolution of vehicle emissions.
Attention deficit/hyperactivity disorder (ADHD) is a common disorder among children. ADHD often prevails into adulthood, unless proper treatments are facilitated to engage self-regulatory systems. Thus, there is a need for effective and reliable mechanisms for the early identification of ADHD. This paper presents a decision support system for the ADHD identification process. The proposed system uses both functional magnetic resonance imaging (fMRI) data and eye movement data. The classification processes contain enhanced pipelines, and consist of pre-processing, feature extraction, and feature selection mechanisms. fMRI data are processed by extracting seed-based correlation features in default mode network (DMN) and eye movement data using aggregated features of fixations and saccades. For the classification using eye movement data, an ensemble model is obtained with 81% overall accuracy. For the fMRI classification, a convolutional neural network (CNN) is used with 82% accuracy for the ADHD identification. Both ensemble models are proved for overfitting avoidance.
In the exemplar-based image inpainting approach, there are usually two major problems: the unreasonable calculation of priority and only considering the color features in the patch lookup strategy. In this paper, we propose an image inpainting approach based on the structural tensor edge intensity model. First, we use the progressive scanning inpainting method to avoid the image filling order being affected by the priority function. Then, we use the edge intensity model to build the patches similarity function for correctly identifying the local image structure. Finally, the balance operator is used to restrict the excessive propagation of structural information to ensure the correct structural reconstruction. The experimental results show that the our approach is comparable and even superior to some state-of-the-art inpainting algorithms.
Recently, deep learning methods have been applied in many real scenarios with the development of convolutional neural networks (CNNs). In this paper, we introduce a camera-based basketball scoring detection (BSD) method with CNN based object detection and frame difference-based motion detection. In the proposed BSD method, the videos of the basketball court are taken as inputs. Afterwards, the real-time object detection, i.e., you only look once (YOLO) model, is implemented to locate the position of the basketball hoop. Then, the motion detection based on frame difference is utilized to detect whether there is any object motion in the area of the hoop to determine the basketball scoring condition. The proposed BSD method runs in real-time with satisfactory basketball scoring detection accuracy. Our experiments on the collected real scenario basketball court videos show the accuracy of the proposed BSD method. Furthermore, several intelligent basketball analysis systems based on the proposed method have been installed at multiple basketball courts in Beijing, and they provide good performance.
Perception and manipulation tasks for robotic manipulators involving highly-cluttered objects have become increasingly indemand for achieving a more efficient problem solving method in modern industrial environments. But, most of the available methods for performing such cluttered tasks failed in terms of performance, mainly due to inability to adapt to the change of the environment and the handled objects. Here, we propose a new, near real-time approach to suction-based grasp point estimation in a highly cluttered environment by employing an affordance-based approach. Compared to the state-of-the-art, our proposed method offers two distinctive contributions. First, we use a modified deep neural network backbone for the input of the semantic segmentation, to classify pixel elements of the input red, green, blue and depth (RGBD) channel image which is then used to produce an affordance map, a pixel-wise probability map representing the probability of a successful grasping action in those particular pixel regions. Later, we incorporate a high speed semantic segmentation to the system, which makes our solution have a lower computational time. This approach does not need to have any prior knowledge or models of the objects since it removes the step of pose estimation and object recognition entirely compared to most of the current approaches and uses an assumption to grasp first then recognize later, which makes it possible to have an object-agnostic property. The system was designed to be used for household objects, but it can be easily extended to any kind of objects provided that the right dataset is used for training the models. Experimental results show the benefit of our approach which achieves a precision of 88.83%, compared to the 83.4% precision of the current state-of-the-art.
In order to overcome the defects where the surface of the object lacks sufficient texture features and the algorithm cannot meet the real-time requirements of augmented reality, a markerless augmented reality tracking registration method based on multi-modal template matching and point clouds is proposed. The method first adapts the linear parallel multi-modal LineMod template matching method with scale invariance to identify the texture-less target and obtain the reference image as the key frame that is most similar to the current perspective. Then, we can obtain the initial pose of the camera and solve the problem of re-initialization because of tracking registration interruption. A point cloud-based method is used to calculate the precise pose of the camera in real time. In order to solve the problem that the traditional iterative closest point (ICP) algorithm cannot meet the real-time requirements of the system, Kd-tree (k-dimensional tree) is used under the graphics processing unit (GPU) to replace the part of finding the nearest points in the original ICP algorithm to improve the speed of tracking registration. At the same time, the random sample consensus (RANSAC) algorithm is used to remove the error point pairs to improve the accuracy of the algorithm. The results show that the proposed tracking registration method has good real-time performance and robustness.
Recently, video-based fire detection technology has become an important research topic in the field of machine vision. This paper proposes a method of combining the classification model and target detection model in deep learning for fire detection. Firstly, the depthwise separable convolution is used to classify fire images, which saves a lot of detection time under the premise of ensuring detection accuracy. Secondly, You Only Look Once version 3 (YOLOv3) target regression function is used to output the fire position information for the images whose classification result is fire, which avoids the problem that the accuracy of detection cannot be guaranteed by using YOLOv3 for target classification and position regression. At the same time, the detection time of target regression for images without fire is greatly reduced saved. The experiments were tested using a network public database. The detection accuracy reached 98% and the detection rate reached 38 fps. This method not only saves the workload of manually extracting flame characteristics, reduces the calculation cost, and reduces the amount of parameters, but also improves the detection accuracy and detection rate.