Volume 11, Number 2, 2014
Special Issue on Big Data (pp.119-152)
Recent progress of Web 2.0 applications has witnessed the rapid development of microblog in China, which has already been one of the most important ways for online communications, especially on sharing information. This paper tries to make an in-depth investigation on the big data modeling and analysis of microblog ecosystem in China by using a real dataset containing over 17 million records of SinaWeibo users. First, we present the detailed geography, gender, authentication, education and age analysis of microblog users in this dataset. Then we conduct the numerical features distribution analysis, propose the user influence formula and calculate the influences for different kinds of microblog users. Finally, user content intention analysis is performed to reveal users' most concerns in their daily life.
Uncertain data are common due to the increasing usage of sensors, radio frequency identification (RFID), GPS and similar devices for data collection. The causes of uncertainty include limitations of measurements, inclusion of noise, inconsistent supply voltage and delay or loss of data in transfer. In order to manage, query or mine such data, data uncertainty needs to be considered. Hence, this paper studies the problem of top-k distance-based outlier detection from uncertain data objects. In this work, an uncertain object is modelled by a probability density function of a Gaussian distribution. The naive approach of distance-based outlier detection makes use of nested loop. This approach is very costly due to the expensive distance function between two uncertain objects. Therefore, a populated-cells list (PC-list) approach of outlier detection is proposed. Using the PC-list, the proposed top-k outlier detection algorithm needs to consider only a fraction of dataset objects and hence quickly identifies candidate objects for top-k outliers. Two approximate top-k outlier detection algorithms are presented to further increase the efficiency of the top-k outlier detection algorithm. An extensive empirical study on synthetic and real datasets is also presented to prove the accuracy, efficiency and scalability of the proposed algorithms.
Legacy system migration to the cloud brings both great challenges and benefits, so there exist various academic research and industrial applications on legacy system migration to the cloud. By analyzing the research achievements and application status, we divide the existing migration methods into three strategies according to the cloud service models integrally. Different processes need to be considered for different migration strategies, and different tasks will be involved accordingly. The similarities and differences between the migration strategies are discussed, and the challenges and future work about legacy system migration to the cloud are proposed. The aim of this paper is to provide an overall presentation for legacy system migration to the cloud and identify important challenges and future research directions.
This article describes a biologically inspired node generator for the path planning of serially connected hyper-redundant manipulators using probabilistic roadmap planners. The generator searches the configuration space surrounding existing nodes in the roadmap and uses a combination of random and deterministic search methods that emulate the behaviour of octopus limbs. The strategy consists of randomly mutating the states of the links near the end-effector, and mutating the states of the links near the base of the robot toward the states of the goal configuration. When combined with the small tree probabilistic roadmap planner, the method was successfully used to solve the narrow passage motion planning problem of a 17 degree-of-freedom manipulator.
This paper investigates the characteristics of a clinical dataset using a combination of feature selection and classification methods to handle missing values and understand the underlying statistical characteristics of a typical clinical dataset. Typically, when a large clinical dataset is presented, it consists of challenges such as missing values, high dimensionality, and unbalanced classes. These pose an inherent problem when implementing feature selection and classification algorithms. With most clinical datasets, an initial exploration of the dataset is carried out, and those attributes with more than a certain percentage of missing values are eliminated from the dataset. Later, with the help of missing value imputation, feature selection and classification algorithms, prognostic and diagnostic models are developed. This paper has two main conclusions: 1) Despite the nature of clinical datasets, and their large size, methods for missing value imputation do not affect the final performance. What is crucial is that the dataset is an accurate representation of the clinical problem and those methods of imputing missing values are not critical for developing classifiers and prognostic/diagnostic models. 2) Supervised learning has proven to be more suitable for mining clinical data than unsupervised methods. It is also shown that non-parametric classifiers such as decision trees give better results when compared to parametric classifiers such as radial basis function networks (RBFNs).
This paper deals with the problem of piecewise auto regressive systems with exogenous input (PWARX) model identification based on clustering solution. This problem involves both the estimation of the parameters of the affine sub-models and the hyper planes defining the partitions of the state-input regression. The existing identification methods present three main drawbacks which limit its effectiveness. First, most of them may converge to local minima in the case of poor initializations because they are based on the optimization using nonlinear criteria. Second, they use simple and ineffective techniques to remove outliers. Third, most of them assume that the number of sub-models is known a priori. To overcome these drawbacks, we suggest the use of the density-based spatial clustering of applications with noise (DBSCAN) algorithm. The results presented in this paper illustrate the performance of our methods in comparison with the existing approach. An application of the developed approach to an olive oil esterification reactor is also proposed in order to validate the simulation results.
The H∞ proportional-integral-differential (PID) feedback for arbitrary-order delayed multi-agent system is investigated to improve the system performance. The closed-loop multi-input multi-output (MIMO) control framework with the distributed PID controller is firstly described for the multi-agent system in a unified way. Then, by using the matrix theory, the prescribed H∞ performance criterion of the multi-agent system is shown to be equivalent to several independent H∞ performance constraints of the single-input single-output (SISO) subsystem with respect to the eigenvalues of the Laplacian matrix. Subsequently, for each subsystem, the set of the PID controllers satisfying the required H∞ performance constraint is analytically characterized based on the extended Hermite-Biehler theorem. Finally, the three-dimensional set of the decentralized H∞ PID control parameters is derived by finding the intersection of the H∞ PID regions for all the decomposed subsystems. The simulation results reveal the effectiveness of the proposed method.
In agricultural context, the principal cause of serious accidents for all-terrain vehicles (ATVs) is rollover. The most important parameters related to this risk is the ground slope. In this paper, we propose a structured observer to estimate the system states and the longitudinal tire forces using only wheel angular velocities measurement. The robust estimation is based on a second order sliding mode observer. This estimation is then used to build up a ground slope estimation. The algorithm is composed by two cascaded estimators. This structured estimation is then applied to the model of an agricultural vehicle G7 (GregoireTM) integrated in the driving simulation environment SCANeRTM-Studio.
Aiming at the time-varying characteristics of industrial process, this paper introduces an adaptive subspace predictive control (ASPC) strategy with time-varying forgetting factor based on the original subspace predictive control algorithm (SPC). The new method uses model matching error to calculate the variable forgetting factor, and applies it to constructing Hankel data matrix. This makes the data represent the changes of system information better. For eliminating the steady state error, the derivation of the incremental control is made. Simulation results on a rotary kiln show that this control strategy has achieved a good control effect.
In this paper, the problem of stabilizing an unstable second order delay system using classical proportional-integral-derivative (PID) controller is considered. An extension of the Hermite-Biehler theorem, which is applicable to quasi-polynomials, is used to seek the set of complete stabilizing proportional-integral/proportional-integral-derivative (PI/PID) parameters. The range of admissible proportional gains is determined in closed form. For each proportional gain, the stabilizing set in the space of the integral and derivative gains is shown to be a triangle.
The problem of fault detection for a class of nonlinear impulsive switched systems is investigated in this paper. Fault detection filters are designed such that the augmented systems are stable, and the residual error signal generated by the filters guarantees the H∞ performance for disturbances and faults. Sufficient conditions for the design of fault detection (FD) filters are presented by linear matrix inequalities. Moreover, the filter gains are characterized according to a solution of a convex optimization. Finally, an example derived from a pulse-width-modulation-driven boost converter is given to illustrate the effectiveness of the FD design approach.