Article Contents
Wei-Ping Ma, Wen-Xin Li, Jin-Chuan Sun, Peng-Xia Cao. Saliency Detection via Manifold Ranking Based on Robust Foreground. International Journal of Automation and Computing. doi: 10.1007/s11633-020-1246-z
Cite as: Wei-Ping Ma, Wen-Xin Li, Jin-Chuan Sun, Peng-Xia Cao. Saliency Detection via Manifold Ranking Based on Robust Foreground. International Journal of Automation and Computing. doi: 10.1007/s11633-020-1246-z

Saliency Detection via Manifold Ranking Based on Robust Foreground

Author Biography:
  • Wei-Ping Ma received the B. Eng. degree in electronic information science and technology from Xi′an University of Science and technology, China in 2011, and the M.Eng. degree in communication and information system from Xi′an University of Science and technology, China in 2015. Currently, she is a Ph. D. degree candidate in space electronics at Lanzhou Institute of Physics, China Academy of Space Technology (CAST), China.Her research interests include space electronic technology, computer vision, and intelligent robotics.E-mail: 498938802@qq.com (Corresponding author)ORCID ID: 0000-0002-2317-253X

    Wen-Xin Li received the M. Eng. degree in applied mathematics from Northwestern Polytechnical University, China in 1993, and the Ph. D. degree in automatic control from Northwestern Polytechnical University, China in 2011. Currently, he is a researcher at Lanzhou Institute of Physics, CAST. His research interests include space electronic technology, software reuse technology, system simulation and reconstruction technology.E-mail: lwxcast@21cn.com

    Jin-Chuan Sun received the B. Eng. degree in mechanical design, manufacturing and automation from Shandong University of Science and Technology, China in 2007, and the M. Eng. degree in micro-electro-mechanical system and nano technologies from Northwestern Polytechnical University, China in 2013. Currently, he is an engineer at Lanzhou Institute of Physics, CAST.His research interests include space electronic technology, structure design of space system, and intelligent robotics.E-mail: 824676828@qq.com

    Peng-Xia Cao received the B. Eng. degree in communication engineering from Hunan International Economics University, China in 2011, and M. Eng. degree in circuits and systems from Hunan Normal University, China in 2015. Currently, she is a Ph. D. degree candidate in space electronics at Lanzhou Institute of Physics, CAST. Her research interests include space electronic technology, computer vision, and augmented reality.E-mail: 316657294@qq.com

  • Received: 2020-04-08
  • Accepted: 2020-07-07
  • Published Online: 2020-10-21
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Figures (13)  / Tables (3)

Metrics

Abstract Views (5) PDF downloads (2) Citations (0)

Saliency Detection via Manifold Ranking Based on Robust Foreground

Abstract: The graph-based manifold ranking saliency detection only relies on the boundary background to extract foreground seeds, resulting in a poor saliency detection result, so a method that obtains robust foreground for manifold ranking is proposed in this paper. First, boundary connectivity is used to select the boundary background for manifold ranking to get a preliminary saliency map, and a foreground region is acquired by a binary segmentation of the map. Second, the feature points of the original image and the filtered image are obtained by using color boosting Harris corners to generate two different convex hulls. Calculating the intersection of these two convex hulls, a final convex hull is found. Finally, the foreground region and the final convex hull are combined to extract robust foreground seeds for manifold ranking and getting final saliency map. Experimental results on two public image datasets show that the proposed method gains improved performance compared with some other classic methods in three evaluation indicators: precision-recall curve, F-measure and mean absolute error.

Wei-Ping Ma, Wen-Xin Li, Jin-Chuan Sun, Peng-Xia Cao. Saliency Detection via Manifold Ranking Based on Robust Foreground. International Journal of Automation and Computing. doi: 10.1007/s11633-020-1246-z
Citation: Wei-Ping Ma, Wen-Xin Li, Jin-Chuan Sun, Peng-Xia Cao. Saliency Detection via Manifold Ranking Based on Robust Foreground. International Journal of Automation and Computing. doi: 10.1007/s11633-020-1246-z
    • Saliency detection technology is an intelligent information processing technology in the pre-processing stage of computer vision, mainly researching how to let the computer simulate the human visual attention mechanism in the unknown scene to quickly and efficiently capture the most important and informative object. At present, much research has been conducted on the calculation of object saliency, and many algorithm models have been proposed and widely applied to numerous fields of computer vision, such as image segmentation[1, 2], object recognition[3] and image compression[4]. According to different information processing perspectives, saliency detection can be divided into two approaches[5]: top-down (task-driven) methods and bottom-up (data-driven) methods. The top-down method[6, 7] needs to integrate specific prior knowledge, high-level semantic information and other human perception to complete saliency detection, which is not universal. The bottom-up method[8-10] pays more attention to features such as contrast, color, and texture. It only needs to use the underlying information of each image to quickly and easily detect the salient object. Meanwhile, it can improve the detection accuracy by combining the prior knowledge such as center, background and convex hull. In this paper, we adopt the bottom-up method to complete the salient object detection task.

      In recent years, great progress has been made in the research of bottom-up saliency detection algorithm. In order to simulate the mechanism of human visual attention, multi-scale color, intensity and directional saliency maps are combined to calculate the final saliency map, which results in poor detection result due to ignoring the integrity of method. Subsequently, contrast calculation[1114] is widely used in saliency detection, including global contrast and local contrast. With the deepening of the research, some prior information has been added to improve the saliency detection result by combining contrast calculation, mainly including background prior, foreground prior and center prior. Many of these methods are based on background priors, usually taking image boundaries as the basis of saliency detection. The geodesic saliency (GS) method proposed by Wei et al.[15] defines the geodesic distance from each superpixel block to the boundary as its saliency value. The greater the distance is, the higher the saliency value of superpixel is. However, when the salient object touches image boundary, the detection result is not accurate enough. Zhu et al.[16] develop a more robust boundary connectivity prior method called background detection (BD). Considering the fact that the connection length between object region and image boundary is smaller than the connection length between background region and image boundary, they define the boundary connectivity as saliency value. The manifold ranking (MR) method proposed by Yang et al.[17] views saliency detection as a ranking problem based on a sparse closed-loop graph, and takes the superpixels at the image boundary as the background queries, obtaining the ranking score of each superpixel as its saliency value through the manifold ranking algorithm. This method has significant detection results and lower running time, but it is not accurate to define all image boundary superpixels as background queries, and the detection result is not ideal for images with a complex background. Some scholars put forward the methods based on foreground prior, defining foreground directly to perform saliency detection. The key for this kind of method is whether the foreground definition is accurate enough. The convex hull is the most common method used to extract foreground regions, the saliency detection methods based on the convex hull regard the region inside convex hull as foreground to calculate the saliency value of each superpixel. The method in [18] uses K-means clustering and color boosting Harris corner to form a convex hull containing a rough object region. In [19], Harris corner detection is used to obtain a convex hull, next, the convex hull is expanded outward to contain the entire object as much as possible, in which the foreground is extracted. In [20], the intersection operation of two convex hulls determined by Harris corner detection and boundary connectivity is carried out to obtain the minimum convex hull, which is closer to the salient object. The foreground obtained by the above-mentioned methods usually contains not only the approximate object region but also partial background region, resulting in inaccurate saliency detection result. The methods based on the center prior think that the salient object in the image is usually located near the image center. Therefore, this kind of method[21,22] mainly highlights the salient region by increasing the saliency weight of the region near the center location. However, when the assumption that the salient object is located in the image center is not established, the salient object cannot be accurately highlighted, resulting in a poor detection result. Considering the shortcomings of the above methods, some scholars combine multiple priors to further improve the performance of saliency detection method. In [23], boundary background and convex hull are used to generate background-based saliency map and foreground-based saliency map, and then the two saliency maps are integrated for a final saliency map. Wang and Tian[24] utilize the above three prior information for saliency detection. In their paper, boundary background, boundary expansion and corners clustering are adopted to generate three saliency maps, namely, background-based saliency map, foreground-based saliency map and center prior saliency map. By integrating the three saliency maps, a more accurate and smooth saliency map is found. These methods integrating multiple priors improve the deficiency of a single prior to a certain extent. Accordingly, the increasing computational complexity is inevitable.

      Seeing that foreground can describe object characteristics more directly, foreground prior is selected for saliency detection in this paper. To better suppress background noises and highlight salient object, we propose an approach that combines convex hull to define robust foreground seeds on the basis of improving MR, then we can obtain a more accurate foreground prior so as to enhance a saliency detection effect.

    • Our approach is to extract an accurate foreground for manifold ranking to obtain a saliency map, of which robust foreground extraction is the key. The specific process of our approach is shown in Fig. 1.

      Figure 1.  Workflow of the proposed approach (Color versions of figures in this paper are available online.)

      Our first contribution is rejecting the assumption that all image boundary superpixels are background in MR. We use boundary connectivity to select the boundary superpixels that belong to image background as labeled background queries, and then compute the saliency values of all superpixels based on their relevance to those queries, getting the preliminary saliency map. Next, binary segmentation is applied on the preliminary saliency map to take a coarse foreground region. The other contribution is generating a convex hull that contains the rough salient object by extracting color boosting Harris corners of the original image and the filtered image. Through the effective combination of the coarse foreground region and the convex hull above-mentioned, robust foreground superpixels are found and regarded as foreground seeds. The saliency value of each superpixel in image is computed based on its relevance to these foreground seeds to acquire a final saliency map.

    • Simple linear iterative clustering algorithms (SLIC) apply K-means clustering to superpixel generation with an advantage of simple structure and fewer required parameters, effectively dividing image into blocks with different shapes and sizes. Fig. 2 shows the effect of SLIC segmentation, each irregular image block segmented is a superpixel. Viewing the superpixel as a basic unit, the saliency detection algorithm takes into account the spatial organization relationship of image pixels, contains less redundancy, and reduces the complexity of subsequent image processing.

      Figure 2.  SLIC segmentation

    • All superpixels are viewed as nodes, we construct a close-loop sparse graph ${{G}} = \left( {{{V}},{{E}}} \right)$ as shown in Fig. 3, where ${{V}} = \left\{ {{v_i}\left| {1 \le i \le N} \right.} \right\}$ is a set of all nodes, ${v_i}$ corresponds to a superpixel and $N$ is the number of superpixels, and ${{E}} = \left\{ {{e_{ij}}\left| {1 \le i,j \le N} \right.} \right\}$ is a set of boundary connecting nodes, where ${e_{ij}}$ is the edge that connects nodes ${v_i}$ and ${v_j}$. In the graph model, each node is not only connected to its adjacent nodes but also connected to the nodes that share a common boundary with its adjacent nodes. To further improve ranking results, the nodes located at the four image boundaries are connected, i.e., any pair of boundary nodes are considered to be adjacent. The weight of edge ${e_{ij}}$ of two connected nodes ${v_i}$ and ${v_j}$ is defined by

      Figure 3.  Graph model

      ${w_{ij}} = \exp \left( { - \left\| {{c_i} - {c_j}} \right\|/{\sigma ^2}} \right)$

      (1)

      where ${c_i}$ and ${c_j}$ denote the average color of all pixels contained in node ${v_i}$ and node ${v_j}$ in the CIE-Lab color space, and $\sigma $ is a constant that controls the strength of the weight.

    • In this section, we want to exploit the boundary prior by taking the superpixels on each image boundary that belong to background as labeled background queries, the saliency value of each superpixel in image is expressed by its ranking result based on those queries. When calculating the preliminary saliency map in MR, the situation that object will extend to image boundary is not fully considered, and the processing mode that all superpixels in image boundary are viewed as background queries is too idealized, leading to an incomplete result or even failure of saliency detection. Therefore, our work uses the concept that is called boundary connectivity in [16] and selects the superpixels with high background probability in the image boundary as background queries. For a superpixel ${p_i}$ located at the boundary, its boundary connectivity can be written as

      $B\left( {{p_i}} \right) = \frac{{Len\left( {{p_i}} \right)}}{{\sqrt {Area\left( {{p_i}} \right)} }}$

      (2)

      where $Area\left( {{p_i}} \right)$ represents the total area of superpixel ${p_i}$, and $Len\left( {{p_i}} \right)$ is the perimeter of the superpixel ${p_i}$ touching the image boundary. Generally, the boundary connectivity of background region is much larger than that of salient region. Therefore, if the boundary connectivity of a superpixel is higher, the probability that it belongs to the background is greater. So the background probability of the superpixel ${p_i}$ estimated by the boundary connectivity is

      $P({p_i}) = 1 - \exp \left( { - \frac{{B\left( {{p_i}} \right)}}{{2\sigma _{bnd}^2}}} \right)$

      (3)

      where $ {\sigma _{bnd}} \in [0.5,2.5]$.

      For all image boundary superpixels, their background probabilities are represented by the vector ${{bgP}} = \left[ bg{P_1}, bg{P_2}, \cdots ,bg{P_n} \right]^{\rm{T}}$, $n$ is their number. Here, we use the threshold $th = k \cdot \min \left( {{{bdP}}} \right)$ to select background queries, excluding the superpixels whose background probabilities are less than the threshold, parameter $k \ge 1$ is used to control the number of background queries. In addition to successfully excluding the superpixels belonging to the salient region in the selection process, some superpixels belonging to the background may also be excluded, but these excluded background superpixels have a high similarity with the selected background queries, and their ranking results will not be affected. The comparison of image background queries with selection and without selection are shown in Fig. 4, it is obvious that the salient object region of the input image shown in Fig. 4(a) has some superpixels on the image boundary. After our selection, Fig. 4(b) successfully retains the boundary superpixels belonging to the background. In contrast, all boundary superpixels are retained in Fig. 4(c), and the misjudgment of background will reduce the accuracy of saliency detection method based on the background prior.

      Figure 4.  Background queries

    • As a general ranking algorithm, manifold ranking uses inner manifold structure in the dataset to rank adjacent nodes according to their similarity to queries, the ranking results are propagated to the remaining nodes until the propagation stability is reached. The specific process of the algorithm is described as follows:

      Given a dataset ${{X}} = \left\{ {{x_1},{x_2},\cdots,{x_n}} \right\}$, including some labeled data points and the rest of the data points that need to be ranked according to their relevance to the queries. Let ${{f}}:{{X}} \to {{\bf{R}}^n}$ denote a ranking function, we can get the ranking score ${f_i}$ of a point ${x_i}$ by the function, and ${{f}}$ is treated as a ranking vector ${{f}} = {\left[ {{f_1},{f_2},\cdots,{f_n}} \right]^{\rm{T}}}$. Let the indication vector ${{y}} = {\left[ {{y_1},{y_2},\cdots,{y_n}} \right]^{\rm{T}}}$ represent the markers of the dataset ${{X}}$, i.e., ${y_i} = 1$ means ${x_i}$ is a labeled query, otherwise ${y_i} = 0$. The affinity matrix ${{W}} = {\left[ {{w_{ij}}} \right]_{n \times n}}$ denotes the weight of the connecting edge between two data points. Let ${{D}} = {\rm{diag}}\left( {{d_{11}},{d_{22}},\cdots,{d_{nn}}} \right)$ be a diagonal matrix, in which ${d_{ii}} = \sum\limits_j {{w_{ij}}}$. The optimal ranking results of all data points are obtained by solving the following optimization problem:

      ${{{f}}^ * } \!=\! \arg \mathop {\min }\limits_f \frac{1}{2}\!\left(\! {\sum\limits_{i,j = 1}^n {{w_{ij}}\left\| {\frac{{{f_i}}}{{\sqrt {{d_{ii}}} }} \!-\! \frac{{{f_j}}}{{\sqrt {{d_{jj}}} }}} \right\|} \! +\! u\sum\limits_{i = 1}^n {{{\left\| {{f_i} \!-\! {y_i}} \right\|}^2}} } \!\right)$

      (4)

      where the parameter $u$ is used to control the proportion of the smoothing (item 1 in (4)) and the fitting constraint (item 2 in (4)). Setting the derivative of the above function to be zero, the optimal solution of the ranking function can be written as

      ${{{f}}^ * } = {\left( {{{I}} - \alpha {{S}}} \right)^{ - 1}}{{y}}$

      (5)

      where ${{I}}$ is an identity matrix, $\alpha = 1/\left( {1 + u} \right)$, ${{S}} = {{{D}}^{ - 1/2}}{{W}}{{{D}}^{ - 1/2}}$ is the normalized Laplacian matrix of the matrix ${{W}}$. Using the non-normalized Laplacian matrix improves (5), another ranking function is given as

      ${{{f}}^ * } = {\left( {{{D}} - \alpha {{W}}} \right)^{ - 1}}{{y}}.$

      (6)

      According to the experiments in [17], the function in (6) achieves better performance. So, we will adopt (6) in the following work.

      When using manifold ranking to compute the preliminary saliency map of an input image, its graph nodes set ${{V}}$ in Section 2.1.2 is viewed as the dataset ${{X}}$, the superpixels on four boundaries that are selected out by using the method in Section 2.1.3 are regarded as background queries, and four corresponding indicator vectors are generated. According to (6), we use the background queries to calculate the relevances of all nodes in the image, the ranking scores of all nodes which are written as ${{f}}_{}^ *$ are equal to saliency values, the ranking score ${{f}}_{}^ * \left( i \right)$ can be viewed as the sum of the relevances of the $i$-th node to all background queries. After the above computation, four saliency maps are constructed and then are integrated for acquiring the preliminary saliency map by using boundary priors.

      Taking the top image boundary as an example, the nodes with high background probability are viewed as background queries. Other nodes in the image are unlabeled. Therefore, the indicator vector ${{{y}}_t}$ is known, and we put it into (6), then all nodes in image are ranked based on (6) in ${{f}}_t^ * $, which is a N-dimensional vector. An element in ${{f}}_t^ * $ is the relevance of a node in image to all background queries. Next, the vector is normalized to the range between 0 and 1, and the saliency map ${S_t}(i)$ based on the top boundary queries can be indicated as

      ${{{S}}_t}\left( i \right) = 1 - \overline {{{{f}}_t}^ * } \left( i \right)$

      (7)

      where $i$ is a graph node (a superpixel in the image), $\overline {{{{f}}_t}^ * }$ indicates the normalized vector. So, it is obvious that if a superpixel has higher similarity with an image boundary, it is more likely to be background, and its saliency degree will be smaller accordingly.

      Similarly, viewing the bottom, left and right image boundary as background queries, we calculate the saliency values of all nodes, getting three other saliency maps ${{{S}}_b}\left( i \right)$, ${{{S}}_l}\left( i \right)$ and ${{{S}}_r}\left( i \right)$. Finally, the four saliency maps are integrated by multiplication and the preliminary saliency map is generated shown below

      ${{{S}}_{pre}}\left( i \right) = {{{S}}_t}\left( i \right) \times {{{S}}_b}\left( i \right) \times {{{S}}_l}\left( i \right) \times {{{S}}_r}\left( i \right).$

      (8)
    • The saliency detection based on the background prior can highlight the salient region brightly, but cannot effectively suppress the background noise. Therefore, we intend to realize the saliency detection directly by extracting the foreground region.

      When it comes to foreground extraction, the most common and simple method is convex hull[19, 25, 26], which is calculated by using the feature points of input image to obtain a convex polygon with the smallest area that surround all feature points. The most classical method of generating a convex hull is Harris corner detection. However, the traditional Harris corner detection algorithm only considers the gray scale information of the input image and ignores the color information, which may not be robust to the image with complex background, so it will detect more feature points belonging to background region, leading to a poor result of saliency detection. The color boosting Harris corner detection algorithm[18] comprehensively considers the brightness and color information of the input image. Compared with Harris corner detection, the color boosting Harris corner detection can detect more stable feature points, these points mostly concentrate on the boundary or surrounding position of an object. Thus, we will use the color boosting Harris corner to estimate image corners and outline points. Consequently, the convex hull is roughly calculated. First, we detect the color boosting Harris feature points, and these points are joined to form a convex hull that roughly encloses the salient object, as shown in Fig. 5(b), the green line is the first convex hull of the original image. However, it can be seen from Fig. 5(b) that the first convex hull still contains more background regions. In order to further reduce the background region, we generate the convex hull of the filtered image, because the background texture has less impact on computing the convex hull of the filtered image[27]. In this section, a mean filtering algorithm is used to get the filtered image, then the color boosting Harris corner detection is performed on the filtered image and the second convex hull is acquired, as shown in Fig. 5(c), where the yellow line is the second convex hull of the filtered image. In the end, the final minimum convex hull will be obtained by calculating the intersection of these two convex hulls, as shown in the Fig. 5(d) with red line. It is clear that the final minimum convex hull is closer to foreground region, and redundant background in the final minimum convex hull can be effectively reduced.

      Figure 5.  Convex hull

    • In our approach, foreground seeds selection is foundational. The convex hull (see Fig. 6(b)) acquired by using the method in Section 2.2 is regarded as foreground queries to compute the saliency values of nodes in the original image, saliency detection results are shown in Fig. 6(d). At the same time, we view the foreground region (see Fig. 6(c)) extracted from the preliminary saliency map in Section 2.1 as foreground queries, and the manifold ranking results are shown in Fig. 6(e). Comparing the first two image examples, the convex hull contains more accurate salient regions than that of the foreground region in the first image example, so the saliency map based on the convex hull has a better effect on background suppression; On the contrary, the convex hull contains more background region than that of the foreground region in the second image example, so the saliency map based on the convex hull is less capable of suppressing background. For the convex hulls and the foreground regions in the third and fourth images, there are incomplete extractions of salient region. Observing the saliency detection results of the third and fourth images, it is obvious that the detection result is more precise and complete only when the convex hull or the foreground region is closer to salient object.

      Figure 6.  Saliency detection. (a) Original image; (b) Convex hull; (c) Foreground region; (d) The saliency map based on the convex hull; (e) The saliency map based on the foreground region.

      Based on the above detection results, the idea that combines the convex hull and the foreground region to get more accurate foreground seeds is proposed. Same as [25], the region CFR is defined as the intersection of the convex hull (CR) and the foreground region (FR), i.e., $CFR = CR \cap FR$, foreground seeds (${F_{seeds}}$) are selected according to

      ${F_{seeds}} =\begin{cases} {FR}, & {\rm{if}}\;\dfrac{{\left| {CFR} \right|}}{{\left| {FR} \right|}} > {t_2} \\ {FR}, & {\rm{if}}\;\dfrac{{\left| {CFR} \right|}}{{\left| {FR} \right|}} < {t_1}\;{\rm{and}}\;\dfrac{{\left| {CFR} \right|}}{{\left| {CR} \right|}} < {t_1} \\ {CR}, & {\rm{if}}\;\dfrac{{\left| {CFR} \right|}}{{\left| {CR} \right|}} > {t_2}\\ {CFR}, & {\rm{otherwise}} \end{cases}$

      (9)

      where ${t_1}$ and ${t_2}$ are fixed thresholds, their values are 0.4 and 0.8, respectively. According to the selected foreground seeds, the indicator vector ${{y}}$ is determined and substituted into (6) to get the final saliency map.

    • In order to test the effectiveness of the proposed method, comparative experiments are performed on two saliency datasets including ASD and ECSSD. The ASD dataset contains 1000 images, and it is often used in a variety of saliency detection model experiments. What′s more, the difference between the object and the background in the ASD dataset is obvious. The ECSSD dataset also contains 1000 images, which has more complex image structure than that of the ASD dataset. Usually, the saliency detection of the images in the ECSSD dataset is more difficult. Each image in these two datasets has a manually labeled ground truth (GT). In the comparative experiments, we compare our saliency detection scheme with six methods that have better saliency detection effects under the conditions of intuitive visual and objective indicators, these comparison saliency detection methods are frequency-tuned (FT)[28], hierarchical saliency (HS)[22], GS[15], BD[16], MR[17] and the method in [25].

    • Fig. 7 shows the saliency maps generated by the proposed method and six representative comparison methods, the first four images are from the ASD dataset, and the last four images are from the ECSSD dataset.

      Figure 7.  Comparison of the proposed method with seven previous methods

      When detecting the image with simple background shown in the first image, the detection results of other methods more or less contain some background. However, the proposed method in this paper can better suppress this background noise and maintain good robustness. In the second image, the salient object region has a part of reflection, and comparison methods highlight this part, the proposed method improves this phenomenon effectively because of its more accurate foreground definition. The third image contains many salient objects, our method is ideal to deal with background noise while enhancing the integrity of the salient region as much as possible. When the color of salient object and background are particularly similar, our method can also accurately detect the salient region, such as the fourth image, while other methods show the leaf belonging to the background. In both the 5th and 7th images, the salient object touches the image boundary, but the excellent detection capability of the proposed method is obvious. The salient object in the 6th image is so small, and the contrast between the salient object and the complex background is very low. However, the method in this paper can effectively suppress the background information, and the prominent salient region is closer to the ground truth than that of other comparison methods. For the 8th image with complex color and structure of the salient object and complex background, the detection ability of the proposed method is also the best among all the methods. On the whole, the visual effect of the proposed method is better than that of other methods. The detection result of MR is better than FT, HS, GS and BD. The method in [25] and the proposed method are superior to MR according to visual detection results, because these two methods are derived from the improvements of MR. In addition, the foreground seeds of the proposed method is more accurate than that of the method in [25], so the proposed method can suppress background noise more effectively and highlight salient object region more strongly and clearly, and its saliency detection result is closer to ground truth than the other six methods.

      It should be noted that the proposed method is not applicable to all kinds of images. When object color is very close to background color, the proposed method has poor detection effect as other comparison methods, as shown in Fig. 8.

      Figure 8.  Failure saliency detection cases of the proposed method and other comparison methods

      The reason for the poor saliency detection is analyzed as follows. The saliency value of a superpixel is equal to the color feature similarity between this superpixel and all query nodes, and the foreground superpixels are used as query nodes. If the color of a background superpixel is very close to the color of object superpixels, the color feature similarity between this background superpixel and query nodes is high, then the saliency value of this background superpixel is close to the saliency value of object region, this background superpixel is likely to be seen as an object superpixel. Therefore, background superpixels and object superpixels are not highly differentiated, resulting in a bad detection effect. Aiming at the research on the saliency detection of this kind of images, some more features such as position and texture will be combined to calculate the saliency values of all superpixels in images in the next detection work.

    • In order to make an objective quantitative data comparison, three indicators most commonly used in the field of saliency detection are adopted to evaluate the performance of the proposed method, they are including precision-recall (P-R) curve, F-measure and mean absolute error (MAE).

      When drawing the P-R curve, the saliency maps of all the input images are quantified to the range of [0, 255], binary segmentation is performed on these saliency maps by using a total of 256 thresholds from 0 to 255 to obtain binary images. Under each threshold, the binary images of all the detected images are compared with ground truth to calculate their average precision and average recall. Then we will get 256 pairs of precision and recall values, the P-R curve can be drawn according to these values. The precision is the ratio of the overlapping area of the two salient object regions in binary image and GT and the salient object area of the binary image, the recall is the ratio of the overlapping area of the two salient object regions in binary image and GT and the salient object area of the GT, their specific calculation formula are written as

      $Pr ecision = \frac{{{S_p} \cap {G_p}}}{{{S_p}}}$

      (10)

      ${{Re}} call = \frac{{{S_p} \cap {G_p}}}{{{G_p}}}$

      (11)

      where ${S_p}$ is the salient object region in binary image, and ${G_p}$ is the salient object region in GT. Ideally, the precision and recall are high, while it should be noted that there is an inter-restriction relationship between them, the recall may decrease as the precision increases. When both of them are demanding, the F-measure is often used to measure the overall performance of saliency detection method.

      The F-measure is a comprehensive assessment computed by the weighted harmonic of precision and recall, representing their overall performance. When calculating the F-measure, it is necessary to use an adaptive threshold that is twice the mean saliency value of saliency map to binarize the saliency map and compute the precision, recall and F-measure, (12) shows the formula to calculate the F-measure.

      $F {\text{-}} measure = \frac{{(1 + \alpha ) \times Pr ecision \times {{Re}} call}}{{\alpha \times Pr ecision + {{Re}} call}}$

      (12)

      where we set $\alpha = 0.3$ to emphasize the precision. Usually, a histogram is used to demonstrate the precision, recall and F-measure. The higher the F-measure value is, the better the detection effect of saliency detection method is.

      Since the PR curve can only reflect whether the saliency value of the object region is higher than that of the background region, the MAE is introduced to measure the difference between the saliency map and GT, evaluating the saliency detection results.

      $MAE = \frac{1}{{WH}}\sum\limits_{i = 1}^W {\sum\limits_{j = 1}^H {\left| {S({p_{ij}}) - G({p_{ij}})} \right|} } $

      (13)

      where $W$ and $H$ are the width and height of the image, ${p_{ij}}$ is a pixel of the image, $S({p_{ij}})$ and $G({p_{ij}})$ represent the value of the pixel in saliency map and GT, respectively. If the MAE value is small, the difference between the saliency map and GT will also be small.

      From the P-R curve of ASD dataset, the proposed method is very close to BD, MR and the method in [25], it is difficult to see who is better from Fig. 9(a). In the ECSSD dataset, the proposed method is slightly better. The F-measure is the comprehensive assessment of precision and recall.

      Figure 9.  P-R curve

      Compared with the other six methods, the F-measure value of the proposed method is bigger in Figs. 10(a) and 10(b), because the foreground seeds selected in the proposed method is closer to the saliency object more accurately and directly. Especially for ECSSD dataset with complex backgrounds, the precision of our method is significantly higher than that of other methods.

      Figure 10.  F-measure

      Observing Figs. 11(a) and 11(b), we can see that the MAE of our method is improved compared with MR and the method in [25]. From the perspective of the overall data, the MAE of our method is only slightly inferior to that of BD in ASD dataset. In the ECSSD dataset, the MAE values of the proposed method and BD are equivalent and better than the other five methods, indicating that the proposed method has a better ability to suppress background noise under a complex background. It is worth mentioning that the proposed method has improved in P-R curve, F-measure and MAE compared with MR and the method in [25].

      Figure 11.  MAE

    • The key to the saliency detection is foreground seeds for manifold ranking. In order to get more accurate foreground seeds, the proposed method has two improvements on the basis of MR. First, all boundary superpixels are selected, the superpixels belonging to background are used as background queries for the first-stage saliency detection; Second, the foreground region obtained by binarization segmentation of the preliminary saliency map is combined with the convex hull algorithm to obtain foreground seeds for the second-stage saliency detection. Here, we will discuss the function of the two parts of the proposed method according to the experiment results on ASD and ECSSD datasets.

      In order to discuss the function of the first part of the proposed method, all boundary superpixels (condition 1) and the selected boundary superpixels (condition 2) in the image are used as background queries for the first-stage saliency detection respectively, the preliminary saliency maps of the above two conditions are obtained, and the average saliency value is used as the threshold for binary segmentation to acquire FR. Since FR is to be used for determining foreground seeds in the second-stage detection, the similarity between FR and object region should be discussed, which is defined as

      ${R} = \frac{{FR \cap {G_p}}}{{{G_p}}}$

      (14)

      where all the variables have been described above.

      As shown in Fig. 12, when the object extends to the image boundary, the condition 2 can get a more accurate salient object. If no object pixel appears on the boundary, the above two conditions have the same result. In Table 1, the average R values of the condition 2 is higher than that of the condition 1, just because the background queries of the proposed method have less incorrect background information. Therefore, the function of the first part of the proposed method is to obtain a FR that is more similar to the object, which is helpful to determine more precise foreground seeds for the second-stage saliency detection.

      DatasetCondition 1Condition 2
      ASD0.78170.7847
      ECSSD0.57980.5877

      Table 1.  R values of the preliminary saliency map in two conditions

      Figure 12.  Preliminary saliency maps

      The second part of the proposed method is that CFR is viewed as foreground seeds for the second-stage saliency detection. To illustrate the function of the second part, CFR and FR are respectively used as foreground seeds for manifold ranking. The P-R curve and the MAE values are used to assess the impact of CFR as foreground seeds on the detection results. As shown in Table 2 and Fig. 13, the MAE values of CFR on two datasets are smaller, and the P-R curves of CFR are higher than that of FR. These experimental data fully demonstrate that the CFR region extracted by convex hull combined with FR in the second part of the proposed method is closer to object region, CFR as foreground seeds is more accurate than FR as foreground seeds. On the whole, the two parts of the proposed method aim to extract the region closer to the object as foreground seeds for manifold ranking, so as to improve the effect of saliency detection as much as possible.

      DatasetCFRFR
      ASD0.07140.0848
      ECSSD0.17730.1865

      Table 2.  MAE values of different foreground seeds

      Figure 13.  P-R curve

    • To further discuss the efficiency of the proposed method, the average detection time of each image in ASD and ECSSD datasets is calculated. The average time of each method is shown in Table 3. The programs run on a laptop with an Intel(R) Core(TM) i3-3217U CPU and 4G memory.

      MethodsTime (s)Code
      FT1.72Matlab
      HS0.74C++
      GS0.57Matlab
      BD0.59Matlab
      MR0.63Matlab
      Method in [25]1.51Matlab
      Ours1.65Matlab

      Table 3.  Running time of different methods

      Among all the methods, the time of MR, BD, GS and HS are short. The proposed method is improved on the basis of MR, and the running time is mainly spent on the selection of foreground seeds, so the time is slightly longer than MR, which is equivalent to the efficiency of FT and the method in [25]. Combined with other indicators, the proposed method still has advantages.

    • A saliency detection method based on robust foreground seeds is proposed in this paper. Using the segmentation result of the preliminary saliency map, the rough foreground region is obtained, and then it is used to determine the robust foreground seeds by combining the intersection of the two convex hulls in the original and filtered images. Through manifold ranking based on these foreground seeds, the prominent saliency map with good visual effects is obtained. From the experimental results, the proposed method directly defines a more accurate foreground rather than just relies on background prior. Therefore, the detection effect obtained is much closer to the ground truth. Furthermore, the final saliency map also maintains a relatively high accuracy in a complex background. How to take full use of integration of higher level multiple features to select more accurate foreground seeds and determine the salient object location so as to improve the accuracy of saliency detection in the condition of complex background will be a further research direction.

Reference (28)

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return