Unsupervised image segmentation aims at grouping different semantic patterns in an image without the use of human annotation. Similarly, image clustering searches for groupings of images based on their semantic content without supervision. Classically, both problems have captivated researchers as they drew from sound mathematical concepts to produce concrete applications. With the emergence of deep learning, the scientific community turned its attention to complex neural network-based solvers that achieved impressive results in those domains but rarely leveraged the advances made by classical methods. In this work, we propose a patch-based unsupervised image segmentation strategy that bridges advances in unsupervised feature extraction from deep clustering methods with the algorithmic help of classical graph-based methods. We show that a simple convolutional neural network, trained to classify image patches and iteratively regularized using graph cuts, naturally leads to a state-of-the-art fully-convolutional unsupervised pixel-level segmenter. Furthermore, we demonstrate that this is the ideal setting for leveraging the patch-level pairwise features generated by vision transformer models. Our results on real image data demonstrate the effectiveness of our proposed methodology.
Clustering data objects into homogeneous groups is one of the most important tasks in data mining. Spectral clustering is arguably one of the most important algorithms for clustering, as it is appealing for its theoretical soundness and is adaptable to many real-world data settings. For example, mixed data, where the data is composed of numerical and categorical features, is typically handled via numerical discretization, dummy coding, or similarity computation that takes into account both data types. This paper explores a more natural way to incorporate both numerical and categorical information into the spectral clustering algorithm, avoiding the need for data preprocessing or the use of sophisticated similarity functions. We propose adding extra nodes corresponding to the different categories the data may belong to and show that it leads to an interpretable clustering objective function. Furthermore, we demonstrate that this simple framework leads to a linear-time spectral clustering algorithm for categorical-only data. Finally, we compare the performance of our algorithms against other related methods and show that it provides a competitive alternative to them in terms of performance and runtime.
Spectral Clustering is one of the most traditional methods to solve segmentation problems. Based on Normalized Cuts, it aims at partitioning an image using an objective function defined by a graph. Despite their mathematical attractiveness, spectral approaches are traditionally neglected by the scientific community due to their practical issues and underperformance. In this paper, we adopt a sparse graph formulation based on the inclusion of extra nodes to a simple grid graph. While the grid encodes the pixel spatial disposition, the extra nodes account for the pixel color data. Applying the original Normalized Cuts algorithm to this graph leads to a simple and scalable method for spectral image segmentation, with an interpretable solution. Our experiments also demonstrate that our proposed methodology over performs both traditional and modern unsupervised algorithms for segmentation in both real and synthetic data.
The analysis of Synthetic Aperture Radar (SAR) imagery is an important step in remote sensing applications, and it is a challenging problem due to its inherent speckle noise. One typical solution is to model the data using the G0I distribution and extract its roughness information, which in turn can be used in posterior imaging tasks, such as segmentation, classification and interpretation. This leads to the need of quick and reliable estimation of the roughness parameter from SAR data, especially with high resolution images. Unfortunately, traditional parameter estimation procedures are slow and prone to estimation failures. In this work, we proposed a neural network-based estimation framework that first learns how to predict underlying parameters of G0I samples and then can be used to estimate the roughness of unseen data. We show that this approach leads to an estimator that is quicker, yields less estimation error and is less prone to failures than the traditional estimation procedures for this problem, even when we use a simple network. More importantly, we show that this same methodology can be generalized to handle image inputs and, even if trained on purely synthetic data for a few seconds, is able to perform real time pixel-wise roughness estimation for high resolution real SAR imagery.
Image Segmentation is one of the core tasks in Computer Vision and solving it often depends on modeling the image appearance data via the color distributions of each it its constituent regions. Whereas many segmentation algorithms handle the appearance models dependence using alternation or implicit methods, we propose here a new approach to directly estimate them from the image without prior information on the underlying segmentation. Our method uses local high order color statistics from the image as an input to tensor factorization-based estimator for latent variable models. This approach is able to estimate models in multiregion images and automatically output the regions proportions without prior user interaction, overcoming the drawbacks from a prior attempt to this problem. We also demonstrate the performance of our proposed method in many challenging synthetic and real imaging scenarios and show that it leads to an efficient segmentation algorithm.
Image segmentation algorithms often depend on appearance models that characterize the distribution of pixel values in different image regions. We describe a new approach for estimating appearance models directly from an image, without explicit consideration of the pixels that make up each region. Our approach is based on novel algebraic expressions that relate local image statistics to the appearance of spatially coherent regions. We describe two algorithms that can use the aforementioned algebraic expressions to estimate appearance models directly from an image. The first algorithm solves a system of linear and quadratic equations using a least squares formulation. The second algorithm is a spectral method based on an eigenvector computation. We present experimental results that demonstrate the proposed methods work well in practice and lead to effective image segmentation algorithms.
This paper presents a novel hierarchical nuclei segmentation algorithm for isolated and overlapping cervical cells based on a narrow band level set implementation. Our method applies a new multiscale analysis algorithm to estimate the number of clusters in each image region containing cells, which turns into the input to a narrow band level set algorithm. We assess the nuclei segmentation results on three public cervical cell image databases. Overall, our segmentation method outperformed six state-of-the-art methods concerning the number of correctly segmented nuclei and the Dice coefficient reached values equal to or higher than 0.90. We also carried out classification experiments using features extracted from our segmentation results and the proposed pipeline achieved the highest average accuracy values equal to 0.89 and 0.77 for two-class and three-class problems, respectively. These results demonstrated the suitability of the proposed segmentation algorithm to integrate decision support systems for cervical cell screening.
The nuclei and cytoplasm segmentation of cervical cells is a well studied problem. However, the current segmentation algorithms are not robust to clinical practice due to the high computational cost or because they cannot accurately segment cells with high overlapping. In this paper, we propose a method that is capable of segmenting both cytoplasm and nucleus of each individual cell in a clump of overlapping cells. The proposed method consists of three steps: 1) cellular mass segmentation; 2) nucleus segmentation; 3) cytoplasm identification based on an active contour method. We carried out experiments on both synthetic and real cell images. The performance evaluation of the proposed method showed that it was less sensitive to the increase in the number of cells per image and the overlapping ratio against two other existing algorithms. It has also achieved a promising low processing time and, hence, it has the potential to support expert systems for cervical cell recognition.
We introduce a new spectral method for image segmentation that incorporates long range relationships for global appearance modeling. The approach combines two different graphs, one is a sparse graph that captures spatial relationships between nearby pixels and another is a dense graph that captures pairwise similarity between all pairs of pixels. We extend the spectral method for Normalized Cuts to this setting by combining the transition matrices of Markov chains associated with each graph. We also derive an efficient method that uses importance sampling for sparsifying the dense graph of appearance relationships. This leads to a practical algorithm for segmenting high-resolution images. The resulting method can segment challenging images without any filtering or pre-processing.
Shape analysis is a key task in computer vision, and multiscale descriptors can significantly enhance shape characterization. However, these descriptors often rely on parameter adjustments to configure a meaningful set of scales that can enable shape analysis. Parameter adjustment in large image datasets is often done on a trial-and-error basis, and an alternative solution to mitigate such a limitation is the use of metaheuristic optimization. The main contribution of this paper is to provide a strategy that supports the automatic parameter adjustment of a multiscale descriptor within a metaheuristic optimization algorithm, where the choice of the cost function strongly influences and boosts the performance of the shape description, which is closely related to the problem domain, i.e. the image dataset. Our research considers synthetic data in a prior evaluation of the cost functions that optimize the scale parameters of the Normalized Multiscale Bending Energy (NMBE) descriptor through the Simulated Annealing (SA) metaheuristic. The cost functions that drive this metaheuristic are: Silhouette (SI), the Davies–Bouldin index (DB) and the Calinski-Harabasz index (CH). We conduct content-based image retrieval and classification experiments to assess the optimized descriptor using three healthcare image datasets: Amphetamine Type Stimulants (ATS) pills (Illicit Pills), pills from the National Library of Medicine (NLM Pills) and hand alphabet gestures (Hands). We also provide segmentation masks for Illicit Pills to guarantee reproducibility. We report the results of tests using a state-of-art method based on a deep neural network, Inception-ResNet-v2. The optimized NMBE with SI and DB achieved competitive and accurate values of above 94%, in terms of both the Mean Average Precision measure (MAP) and Accuracy (ACC) for Illicit Pills and NLM Pills. The precision recall curves demonstrate that it outperforms the Inception-ResNet-v2 for both of these datasets.
SAR image segmentation plays a central role in geoscience and remote sensing of the environment. Recently, methodologies that apply traditional segmentation algorithms to maps of statistical information extracted from SAR image rather than to the raw data itself have shown promising results. Nonetheless, the application of more powerful segmentation methods to these maps is constrained by the lack of adequate statistical models for such data. In this letter, we present a level-set-based algorithm that embodies much of the data statistics without assuming any prior model for it. We also evaluated its performance on both real and synthetic SAR images.
Algorithms for retinal vessel segmentation are powerful tools in automatic tracking systems for early detection of ophthalmological and cardiovascular diseases, and for biometric identification. In order to create more robust and reliable systems, the algorithms need to be accurately evaluated to certify their ability to emulate specific human expertise. The main contribution of this paper is an unsupervised method to detect blood vessels in fundus images using a coarse-to-fine approach. Our methodology combines Gaussian smoothing, a morphological top-hat operator, and vessel contrast enhancement for background homogenization and noise reduction. Here, statistics of spatial dependency and probability are used to coarsely approximate the vessel map with an adaptive local thresholding scheme. The coarse segmentation is then refined through curvature analysis and morphological reconstruction to reduce pixel mislabeling and better estimate the retinal vessel tree. The method was evaluated in terms of its sensitivity, specificity and balanced accuracy. Extensive experiments have been conducted on DRIVE and STARE public retinal images databases. Comparisons with state-of-the-art methods revealed that our method outperformed most recent methods in terms of sensitivity and balanced accuracy with an average of 0.7819 and 0.8702, respectively. Also, the proposed method outperformed state-of-the-art methods when evaluating only pathological images that is a more challenging task. The method achieved for this set of images an average of 0.7842 and 0.8662 for sensitivity and balanced accuracy, respectively. Visual inspection also revealed that the proposed approach effectively addressed main image distortions by reducing mislabeling of central vessel reflex regions and false-positive detection of pathological patterns. These improvements indicate the ability of the method to accurately approximate the vessel tree with reduced visual interference of pathological patterns and vessel-like structures. Therefore, our method has the potential for supporting expert systems in screening, diagnosis and treatment of ophthalmological diseases, and furthermore for personal recognition based on retinal profile matching.
Shape description often relies on parameter adjustment in order to configure a meaningful scale that enables a computer vision task. Instead of manual interaction, which is prohibitive for large datasets, an alternative solution towards supporting multiscale methodology is to apply metaheuristic optimization. Nevertheless, the cost function assigned to the optimization process is an open question that we fully address in this paper. Our investigation describes the influence of the cost function on the performance of an optimized multiscale shape descriptor using three distinct clustering metrics: the Silhouette, Davies-Bouldin and Calinski-Harabasz indices. Thus, we optimize the scale parameters of the Normalized Multiscale Bending Energy descriptor using the Simulated Annealing metaheuristic; both classification and retrieval experiments are conducted using a synthetic shape dataset (Kimia 99), two real plant leaf datasets (ShapeCN and Swedish) and the National Library of Medicine (NLM) pill image dataset (NLM Pills). Using the Bulls-eye ratio and the Accuracy measure, the performance evaluation showed that optimized descriptor with the Calinski-Harabasz cost function underperformed other functions for datasets where there is high level of dissimilarity between classes. Particularly for the NLM Pills, where each class has a well-defined pattern and differences within pill classes are quite small, the Normalized Multiscale Bending Energy descriptor did not benefit from the optimization methodology. We also present a qualitative assessment of the cluster arrangements produced by the Self-Organizing Map (SOM) which reinforced that the three cost functions performed differently within the optimized shape descriptor.
Noisy image segmentation is one of the most important and challenging problem in computer vision. In this paper, we propose a level set segmentation technique inspired by the classic Otsu thresholding method. The front propagation of the proposed level set based method embeds a cost function that takes into account first-order statistical moments. In order to deal with highly noisy images, we also added a morphological step to our algorithm which led the final segmentation more robust and efficient. Tests were carried out on images artificially contaminated with Gaussian and Salt & Pepper noise patterns. The results showed that our methodology outperformed the classic Otsu thresholding algorithm and an active contour based technique in terms of the Error of Segmentation (EoS), Rate of False Positive (RFP), Rate of False Negative (RFN) and Dice evaluation measures. In addition, the designed algorithm attained a lower average computational time when compared to the active contour related method.
Image segmentation can be applied to a broad class of different problems. However, it is not usually a simple task for synthetic aperture radar (SAR) images due to the presence of speckle. Given the importance of SAR images in remote sensing problems, this letter introduces a simplified and general methodology to achieve SAR image segmentation by using the estimated roughness parameters of SAR data modeled by G I 0 and G A 0 distributions, instead of directly processing the speckled images. In this letter, we adopted the log-cumulants method for the roughness parameter estimation. The performance evaluation of the results was attained in terms of the error of segmentation and cross-region fitting measures for synthetic and real SAR images, respectively. With regard to synthetic images, we performed Monte Carlo experiments which confirmed the suitability of SAR image segmentation by means of roughness parameters. The results showed that the methodology provides a feasible input to SAR image segmentation algorithms which also include thresholding-based methods. The proposed approach accomplished satisfactory results for the most interesting and critical study case, i.e., the single-look images, which are markedly affected by speckle.
Synthetic aperture radar (SAR) image segmentation is an important task in image processing. However, classic segmentation techniques are inadequate due to the presence of speckle noise. In this paper, we present a methodology for SAR image segmentation that uses the matrix of Rényi’s entropy. This matrix arises from SAR data that follows the G A 0 model, and here, it is an input to segmentation methods. For performance evaluation of the proposed methodology, we employ the error of segmentation, the cross-region fitting index, the Dice measure, as well as the rates of false positives and negatives. Tests have been performed on synthetic and real SAR images and the matrix of entropy has improved the results, regardless of the increase of the number of looks. The Otsu’s method of global thresholding produced good segmentation results when applied to this matrix.