Image segmentation studies

come from http://blog.sina.com.cn/s/blog_40884e9b0100v6qu.html

mage segmentation is one of the primary steps in image analysis for object identification. The main aim is to recognise homogeneous regions within an image as distinct and belonging to different objects. Segmentation stage does not worry about the identity of the objects. They can be labelled later. The segmentation process can be based on finding the maximum homogeneity in grey levels within the regions identified.

There are several issues related to image segmentation that require detailed review. One of the common problems encountered in image segmentation is choosing a suitable approach for isolating different objects from the background. The segmentation doesn?t perform well if the grey levels of different objects are quite similar. Image enhancement techniques seek to improve the visual appearance of an image. They emphasize the salient features of the original image and simplify the task of image segmentation. The type of operator chosen has a direct impact on the quality of the resultant image. It is expected that an ideal operator will enhance the boundary differences between the objects and their background making the image segmentation task easier. Issues related to segmentation involve choosing good segmentation algorithms, measuring their performance, and understanding their impact on the scene analysis system.

Segmentation techniques

We review primarily those studies that are based on finding object regions in grey-level images. We also mention couple of studies that deal with colour segmentation to highlight how this has been used for outdoor scene analysis. Image segmentation has been approached from a wide variety of perspectives[1]. Our summary is presented for histogram thresholding, edge based segmentation, tree/graph based approaches, region growing, clustering, probabilistic or Bayesian approaches, neural networks for segmentation, and other approaches.

Histogram Thresholding

Ohlander[2] proposed a thresholding technique that is very useful on segmenting outdoor colour images. This is based on constructing colour and hue histograms. The picture is thresholded at its most clearly separated peak. The process iterates for each segmented part of the image until no separate peaks are found in any of the histograms. The criteria to separate peaks was based on the ratio of peak maximum to peak minimum to be greater than or equal to two. Textured areas were separated from uniform regions by using a Sobel operator marking regions that contain large edge activity (more than 25 edge pixels in a 9x9 pixel window). These were called ?busy? areas. The original area was not to segment inside these busy areas based on thresholding but sometimes it is necessary to do so as in the case of segmenting skylines.

Cheriet et al.[3] presented a general recursive approach for image segmentation by extending Otsu?s method[4]. This approach has been implemented in the area of document images, specifically for segmenting bank cheques. This approach segments the brightest homogeneous object from a given image at each recursion, leaving the darkest homogeneous object. This method is developed without any constraints on the number of objects in the digital image. The method is based on discriminant analysis. The thresholding operation is regarded as the partitioning of pixels of an image into two classes: object and background. For each iteration, the histogram of the image is drawn and the largest peak is separated from the rest of the image. The process is continued till there are no more peaks left in the histogram. Similar to Otsu?s method, the proposed method only performs well on the images with two classes. Using the method iteratively one can segment more than two objects. This technique also facilitates the process of information extraction from document images and preserves the topological properties of the extracted information that is used for further recognition. In this study, a set of human visual criterion was defined, where each criterion was given a weight. The system was trained on 220 bank cheques and 505 different cheques were used for testing the performance. Results of the performance analysis indicate that the percentage recognition rate varied between 93%-100% for different sets of test samples. The authors concluded that the method gives good results when the target object is the darkest object in a given image. However, when the target object is not darkest the method fails to segment properly.

In a number of applications, histogram thresholding is not possible simply because the histogram may be unimodal. In some cases the images may be of such quality that any preprocessing may not improve the contrast between objects sufficiently and hence one may not achieve two or more peaks in the histogram for selecting thresholds for segmentation. Unimodal distributions are typically obtained when the image consists of mostly of a large background area with small, but significant regions. This often happens in medical imaging applications. Similarly in aerial scenes with many different objects, the histogram may only have one peak because of the vast range of intensities for each object and an overlap between these. Bhanu and Faugeras[5] propose a gradient relaxation algorithm for solving the above problem and compare it to non-linear probabilistic relaxation algorithm proposed by Rosenfeld[6]. The process is based on assigning each pixel an initial probability of occurrence and then using gradient relaxation based on compatibility function that taken into account the relationship between a pixel and its eight neighbours. The relaxation process is iterative where pixel values are changed so that the histogram is no longer unimodal and thresholds can be easily detected. Compared to Rosenfeld?s method, the authors claim that this technique provides a better control by defining parameters that are under user control. The user can therefore control the degree of smoothing at each iteration as well as the initial assignment of probabilities. An extension of the gradient relaxation approach for image segmentation is presented in Bhanu and Parvin[7]. This new segmentation technique is based on recursive split and merge and has a number of advantages. The method avoids any heuristics and arbitrary measures for partitioning the image. Hence it does not require a robust merging step in order to remove boundary artefacts. In addition, the method does not require detailed peak location and selection procedure.

Li et al.[8] propose that the use of two dimensional histograms of an image is more useful for finding thresholds for segmentation rather than just using grey level information in one dimension. In 2D histograms, the information on point pixels as well as the local grey level average of their neighbourhood is used. The authors show that the application of Fisher linear discriminant to the histogram results in an optimal projection where the data clusters are better defined and hence easier to separate by choosing appropriate thresholds. The experimental results show that the proposed technique has the same computational demands as of one dimensional techniques but gives better segmentation results.

Edge based segmentation

Ahuja et al.[9] describe how pixel neighbourhood elements can be used for image segmentation. For each, its neighbours are first identified in a window of fixed size. A vector of these neighbours as individual grey values or vector of average grey levels in windows of size 1x1, 3x3 and 5x5 is determined. The paper uses both vector representations. The aim is to identify a weight matrix that multiplied with these vectors will yield a discriminant value that allows the classification of pixel in one of the two classes. Two different sets of experiments were performed on FLIR images. The first experiment is performed on 10 images. In the first experiment, two choices of feature vectors are chosen for every pixel. The images were of size 64x64 pixels containing a single tank each. The segmentation was done using the super-slice algorithm. The features were averaged over different sized windows (3x3 pixels and 5x5 pixels) and classification was performed using Fisher?s linear decision rule. In the second experiment, a separate set of 25 images was chosen with light objects on dark background. Ten images were used for training and 12 points were chosen from the background and 12 from the interior of the object for each image. All of these samples were then used to train the classifiers. Testing was done on the remaining set of 15 images with some added noise. The authors stated that, for the noisy data, using the pixel grey level as the only feature, the results obtained were better than those obtained by pixel grey level properties. The authors proposed that the error rate for noisy images seems to depend primarily on number of features used. The authors concluded that the results suggest that the grey-levels of the pixel and its neighbours are a good set of features to use in pixel classification.

Prager[10] proposed a set of algorithms used to perform segmentation of natural scenes through boundary analysis. The goal of the algorithm is to locate the boundaries of an object correctly in a scene. First, pre-processing of the images is done to clean up the raw data by smoothing and noise-removal. Second, the edge representation is generated. Differentiation is done to find the edge-strength at each point in the image. Suppression is then done to remove multiple edges formed by spatial differentiation of boundaries. Third, the edges are joined into line segments and features are computed. The features include: length, contrast, frequency, mean, variance and location of each line segment. Fourth, post-processing is done to remove unwanted line segments and to build confidence for each of the remaining segments. The output of the system is a set of line segments with a list of attributes, such as length and confidence. The authors concluded that advantages of this approach are twofold. Firstly, the effects of each stage are well defined. Secondly, any stage can be omitted or replaced by a more sophisticated variant if desired.

Perkins[11] uses an edge based technique for image segmentation. It is acknowledged that edge based segmentation has not been very successful because of small gaps that allow merging of dissimilar regions. In order to avoid these problems, the paper proposes an expansion-contraction technique in which edge regions are expanded to close gaps and then contracted after the separate regions have been labelled. The size of expansion is controlled such that small regions are not engulfed by this process. The process involves the use of Sobel filter for producing edge strengths and directions at every point. The edges are thinned and the result is automatically thresholded leaving only ridges. The ridges separate regions of different intensity but there may be small gaps. Segmentation is performed by expanding active edge regions, labelling the segmented uniform intensity regions, and then contracting edge regions. The results are shown for landscape pictures and for pictures of electronic circuits.

Chan et al.[12] developed a new adaptive thresholding algorithm for image segmentation using variational theory. The method is a heuristic algorithm, which consists of seven steps. First, image smoothing is done using an average filter. Grey-level gradient magnitude is then derived from it. A thresholding and thinning algorithm is applied to the gradient magnitude to find the object boundary points. The image is then sampled at the boundary points as the local thresholds. The threshold surface is interpolated by the sampled local thresholds. Then the image is segmented by the threshold surface. Noise is then removed from the segmented image by a variational method. The method is performed iteratively to segment an image. The iteration process is stopped using an error or iteration time threshold. The authors presented the results on an image with 16 objects and simulated background and also on some handwriting images. The results were found to be more satisfactory when there are a large number of objects in the image as they yield a better thresholding surface. The method is demonstrated successfully on segmenting images for an OCR application.

Tree/graph based approaches

Cho and Meer[13] proposed a new approach for segmentation, which is derived from the consensus of a set of different segmentation outputs on one input image. Instead of statistics characterising the spatial structure of the local neighbourhood of a pixel, for every pair of adjacent pixels their collected statistics are used for determining local homogeneity. Several initial segmentations are derived from the same input image by changing the probabilistic component of the hierarchical Region Adjancecy Graph (RAG) pyramid based technique. From the ensemble of these initial segmentations, for every adjacent pixel pair a co-occurrence probability is derived, which captures global information (about the image) at the local level (pixel level). The final segmentation of the input image is obtained by processing the co-occurrence probability field with the same RAG pyramid technique. The pixel pairs with high co-occurrence probability are then grouped together based on the consensus about local homogeneity. This technique can also be used to extract the high confidence homogeneous regions from the co-occurrence probability field. Bayesian networks were then used to extract features from images. The features extracted were variance of the width of the region, ratio of average width to length and the average grey level. Then post-processing of over-segmented images is done based upon a priori information about the sought features. The RAG of the final segmentation provides the spatial relationship between regions and can used for further interactive analysis of the image. This segmentation method is completely unsupervised. Experiments were performed on an aerial image, and images of boat, pentagon, and house.

Yeung et al.[14] proposed the technique of segmentation of video by clustering and graph analysis. The method is also extended to the Scene Transition Graph (STG) representation for the analysis of temporal structures extracted from a video. First, a video is taken and then similar shots are recognised to reduce the amount of information to be processed. Then dissimilarity between two shots is defined based upon the dissimilarity indices for all image pairs in the two shots. Then automatic segmentation of scenes and story units is done. The images are segmented based on the presentation of the featured video sequence along the timeline and the visual contents. The video shots are then clustered together based upon visual similarities of image contents and the temporal dynamics shown in the visual contents within individual shots (termed as time-constrained clustering). Then an STG is drawn and analysed. An STG is then used to represent compactly the structures of shots and the temporal flow of the story of videos. Some new attributes are defined from STG representation for the further analysis. It is assumed that two consecutive scenes in a video presentation do not share significant similarities in visual qualities. Cluster labels are given to shots in different scenes. The experiment was performed on a sequence of shots to extract story units. The video was successfully decomposed into a hierarchy of story units, each of which consisted of clusters of similar shots. The methodology also provided the mean of non-linear access to a featured program which in turn facilitated browsing of video contents.

Region growing

A range of image segmentation algorithms are based on region growing. We review some relevant studies that have used region growing algorithms. Region growing algorithms take one or more pixels, called seeds, and grow the regions around them based upon a certain homogeneity criteria. If the adjoining pixels are similar to the seed, they are merged with them within a single region. The process continues until all the pixels in the image are assigned to one or more regions.

Chang and Li[15] proposed a region-growing framework for image segmentation. This process is guided by regional feature analysis and no parameter tuning or a priori knowledge about the image is required. The algorithm is known as Fast Adaptive Segmentation (FAS) algorithm. The image is first divided into many small primitive regions that are assumed to be homogeneous. These primitive regions are then merged to form larger regions until no more merges are possible. Two regions are merged if they pass the homogeneity test and also if the value of the edge connecting them is weak. The focus of this study is on investigating how different merge criteria affect the quality of segmentation and the processing time. The experiments designed to evaluate the merge criteria are based on four important aspects of segmentation output: region mergeability, boundary accuracy, merge rejections, and number of iterations required. In these experiments, 300 images of size 50x50 pixels were used. Each image had two equal sized regions that share a simple straight 50-unit boundary. The data was randomly generated for the two regions from a pair of normal distributions having different means and variances. The best results using the fast merge method gives the correct classification rate of 86% (with less than 5 regions in the image). The authors concluded that the algorithm automatically computes segmentation thresholds based on local feature analysis. The algorithm is robust and produces high quality segmentation on a wide range of textured and grey scale images. This framework can also be easily adapted to different image applications by substituting the suitable features. The main limitation of this algorithm is however the limited applicability of the adaptive homogeneity test on very small regions and order dependency of its segmentation results.

For region growing, seeds can be automatically or manually selected. Their automated selection can be based on finding pixels that are of interest, e.g. the brightest pixel in an infra-red image can serve as a seed pixel. They can also be determined from the peaks found in an image histogram. On the other hand, seeds can also be selected manually for every object present in the image. Adams and Bischof[16] studied the effectiveness of seeded region growing approach for image segmentation of greyscale images where the seeds are manually selected. The method is employed to segment an image into different regions using a set of seeds. Each seeded region is a connected component comprising of one or more points and is represented by a set S. The set of immediate neighbours bordering the pixel is calculated. The neighbours are then examined and if they intersect any region from set S, then a measure d (difference between a pixel and the intersected region) is computed. If the neighbours intersect more than one region, then the set is taken as that region for which difference measure d is maximum. The new state of regions for the set then constitutes input to the next iteration. This process continues until all of the image pixels have been assimilated into regions. Hence, for each iteration the pixel that is most similar to a region that it borders is appended to that region. The SRG algorithm is inherently dependent on the order of processing image pixels. One implication of this algorithm is that raster order processing and anti-raster order processing do not lead to the same tessellation. The algorithm was applied on images with different types of objects in them. The authors concluded that the method has the advantage that it is fairly robust, quick, and parameter free except for its dependency on the order of pixel processing.?

Mehnert and Jackway[17] improved the above seeded region-growing algorithm by making it independent of the pixel order of processing and making it more parallel. Their study presents a novel technique for Improved Seeded Region Growing (ISRG). ISRG algorithm retains the advantages of Seeded Region Growing (SRG) such as fast execution, robust segmentation and no parameters to tune. The algorithm is also pixel order independent. If more than one pixel in the neighbourhood have same minimum similarity measure value, then all of them are processed in parallel. No pixel can be labelled and no region can be updated until all other pixels with the same priority have been examined. If a pixel cannot be labelled, because it is equally likely belong to two or more adjacent regions, then it is labelled as ?tied? and takes no part in the region growing process. After all of the pixels in the image have been labelled, the pixels labelled 'tied' are independently re-examined to see whether or not the ties can be resolved. To resolve the ties an additional assignment criterion is imposed, such as assigning a tied pixel to the largest neighbouring region or to the neighbouring region with the largest mean. This is a post-processing step. The algorithm in this study was tested on the image of man made objects such as a car, an aeroplane, and buildings. The authors concluded that ISRG algorithm produces consistent segmentation because it is not dependent on the order of pixel processing. Parallel processing ensures that the pixels with the same priority are processed in the same manner simultaneously.

Basu[18] developed general sets of semantics for region detection to describe a number of image models using them. The semantic set is established empirically based on simple and intuitive properties of a region. It can be extended to include new semantics without altering the conceptual and computational framework of the method. In case of region detection, each pixel of a digitised image is chosen as a primitive. An ideal region is obtained from a group of pixels of the given image by approximating the grey-level values of these pixels with a linear or quadratic approximation scheme. A set of attributes is calculated for each pixel of the group belonging to the given image as well as for the each pixel of the ideal region. In particular, four attributes: contrast, total variation, global average grey-level value, and representative grey-level value are used. Distance between the group of pixels and the ideal region is obtained as a numerical measure of dissimilarity between them. Pixel assignment to regions is decided according to the distance measure. The authors concluded that there is a noticeable improvement in these results when semantic information is added. There is no optimal set of attributes that can be used for all of the images. Any image has to be tested with a number of different attribute value combinations in order to obtain the best possible segmentation results. This scheme is found suitable for the class of problems in which fine structural detail of an image is not needed. The author demonstrated better results as a consequence of using semantic information than other known segmentation methods on natural objects.

Beaulieu and Goldberg[19] proposed a hierarchical stepwise optimisation algorithm for region merging which is based on stepwise optimisation and produces a hierarchical decomposition of the image. The algorithm starts with an initial image partition into a number of regions. At each iteration, two segments are merged provided they minimise a criteria of merging a segment to? another. In this stepwise optimisation, the algorithm searches the whole image context before merging two segments and finds the optimal pair of segments. This means that the most similar segments are merged first. The algorithm gradually merges the segments and produces a sequence of partitions. The sequence of partitions reflect the hierarchical structure of the image. This is also termed as agglomerative clustering. The authors concluded that the results obtained from the final grouping of the algorithm are far better than the conventional non-hierarchical methods such as k-means clustering. The advantage of this algorithm over non-hierarchical methods is that there is no need to specify the seed points.

In some studies, edge or gradient information has been used in combination with region growing for image segmentation. Gambotto[20] proposed an algorithm that combines the region growing and edge detection methods for segmenting the images. The method is iterative and uses both of these approaches in parallel. The algorithm starts with an initial set of seeds located inside the true boundary of the region. The pixels that are adjacent to the region are iteratively merged with it if they satisfy a similarity criterion. A second criterion uses the average gradient over the region boundary to stop the growth. The last stage refines the segmentation. The analysis is based on cooperation between the region growing algorithm and the contour detection algorithm. Since, the growing process is performed by adding segments to a region, some pixels which belong to the next region and to the previous region, may be misclassified. A nearest neighbour rule is then used to locally reclassify these pixels.? The algorithm was tested on a number of x-ray images. The authors concluded that the method uses both the statistical model of the region and the average gradient computed over its boundary. The system behaves similar to the snake class of algorithms but it can also be used to segment closed regions having unknown and complex shapes.

Hojjatoleslami and Kittler[21] proposed a new region growing approach for image segmentation which uses gradient information to specify the boundary of a region. The method has the capability of finding the boundary of a relatively bright/dark region in a textured background. The method relies on a measure of contrast of the region which represents the variation of the region grey-level as a function of its evolving boundary during segmentation. This helps to identify the best external boundary of the region. The application of a reverse test using a gradient measure then yields the highest gradient boundary for the region being grown. The unique feature of the approach is that in each step at most one candidate pixel will exhibit the required properties to join the region. The growing process is directional so that the pixels join the grown region according to a ranking list and the discontinuity measurements are tested pixel by pixel. The authors have performed a number of experiments on synthetic and real images. The authors concluded that the results are more reliable and consistent than conventional thresholding methods. The algorithm is also insensitive to a reasonable amount of noise. The main advantage of the algorithm is that no a priori knowledge is needed about the regions.

Lu and Xu[22] proposed a region growing technique for texture segmentation, in which a two-dimensional autoregressive model is used for texture representation. In this technique an artificial neural-network is adopted to implement the parameter estimation process for the autoregressive model and to compute the local texture properties of regions during the segmentation process. The segmentation procedure consists of two stages, namely the initial stage and the refining stage. At the initial stage, the image is partitioned into disjoint blocks or regions in the form of windows of a fixed size. The block is considered as a part of the internal area if it has the same texture class as its four neighbouring blocks in the horizontal and vertical direction. Otherwise, it is considered as undetermined. At the refining stage, all initial blocks are extended in parallel. Each undetermined block is divided into sub-blocks with equal size. Each sub-block is then compared with its neighbouring blocks by using the local information obtained by the neural network. The process terminates when each pixel in the image is assigned to one of the initial regions.? The algorithm was tested on 38 different textures from the Brodatz album. The algorithm provided accurate segmentation with less computational time.

He and Chen[23] propose a resonance algorithm for image segmentation that is not much different from seeded region growing algorithm. It is stressed that the resonance based segmentation algorithm is much more robust to illumination changes than conventional algorithms that work directly on grey scale images. The basic idea of resonance is based on the distribution of energy from a source to other elements in the system. This can be illustrated by considering each pixel in the image as a mass connected to a platform by a spring. The image contains several such mass-spring pairs that can be considered as immersed in water. When an external force acts on one such pair, there is a certain amount of displacement. If the external force matches the natural frequency of the mass, then the mass will resonate and the vibrations will spread to other mass-spring pairs that will oscillate as a result. The effect will only, however, last within a circle of certain radius and become weaker as we move away from the source. It is proposed that a seed based segmentation algorithm based on the above theory should be used only on feature images rather than original grey level images. The experiments are performed on Brodatz texture images of size 512x512 pixels. The grey level average is extracted within windows of size 8x8 and a feature image is generated. Since averaging is illumination dependent, it is expected that the true ability of the algorithm can be experimentally determined using such a feature. The results show better performance of the resonance algorithm compared to fuzzy c-means and histogram analysis segmentation methods.

Singh and Al-Mansoori[24] compared region growing and gradient based techniques for detecting regions of interest in digital mammograms. These regions of interest form the basis of applying shape and texture techniques for detecting cancerous masses. The study also proposes a two stage method where gradient based techniques are applied followed by region growing method which yields less number of regions for analysis. First, the quality of image is improved by histogram equalization and fuzzy enhancement techniques. Their utility is then compared using three quantitative measures. After the enhancement step, the images are then subjected to region growing or gradient operations (masking) for segmentation purpose. The segmented image is then analysed for estimating the regions of interest and the results are compared against previously known diagnosis of the radiologist. The authors concluded that the region growing segmentation gives lesser number of regions for the analysis as compared to gradient based techniques without compromising on quality.

Clustering

Image segmentation can be performed effectively by clustering image pixels. Cluster analysis allows the partitioning of data into meaningful subgroups and it can be applied for image segmentation or classification purposes. Clustering analysis either requires the user to provide the seeds for the regions to be segmented or uses non-parametric methods for finding the salient regions without the need for seed points. Clustering is commonly used in a range of applications such as image segmentation and unsupervised learning [25]. A number of issues related to clustering are worth studying including how many clusters are the best and how to determine the validity of clusters. In a number of segmentation techniques, such as fuzzy c-means clustering, the number of clusters present in the image have to be specified in advance. Several techniques that do not require such initialisation have been proposed in literature (see [26] for a discussion on the limitations of traditional clustering techniques and description of non-parameteric clustering that does not need prior initialisation).

Kurita[27] developed an efficient agglomerative clustering algorithm for region growing. This algorithm is a typical example of hierarchical clustering. The algorithm starts with an initial partition of a given image into N segments and sequentially reduces the number of segments by merging the best pairs of segments among all possible pairs in terms of a given criterion. The merging process is repeated until the required number of segments is obtained. The algorithm uses sorted linked lists to maintain neighbouring relations of segments and also a heap structure to store dissimilarities of all possible pairs of segments. A set of experiments was performed on a monochrome image and some range images of size 128x128 pixels. The number of regions turned out to be around 100 and the computational time was about 6 seconds. In case of range images each planar region is correctly classified into one segment but the curved regions were classified into several patches. The authors concluded that the algorithm makes the stepwise optimisation approach for region growing.

Frigui and Krishnapuram[28] have addressed three major issues associated with conventional partitional clustering. The issues addressed are sensitivity to initialisation, difficulty in determining the number of clusters and sensitivity to noise and outliers. Their proposed method Robust Competitive Agglomeration (RCA) starts with a large number of clusters to reduce the sensitivity to initialisation and determines the actual number of clusters by a process of competitive agglomeration. This method combines the advantages of hierarchical and partitional clustering techniques. To overcome the sensitivity to outliers, concepts from robust statistics are incorporated. Overlapping clusters are handled with the use of fuzzy membership. To handle doubtful regions, and to reduce the sensitivity to initialisation, the algorithm uses finite rejection. The agglomerative property makes it relatively insensitive to initialisation and local minima effects.? The algorithm was tested on noisy data sets with synthetic ellipsoidal clusters and linear clusters, and a number of real range images. The results proved that RCA can provide robust estimates of the prototype parameters even when the clusters vary significantly in size and shape, and the data is noise contaminated.

Pauwels et al.[26] investigated non-parametric clustering for image segmentation. They propose a robust and versatile method that is able to handle unbalanced and highly irregular clusters using intermediate level processing. First, a density image is constructed by convolving the image data with a Gaussian density kernel. The nearest neighbours for each pixel are determined in a neighbourhood. The neighbourhood considered is one percent of the total number of pixels in the image. After convolution, candidate clusters are identified using gradient ascent and each point is linked to the point of highest density among its neighbours (including possibly itself). These process results in an overestimation of the number of clusters, carving up the data set into a collection of relatively small clumps centred around local maxima. A hierarchical family of derived clusters is constructed by using the data density to systematically merge neighbouring clumps. The order of merging is established by comparing the density values at neighbouring maxima with respect to density at the saddle-point (which is defined as the point of maximal density among the boundary points). The measures of cluster validity for a given cluster are then quantified by two mathematically defined indices: isolation and connectivity. Using this clustering algorithm, it becomes possible to extract image regions that are salient and semantically meaningful. The authors concluded that the algorithms produced a smooth version of the inverse equalisation algorithm without destroying any information in the image. The method can be used on greyscale images such as natural scenes, faces, handwritings etc. The method is analysed on a number of synthetic and real images. The regions segmented were found homogeneous and the objects were easily identified from the image background. One of the advantages of the algorithm is that it automatically determines the number of objects from the image (or the number of optimal clusters) and therefore no a priori knowledge about the number of objects is needed .

Ohm and Ma[29] proposed a feature based cluster segmentation method for image sequences. The algorithm analyses specific features from the image sequence and checks their reliability and evidence locally for images in order to build segments that are probably part of one object.? The segmentation procedure is basically a clustering procedure which takes into account different features such as colour and motion. The approach is similar to vector quantisation. Different weights are applied to different features according to their reliability. The pixel- feature vector is then compared to a set of cluster-feature vectors and hence classified to a feature class to generate clusters. The set of cluster feature vectors is updated for each image in the sequence which is also used for the segmentation of the next image. The labels associated with different clusters remain identical for different images of a same scene. After a scene change, the whole process is started with a completely new set of feature vectors which are computed from the first frame after the change and hence the whole sequence is segmented. The algorithm was performed on a sequence of flower garden images. In all of the frames the segmentation and tracking was done automatically. The authors concluded that the algorithm is a hybrid combination of a block-recursive and a pixel-recursive technique and produces a dense vector field. The algorithm also has the capability to set flexible weights for different feature components automatically.

Ng[30] describes an extension to the conventional k-means algorithms by modifying the splitting rule in order to control the number of cluster members. By adding suitable constraints into the mathematical program formulation, the author developed an approach that allows the use of k-means paradigm to efficiently cluster data sets with a fixed number of elements in each cluster. The main objective of this algorithm is the minimisation of an objective function usually taken as a function of deviations calculated for all patterns from their respective cluster centres. This algorithm minimises the objective function through a scheme that starts with an arbitrary initial cluster membership in an iterative manner to obtain better clustering results. Distributed pruning technique is used to reduce the number of clusters in the parallel manner. The results show that as the amount of pruning increases, as the data between partitions becomes more skewed i.e., the similarity between a sample and its membership in two or more clusters increases. The algorithm was used on a Print Circuit Board (PCB) insertion problem.

The comparison of various clustering techniques and studying their behaviour is important. It is important that an optimal clustering technique is able to choose the correct number of clusters. Dubes and Jain[31] detail comparisons on three classes of clustering techniques. A total of eight programs are compared on minimised squared error based clustering, hierarchical clustering and graph theoretic clustering. Important issues related to the use of clustering programs such as user options, computational cost, input and outputs, and comparing programs that yield different outputs have been considered. They propose that different clustering algorithms can be compared on the basis of four factors: manner in which the next clustering is chosen or updated; criterion for creating new clusters; criterion for deleting and merging clusters; and the initialisation procedure. Approaches based on ranking clustering programs on the basis of their performance on a standard data set with performance criterion such as minimising squared error are criticised. One way of categorising such program is to cluster their performances. A set of nine criterion are defined for comparing clustering programs based on the manner in which clusters are formed, structure of data, and the sensitivity of the clustering technique to changes in data that do not essentially alter its nature. The different clustering programs are compared on the basis of these criteria using handwritten character data and a similarity matrix for these methods is shown.

The validity of clusters is important to study. Yarman-Vural and Ataman[32] critique several areas of clustering methodology including the definition of clusters, determination of the number of clusters, heurtistic partitional clustering algorithms and the effect of noise on determining accurate clusters. Cluster validity criteria including maximum likelihood information criteria and sum of squared errors is discussed. The first criteria is found to be better when the number of clusters changes. This paper also proposes a method of improving conventional clustering algorithms so that they display better performances with noise contaminated data. The ISODATA algorithm is modified to show its improved performance on up to 20% noise contamination in an artificial Gaussian mixture data set. Similar good performances are achieved on image analysis applications. Dubes[33] investigates the utility of Davies and Bouldin index for cluster validity and compares it with the proposed improved Hubert statistic. Results on a Monte Carlo study are reported by varying parameters such as dimensionality, number of patterns, sampling window, and number and spread of clusters. The new index is found to perform much superior as it shows more consistent performance when the above parameters are varied.

Zahid et al.[34] proposed a new heuristic method for fuzzy cluster-validity. Its principle is based on the evaluation of fuzzy separation and fuzzy compactness of clusters. The main attempt of the study is to analyse the natural structure present in the data set. This evaluation, which measures the quality of fuzzy clustering, assigns a real number to the outputs of the used fuzzy clustering algorithm. Two kinds of outputs are considered. For each output, a combination of fuzzy separation and fuzzy compactness criterion is evaluated. Maximising the resulting criteria results in good clustering. This maximum value allows in identifying the right number of clusters. The first function calculates a fuzzy separation and fuzzy compactness ratio by considering geometrical properties and membership degrees of the data. The second function evaluates this ratio by using only the properties of membership degree. The aim of classification phase is to generate a well-defined fuzzy c-partition that is as close as possible to the natural structure of the data. Four numerical examples were used by the authors to illustrate the use of the proposed algorithm as a validity function. The obtained results indicate that the proposed method provides better answers than other cluster-validity functions especially with overlapping fuzzy clusters.

Probablistic and Bayesian approaches

Haddon and Boyce[35] use co-occurrence based approach to image segmentation making use of region and boundary information in parallel for improved performance on a sequence of images. The authors examined image segmentation by unifying region and boundary information using co-occurrence matrices. The co-occurrence matrices were used to generate the feature space. The analysis was performed in the context of an ensemble of images. Based on the location of the intensities of each pixel and its neighbours in the co-occurrence matrix, initial segmentation is done. Each pixel is then associated with a tuple which specifies whether it belongs to a given region or if it is a boundary pixel. This tentative segmentation was then refined by relaxation labelling that ensures local consistency of pixel labelling during segmentation by minimising the entropy of local neighbourhoods (see [36,37] for details on relaxation labelling for 2D and 3D images). If a pixel does not belong to the boundary, then it is assigned to one of the regions. This classification is entirely uni-dimensional in the co-occurrence direction and contains no explicit local consistency. The consistency for regions and boundary was obtained assuming that boundaries are not wider than one pixel. The algorithm was applied to synthetic images and infrared images of natural scenes. The authors find that segmentation worked reasonably well for most of the areas in the images while the majority of real edges and boundaries were correctly located. The algorithm was applied on a series of infrared greyscale images taken by a low flying aircraft. The authors conclude that the technique is robust and performs well even in poor imaging conditions. The algorithm is less effective if the clusters in the co-occurrence space have substantial overlap due to the imposition of local consistency. Since the techniques use global information in a local context, it was possible to adapt it to varying image characteristics i.e. variation in colour and texture. The difference between the co-occurrence matrices generated by a sequence of images is negligible if the image content and conditions do not change significantly.?

A further extension of the above methodology is provided by Haddon and colleagues in their later study, Haddon and Boyce[38]. They suggest two intrinsically similar techniques for image segmentation and edge detection based on co-occurrence matrices. The first technique defines transforms that enhance the difference between typical and atypical image features. The second technique uses the location of region distributions in an edge co-occurrence matrix and defines the location of corresponding boundary distributions.? The techniques are analysed by labelling the matrices. The labelled matrix can be used to segment the regions of an image and to simultaneously detect prominent regions. The image segmentation and edge detection is done at the same time. The operator used to generate the edge co-occurrence matrix effectively generates a lookup position within the labelled matrix. The operator is convolved with the neighbourhood of a pixel and this pixel is then replaced by the label of the co-occurrence matrix at the lookup position. This process is repeated using the same operator for a variety of operator orientations. Correlation techniques and conjugate iteration schemes are used to determine the location and the extent of the distributions in the co-occurrence matrix. The algorithm is applied to FLIR imagery and multispectral data. The majority of components in the algorithm are performed in parallel. The results are significantly improved as compared to the previous algorithm where segmentation and edge detection were performed simultaneously.

Haddon and Boyce[39] used co-occurrence method to segment a sequence of FLIR images into its major regional components. These regions are subjected to texture classification. The co-occurrence matrices are temporally smoothed to ensure consistency of segmentation in the sequence. The edge co-occurrence matrices are decomposed using discrete Hermite functions because their underlying structure is Guassian due to the Guassian noise present in the images. If this structure is removed, what is left is the texture structure of the image. Neural networks (MLP) are then used to classify the images using the coefficients of the Hermite functions. Co-occurrence matrix is a multidimensional histogram, in which each element (i k), is the frequency with which two events i and j co-occur with specific relationship to each other. Edge co-occurrence matrices emphasize the intensity difference between two pixels. The authors of this paper used the canny edge operator to form the matrices. A sequence of 300 FLIR images from a low flying aircraft approaching a bridge are obtained and classified. These images are first segmented using co-occurrence matrices into their major regions and using temporally smoothed averaged matrices. Then a co-occurrence matrix is formed for every major region. The matrices are decomposed using orthogonal Hermite functions. The higher order functions describe the texture of the region and form the input vector of 121 features. Every third data sample was taken to form the validation set. Training and validation sets are disjoint. MLP networks were trained until validation error started to increase (over-train). Additionally, principal components analysis was used to extract the most descriptive of those 121 features. Very good results have been obtained with both a single layer network (architecture 25x10x5) with 96.9% training and 98.3% validation accuracy and a two layer network (architecture 25x5x8x5) with 94.8% training and 88.7% validation recognition accuracy. The major regions classified were grass, trees, sky, river reflecting trees and river reflecting sky. The major problems are the reflecting classes. PCA results were not as good as with the use of the full 121 features. The techniques described in this paper seem to be robust with noise. Overall, 93.4% of image area was correctly classified.

Haddon et al.[40] developed an algorithm for automatic segmentation and classification of images using a co-occurrence based approach using Hermite functions. The basic approach consists of four steps: segmentation, texture analysis, initial classification and labelling and spatio-temporal classification. In the segmentation phase, the key regions are separated from each other and from the remainder of the scene using co-occurrence matrices and edge detection simultaneously. The texture classification of segmented regions is performed using discrete Hermite functions. The co-occurrence matrices are decomposed for the calculation of features of segmented regions. The result of texture analysis is a low order feature vector that describes the texture of a segmented region. The features that contain information on discriminating between classes are selected and used as inputs to a multi-layer neural network classifier.? Local consistency of interpretation is achieved in regions both spatially within an image and temporally within a sequence of images. A relaxation labelling approach is used to associate pixels belonging to particular coefficients. The compatibility coefficients take into account the current image and its predecessor in a sequence of images and results of relaxation labelling are applied to the previous image for classification. The authors evaluated the results on a sequence of FLIR (Forward Looking InfraRed) images taken from a low flying aircraft and also performed snow profile analysis. The authors found that the algorithm is generally applicable to a wide range of imaging applications.

Haddon and Boyce[41] presented a method of ensuring consistency of interpretation in terms of the classification of segmented images, both spatially within an image and temporally through a sequence of FLIR images taken from a low-flying aircraft. It is important to maintain consistency with time in video analysis. Inconsistencies in segmentation between frames of a sequence are unacceptable in autonomous vehicle navigation applications. The image is first segmented using a co-occurrence based technique. After the image is segmented, the regions are classified based on texture. Co-occurrence matrices are also used for describing the texture of each region. The co-occurrence matrices are then decomposed using discrete Hermite functions. Features selected by using the linear discriminant analysis and principal components analysis were fed into an MLP neural network for classification. In enforcing spatio-temporal consistency, the output values of the neural network are taken into consideration. Two or more roughly equal outputs indicate that both outputs are considered equally likely. Consistency is enforced only when there is insufficient evidence of a region class. Consistency is enforced by using relaxation labelling, however, instead of using it in its classical way (looking at the image pixel by pixel), the authors treat a region as a single entity with both spatial and temporal neighbours. Adjacent regions, both spatially and temporally, are considered as neighbours. These neighbours influence the probability that a region belongs to a particular class if the result of the neural network is insufficient. These techniques were applied to a 12 second 300 frame sequence of infrared images. Five classes, trees, grass, sky, river reflecting trees and river reflecting sky are considered. The authors find very good results after the initial classification (first frame) regions. Regions that have equal membership in all classes were left as white. After consistency enforcement, all regions are classified, and only a few misclassification errors are found.

Comer and Delp[42] proposed a new algorithm for the segmentation of textured images using a multiresolution Bayesian approach. This algorithm uses a Multiresolution Gaussian Auto-Regressive (MGAR) model for the pyramid representation of the observed image. This algorithm effectively uses larger neighbourhoods to perform the segmentation than the single resolution algorithms. The observed data, which is a multiresolution representation of the observed image, is obtained using Gaussian pyramid decomposition. The nodes in a binary tree then index the data and an MGAR model is used. Value of a random variable is then predicted as a linear combination of the values of the random variables at previous nodes. A Gaussian pyramid representation of the image is generated. A multiscale Markov Random Field model is then used for the class label pyramid. The optimisation criterion is then used for the image segmentation. This criterion minimises the expected value of the number of misclassified nodes in the multiresolution lattice. Expected Maximisation (EM) algorithm is finally used for parameter estimation. This method also assigns a cost to an incorrect segmentation based on the number of incorrectly classified pixels in the segmentation. The algorithm was applied to two different images where three pyramid levels were considered, i.e. the original image was examined at three different resolutions. The algorithm performed well on both the images segmenting them into regions of homogeneous textures. The algorithm was found successful in segmenting the image better than the EM and Multiresolution Posterior Marginal techniques.

Neural networks segmentaton

Campbell et al.[44] proposed a automatic segmentation and classification method for outdoor images using neural networks. First, the images are segmented using Self-Organising Feature Maps (SOFM) based on texture and colour information of the objects. SOFMs used consisted of 64x64 nodes for best segmentation. A set of 28 features is then extracted from each region. The features include: average colour, position, size, rotation, texture (Gabor filters) and shape (using principal components). Classification is then performed using a Multi Layer Perceptron with 28 input nodes and 11 output nodes. The training is performed on 7000 regions and testing is done on a independent set of 3000 samples. Over 80% regions were classified correctly using Learning Vector Quantisation and 91.9% regions were classified correctly using the Multi Layer Perceptron.

Papamarkos et al.[44] have developed the procedure of image segmentation using self organising maps. The use of these neural network paradigms is considered equivalent to multithresholding where the output of the network defines a number of homogeneous clusters. One of the interesting note from this paper is the technique used for finding the optimal number of thresholds or in other words the number of segmented regions. Considering that grey level distributions within the region are Gaussian, it is suggested that the linear combination of probability density functions for individual regions should match the overall density of the global histogram of the original image. For different number of segmented regions, we can find which result minimises the error.

Other approaches

Medioni and Yasumoto[45] proposed using? fractal dimension method for image segmentation. The authors computed the fractal dimension of textures in frequency domain by approximating the log power spectrum by a straight line. The difference of the logarithm of the expected values of the two dipole-statistics yields a single texture feature. This procedure approximates the fractal dimension, which relates to the expected statistics of the grey-level difference versus the statistics of distance vector. The main drawback of this method is that segmentation based on single feature extracted from images is not always possible.

Pentland[46] addressed the problem of fractal based description of natural scenes. The authors addressed the following problems: (i) Representing natural shapes such as mountains, trees, and clouds, and (ii) Computing such description from image data. It has been observed that many natural objects such as leaves, snowflakes, etc. have a fractal nature. This makes it impossible to measure them accurately in an image. For example, measuring the length of a coastline, no matter what the size of the measuring tool is selected, all the curves smaller than the size of the measuring tool will be missed. Standard notions of length and area do not produce consistent measurement for many natural shapes: the basic metric properties of these shapes vary as a function of the fractal dimension. Fractal dimension, therefore, is a necessary part of any consistent description of such shapes. This means that fractals have to be used in the measurement of such shapes. The characterisation of image texture by means of a fractal surface model makes it impossible to describe image in a manner that is stable over transformation of scale and linear transforms of intensity. The authors demonstrate how natural images can be segmented using this method. It is suggested that in this manner the segmentation is more stable in scaling the images down rather than using thresholding techniques on image intensity. Additionally, comparison with correlation techniques and co-occurrence techniques has been made using a mosaic of 8 natural textures constructed by Laws[47]. On these textures, the authors demonstrate that the fractal methodology yields better results than known segmentation algorithms.

Xu et al.[48] present a segmentation algorithm that is based on partitioning the image into arbitrarily shaped connected regions to minimise the sum of grey level variations over all partitioned regions under the constraints that each partitioned region has a specified number of pixels and that two adjacent regions have significant differences in average of grey levels. A minimum spanning tree has been used to construct these partitions and as such the segmentation problem reduces to tree partitioning problem. The effect of various noise contamination on the segmentation results is evaluated on the basis of how the noise affects the tree representation and tree partitioning.

In some studies, topological maps have been used as a structural method for image segmentation. Brequelaire and Brun[49] proposed a novel technique for image segmentation using topological maps and inter-pixel representation. The authors describe a data structure that allows the system to store and process regions of any size or form. It is called a model of discrete maps which helps in efficient computation of parameters required by the segmentation algorithm. There are two aspects in a segmented image representation: the geometrical aspect which describes the shape of regions, and the topological aspect which describes neighbourhoods and inclusions of regions. A topological map is defined as a partition of an oriented surface in a finite set of points.? A discrete map is based on both geometric and topological levels of representation co-operating together.? The discrete map data structure is then used to design segmentation algorithms. In the study the authors have described a recursive split and merge algorithm for segmentation using discrete data structure. In the first step, segmentation is done and in the second step refinement is done recursively to improve the quality of segmentation. The results are provided for a test image of Lena and images for various iterations are shown.

Ojala and Pietik䩮en[50] presented an unsupervised texture segmentation method using feature distributions. The proposed algorithm uses distributions of local binary patterns and pattern contrasts for measuring the similarity of adjacent image regions during the segmentation process. Texture information is measured with a method based on local binary patterns and contrast (LBP/C). The segmentation method consists of three phases: hierarchical splitting, agglomerative merging and pixel-wise classification. First, hierarchical splitting is used to divide the image into regions of roughly uniform texture. Then, agglomerative merging is performed to merge similar adjacent regions until a stopping criterion is met. A pixel-wise classification is then performed to improve the localisation. The authors use a non-parametric likelihood test as a pseudo-metric for comparing feature distributions. The authors tested the method on four texture mosaics and two greyscale images of natural scenes. The method performed very well in these experiments. The authors concluded that the method is not sensitive to the selection of parameter values and does not require any a priori knowledge about the number of textures or regions in the image. The method can also be easily generalised to utilise other texture features, multiscale information or colour features.

Yoshimura and Oe[51] proposed a segmentation algorithm for texture images using genetic algorithms that automatically determines the optimum number of segmentation areas. The authors divide the original image into many small rectangular regions and extract texture features from the data using two dimensional autoregressive model, and other features such as fractal dimension, mean, and variance. The authors use three types of evolutionary segmentation methods. The first method uses Genetic algorithms (GA), the second method uses GA's and Self Organising Maps (SOM), the third method uses GA's and SOM considering the optimum number of segmentation areas in an image. Various experiments were performed on a set of simulated images with 3 or 4 different real textures. The authors found that the third method performed the best and the algorithms were visually accurate as compared to other conventional techniques. The first method produced good results but it is necessary to specify the number of regions beforehand. The second method selected the number of regions automatically, and the third method finds optimum number of regions with homogeneous texture. The authors concluded that the methods are effective for the segmentation of images that contain similar texture fields.

Perner[52] presents a methodology for a case based reasoning image segmentation system. The image segmentation is performed by looking up a case base for similar cases and using the segmentation parameters associated with the matched case. Hence, images are first matched for similarity on the basis of features such as image moments. Similarity is determined on the basis of measure proposed by Tversky[53] for non image information and a weighted similarity measure for image data. If ideal segmentation parameters are known for reference images, by matching new images to these reference images, the same segmentation parameters can be used. The system is shown to perform well on brain CT segmentation.

Segmentation evaluation

Kitchen and Rosenfeld[54] discuss the various issues related to under- and over-segmentation of images. The problems are blamed on the fact that context information is not used in segmentation algorithms. For example, a difference in brightness that is not significant in some contexts (such as in fluctuation in a textured background) may well be significant in other backgrounds (such as part of a low contrast border of an object with its surrounding). Under-segmentation is considered as a much more serious problem as it is easier to recover true segments through a merging process after over-segmentation rather than trying to split a heterogeneous region. In some situations, segmentation evaluation is based on how well the segmented image corresponds to the ground truth segmentation in terms of pixel by pixel differences. In most scene analysis applications, these differences need to be minimised and hence the segmentation criteria are rather harsh in terms of penalising algorithms severely if they do not segment accurately. However, in some applications of segmentation, such as in medical imaging, it may be enough that the segmented region overlaps with the true region (e.g. if the segmented region of interest overlaps with the true region, e.g. a tumour, then the segmentation can be classed as successful). Hence segmentation can be considered acceptable even if it only partially detects the true borders or region. For measures on how to evaluate segmentation in such cases, refer to [55].

The evaluation of image segmentation programs is an important field of study. The evaluation can be categorised as supervised or unsupervised based on whether a priori information is available or not. A number of different approaches have been adopted. Some qualitative guidelines have been developed in Haralick and Shapiro[56]. Quantitative techniques of evaluation have been used by several authors. Weszka and Rosenfeld[57] used a busyness measure and classification error as performance criteria. Lavine and Nazif[58] defined a set of parameters for unsupervised segmentation including region uniformity, region contrast, line contrast and line connectivity. Sahoo et al.[59] used the uniformity criterion of Lavine and Nazif and a shape measure computed from the gradient values and the selected threshold value. In the case of supervised segmentation, errors are measured using a reference. The differences between the reference and segmentation output are measured to determine how well the segmentation algorithm performs. The simplest measure for supervised segmentation is probability of error as shown in Yang et al.[60]. This criteria can be used for finding optimal threshold values. This error can be decomposed into an under-merging or over-merging error[58]. This error measure may not however give full information on the quality of segmentation. Yasnoff et al.[61] have in addition proposed the pixel distance error for measuring segmentation performance in which the positions of misclassified pixels are taken into account. Two error measures, the percentage area misclassified and pixel distance error are used.

Another approach to measuring the quality of segmentation is based on the differences in the values of features computed from an ideally segmented image and an actual segmented image. These can be any object features, e.g. shape features such as eccentricity, or their texture, etc. This approach has been detailed by Zhang[62]and Zhang and Gerbrands[63]. Yang et al.[60] use shape features including area, circularity, orientation and elongation of segmented objects for the comparison of three segmentation methods. The under-merging, over-merging and ultimate measure of accuracy criteria are evaluated for comparing the algorithms considered. Wang et al.[64] compares several approaches for texture image segmentation. The techniques used are k-means, fuzzy k-means, fuzzy Adaptive Resonance Theory (ART2) and fuzzy Kohonen Self Organising Feature Mapping (SOFM). In their tests five features including energy, entropy, correlation, inertia, and homogeneity are used. After segmenting the images, features are extracted using Spatial Grey Level Dependence Method (SGDLM). The method is developed by Wang et al. [65]. This method is used because it is 3 times faster than SGLDM method developed by Haralick. The co-occurrence features are calculated in four directions of 0, 45, 90 and 135 degrees with respect to the horizontal axis. These features are selected because of: (i) strong within class variance; (ii) strong between class separation; and (iii) low sensitivity to small sample size. Experiments were performed on 2 images with different textures of size 150x360 pixels and 100x100 pixels. The experiments showed some limitations of k-means algorithm such as its slow processing ability and unstability. Clustering errors were reduced by SOFM. Another advantage of SOFM algorithm is that the size of the neighbourhood was automatically adjusted due to its self-organising nature. The numerical results showed that the fuzzy approaches are better than conventional k-means algorithms and SOFM provides the best approach for texture image segmentation.

Hoover et al. [66,67] proposed a methodology for evaluating range image segmentation techniques. The authors have also created a framework for the comparison of range image segmentation techniques. Experiments were performed on the data acquired through laser range finder or a triangulation technique. Images are segmented using any popular method that is to be evaluated. The authors have developed a tool for a human operator to create ground truth segmentation. The operator assigns three different labels to the regions. The labels are shadow pixels, noise pixels and cross edge pixels. Then machine segmentation is compared to the ground truth. Every region in the ground truth and machine segmented images is mapped as one of the five possibilities: correct detection, over-segmentation, under-segmentation, missed and noise. The first three categories apply to mappings of regions between the ground truth and machine segmented images. A missed classification applies to a region in the ground truth image and noise classification applies to a region in the machine segmented image.

Comparative studies and reviews

Haralick and Shapiro[56] state that: ?As there is no theory of clustering, there is no theory of image segmentation?, (p. 100). They categorise segmentation algorithms as: measurement space guided spatial clustering, single linkage region growing schemes, spatial clustering clustering schemes, single linkage region growing schemes, hybrid linkage region growing schemes, centroid linkage region growing schemes, and split and merge schemes. The paper details the basics of these schemes with examples.

Similarly. Fu and Mui[68] state that: ?Almost all image segmentation techniques proposed so far are ad hoc in nature. There are no general algorithms that will work for all images?, (p. 4). Their survey categorises image segmentation studies as follows: feature thresholding or clustering, edge detection techniques, and region extraction techniques. The authors recommend that one of the fruitful areas of further investigation lies in combining spatial and semantic information with edge detection and thresholding or clustering techniques to perform image segmentation. A number of review studies on image segmentation are available including [56,68, 69,70,71,72].

References

1. T. Pavlidis, Algorithms for graphics and image processing, Springer, Berlin, 1982.

2. R.B. Ohlander, Analysis of natural scenes, PhD Thesis, Carnegie Institute of Technology, Dept. of Computer Science, Carnegie-Mellon University, Pittsburgh, PA, 1975

3. M. Cheriet, J. N. Said and C. Y. Suen, A recursive thresholding technique for image segmentation, IEEE Transactions on Image Processing, vol. 7, no. 6, pp. 918-920, 1998.

4. N. Otsu, A threshold selection method from grey level histograms, IEEE Transactions on Systems, Man and Cybernetics, vol. 8, pp. 62-66, 1978.

5. B. Bhanu and O.D. Faugeras, Segmentation of images having unimodal distributions, IEEE Transactions on Pattern Recognition and Machine Intelligence, vol. 4, no. 4, pp. 408-419, 1982.

6. A. Rosenfeld, Scene labelling by relaxation operations, IEEE Transactions on Systems, Man and Cybernetics, vol. 6, no. 6, pp. 420-433, 1976.

7. B. Bhanu and B.A. Parvin, Segmentation of natural scenes, Pattern Recognition, vol. 20, no. 5, pp. 487-496, 1987.

8. L. Li, J. Gong and W. Chen, Gray-level image thresholding based on Fisher linear projection of two-dimensional histogram, Pattern Recognition, vol. 30, no. 5, pp. 743-749, 1997.

9. N. Ahuja, A. Rosenfeld and R.M. Haralick, Neighbour gray levels as features in pixel classification, Pattern Recognition, vol. 12, pp. 251-260, 1980.

10. J.M. Prager, Extracting and labeling boundary segments in natural scenes, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 2, no. 1, pp. 16-27, 1980.

11. W.A. Perkins, Area segmentation of images using edge points, IEEE Transactions on Pattern Recognition and Machine Intelligence, vol. 2, no. 1, pp. 8-15, 1980.

12. F.H.Y. Chan, F. K. Lam and H. Zhu, Adaptive thresholding by variational method, IEEE Transactions on Image Processing, vol. 2, no. 3, pp. 168-174, 1998.

13. K. Cho and P. Meer, Image segmentation from consensus information, Computer Vision and Image Understanding, vol. 68, no. 1, pp. 72-89, 1997.

14. M. Yeung, B.L. Yeo and B. Liu, Segmentation of video by clustering and graph analysis, Computer Vision and Image Processing, vol. 71, no. 1, pp. 94-109, 1998.

15. Y.L. Chang and X. Li, Adaptive image region growing, IEEE Transactions on Image Processing, vol. 3, no. 6, pp. 868-873, 1994.

16. R. Adams and L. Bischof, Seeded region growing, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 6, 1994.

17. Mehnert and P. Jackway, An improved seeded region growing algorithm, Pattern Recognition Letters, vol. 18, pp. 1065-1071, 1997.

18. S. Basu, Image segmentation by semantic method, Pattern Recognition, pp. 497-511, 1987.

19. J.M. Beaulieu and M. Goldberg, Hierarchy in picture segmentation: a stepwise optimisation approach, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, no. 2, pp. 150-163, 1989.

20. J.P. Gambotto, A new approach to combining region growing and edge detection, Pattern Recognition Letters, vol. 14, pp. 869-875, 1993.

21. S.A. Hojjatoleslami and J. Kittler, Region growing: a new approach, CVSSP Technical Report TR-6/95, University of Surrey, Department of Electronic and Electrical Engineering, 1995.

22. S.W. Lu and H. Xu, Textured image segmentation using autoregressive model and artificial neural network, Pattern Recognition, vol. 28, no. 12, pp. 1807-1817, 1995.

23. H. He and Y.Q. Chen, Unsupervised texture segmentation using resonance algorithm for natural scenes, Pattern Recognition Letters, vol. 21, pp. 741-757, 2000.

24. S. Singh, R. Al-Mansoori, Identification of regions of interest in digital mammograms, Journal of Intelligent Systems, vol. 10, no. 2, pp. 183-217, 2000.

25. A.K. Jain and R.C. Dubes, Algorithms for clustering data, Prentice Hall, Englewood Cliffs, N.J., 1988.

26. J. Pauwels and G. Frederix, Finding salient regions in images, Computer Vision and Image Understanding, vol. 75, pp. 73-85, 1999.

27. T. Kurita, An efficient agglomerative clustering algorithm using a heap, Pattern Recognition, vol. 24, no. 3, pp. 205-209, 1991.

28. H. Frigui and R. Krishnapuram, A robust competitive clustering algorithm with applications in computer vision, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 5, pp. 450-465, 1999.

29. J.R. Ohm and P. Ma, Feature-based cluster segmentation of image sequences, Proc. IEEE International Conference on Image Processing, pp. 178-181, 1997.

30. M.K. Ng, A note on constrained k-means algorithm, Pattern Recognition, vol. 33, pp. 515-519, 2000.

31. R.C. Dubes and A.K. Jain, Clustering techniques: the user's dilemma, Pattern Recognition, vol. 8, pp. 247-260, 1976.

32. F. Yarman-Vural and E. Ataman, Noise, histogram and cluster validity for Gaussian-mixtured data, Pattern Recognition, vol. 20, no. 4, pp. 385-401, 1987.

33. R.C. Dubes, How many clusters are best?-an experiment, Pattern Recognition, vol. 20, no. 6, pp. 645-663, 1987.

34. N. Zahid, M. Limouri and A. Essaid, A new cluster validity for fuzzy clustering, Pattern Recognition, vol. 32, pp. 1089-1097, 1999.

35. J.F. Haddon and J.F. Boyce, Image segmentation by unifying region and boundary information, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 10, pp. 929-948, 1990.

36. M.W. Hansen and W.E. Higgins, Relaxation methods for supervised image segmentation, IEEE Transactions on Pattern Recognition and Machine Intelligence, vol. 19, no. 9, pp. 949-961, 1997.

37. A.R.R. Hummel and S. Zucker, Scene labelling by relaxation operations, IEEE Transactions on Systems, Man and Cybernetics, vol. 6, pp. 420-433, 1976.

38. J.F. Haddon and J.F. Boyce, Co-occurrence matrices for image analysis, Electronics and Communication Engineering Journal, pp. 71-83, 1993.

39. J.F. Haddon and J.F. Boyce, Texture classification of segmented regions of FLIR images using neural networks, Proc. 1st International Conference on Image Processing, 1994.

40. J.F. Haddon, M. Schneebeli and O. Buser, Automatic segmentation and classification using a co-occurrence based approach, Proceedings of Digital Image Technologies II, Techniques and Civil Engineering Applications, 1997.

41. ? J.F. Haddon and J.F. Boyce, Integrating spatio-temporal information in image sequence analysis for the enforcement of consistency of interpretation, Digital Signal Processing, special issue on image analysis and information fusion, 1998.

42. M.L. Comer and E.J. Delp, Segmentation of textured images using a multi-resolution Gaussian autoregressive model, IEEE Transactions on Image Processing, vol. 8, no. 3, pp. 408-420, 1999.

43. N.W. Campbell, B.T. Thomas and T. Troscianko, Automatic segmentation and classification of outdoor images using neural networks, International Journal of Neural Systems, vol. 8, no. 1, pp. 137-144, 1997a.

44. N. Papamarkos, C. Strouthopoulos and I. Andreadis, Multithresholding of colour and gray-level images through a neural network technique, Image and Vision Computing, vol. 18, pp. 213-222, 2000.

45. G.G. Medioni and Y. Yasumoto, A note on using fractal dimension for segmentation, Proc. IEEE Computer Vision Workshop, pp. 25-30, 1984.

46. A. Pentland, Fractal-based description of natural scenes, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, no. 6, pp. 661-674, 1984.

47. K.I. Laws, Textured image segmentation, USCIPI Rep. 940, Image Processing Institute, University of Southern California, Los Angeles, 1980a.

48. Y. Xu, V. Olman and E.C. Uberbacher, A segmentation algorithm for noisy images: design and evaluation, Pattern Recognition Letters, vol. 19, pp. 1213-1224, 1998.

49. J.P. Braquelaire and L. Brun, Image segmentation with topological maps and inter-pixel representation, Journal of Visual Communication and Image Representation, vol. 9, no. 1, pp. 62-79, 1998.

50. T. Ojala and M. Pietik䩮en, Unsupervised texture segmentation using feature distributions, Pattern Recognition, vol. 32, pp. 477-486, 1999.

51. M. Yoshimura and S. Oe, Evolutionary segmentation of texture image using genetic algorithms towards automatic decision of optimum number of segmentation areas, Pattern Recognition, vol. 32, pp. 2041-2054, 1999.

52. P. Perner, An architecture for a CBR image segmentation system, Engineering Applications of Artificial Intelligence, vol. 12, pp. 749-759, 1999.

53. A. Tversky, Feature of similarity, Psychological Review, vol. 84, no. 4, pp. 327-350, 1977.

54. L. Kitchen and A. Rosenfeld, Scene analysis using region-based constraint filtering, Pattern Recognition, vol. 17, no. 2, pp. 189-203, 1984.

55. M. Kallergi, G.M. Carney and J. Gaviria, Evaluating the performance of detection algorithms in digital mammography, Medical Physics, vol. 26, no.2, pp. 267-275, 1999.

56. R.M. Haralick and L.G. Shapiro, Survey- image segmentation techniques, Computer Vision Graphics and Image Processing, vol. 29, pp. 100-132, 1985.

57. J.S. Weszka and A. Rosenfeld, Threshold evaluation techniques, IEEE Transactions on Systems, Man and Cybernetics, vol. 8, pp. 622-629, 1978.

58. M.D. Lavine and A.M. Nazif, An experimental rule based system for testing low level segmentation techniques, Multicomputers and Image Processing Algorithms and Programs, Academic Press, pp. 149-160, 1982.

59. P.K. Sahoo, S. Soltani, A.K.C. Wong and Y.C. Chen, A survey of thresholding techniques, Computer Vision, Graphics and Image Processing, vol. 41, pp. 233-260, 1988.

60. L. Yang, F. Albregtsen, Tor L?tead and P. Gr?, A supervised approach to the evaluation of image segmentation methods, in Computer Analysis of Images and Patterns, V. Hlavac and R. Sara (eds.), pp. 759-765, Springer, Berlin, 1995.

61. W.A. Yasnoff, J.K. Mui and J.W. Bacus, Error measures for scene segmentation, Pattern Recognition, vol. 9, pp. 217-231, 1977.

62. Y.J. Zhang, Evaluation and comparison of different segmentation algorithm, Pattern Recognition Letters, vol. 18, pp. 963-974, 1997.

63. Y.J. Zhang and J.J. Gerbrands, Segmentation evaluation using ultimate measurement accuracy, Proc. SPIE vol. 1657, Image Processing Algorithms and Techniques III, pp. 449-460, 1992.

64. Z. Wang, A. Guerriero and M.D. Sario, Comparison of several approaches for the segmentation of texture images, Pattern Recognition Letters, vol. 17, pp. 509-521, 1996.

65. Z. Wang, G. Sylos, M. Labini, M.D. Sario and R. Magnuolo, Unsupervised texture image segmentation by imporved neural network ART2, Proc. CRIFFSS?94 AIAA/NASA Conference on Intelligent Robots for Factory Field, Service, and Space, Lyndon, Houston, TX, 1994.

66. A. Hoover, G. Jean-Baptiste, X. Jiang, P.J. Flynn, H. Bunke, D.B. Goldof, K. Bowyer, D.W. Eggert, A. Fitzgibbon and R.B. Fisher, An experimental comparison of range image segmentation algorithms, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 7, pp. 673-689, 1996.

67. A. Hoover, G. Jean-Baptiste, D.B. Goldof and K. Bowyer, A methodology for evaluating range image segmentation techniques, Proc. 2nd IEEE Workshop on Applications for Computer Vision, 1994.

68. K.S. Fu? and J.K. Mui, A survey on image segmentation, Pattern Recognition, vol. 13, pp. 3-16, 1981.

69. R.M. Haralick, Image segmentation survey, in Fundamentals in Computer Vision, O. D. Faugeras (ed.), pp. 209-224, Cambridge University Press, Cambridge, 1983.

70. N.R. Pal and S.K. Pal, A review on image segmentation techniques, Pattern Recognition, vol. 26, pp. 1277-1294, 1993.

71. P.K. Sahoo, S. Soltani, A.K.C. Wong and Y.C. Chen, A survey of thresholding techniques, Computer Vision, Graphics and Image Processing, vol. 41, pp. 233-260, 1988.

72. S. Singh and K.J. Bovis, Medical image segmentation in digital mammography, in Advanced Algorithmic Approaches to Medical Image Segmentation, J. Suri, K. Setarehdan and S. Singh (eds.), Springer, London, 2001.