Bayesian clustering with AutoClass explicitly recognises uncertainties in landscape classification

J. Angus Webb, Nicholas R. Bond, Stephen R. Wealands, R. Mac Nally, G.P. Quinn, P.A. Vesk, Michael R. Grace

    Research output: Contribution to journalArticle

    20 Citations (Scopus)

    Abstract

    Clustering of multivariate data is a commonly used technique in ecology, and many approaches to clustering are available. The results from a clustering algorithm are uncertain, but few clustering approaches explicitly acknowledge this uncertainty. One exception is Bayesian mixture modelling, which treats all results probabilistically, and allows comparison of multiple plausible classifications of the same data set. We used this method, implemented in the AutoClass program, to classify catchments (watersheds) in the Murray Darling Basin (MDB), Australia, based on their physiographic characteristics (e.g. slope, rainfall, lithology). The most likely classification found nine classes of catchments. Members of each class were aggregated geographically within the MDB. Rainfall and slope were the two most important variables that defined classes. The second-most likely classification was very similar to the first, but had one fewer class. Increasing the nominal uncertainty of continuous data resulted in a most likely classification with five classes, which were again aggregated geographically. Membership probabilities suggested that a small number of cases could be members of either of two classes. Such cases were located on the edges of groups of catchments that belonged to one class, with a group belonging to the second-most likely class adjacent. A comparison of the Bayesian approach to a distance-based deterministic method showed that the Bayesian mixture model produced solutions that were more spatially cohesive and intuitively appealing. The probabilistic presentation of results from the Bayesian classification allows richer interpretation, including decisions on how to treat cases that are intermediate between two or more classes, and whether to consider more than one classification. The explicit consideration and presentation of uncertainty makes this approach useful for ecological investigations, where both data and expectations are often highly uncertain.
    Original languageEnglish
    Pages (from-to)526-536
    Number of pages11
    JournalEcography
    Volume30
    Issue number4
    DOIs
    Publication statusPublished - 2007

    Fingerprint

    uncertainty
    catchment
    basins
    rain
    rainfall
    Bayesian theory
    basin
    lithology
    watershed
    ecology
    methodology
    modeling
    comparison
    method

    Cite this

    Webb, J. Angus ; Bond, Nicholas R. ; Wealands, Stephen R. ; Mac Nally, R. ; Quinn, G.P. ; Vesk, P.A. ; Grace, Michael R. / Bayesian clustering with AutoClass explicitly recognises uncertainties in landscape classification. In: Ecography. 2007 ; Vol. 30, No. 4. pp. 526-536.
    @article{acc69a54dcaf4a22879ea8e86ffadc5d,
    title = "Bayesian clustering with AutoClass explicitly recognises uncertainties in landscape classification",
    abstract = "Clustering of multivariate data is a commonly used technique in ecology, and many approaches to clustering are available. The results from a clustering algorithm are uncertain, but few clustering approaches explicitly acknowledge this uncertainty. One exception is Bayesian mixture modelling, which treats all results probabilistically, and allows comparison of multiple plausible classifications of the same data set. We used this method, implemented in the AutoClass program, to classify catchments (watersheds) in the Murray Darling Basin (MDB), Australia, based on their physiographic characteristics (e.g. slope, rainfall, lithology). The most likely classification found nine classes of catchments. Members of each class were aggregated geographically within the MDB. Rainfall and slope were the two most important variables that defined classes. The second-most likely classification was very similar to the first, but had one fewer class. Increasing the nominal uncertainty of continuous data resulted in a most likely classification with five classes, which were again aggregated geographically. Membership probabilities suggested that a small number of cases could be members of either of two classes. Such cases were located on the edges of groups of catchments that belonged to one class, with a group belonging to the second-most likely class adjacent. A comparison of the Bayesian approach to a distance-based deterministic method showed that the Bayesian mixture model produced solutions that were more spatially cohesive and intuitively appealing. The probabilistic presentation of results from the Bayesian classification allows richer interpretation, including decisions on how to treat cases that are intermediate between two or more classes, and whether to consider more than one classification. The explicit consideration and presentation of uncertainty makes this approach useful for ecological investigations, where both data and expectations are often highly uncertain.",
    author = "Webb, {J. Angus} and Bond, {Nicholas R.} and Wealands, {Stephen R.} and {Mac Nally}, R. and G.P. Quinn and P.A. Vesk and Grace, {Michael R.}",
    note = "Cited By :17 Export Date: 6 June 2017",
    year = "2007",
    doi = "10.1111/j.2007.0906-7590.05002.x",
    language = "English",
    volume = "30",
    pages = "526--536",
    journal = "Ecography",
    issn = "0906-7590",
    publisher = "Wiley-Blackwell",
    number = "4",

    }

    Bayesian clustering with AutoClass explicitly recognises uncertainties in landscape classification. / Webb, J. Angus; Bond, Nicholas R.; Wealands, Stephen R.; Mac Nally, R.; Quinn, G.P.; Vesk, P.A.; Grace, Michael R.

    In: Ecography, Vol. 30, No. 4, 2007, p. 526-536.

    Research output: Contribution to journalArticle

    TY - JOUR

    T1 - Bayesian clustering with AutoClass explicitly recognises uncertainties in landscape classification

    AU - Webb, J. Angus

    AU - Bond, Nicholas R.

    AU - Wealands, Stephen R.

    AU - Mac Nally, R.

    AU - Quinn, G.P.

    AU - Vesk, P.A.

    AU - Grace, Michael R.

    N1 - Cited By :17 Export Date: 6 June 2017

    PY - 2007

    Y1 - 2007

    N2 - Clustering of multivariate data is a commonly used technique in ecology, and many approaches to clustering are available. The results from a clustering algorithm are uncertain, but few clustering approaches explicitly acknowledge this uncertainty. One exception is Bayesian mixture modelling, which treats all results probabilistically, and allows comparison of multiple plausible classifications of the same data set. We used this method, implemented in the AutoClass program, to classify catchments (watersheds) in the Murray Darling Basin (MDB), Australia, based on their physiographic characteristics (e.g. slope, rainfall, lithology). The most likely classification found nine classes of catchments. Members of each class were aggregated geographically within the MDB. Rainfall and slope were the two most important variables that defined classes. The second-most likely classification was very similar to the first, but had one fewer class. Increasing the nominal uncertainty of continuous data resulted in a most likely classification with five classes, which were again aggregated geographically. Membership probabilities suggested that a small number of cases could be members of either of two classes. Such cases were located on the edges of groups of catchments that belonged to one class, with a group belonging to the second-most likely class adjacent. A comparison of the Bayesian approach to a distance-based deterministic method showed that the Bayesian mixture model produced solutions that were more spatially cohesive and intuitively appealing. The probabilistic presentation of results from the Bayesian classification allows richer interpretation, including decisions on how to treat cases that are intermediate between two or more classes, and whether to consider more than one classification. The explicit consideration and presentation of uncertainty makes this approach useful for ecological investigations, where both data and expectations are often highly uncertain.

    AB - Clustering of multivariate data is a commonly used technique in ecology, and many approaches to clustering are available. The results from a clustering algorithm are uncertain, but few clustering approaches explicitly acknowledge this uncertainty. One exception is Bayesian mixture modelling, which treats all results probabilistically, and allows comparison of multiple plausible classifications of the same data set. We used this method, implemented in the AutoClass program, to classify catchments (watersheds) in the Murray Darling Basin (MDB), Australia, based on their physiographic characteristics (e.g. slope, rainfall, lithology). The most likely classification found nine classes of catchments. Members of each class were aggregated geographically within the MDB. Rainfall and slope were the two most important variables that defined classes. The second-most likely classification was very similar to the first, but had one fewer class. Increasing the nominal uncertainty of continuous data resulted in a most likely classification with five classes, which were again aggregated geographically. Membership probabilities suggested that a small number of cases could be members of either of two classes. Such cases were located on the edges of groups of catchments that belonged to one class, with a group belonging to the second-most likely class adjacent. A comparison of the Bayesian approach to a distance-based deterministic method showed that the Bayesian mixture model produced solutions that were more spatially cohesive and intuitively appealing. The probabilistic presentation of results from the Bayesian classification allows richer interpretation, including decisions on how to treat cases that are intermediate between two or more classes, and whether to consider more than one classification. The explicit consideration and presentation of uncertainty makes this approach useful for ecological investigations, where both data and expectations are often highly uncertain.

    U2 - 10.1111/j.2007.0906-7590.05002.x

    DO - 10.1111/j.2007.0906-7590.05002.x

    M3 - Article

    VL - 30

    SP - 526

    EP - 536

    JO - Ecography

    JF - Ecography

    SN - 0906-7590

    IS - 4

    ER -