Head / tail Breaks

Head / tail breaks is a clustering algorithm with heavy-tailed distributions such as power laws and lognormal distributions . The heavy-tailed distribution can be simply referred to the scaling pattern of large, small, or small, largest and smallest. The classification is done through a large part of the world (or called the head) and small (or called the tail). Arithmetic mean or average, and then recursively going for the division process. of far more small things than large ones [1]Head / tail breaks is not just for classification, but also for visualization of big data by keeping the head, since the head is self-similar to the whole. Head / tail breaks can be applied not only to vector data such as points, lines and polygons, but also to raster data like the digital elevation model (DEM).

Motivation

The head / tail breaks is Mainly motivated by Inability of conventional classification methods Such As equal intervals, quantiles, geometric progressions, standard deviation, and natural breaks – Commonly Known As Jenks Natural Breaks Optimization for revealing the Underlying scaling pattern of far more small things than wide ones. Note that the notion of greater than is not only limited to other areas, but also to topological and semantic properties. In this connection, the concept should be more unpopular (or less-connected) things than popular (or well-connected) ones, or far more meaningless things than meaningful ones.

Method

Given some variable X that demonstrates a heavy-tailed distribution, there are far more small x than large ones. Take the average of all xi, and get the first mean m1. Then calculate the second mean for those xi greater than m1, and obtain m2. In the same recursive way, we may find that For simplicity, we assume there are three means, m1, m2, and m3. This classification leads to four classes: [minimum, m1], (m1, m2], (m2, m3), (m3, maximum) In general, it can be represented as a recursive function as follows:

 Recursive function Head / tail Breaks :
 Break the data input (around mean or average)
 // the head for data values ​​greater the mean
 // the tail for data values ​​less the mean
 while (head <= 40%): Head / tail Breaks (head);
 End Function

The resulting number of classes is referred to as ht-index, an alternative index to fractal dimension for characterizing the complexity of fractals or geographic features: the higher the ht-index, the more complex the fractals. [2]

Threshold or its sensitivity

The criterion to stop the iterative classification process using the head / tail breaks method is that the remaining data (ie, the head part) are not heavy-tailed, or simply, the head part is no longer a minority (ie, the proportion of the head is no longer less than a threshold such as 40%). This threshold is suggested to be 40% by Jiang et al. (2013), [3] just as the codes above (ie, head ≤ 40%). But sometimes a larger threshold, for example 50% or more, can be used, as Jiang and Yin (2014) [2]noted in another article: “this condition can be relaxed for many geographic features, such as 50 percent or even more”. However, all heads’ percentage is smaller than 40% (or 41, 42%), indicating far more small things than large ones. This sensitivity issue deserves further research in the future.

Rank-size plot and RA index

A good tool to display the scaling pattern, or the heavy-tailed distribution, is the rank-size plot, which is a scatter plot to display a set of values ​​according to their ranks. With this tool, a new index [4] termed the ratio of areas (RA) in a rank-size plot was defined to characterize the scaling pattern. The RA has been successfully used in estimating traffic conditions. However, it should be noted that the index can only be used as a complementary method to the ht-index, because it is ineffective to capture the scaling structure of geographic features.

Other Indices based on the head / tail breaks

In addition to the ht-index, the following indices are also derived from the head / tail breaks.

  • CRG-index. It is developed as a more sensitive ht-index to capture the slight changes of geographic features. [5] In contrast to the ht-index, which is an integer, CRG-index is a real number.
  • Unified metrics. Two unified metrics (UM1 and UM2) were proposed in AAAG paper [6] for characterizing the fractal nature of geographic features. What are these small (or large) things? “, The other one to answer” I know there are far more small things than large ones, but how many more? “

Applications

Instead of more or less things, there are far more Given the ubiquity of the scaling pattern, the generalization, cognitive mapping and even perception of beauty. [3] [7] [8] It helps visualize big data. The visualization strategy is to recursively drop out of the tail. [9] In addition, it helps to identify specific cities, cities and towns, social media geolocation data, and nighttime images.

Characterizing the imbalance

As the head / tail breaks method can be used to obtain a part of a data set, this method actually captures the underlying hierarchy of the data set. For example, if we divide the array (19, 8, 7, 6, 2, 1, 1, 1, 0) with the head / tail breaks method, we can get two head parts, ie, the first head part (19 , 8, 7, 6) and the second head part (19). These two head parts of the original array form a three-level hierarchy:

the 1st level (19),

the 2nd level (19, 8, 7, 6), and

the 3rd level (19, 8, 7, 6, 2, 1, 1, 1, 0).

The number of levels of the above-mentioned hierarchy is actually a characterization of the imbalance of the example array, and this number of levels has been termed as the ht-index. [2] With the ht-index, we are able to compare degrees of imbalance of two data sets. For example, the ht-index of the example array (19, 8, 7, 6, 2, 1, 1, 1, 0) is 3, and the ht-index of another array (19, 8, 8, 8, 8, 8, 8, 8, 8) is 2. Therefore, the degree of imbalance of the array is higher than that of the latter array.

The left panel pattern contains 50,000 natural cities, which can be put into 7 hierarchical levels. It looks like a hair ball. Instead of showing all the 7 hierarchical levels, we show 4 top levels, by dropping out 3 low levels. Now with the right panel, the scaling pattern of far more small cities than large ones emerges. It is important to note that the right pattern (or the remaining part of the world) is self-similar to the whole (or the left pattern). Thus the right pattern reflects the underlying structure of the left one, and enables us to see the whole.
The scaling pattern of US ground surface is distorted by the natural breaks, but revealed by the head / tail breaks.

Delineating natural cities

The term ‘natural cities’ refers to the human settlements or human activities in general on Earth’s surface is naturally or objectively defined and delineated from massive geographic information based on head / tail division rule, a non-recursive form of head / tail breaks. [10] [11] Such geographic information could be from various sources, such as massive street junctions [11]and street ends, a massive number of street blocks, nighttime imagery and social media users’ rentals etc. Distinctive from the cities, the adjective ‘natural’ could be explained by the sources of natural cities, but by the approach to derive them. Natural cities are derived from a significant cutoff of a massive amount of information. [9] These units vary according to different types of geographic information, for example, and can be used for the production of images. A natural cities model has been created using ArcGIS model builder, [12] it follows the same process of deriving natural cities from location-based social media, [10] namely, building up huge triangular irregular network (TIN) based on the point features and considering the triangles which are smaller than a mean value in the natural cities.

Color rendering DEM

Current color renderings for DEM or density map are essentially based on such classifications, or they are disproportionately exaggerated high elevations or high densities. As a matter of fact, there are not so many high elevations or high-density locations. [13] It was found that coloring based head [14]

Software implementations

The following implementations are available under Free / Open Source Software licenses.

  • HT calculator : a winform application for one or more metrics of head / tail.
  • HT in JavaScript : a JavaScript implementation for applying head / tail breaks on a single data array.
  • HT Mapping tool : a function in the free plug-in Axwoman 6.3 to ArcMap 10.2 that conducts geo-data symbolization automatically based on the head / tail breaks classification.
  • HT in Python : Python and JavaScript code for the head / tail breaks algorithm. It’s works great for choropleth map coloring.
  • pysal.esda.mapclassify : Python classification schemes for choropleth mapping, including head / tail breaks map classification.
  • smoomapy 0.1.9 : Brings smoothed maps through python.
  • Ht-index calculator : A PostgreSQL function for calculating ht-index (also see [15] ).
  • RA calculator : Software for calculating the ratio of areas (RA) in a rank-size plot (also see [4] ).

References

  1. Jump up^ Jiang, Bin (2013). “Head / tail breaks: A new classification scheme for data with a heavy-tailed distribution”,The Professional Geographer, 65 (3), 482-494.
  2. ^ Jump up to:c Jiang, Bin and Yin Junjun (2014). “Ht-index for quantifying the fractal or scaling structure of geographic features”, Annals of the American Geographers Association , 104 (3), 530-541.
  3. ^ Jump up to:b Jiang Bin, Liu Xintao and Jia Tao (2013). “Scaling of geographic space as a universal rule for map generalization”, Annals of the American Geographers Association , 103 (4), 844-855.
  4. ^ Jump up to:b Gao Peichao; Liu, Zhao; Tian, ​​Kun; Liu, Gang (2016-03-10). “Characterizing Traffic Conditions from the Perspective of Spatial-Temporal Heterogeneity” . ISPRS International Journal of Geo-Information . 5 (3): 34. doi : 10.3390 / ijgi5030034 .
  5. Jump up^ Gao, Peichao; Liu, Zhao; Xie, Meihui; Tian, ​​Kun; Liu, Gang (2016-10-01). “CRG Index: A More Sensitive Ht-Index for Enabling Dynamic Views of Geographic Features” . The Professional Geographer . 68 (4): 533-545. doi : 10.1080 / 00330124.2015.1099448 . ISSN  0033-0124 .
  6. Jump up^ Gao, Peichao; Liu, Zhao; Liu, Gang; Zhao, Hongrui; Xie, Xiaoxiao (2017-06-02). “Unified Metrics for Characterizing the Fractal Nature of Geographic Features” . Annals of the American Association of Geographers . 0 (0): 1-17. doi : 10.1080 / 24694452.2017.1310022 . ISSN  2469-4452 .
  7. Jump up^ Jiang, Bin (2013b). “Annals of the Association of American Geographers,” 103 (6), 1552-1566,”The image of the city of the underlying scaling of city artifacts”.
  8. Jump up^ Jiang, Bin and Sui, Daniel (2014). “The Professional Geographer,” 66 (4), 676-686,”A new kind of beauty out of the underlying scaling of geographic space”
  9. ^ Jump up to:b Jiang Bin (2015). “Head / tail breaks for visualization of city structure and dynamics”, Cities, 43, 69 – 77.
  10. ^ Jump up to:b Jiang Bin and Miao Yufan (2015). “The evolution of natural cities from the perspective of location-based social media”, The Professional Geographer , 67 (2), 295-306.
  11. ^ Jump up to:b Long Ying (2016). “Redefining Chinese city system with emerging new data”, Applied Geography , 75, 36 – 48.
  12. Jump up^ Ren, Zheng (2016). “Natural cities model in ArcGIS”, http://www.arcgis.com/home/item.html?id=47b1d6fdd1984a6fae916af389cdc57d .
  13. Jump up^ Jiang, Bin (2015). “Geospatial analysis requires a different way of thinking: The problem of spatial heterogeneity”,GeoJournal, 80 (1), 1-13.
  14. Jump up^ Wu, Jou-Hsuan (2015). “Examining the new kind of beauty using the human being as a measuring instrument”, http://www.diva-portal.org/smash/get/diva2:805296/FULLTEXT01.pdf .
  15. Jump up^ “A PostgreSQL function for calculating the ht-index (PDF Download Available)” . ResearchGate . doi : 10.13140 / rg.2.1.3041.0324 . Retrieved 2017-08-08 .