Monday, October 3, 2016

Visualization of U.S. Airlines by Using Gephi



Visualization of U.S. Airlines by Using Gephi








MEASUREMENTS


1.      Average Clustering Coefficient
The clustering coefficient (Watts-Strogatz), when applied to a single node, is a measure of how complete the neighborhood of a node is. When applied to an entire network, it is the average clustering coefficient over all of the nodes in the network.
The clustering coefficient, along with the mean shortest path, can indicate a "small-world" effect. For the clustering coefficient to be meaningful it should be significantly higher than in version of the network where all of the edges have been shuffled.
The neighborhood of a node, u, is the set of nodes that are connected to u. If every node in the neighborhood of u is connected to every other node in the neighborhood of u, then the neighborhood of u is complete and will have a clustering coefficient of 1. If no nodes in the neighborhood of u are connected, then the clustering coefficient will be 0.

2.      Average Path Length
The average graph-distance between all pairs of nodes. Connected nodes have graph distance 1. The diameter is the longest graph distance between any two nodes in the network. (i.e. How far apart are the two most distant nodes). Two measures derive from the distance: Betweenness Centrality and Closeness Centrality.

3.      Betweenness Centrality
Node Betweenness Centrality measures how often a node appears on shortest paths between nodes in the network. Suppose that I want to influence you by sending you information, or make a deal to exchange some resources. But, in order to talk to you, I must go through an intermediary. For example, let's suppose that I wanted to try to convince the Chancellor of my university to buy me a new computer. According to the rules of our bureaucratic hierarchy, I must forward my request through my department chair, a dean, and an executive vice chancellor. Each one of these people could delay the request, or even prevent my request from getting through. This gives the people who lie "between" me and the Chancellor power with respect to me. To stretch the example just a bit more, suppose that I also have an appointment in the school of business, as well as one in the department of sociology. I might forward my request to the Chancellor by both channels. Having more than one channel makes me less dependent, and, in a sense, more powerful.

4.      Closeness Centrality
The average distance from a given node to all other nodes in the network. The second reason why actor A is more powerful than the other actors in the star network is that actor A is closer to more actors than any other actor. Power can be exerted by direct bargaining and exchange. But power also comes from acting as a "reference point" by which other actors judge themselves, and by being a center of attention who's views are heard by larger numbers of actors. Actors who are able to reach other actors at shorter path lengths, or who are more reachable by other actors at shorter path lengths have favored positions. This structural advantage can be translated into power. In the star network, actor A is at a geodesic distance of one from all other actors; each other actor is at a geodesic distance of two from all other actors (but A). This logic of structural advantage underlies approaches that emphasize the distribution of closeness and distance as a source of power.
Now consider the circle network in terms of actor closeness. Each actor lies at different path lengths from the other actors, but all actors have identical distributions of closeness, and again would appear to be equal in terms of their structural positions. In the line network, the middle actor (D) is closer to all other actors than are the set C,E, the set B,F, and the set A,G. Again, the actors at the ends of the line, or at the periphery, are at a disadvantage.

5.      Connected Components
Determines the number of connected components in the network.
§  On directed graphs     : detect strongly and weakly connected components.
§  On undirected graphs : detect only weakly connected components.

6.      Deggre
The degree of a node is the number of edges that are adjacent to the node.

7.      Degree Power Law
It measures how closely the degree distribution of a network follows a power-law scale. An alpha value between 2 and 3 implies a power law.

8.      Diameter
The maximal distance between all pairs of nodes.

9.      Eigenvector Centrality
A measure of node importance in a network based on a node's connections. The closeness centrality measure described above is based on the sum of the geodesic distances from each actor to all others (farness). In larger and more complex networks than the example we've been considering, it is possible to be somewhat misled by this measure. Consider two actors, A and B. Actor A is quite close to a small and fairly closed group within a larger network, and rather distant from many of the members of the population. Actor B is at a moderate distance from all of the members of the population. The farness measures for actor A and actor B could be quite similar in magnitude. In a sense, however, actor B is really more "central" than actor A in this example, because B is able to reach more of the network with same amount of effort.
The eigenvector approach is an effort to find the most central actors (i.e. those with the smallest farness from others) in terms of the "global" or "overall" structure of the network, and to pay less attention to patterns that are more "local." The method used to do this (factor analysis) is beyond the scope of the current text. In a general way, what factor analysis does is to identify "dimensions" of the distances among actors. The location of each actor with respect to each dimension is called an "eigenvalue," and the collection of such values is called the "eigenvector." Usually, the first dimension captures the "global" aspects of distances among actors; second and further dimensions capture more specific and local sub-structures.
The UCINET Network>Centrality>Eigenvector routine calculates individual actor centrality, and graph centralization using weights on the first eigenvector.  A limitation of the routine is that it does not calculate values for asymmetric data.  So, our measures here are based on the notion of "any connection."

10.  Graph Density
Measures how close the network is to complete. A complete graph has all possible edges and density equal to 1.

11.  HITS
Hyperlink-Induced Topic Search (HITS) (also known as Hubs and authorities) is a link analysis algorithm that rates Web pages, developed by Jon Kleinberg. The HITS metric determines two values for a page: its authority, which estimates the value of the content of the page, and its hub value, which estimates the value of its links to other pages. Description

Actually computes two different scores: hubs and authority. The authority score indicates the value of the page (node) itself and hubs estimates the value of the links outgoing from the page (node). Hits is an iterative algorithm at each iteration:
§  Update the authority value of each node to be the sum of the hub values for every node it has a link into.
§  Update the hub values for each node to be the sum of the authority values that it has a link into.
§  Normalize the hub and authority scores for all nodes by normalizing each value by the system sum for each value.
§  Repeat these steps (assumingly until the values no longer fluctuate).

12.  Modularity
Measures how well a network decomposes into modular communities. A high modularity score indicates sophisticated internal structure. This structure, often called a community structure, describes how the the network is compartmentalized into sub-networks. These sub-networks (or communities) have been shown to have significant real-world meaning. Randomizing the algorithm can produce a better decomposition resulting in a higher modularity score, however randomizing will increase computation time.

13.  PageRank
An iterative algorithm that measures the importance of each node within the network. The metric assigns each node a probability that is the probability of being at that page after many clicks. The page rank values are the values in the eigenvector that has the highest corresponding eigenvalue of a normalized adjacency matrix A'. The standard adjacency matrix is normalized so that the columns of the matrix sum to 1.



REFERENCES


Watts, D.J., Strogatz, S.H.(1998) Collective dynamics of 'small-world' networks. Nature 393:440-442. PDF
Albert, R., and Barabasi, A.-L. (2002) Statistical mechanics of complex networks. Review of Modern Physics 74:47-97. PDF
Newman, M.E.J. (2003) The structure and function of complex networks. SIAM Review 45:167-256. PDF
Ulrik Brandes (2001). A faster algorithm for betweenness centrality
Robert Tarjan, Depth-First Search and Linear Graph Algorithms, in SIAM Journal on Computing 1 (2): 146–160 (1972)
Kleinberg, Jon (1999). "Authoritative sources in a hyperlinked environment". Journal of the ACM 46 (5): 604–632. doi:10.1145/324133.324140. PDF
Page, Lawrence; Brin, Sergey; Motwani, Rajeev and Winograd, Terry (1999). The PageRank citation ranking: Bringing order to the Web. PDF

No comments: