Visualization of U.S.
Airlines by Using Gephi
MEASUREMENTS
1.
Average
Clustering Coefficient
The
clustering coefficient (Watts-Strogatz), when applied to a single node, is a
measure of how complete the neighborhood of a node is. When applied to an entire
network, it is the average clustering coefficient over all of the nodes in the
network.
The
clustering coefficient, along with the mean shortest path, can indicate a
"small-world" effect. For the clustering coefficient to be meaningful
it should be significantly higher than in version of the network where all of
the edges have been shuffled.
The
neighborhood of a node, u, is the set of nodes that are connected to u. If
every node in the neighborhood of u is connected to every other node in the
neighborhood of u, then the neighborhood of u is complete and will have a
clustering coefficient of 1. If no nodes in the neighborhood of u are
connected, then the clustering coefficient will be 0.
2.
Average
Path Length
The
average graph-distance between all pairs of nodes. Connected nodes have graph
distance 1. The diameter is the longest graph distance between any two nodes in
the network. (i.e. How far apart are the two most distant nodes). Two measures
derive from the distance: Betweenness Centrality and Closeness Centrality.
3.
Betweenness
Centrality
Node
Betweenness Centrality measures how often a node appears on shortest paths
between nodes in the network. Suppose
that I want to influence you by sending you information, or make a deal to
exchange some resources. But, in order to talk to you, I must go through an
intermediary. For example, let's suppose that I wanted to try to convince the
Chancellor of my university to buy me a new computer. According to the rules of
our bureaucratic hierarchy, I must forward my request through my department
chair, a dean, and an executive vice chancellor. Each one of these people could
delay the request, or even prevent my request from getting through. This gives
the people who lie "between" me and the Chancellor power with respect
to me. To stretch the example just a bit more, suppose that I also have an
appointment in the school of business, as well as one in the department of
sociology. I might forward my request to the Chancellor by both channels.
Having more than one channel makes me less dependent, and, in a sense, more
powerful.
4.
Closeness
Centrality
The
average distance from a given node to all other nodes in the network. The second reason why actor A is more
powerful than the other actors in the star network is that actor A is closer to
more actors than any other actor. Power can be exerted by direct bargaining and
exchange. But power also comes from acting as a "reference point" by
which other actors judge themselves, and by being a center of attention who's
views are heard by larger numbers of actors. Actors who are able to reach other
actors at shorter path lengths, or who are more reachable by other actors at
shorter path lengths have favored positions. This structural advantage can be
translated into power. In the star network, actor A is at a geodesic distance
of one from all other actors; each other actor is at a geodesic distance of two
from all other actors (but A). This logic of structural advantage underlies
approaches that emphasize the distribution of closeness and distance as a
source of power.
Now
consider the circle network in terms of actor closeness. Each actor lies at
different path lengths from the other actors, but all actors have identical
distributions of closeness, and again would appear to be equal in terms of
their structural positions. In the line network, the middle actor (D) is closer
to all other actors than are the set C,E, the set B,F, and the set A,G. Again,
the actors at the ends of the line, or at the periphery, are at a disadvantage.
5.
Connected
Components
Determines
the number of connected components in the network.
§ On
directed graphs : detect strongly and
weakly connected components.
§ On
undirected graphs : detect only weakly
connected components.
6.
Deggre
The
degree of a node is the number of edges that are adjacent to the node.
7.
Degree
Power Law
It
measures how closely the degree distribution of a network follows a power-law
scale. An alpha value between 2 and 3 implies a power law.
8.
Diameter
The
maximal distance between all pairs of nodes.
9.
Eigenvector
Centrality
A
measure of node importance in a network based on a node's connections. The closeness centrality measure described
above is based on the sum of the geodesic distances from each actor to all
others (farness). In larger and more complex networks than the example we've
been considering, it is possible to be somewhat misled by this measure.
Consider two actors, A and B. Actor A is quite close to a small and fairly closed
group within a larger network, and rather distant from many of the members of
the population. Actor B is at a moderate distance from all of the members of
the population. The farness measures for actor A and actor B could be quite
similar in magnitude. In a sense, however, actor B is really more
"central" than actor A in this example, because B is able to reach
more of the network with same amount of effort.
The
eigenvector approach is an effort to find the most central actors (i.e. those
with the smallest farness from others) in terms of the "global" or
"overall" structure of the network, and to pay less attention to
patterns that are more "local." The method used to do this (factor
analysis) is beyond the scope of the current text. In a general way, what
factor analysis does is to identify "dimensions" of the distances
among actors. The location of each actor with respect to each dimension is
called an "eigenvalue," and the collection of such values is called
the "eigenvector." Usually, the first dimension captures the
"global" aspects of distances among actors; second and further
dimensions capture more specific and local sub-structures.
The
UCINET Network>Centrality>Eigenvector routine calculates individual actor
centrality, and graph centralization using weights on the first
eigenvector. A limitation of the routine
is that it does not calculate values for asymmetric data. So, our measures here are based on the notion
of "any connection."
10.
Graph
Density
Measures
how close the network is to complete. A complete graph has all possible edges
and density equal to 1.
11.
HITS
Hyperlink-Induced
Topic Search (HITS) (also known as Hubs and authorities) is a link analysis
algorithm that rates Web pages, developed by Jon Kleinberg. The HITS metric
determines two values for a page: its authority, which estimates the value of
the content of the page, and its hub value, which estimates the value of its
links to other pages. Description
Actually
computes two different scores: hubs and authority. The authority score
indicates the value of the page (node) itself and hubs estimates the value of
the links outgoing from the page (node). Hits is an iterative algorithm at each
iteration:
§ Update
the authority value of each node to be the sum of the hub values for every node
it has a link into.
§ Update
the hub values for each node to be the sum of the authority values that it has
a link into.
§ Normalize
the hub and authority scores for all nodes by normalizing each value by the
system sum for each value.
§ Repeat
these steps (assumingly until the values no longer fluctuate).
12.
Modularity
Measures
how well a network decomposes into modular communities. A high modularity score
indicates sophisticated internal structure. This structure, often called a
community structure, describes how the the network is compartmentalized into
sub-networks. These sub-networks (or communities) have been shown to have significant
real-world meaning. Randomizing the algorithm can produce a better decomposition
resulting in a higher modularity score, however randomizing will increase
computation time.
13.
PageRank
An
iterative algorithm that measures the importance of each node within the
network. The metric assigns each node a probability that is the probability of
being at that page after many clicks. The page rank values are the values in
the eigenvector that has the highest corresponding eigenvalue of a normalized
adjacency matrix A'. The standard adjacency matrix is normalized so that the
columns of the matrix sum to 1.
REFERENCES
Watts,
D.J., Strogatz, S.H.(1998) Collective dynamics of 'small-world' networks.
Nature 393:440-442. PDF
Albert,
R., and Barabasi, A.-L. (2002) Statistical mechanics of complex networks.
Review of Modern Physics 74:47-97. PDF
Newman,
M.E.J. (2003) The structure and function of complex networks. SIAM Review
45:167-256. PDF
Ulrik
Brandes (2001). A faster algorithm for betweenness centrality
Robert
Tarjan, Depth-First Search and Linear Graph Algorithms, in SIAM Journal on
Computing 1 (2): 146–160 (1972)
Kleinberg,
Jon (1999). "Authoritative sources in a hyperlinked environment".
Journal of the ACM 46 (5): 604–632. doi:10.1145/324133.324140. PDF
Page,
Lawrence; Brin, Sergey; Motwani, Rajeev and Winograd, Terry (1999). The
PageRank citation ranking: Bringing order to the Web. PDF
No comments:
Post a Comment