Is counting the frequency a vertex appears in paths between all vertexes a valid way to determine “centrality”? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)Algorithm for Collection of Shortest Paths in a Grid without any clash at a point of time.Finding all nodes that are passed by all paths from A to B in an undirected graphReturn list of k-paths in a networkShortest path between wikipedia articlesProof completion: Determine a simple expression for $tau(G)$ in terms of the vertex degrees of $G$. (details inside)On an $h times h$ square lattice, count all the paths from $(0,a)$ to $(h-1,b)$, $a,b in [0,h-1]$, with diagonal moves allowedTraveling Salesman with paths instead of pointsCalculating the betweenness centrality of this small graph?Is there a way to find paths across this graph that satisfy the given conditions?expected number of in- and out-going links in random PA graph

How do I keep my slimes from escaping their pens?

Need a suitable toxic chemical for a murder plot in my novel

Replacing HDD with SSD; what about non-APFS/APFS?

Geometric mean and geometric standard deviation

Blender game recording at the wrong time

I'm having difficulty getting my players to do stuff in a sandbox campaign

How is simplicity better than precision and clarity in prose?

Problem when applying foreach loop

What computer would be fastest for Mathematica Home Edition?

Why is "Captain Marvel" translated as male in Portugal?

I'm thinking of a number

Writing Thesis: Copying from published papers

What did Darwin mean by 'squib' here?

Was credit for the black hole image misattributed?

Estimated State payment too big --> money back; + 2018 Tax Reform

Complexity of many constant time steps with occasional logarithmic steps

When communicating altitude with a '9' in it, should it be pronounced "nine hundred" or "niner hundred"?

Area of a 2D convex hull

Fishing simulator

Simulating Exploding Dice

Why does tar appear to skip file contents when output file is /dev/null?

Why does this iterative way of solving of equation work?

What's the point in a preamp?

Can smartphones with the same camera sensor have different image quality?



Is counting the frequency a vertex appears in paths between all vertexes a valid way to determine “centrality”?



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)Algorithm for Collection of Shortest Paths in a Grid without any clash at a point of time.Finding all nodes that are passed by all paths from A to B in an undirected graphReturn list of k-paths in a networkShortest path between wikipedia articlesProof completion: Determine a simple expression for $tau(G)$ in terms of the vertex degrees of $G$. (details inside)On an $h times h$ square lattice, count all the paths from $(0,a)$ to $(h-1,b)$, $a,b in [0,h-1]$, with diagonal moves allowedTraveling Salesman with paths instead of pointsCalculating the betweenness centrality of this small graph?Is there a way to find paths across this graph that satisfy the given conditions?expected number of in- and out-going links in random PA graph










2












$begingroup$


I have a blog with 15+ years of posts. There are over 7,000 of them. (It's Gadgetopia, if you're curious.)
I'm auditing it in preparation for a big purge, and I'm accumulating some metrics so I can "score" posts for retention or not.



One of the metrics I'm interested in is what I'm calling "link centrality."



I linked between posts a lot over the years. The posts are weaved into each other – one post will link to another, which will link to three more, which each link to five more, at least one of which links back to the first post, etc.
I have a database table that tracks the links between posts. I have a scheduled job and parses each post, and pulls all the intra-site links out, and enters a record for each one. We'll call this the "link table." The link table has two columns – source and target – so it only tracks one “hop."
The link table can tell me that post #1234 linked to post #5678. And another record in the table might tell me that post #5678 linked to post #9012. And so on. In this sense, every link from one post to another starts a "chain" or "path" of links.



I got to wondering how to determine what posts were most central in these chains, so I got in my head that I would use the data in the link table to set up a network map of these relationships, and run some metrics on them.



My methodology, using QuickGraph:



  • I created a vertex for each distinct post that appeared in either column of the link table (as either a source or a target). I figured that any post appearing in this table was a node on the map. And, if a post didn't appear in this table, then, by definition, it wasn't a part of any path (it was orphaned/isolated from all other posts on the site).

  • I created a directed edge for each link from one post to another post.

  • For each combination of posts/vertices in the map (approx. 1.6 million combinations), I computed the shortest possible path between them. Note that some came up null, because there was no path.

  • I iterated the edges/links of all these paths and recorded the destination vertex/post for each.

  • I counted those vertices/posts up to determine what I hope to believe is some measure of "link centrality."

The theory in my head was this: if you're moving between two posts on the site that are part of the link graph – meaning they're somehow "plugged into" other posts – then the posts you "run over" the most on all these paths are probably pretty important.



One post in particular, for example, appears about 1,600 times in paths between linked posts. This post is certainly foundational to the site (it's this one) – I linked to it from many posts over the years, and those linking posts were also full of links to other posts. Additionally, its sheer age (circa 2007) make it more likely to appear in more paths.



My question



Have I accomplished anything here that I couldn't have accomplished my just counting inbound and outbound links from my link table?










share|cite|improve this question











$endgroup$











  • $begingroup$
    As a complete sidenote: Do you really need to remove content? What do you actually gain with this? E.g., if the content you remove is clogging searches or similar, there are almost certainly better ways to deal with this. Link rot is one of the plagues of the Internet.
    $endgroup$
    – Wrzlprmft
    Mar 25 at 20:36










  • $begingroup$
    I agree, which is why I'm a huge believer in 410 Gone over 404 Not Found. I've already removed several posts which were causing problems. Being a blog, I lot of my older posts are just links to things that don't exist anymore, so I'm part of the link rot, I just need to break the chain earlier.
    $endgroup$
    – Deane
    Mar 25 at 21:00















2












$begingroup$


I have a blog with 15+ years of posts. There are over 7,000 of them. (It's Gadgetopia, if you're curious.)
I'm auditing it in preparation for a big purge, and I'm accumulating some metrics so I can "score" posts for retention or not.



One of the metrics I'm interested in is what I'm calling "link centrality."



I linked between posts a lot over the years. The posts are weaved into each other – one post will link to another, which will link to three more, which each link to five more, at least one of which links back to the first post, etc.
I have a database table that tracks the links between posts. I have a scheduled job and parses each post, and pulls all the intra-site links out, and enters a record for each one. We'll call this the "link table." The link table has two columns – source and target – so it only tracks one “hop."
The link table can tell me that post #1234 linked to post #5678. And another record in the table might tell me that post #5678 linked to post #9012. And so on. In this sense, every link from one post to another starts a "chain" or "path" of links.



I got to wondering how to determine what posts were most central in these chains, so I got in my head that I would use the data in the link table to set up a network map of these relationships, and run some metrics on them.



My methodology, using QuickGraph:



  • I created a vertex for each distinct post that appeared in either column of the link table (as either a source or a target). I figured that any post appearing in this table was a node on the map. And, if a post didn't appear in this table, then, by definition, it wasn't a part of any path (it was orphaned/isolated from all other posts on the site).

  • I created a directed edge for each link from one post to another post.

  • For each combination of posts/vertices in the map (approx. 1.6 million combinations), I computed the shortest possible path between them. Note that some came up null, because there was no path.

  • I iterated the edges/links of all these paths and recorded the destination vertex/post for each.

  • I counted those vertices/posts up to determine what I hope to believe is some measure of "link centrality."

The theory in my head was this: if you're moving between two posts on the site that are part of the link graph – meaning they're somehow "plugged into" other posts – then the posts you "run over" the most on all these paths are probably pretty important.



One post in particular, for example, appears about 1,600 times in paths between linked posts. This post is certainly foundational to the site (it's this one) – I linked to it from many posts over the years, and those linking posts were also full of links to other posts. Additionally, its sheer age (circa 2007) make it more likely to appear in more paths.



My question



Have I accomplished anything here that I couldn't have accomplished my just counting inbound and outbound links from my link table?










share|cite|improve this question











$endgroup$











  • $begingroup$
    As a complete sidenote: Do you really need to remove content? What do you actually gain with this? E.g., if the content you remove is clogging searches or similar, there are almost certainly better ways to deal with this. Link rot is one of the plagues of the Internet.
    $endgroup$
    – Wrzlprmft
    Mar 25 at 20:36










  • $begingroup$
    I agree, which is why I'm a huge believer in 410 Gone over 404 Not Found. I've already removed several posts which were causing problems. Being a blog, I lot of my older posts are just links to things that don't exist anymore, so I'm part of the link rot, I just need to break the chain earlier.
    $endgroup$
    – Deane
    Mar 25 at 21:00













2












2








2





$begingroup$


I have a blog with 15+ years of posts. There are over 7,000 of them. (It's Gadgetopia, if you're curious.)
I'm auditing it in preparation for a big purge, and I'm accumulating some metrics so I can "score" posts for retention or not.



One of the metrics I'm interested in is what I'm calling "link centrality."



I linked between posts a lot over the years. The posts are weaved into each other – one post will link to another, which will link to three more, which each link to five more, at least one of which links back to the first post, etc.
I have a database table that tracks the links between posts. I have a scheduled job and parses each post, and pulls all the intra-site links out, and enters a record for each one. We'll call this the "link table." The link table has two columns – source and target – so it only tracks one “hop."
The link table can tell me that post #1234 linked to post #5678. And another record in the table might tell me that post #5678 linked to post #9012. And so on. In this sense, every link from one post to another starts a "chain" or "path" of links.



I got to wondering how to determine what posts were most central in these chains, so I got in my head that I would use the data in the link table to set up a network map of these relationships, and run some metrics on them.



My methodology, using QuickGraph:



  • I created a vertex for each distinct post that appeared in either column of the link table (as either a source or a target). I figured that any post appearing in this table was a node on the map. And, if a post didn't appear in this table, then, by definition, it wasn't a part of any path (it was orphaned/isolated from all other posts on the site).

  • I created a directed edge for each link from one post to another post.

  • For each combination of posts/vertices in the map (approx. 1.6 million combinations), I computed the shortest possible path between them. Note that some came up null, because there was no path.

  • I iterated the edges/links of all these paths and recorded the destination vertex/post for each.

  • I counted those vertices/posts up to determine what I hope to believe is some measure of "link centrality."

The theory in my head was this: if you're moving between two posts on the site that are part of the link graph – meaning they're somehow "plugged into" other posts – then the posts you "run over" the most on all these paths are probably pretty important.



One post in particular, for example, appears about 1,600 times in paths between linked posts. This post is certainly foundational to the site (it's this one) – I linked to it from many posts over the years, and those linking posts were also full of links to other posts. Additionally, its sheer age (circa 2007) make it more likely to appear in more paths.



My question



Have I accomplished anything here that I couldn't have accomplished my just counting inbound and outbound links from my link table?










share|cite|improve this question











$endgroup$




I have a blog with 15+ years of posts. There are over 7,000 of them. (It's Gadgetopia, if you're curious.)
I'm auditing it in preparation for a big purge, and I'm accumulating some metrics so I can "score" posts for retention or not.



One of the metrics I'm interested in is what I'm calling "link centrality."



I linked between posts a lot over the years. The posts are weaved into each other – one post will link to another, which will link to three more, which each link to five more, at least one of which links back to the first post, etc.
I have a database table that tracks the links between posts. I have a scheduled job and parses each post, and pulls all the intra-site links out, and enters a record for each one. We'll call this the "link table." The link table has two columns – source and target – so it only tracks one “hop."
The link table can tell me that post #1234 linked to post #5678. And another record in the table might tell me that post #5678 linked to post #9012. And so on. In this sense, every link from one post to another starts a "chain" or "path" of links.



I got to wondering how to determine what posts were most central in these chains, so I got in my head that I would use the data in the link table to set up a network map of these relationships, and run some metrics on them.



My methodology, using QuickGraph:



  • I created a vertex for each distinct post that appeared in either column of the link table (as either a source or a target). I figured that any post appearing in this table was a node on the map. And, if a post didn't appear in this table, then, by definition, it wasn't a part of any path (it was orphaned/isolated from all other posts on the site).

  • I created a directed edge for each link from one post to another post.

  • For each combination of posts/vertices in the map (approx. 1.6 million combinations), I computed the shortest possible path between them. Note that some came up null, because there was no path.

  • I iterated the edges/links of all these paths and recorded the destination vertex/post for each.

  • I counted those vertices/posts up to determine what I hope to believe is some measure of "link centrality."

The theory in my head was this: if you're moving between two posts on the site that are part of the link graph – meaning they're somehow "plugged into" other posts – then the posts you "run over" the most on all these paths are probably pretty important.



One post in particular, for example, appears about 1,600 times in paths between linked posts. This post is certainly foundational to the site (it's this one) – I linked to it from many posts over the years, and those linking posts were also full of links to other posts. Additionally, its sheer age (circa 2007) make it more likely to appear in more paths.



My question



Have I accomplished anything here that I couldn't have accomplished my just counting inbound and outbound links from my link table?







graph-theory network






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Mar 25 at 20:07









Wrzlprmft

3,26911335




3,26911335










asked Mar 25 at 19:32









DeaneDeane

1235




1235











  • $begingroup$
    As a complete sidenote: Do you really need to remove content? What do you actually gain with this? E.g., if the content you remove is clogging searches or similar, there are almost certainly better ways to deal with this. Link rot is one of the plagues of the Internet.
    $endgroup$
    – Wrzlprmft
    Mar 25 at 20:36










  • $begingroup$
    I agree, which is why I'm a huge believer in 410 Gone over 404 Not Found. I've already removed several posts which were causing problems. Being a blog, I lot of my older posts are just links to things that don't exist anymore, so I'm part of the link rot, I just need to break the chain earlier.
    $endgroup$
    – Deane
    Mar 25 at 21:00
















  • $begingroup$
    As a complete sidenote: Do you really need to remove content? What do you actually gain with this? E.g., if the content you remove is clogging searches or similar, there are almost certainly better ways to deal with this. Link rot is one of the plagues of the Internet.
    $endgroup$
    – Wrzlprmft
    Mar 25 at 20:36










  • $begingroup$
    I agree, which is why I'm a huge believer in 410 Gone over 404 Not Found. I've already removed several posts which were causing problems. Being a blog, I lot of my older posts are just links to things that don't exist anymore, so I'm part of the link rot, I just need to break the chain earlier.
    $endgroup$
    – Deane
    Mar 25 at 21:00















$begingroup$
As a complete sidenote: Do you really need to remove content? What do you actually gain with this? E.g., if the content you remove is clogging searches or similar, there are almost certainly better ways to deal with this. Link rot is one of the plagues of the Internet.
$endgroup$
– Wrzlprmft
Mar 25 at 20:36




$begingroup$
As a complete sidenote: Do you really need to remove content? What do you actually gain with this? E.g., if the content you remove is clogging searches or similar, there are almost certainly better ways to deal with this. Link rot is one of the plagues of the Internet.
$endgroup$
– Wrzlprmft
Mar 25 at 20:36












$begingroup$
I agree, which is why I'm a huge believer in 410 Gone over 404 Not Found. I've already removed several posts which were causing problems. Being a blog, I lot of my older posts are just links to things that don't exist anymore, so I'm part of the link rot, I just need to break the chain earlier.
$endgroup$
– Deane
Mar 25 at 21:00




$begingroup$
I agree, which is why I'm a huge believer in 410 Gone over 404 Not Found. I've already removed several posts which were causing problems. Being a blog, I lot of my older posts are just links to things that don't exist anymore, so I'm part of the link rot, I just need to break the chain earlier.
$endgroup$
– Deane
Mar 25 at 21:00










1 Answer
1






active

oldest

votes


















1












$begingroup$

There is an entire armada of centrality measures for networks.
The one you reïnvented (unless I am misunderstanding something) is called the betweenness centrality.
The number of outgoing or ingoing links is called degree.



Which centrality is better suited for you depends a lot on your question and data, and even for a given scenario, there is no clear way to say which centrality is best.
On the other hand, for many centralities will arrive at the same most important nodes for many networks, so it does not really matter which one you choose.



However, as your goal is not exactly finding the most important nodes, but pruning least important ones, many centrality measures and details thereof are irrelevant for you.
Also, you have the problem that pruning changes your network and thus many centrality measures.



Instead, I would probably follow an iterative approach and remove nodes that fail to meet some minimal relevance criteria, which do not improve through pruning, for example:



  • Remove nodes without any link pointing to them (zero in-degree). By contrast, you probably do not want to remove nodes whose only connection to the rest of the network is a link from a very important node to them.

  • Remove small clusters of posts that have no ingoing connections from the rest of the network.

  • Remove (most of) long one-way “culs-de-sac”, i.e., outgoing chains of nodes looking like this: •→•→•→•→•→•→• (with the last node having no outgoing links at all).

Of course, what your criteria are strongly depends on what you wish to achieve and how much content you wish to get rid off.






share|cite|improve this answer









$endgroup$












  • $begingroup$
    I'm going to pick this is the answer because you validated that my analysis is a real thing. "Betweenness Centrality" is exactly what I did, accidentally as it was.
    $endgroup$
    – Deane
    Mar 26 at 15:29










  • $begingroup$
    I got to wondering if my results would change if the graph were undirected. I did one-way links because hyperlinks are one-way. But what if I made the links two-way? Would that change my results markedly?
    $endgroup$
    – Deane
    Mar 26 at 17:49










  • $begingroup$
    @Deane: It’s very likely that results would change. Note that I recommend against this for practical reasons: You are throwing away relevant information. Also see my first bullet point.
    $endgroup$
    – Wrzlprmft
    Mar 26 at 18:31











Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3162226%2fis-counting-the-frequency-a-vertex-appears-in-paths-between-all-vertexes-a-valid%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1












$begingroup$

There is an entire armada of centrality measures for networks.
The one you reïnvented (unless I am misunderstanding something) is called the betweenness centrality.
The number of outgoing or ingoing links is called degree.



Which centrality is better suited for you depends a lot on your question and data, and even for a given scenario, there is no clear way to say which centrality is best.
On the other hand, for many centralities will arrive at the same most important nodes for many networks, so it does not really matter which one you choose.



However, as your goal is not exactly finding the most important nodes, but pruning least important ones, many centrality measures and details thereof are irrelevant for you.
Also, you have the problem that pruning changes your network and thus many centrality measures.



Instead, I would probably follow an iterative approach and remove nodes that fail to meet some minimal relevance criteria, which do not improve through pruning, for example:



  • Remove nodes without any link pointing to them (zero in-degree). By contrast, you probably do not want to remove nodes whose only connection to the rest of the network is a link from a very important node to them.

  • Remove small clusters of posts that have no ingoing connections from the rest of the network.

  • Remove (most of) long one-way “culs-de-sac”, i.e., outgoing chains of nodes looking like this: •→•→•→•→•→•→• (with the last node having no outgoing links at all).

Of course, what your criteria are strongly depends on what you wish to achieve and how much content you wish to get rid off.






share|cite|improve this answer









$endgroup$












  • $begingroup$
    I'm going to pick this is the answer because you validated that my analysis is a real thing. "Betweenness Centrality" is exactly what I did, accidentally as it was.
    $endgroup$
    – Deane
    Mar 26 at 15:29










  • $begingroup$
    I got to wondering if my results would change if the graph were undirected. I did one-way links because hyperlinks are one-way. But what if I made the links two-way? Would that change my results markedly?
    $endgroup$
    – Deane
    Mar 26 at 17:49










  • $begingroup$
    @Deane: It’s very likely that results would change. Note that I recommend against this for practical reasons: You are throwing away relevant information. Also see my first bullet point.
    $endgroup$
    – Wrzlprmft
    Mar 26 at 18:31















1












$begingroup$

There is an entire armada of centrality measures for networks.
The one you reïnvented (unless I am misunderstanding something) is called the betweenness centrality.
The number of outgoing or ingoing links is called degree.



Which centrality is better suited for you depends a lot on your question and data, and even for a given scenario, there is no clear way to say which centrality is best.
On the other hand, for many centralities will arrive at the same most important nodes for many networks, so it does not really matter which one you choose.



However, as your goal is not exactly finding the most important nodes, but pruning least important ones, many centrality measures and details thereof are irrelevant for you.
Also, you have the problem that pruning changes your network and thus many centrality measures.



Instead, I would probably follow an iterative approach and remove nodes that fail to meet some minimal relevance criteria, which do not improve through pruning, for example:



  • Remove nodes without any link pointing to them (zero in-degree). By contrast, you probably do not want to remove nodes whose only connection to the rest of the network is a link from a very important node to them.

  • Remove small clusters of posts that have no ingoing connections from the rest of the network.

  • Remove (most of) long one-way “culs-de-sac”, i.e., outgoing chains of nodes looking like this: •→•→•→•→•→•→• (with the last node having no outgoing links at all).

Of course, what your criteria are strongly depends on what you wish to achieve and how much content you wish to get rid off.






share|cite|improve this answer









$endgroup$












  • $begingroup$
    I'm going to pick this is the answer because you validated that my analysis is a real thing. "Betweenness Centrality" is exactly what I did, accidentally as it was.
    $endgroup$
    – Deane
    Mar 26 at 15:29










  • $begingroup$
    I got to wondering if my results would change if the graph were undirected. I did one-way links because hyperlinks are one-way. But what if I made the links two-way? Would that change my results markedly?
    $endgroup$
    – Deane
    Mar 26 at 17:49










  • $begingroup$
    @Deane: It’s very likely that results would change. Note that I recommend against this for practical reasons: You are throwing away relevant information. Also see my first bullet point.
    $endgroup$
    – Wrzlprmft
    Mar 26 at 18:31













1












1








1





$begingroup$

There is an entire armada of centrality measures for networks.
The one you reïnvented (unless I am misunderstanding something) is called the betweenness centrality.
The number of outgoing or ingoing links is called degree.



Which centrality is better suited for you depends a lot on your question and data, and even for a given scenario, there is no clear way to say which centrality is best.
On the other hand, for many centralities will arrive at the same most important nodes for many networks, so it does not really matter which one you choose.



However, as your goal is not exactly finding the most important nodes, but pruning least important ones, many centrality measures and details thereof are irrelevant for you.
Also, you have the problem that pruning changes your network and thus many centrality measures.



Instead, I would probably follow an iterative approach and remove nodes that fail to meet some minimal relevance criteria, which do not improve through pruning, for example:



  • Remove nodes without any link pointing to them (zero in-degree). By contrast, you probably do not want to remove nodes whose only connection to the rest of the network is a link from a very important node to them.

  • Remove small clusters of posts that have no ingoing connections from the rest of the network.

  • Remove (most of) long one-way “culs-de-sac”, i.e., outgoing chains of nodes looking like this: •→•→•→•→•→•→• (with the last node having no outgoing links at all).

Of course, what your criteria are strongly depends on what you wish to achieve and how much content you wish to get rid off.






share|cite|improve this answer









$endgroup$



There is an entire armada of centrality measures for networks.
The one you reïnvented (unless I am misunderstanding something) is called the betweenness centrality.
The number of outgoing or ingoing links is called degree.



Which centrality is better suited for you depends a lot on your question and data, and even for a given scenario, there is no clear way to say which centrality is best.
On the other hand, for many centralities will arrive at the same most important nodes for many networks, so it does not really matter which one you choose.



However, as your goal is not exactly finding the most important nodes, but pruning least important ones, many centrality measures and details thereof are irrelevant for you.
Also, you have the problem that pruning changes your network and thus many centrality measures.



Instead, I would probably follow an iterative approach and remove nodes that fail to meet some minimal relevance criteria, which do not improve through pruning, for example:



  • Remove nodes without any link pointing to them (zero in-degree). By contrast, you probably do not want to remove nodes whose only connection to the rest of the network is a link from a very important node to them.

  • Remove small clusters of posts that have no ingoing connections from the rest of the network.

  • Remove (most of) long one-way “culs-de-sac”, i.e., outgoing chains of nodes looking like this: •→•→•→•→•→•→• (with the last node having no outgoing links at all).

Of course, what your criteria are strongly depends on what you wish to achieve and how much content you wish to get rid off.







share|cite|improve this answer












share|cite|improve this answer



share|cite|improve this answer










answered Mar 25 at 20:32









WrzlprmftWrzlprmft

3,26911335




3,26911335











  • $begingroup$
    I'm going to pick this is the answer because you validated that my analysis is a real thing. "Betweenness Centrality" is exactly what I did, accidentally as it was.
    $endgroup$
    – Deane
    Mar 26 at 15:29










  • $begingroup$
    I got to wondering if my results would change if the graph were undirected. I did one-way links because hyperlinks are one-way. But what if I made the links two-way? Would that change my results markedly?
    $endgroup$
    – Deane
    Mar 26 at 17:49










  • $begingroup$
    @Deane: It’s very likely that results would change. Note that I recommend against this for practical reasons: You are throwing away relevant information. Also see my first bullet point.
    $endgroup$
    – Wrzlprmft
    Mar 26 at 18:31
















  • $begingroup$
    I'm going to pick this is the answer because you validated that my analysis is a real thing. "Betweenness Centrality" is exactly what I did, accidentally as it was.
    $endgroup$
    – Deane
    Mar 26 at 15:29










  • $begingroup$
    I got to wondering if my results would change if the graph were undirected. I did one-way links because hyperlinks are one-way. But what if I made the links two-way? Would that change my results markedly?
    $endgroup$
    – Deane
    Mar 26 at 17:49










  • $begingroup$
    @Deane: It’s very likely that results would change. Note that I recommend against this for practical reasons: You are throwing away relevant information. Also see my first bullet point.
    $endgroup$
    – Wrzlprmft
    Mar 26 at 18:31















$begingroup$
I'm going to pick this is the answer because you validated that my analysis is a real thing. "Betweenness Centrality" is exactly what I did, accidentally as it was.
$endgroup$
– Deane
Mar 26 at 15:29




$begingroup$
I'm going to pick this is the answer because you validated that my analysis is a real thing. "Betweenness Centrality" is exactly what I did, accidentally as it was.
$endgroup$
– Deane
Mar 26 at 15:29












$begingroup$
I got to wondering if my results would change if the graph were undirected. I did one-way links because hyperlinks are one-way. But what if I made the links two-way? Would that change my results markedly?
$endgroup$
– Deane
Mar 26 at 17:49




$begingroup$
I got to wondering if my results would change if the graph were undirected. I did one-way links because hyperlinks are one-way. But what if I made the links two-way? Would that change my results markedly?
$endgroup$
– Deane
Mar 26 at 17:49












$begingroup$
@Deane: It’s very likely that results would change. Note that I recommend against this for practical reasons: You are throwing away relevant information. Also see my first bullet point.
$endgroup$
– Wrzlprmft
Mar 26 at 18:31




$begingroup$
@Deane: It’s very likely that results would change. Note that I recommend against this for practical reasons: You are throwing away relevant information. Also see my first bullet point.
$endgroup$
– Wrzlprmft
Mar 26 at 18:31

















draft saved

draft discarded
















































Thanks for contributing an answer to Mathematics Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3162226%2fis-counting-the-frequency-a-vertex-appears-in-paths-between-all-vertexes-a-valid%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Lowndes Grove History Architecture References Navigation menu32°48′6″N 79°57′58″W / 32.80167°N 79.96611°W / 32.80167; -79.9661132°48′6″N 79°57′58″W / 32.80167°N 79.96611°W / 32.80167; -79.9661178002500"National Register Information System"Historic houses of South Carolina"Lowndes Grove""+32° 48' 6.00", −79° 57' 58.00""Lowndes Grove, Charleston County (260 St. Margaret St., Charleston)""Lowndes Grove"The Charleston ExpositionIt Happened in South Carolina"Lowndes Grove (House), Saint Margaret Street & Sixth Avenue, Charleston, Charleston County, SC(Photographs)"Plantations of the Carolina Low Countrye

random experiment with two different functions on unit interval Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 00:00UTC (8:00pm US/Eastern)Random variable and probability space notionsRandom Walk with EdgesFinding functions where the increase over a random interval is Poisson distributedNumber of days until dayCan an observed event in fact be of zero probability?Unit random processmodels of coins and uniform distributionHow to get the number of successes given $n$ trials , probability $P$ and a random variable $X$Absorbing Markov chain in a computer. Is “almost every” turned into always convergence in computer executions?Stopped random walk is not uniformly integrable

How should I support this large drywall patch? Planned maintenance scheduled April 23, 2019 at 00:00UTC (8:00pm US/Eastern) Announcing the arrival of Valued Associate #679: Cesar Manara Unicorn Meta Zoo #1: Why another podcast?How do I cover large gaps in drywall?How do I keep drywall around a patch from crumbling?Can I glue a second layer of drywall?How to patch long strip on drywall?Large drywall patch: how to avoid bulging seams?Drywall Mesh Patch vs. Bulge? To remove or not to remove?How to fix this drywall job?Prep drywall before backsplashWhat's the best way to fix this horrible drywall patch job?Drywall patching using 3M Patch Plus Primer