一、什么是PageRank
PageRank,中文一般叫佩奇排名或网页排名,是利用网页简单的超链接来计算网页的分值,从而给网页进行排名的一种算法,以Google公司创办人Larry Page之姓来命名。Google用它来体现网页的相关性和重要性,在搜索引擎优化操作中是经常被用来评估网页优化的成效因素之一。
它的思想是模拟一个悠闲的上网者,上网者首先随机选择一个网页打开,然后在这个网页上呆了几分钟后,跳转到该网页所指向的链接,这样无所事事、漫无目的地在网页上跳来跳去,PageRank就是估计这个悠闲的上网者分布在各个网页上的概率。
二、neo4j实现PageRank算法例子:创建一个属性图
CREATE (home:Page {name:'Home'}), (about:Page {name:'About'}), (product:Page {name:'Product'}), (links:Page {name:'Links'}), (a:Page {name:'Site A'}), (b:Page {name:'Site B'}), (c:Page {name:'Site C'}), (d:Page {name:'Site D'}), (home)-[:LINKS {weight: 0.2}]->(about), (home)-[:LINKS {weight: 0.2}]->(links), (home)-[:LINKS {weight: 0.6}]->(product), (about)-[:LINKS {weight: 1.0}]->(home), (product)-[:LINKS {weight: 1.0}]->(home), (a)-[:LINKS {weight: 1.0}]->(home), (b)-[:LINKS {weight: 1.0}]->(home), (c)-[:LINKS {weight: 1.0}]->(home), (d)-[:LINKS {weight: 1.0}]->(home), (links)-[:LINKS {weight: 0.8}]->(home), (links)-[:LINKS {weight: 0.05}]->(a), (links)-[:LINKS {weight: 0.05}]->(b), (links)-[:LINKS {weight: 0.05}]->(c), (links)-[:LINKS {weight: 0.05}]->(d);
1:属性图如下
在这里插入图片描述
2:给这个属性图起一个名字
CALL gds.graph.create( 'myGraph', 'Page', 'LINKS', { relationshipProperties: 'weight' })
3:可以评估一下这个属性图所占内存
CALL gds.pageRank.write.estimate('myGraph', { writeProperty: 'pageRank', maxIterations: 20, dampingFactor: 0.85})YIELD nodeCount, relationshipCount, bytesMin, bytesMax, requiredMemory
4:现在可以调用GDS中的PageRank算法
CALL gds.pageRank.stream('myGraph')YIELD nodeId, scoreRETURN gds.util.asNode(nodeId).name AS name, scoreORDER BY score DESC, name ASC
5:还有stats、mutate和write执行模式,此处不在赘述 6:下面执行带权重的算法计算
CALL gds.pageRank.stream('myGraph', { maxIterations: 20, dampingFactor: 0.85, relationshipWeightProperty: 'weight'})YIELD nodeId, scoreRETURN gds.util.asNode(nodeId).name AS name, scoreORDER BY score DESC, name ASC
7:这里约束条件tolerance设为0.1,低于此值,算法结束,结果返回
CALL gds.pageRank.stream('myGraph', { maxIterations: 20, dampingFactor: 0.85, tolerance: 0.1})YIELD nodeId, scoreRETURN gds.util.asNode(nodeId).name AS name, scoreORDER BY score DESC, name ASC
8:阻尼系数dampingFactor:也就是概率值改变会引起中心度值不同,这里变为0.05,看起来和默认的0.85差不多,源于图节点比较少,还不足以引起质变
CALL gds.pageRank.stream('myGraph', { maxIterations: 20, dampingFactor: 0.05})YIELD nodeId, scoreRETURN gds.util.asNode(nodeId).name AS name, scoreORDER BY score DESC, name ASC
9:个性化PageRank算法,其实就是制定一个特定的起点,重点关注这个指定点的中心度
MATCH (siteA:Page {name: 'Site A'})CALL gds.pageRank.stream('myGraph', { maxIterations: 20, dampingFactor: 0.85, sourceNodes: [siteA]})YIELD nodeId, scoreRETURN gds.util.asNode(nodeId).name AS name, scoreORDER BY score DESC, name ASC
文章转载自Neo4j图数据库,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




