Cronycle Topics & Influencer Communities

Reading Time: 2 minutes

Cronycle is an information workflow application, powered by Right Relevance (subsidiary of Cronycle), which is a topical information search and relevance platform. Topics and Influencers (per topic) form the backbone of the search and relevance technology.

    1. Topics (over 50 thousand) including metadata like related topics & semantics like synonyms, acronyms.
  1. Topical influencers (over 2.5M) with score and rank.

Topics are identified by algorithmically mining over 10M unstructured documents on the web and leveraging Wikipedia and Right Relevance topical graph neighborhood techniques. Relationships and semantics are derived from this process with manual corrections and injections for the last mile.

Topical Influencers mining is fully algorithmic and primarily graph based. The methodology leverages ML, semantic analysis and NLP on unstructured data at scale and involves a 2-level proprietary people rank (custom page rank for social graphs):

Stage 1. Global PR to reduce a ~300M nodes graph to ~6M (for now) globally ranked influencers. This is a first level reduction and we don’t expose the scores. It doesn’t have topical context.

Stage 2. Graph partitioning of the ~6M connected nodes from stage 1 across our ~50K structured topic space using unstructured data assigned to each node. This leads to ~50K per topic sub-graphs, where a secondary PR is applied to determine the topic score for each node in each topical sub-graph. This secondary PR score is normalized to calculate the Right Relevance topic score and rank influencers for every structured topic in our platform. 

Our custom PR algorithm is derived from google pagerank but is specialized for social graphs (instead of links/webpages) with many important differences applicable to social networks.

The RightRelevance score of an expert/influencer for a TOPIC represents the authority within the topical community say for e.g. ‘machine learning’ of that influencer. This measure of influence per topic is termed as ‘topical influence’ and the topical communities formed are termed as “Tribes“.

Once we have the scored and ranked influencers’ community for a particular topic (e.g. machine learning, behavioral science, big data, emergency medicine, oil and gas, angularjs,  social media marketing etc.) we mine the web for content. The numeric influence from topics and influencers is inductively applied to this content for measuring relevance and forms a critical part of the search. We download ~600K articles daily from ~2M websites every month. Topical content and information are available in the form of articles, videos and conversations.

Points to note:

    • We dampen followers count, tweet count etc. noisy signals and lay much more focus on the topical network itself.
  • Each influencer can be part of multiple topical sub-graphs aka communities and have a different score, and rank, within each. This is exposed in our apps via scored tags.
  • Other, non structured, topics work via free-form search but the relevance may not be of the same quality. This can be seen by the score ’10’, which, probably poorly done, means we didn’t find a community for the topic.

Both topics and influencer graphs are mined and built algorithmically at scale with ever-increasing quality after every iteration.

2017 Insights Analysis – GDPR

Reading Time: 3 minutes

After four years of preparation and debate about GDPR, the EU Parliament approved the regulation in April 2016 to replace an outdated data protection directive from 1995. Today, we have five months to go until the enforcement deadline of General Data Protection Regulation (GDPR) in May 2018. At which, non-compliant organisations can face fines /penalties of up to €20 million or 4% of your global annual turnover, whichever is greater. Encase you are ever in doubt of the time frame, there is a live countdown timer on the EU GDPR website to remind you.

 

You may be wondering, why the regulation was agreed in the first place? There are two key takeaways as summarised by IT Pro

  • The EU wants to give people more control over how their personal data is used, bearing in mind that many companies like Facebook and Google swap access to people’s data for use of their services. The current legislation was enacted before the internet and cloud technology created new ways of exploiting data, and the GDPR seeks to address that. By strengthening data protection legislation and introducing tougher enforcement measures, the EU hopes to improve trust in the emerging digital economy.
  • Secondly, the EU wants to give businesses a simpler, clearer legal environment in which to operate, making data protection law identical throughout the single market (the EU estimates this will save businesses a collective €2.3 billion a year).

 

Our newest collaboration between Cronycle and Right Relevance means we can produce insights reports on hot topics to analyse the conversations at any point in time. As GDPR is a key focus for us (and others), we started with this and launched our report this week which you can view here.

Flock graph for GDPR Report 2017

Our report examines the all online conversations during the time period from November 15th to December 4th and along with Right Relevance topics, topical communities’ and articles data. All that data allows us to plot impressive graphs of interactions, with clear communities forming along the lines of nationality and business type. The pale blue cluster, for example, centres on the French data commissioner, CNIL: those accounts orbiting it include French firms and governmental departments.

 

Our overall findings are that the discussion about GDPR is driven by fear of failing to become compliant, across all kinds of users. Just a glance at our groupings of top trending terms can give a flavour of keywords, which focus on guides and webinars which provide clear guidance on compliance. Discussions about more the more positive side of GDPR, such as greater protection for user information or ethical innovation under the new regulations, appears to be less central at this time.

Using Right Relevance’s data, we can also produce a list of flocks: that’s those accounts which have the most influence in our specific period of research in our specific field. Rather than measuring long-term power, they’re instead a snapshot of the key players at a given moment. They included the British and French data commisioners (the ICO and CNIL), tech journalists, privacy experts like Max Schrems, and trade groups. Conspicuously missing from the table below? Members of Parliament from Britain or France, the countries from which most traffic on GDPR came.

What these flocks show is that it’s not just follower count which gives accounts importance: Laura Kayali (@LauKaya), a Brussels-based reporter, tops out our list but only has 1,524 followers compared to over 37,000 for the ICO (@ICOnews).

Our report also discusses important metrics which are often not covered elsewhere, such as betweenness centrality: how well does an account act as a node for the overall network? Whilst high page rank and betweenness centrality (being a connector here) can be interlinked, that’s not always the case: @LauKaya has a high page rank, but is not a key connector, for example.

 

Let us know if you have any thoughts or feedback as we are looking to produce a report on GDPR topic at least once a month to keep us all in the loop of conversations.

 

View the full report