Back
SHARE
Filter information in a library

On the importance of filtering information

Share on linkedin
Share on facebook
Share on twitter

In my second post in this series on information management, I will explore filtering as an important part of making knowledge flows valuable for individuals and enterprises (including cross-enterprise communities). This follows my previous post which reviewed articles on the value of knowledge flows for individuals and the enterprise including their potential “dark side.” I will also discuss how properly designing and using filters can counter the dark side of knowledge flows.

Old problem, new scale, filter design and forms of filtering

In a 2008 interview, Clay Shirky contended that “there is no such thing as information overload, there’s only filter failure.”  He makes the valid point that every media distribution innovation created an “information overload” problem starting as early as with the library of Alexandria established circa 250 BC. That information overload problem continued with the emergence of printing at the end of the 15th century in Venice, and has only accelerated in our modern day with technology and the internet.  Shirky doesn’t deny the new scale of the information firehose problem, and actually sees at-scale filtering as a potential solution for tackling it (“the only group that can catalog everything is everybody”).  Ultimately, Shirky believes that designing better filters is how you tackle the growing firehose. 

The Great Library of Alexandria by O. Von Corven - Tolzmann, Don Heinrich, Alfred Hessel and Reuben Peiss
The Great Library of Alexandria by O. Von Corven – Tolzmann, Don Heinrich, Alfred Hessel and Reuben Peiss

JP Rangaswami in his wonderfully titled “Confused of Calcutta” blog committed in 2013 to writing about filtering and produced six must-read blog posts on filtering. The blog is no longer online, but I was compelled to save some of this valuable work (and I’m glad I did). In the first post of the series, “Filtering: Seven Principles,” Rangaswami outlines what he calls “starter set” principles: 

1. Filters should be built such that they are selectable by subscriber, not publisher.

2. Filters should intrinsically be dynamic, not static.

3. Filters should have inbuilt “serendipity” functionality.

4. Filters should be interchangeable, exchangeable, even tradeable.

5. The principal filters should be by choosing a variable and a value (or range of values) to include or exclude.

6. Secondary filters should then be about routing.*

7. Network-based filters, “collaborative filtering,” should then complete the set.

However quickly digital media is moving, these seven principles are as valuable today as they were then, and I predict that they will stand for a good while longer as all true principles should. 

Curation vs filtering information: working with the dark side of algorithmic filtering

One of the questions in the comment section of JP Rangaswami’s series asks about the difference between machine filtering and human filtering. He answered with one of his mantras: “machines filter and humans curate.” Tim Kastelle’s blog looking at Five forms of filtering makes a similar distinction. According to Kastelle, “the five forms of filtering break into two categories: judgment-based or mechanical.”  The three judgment-based ones are naive filtering, expert filtering and network filtering, all of which are human-centric. The two mechanical ones are algorithmic and heuristic, which are both machine-centric. Kastelle acknowledges that as the firehose becomes larger, humans (even with the aid of human networks and expert networks) will be unable to cope without resorting to the algorithm and the machine. 

I think it’s really important at this stage to fully understand and emphasize Rangaswami’s first principle which is a principle of human choice: machine filters should be human-selected as part of curation and they should not be applied unbeknownst to the user. It’s all the more important to point this out given that over the past several years we’ve had a number of researchers and experts warning about algorithmic filtering leading to the formation of epistemic bubbles and echo chambers (a recent article discussing the difference between the two by C. Thi Nguyen is worth a read).  

This alarming narrative in the media is combined with another alarming assessment of the unemployment outlook in the face of rapid progress in artificial intelligence and automation.  This combination creates a perfect storm for the return of the Luddites** (Shirky was rightly already critical of the Luddite mentality back in 2008) and may contribute to an apprehension about engaging positively with AI, which ultimately thwarts the ideal of objective engagement with AI. Such narratives fail to consider the value employees bring to the filtering process by applying judgment, sense-making and adaptation to the context and audience, as well as how they want to interface/augment their capability with automation in this activity (more on that later). 

I became aware of algorithmic filtering with Eli Pariser’s 2011 book The Filter Bubble (he had coined the term a year or so earlier). Pariser had started to notice that Google search results were personalized to his taste, as was his Facebook newsfeed. He rightly worried about the impact this personalization would have on one’s exposure to differing viewpoints and whether the lack of diverse insights would isolate individuals and groups in intellectual silos, creating dire consequences not only within enterprises but in terms of broader civic discourse in society as well.   

The extent and potential impact of the civic discourse problem was explored by Zeynep Tufekci in the August 2014 blog post  What happens to #Ferguson affects Ferguson. Ferguson is a town in Missouri, and #Ferguson was a hashtag used widely on social media referring to an incident in the town where an unarmed black man named Michael Brown was shot by a police officer.  According to some witnesses at the time, Michael Brown died with his hands in the air shouting “don’t shoot,” and this early narrative led to a firestorm of protests and controversy (there has since been video evidence that contradicts this witness account, but it was not made known to the wider public until much later). Tufekci became aware of the fact that during the tumultuous early days when #Ferguson was trending on Twitter, she had seen no mention of it in her personal Facebook newsfeed. This seemed significant given that huge swaths of Americans get their news from social media. According to Pew Research, 36% of Americans get their news from Facebook, 23% from YouTube, 15% from Twitter, and 11% from Instagram, and all of these social media platforms use algorithms to determine what content to present to each individual user. Tufekci’s point—that algorithms have consequences—was perfectly illustrated by the way the high-profile and much talked about Ferguson event was shown to some people and virtually hidden from others by platforms that people rely upon for their news.  She concluded with the following: 

“But keep in mind, Ferguson is also a net neutrality issue. It’s also an algorithmic filtering issue. How the internet is run, governed and filtered is a human rights issue.”

Both Pariser and Tufekci had convincingly shown the “dark side” of algorithmic filtering. However, it is equally important to highlight that the publisher-controlled algorithmic filtering they were warning about broke two of Rangaswami’s filtering principles:  Facebook was applying publishing-level filters (breach of Principle 1) and stifling serendipity (breach of Principle 3). In my mind, the question is more about agency towards computational filtering, particularly as the firehose problem gets beyond individual or expert network capabilities and while AI becomes more recursive (and better) in its latest unsupervised learning advances.   

Choosing your computation filters (and having a diverse enough choice) is going to be a skill and task that employees will have to practice and learn in order to improve their filtering.  This requires judgment and work, but it’s necessary for benefiting from advances in computational filtering and staying relevant in your work. 

Why filter information?

As eloquently explained in the third episode of the Copyblogger podcast series dedicated to curation, one reason is because the ability of successful professionals to curate ideas from multiple sources is what separates them from everyone else. What differentiates writers also applies more broadly –Jon Reed of Diginomica contends that the enterprise professional needs curation to establish topic authority while simultaneously bringing fresh ideas and content to the company. Curation is, at a basic level, essential to learning and core to solving complex problems. In that sense it is not surprising that academia was amongst the first to give itself the infrastructure and tools to filter its knowledge base and to understand the value of filtering for the benefit of the overall community.   

The second macro reason for filtering is, again, the critical need to deal with that ever-expanding information firehose: as more humans and devices get connected online, more publishing happens, requiring more filtering as part of processing.  

In equal measure, as the rate of technological change accelerates, enterprises are becoming increasingly aware that the ability of their employees to aggregate and filter information (both internal and external) is key to driving innovation and avoiding the fate of former market leaders like Kodak and Blackberry. In his book Outside Insight: Navigating a World Drowning in Data, Jorn Lyseggen recounts the fall of Blackberry. In the first quarter of 2009, Blackberry controlled 55% of the professional US market and 20% globally. With the arrival of touch screen smartphones, Blackberry’s market lead was threatened, but they missed all the signs, and as a result, by the end of 2013, their global market share of the business segment had plummeted to 0.6%. 

In summary, in a world where more devices and people get connected and the cost of publishing content and notifications continues to decrease, filtering becomes a greater necessity for gaining relevant insights. This truth applies whether you’re an individual, a community, or an enterprise. The latter two will need a context-filtered information base from which to perform collaborative filtering and sense-making. Establishing a filtering set that avoids a garbage-in garbage-out impasse is absolutely critical. 

Agency and sensemaking

In an age of even more powerful algorithmic filtering, the importance and centrality of individual judgment must be emphasized in order to avoid the trap of epistemic bubble and echo chambers. In a recent Deloitte Insights report, John Hagel et al. encourages enterprises to foster their employees’ “enduring human capabilities” to create the new value that the market demands. The same principles apply to filtering information with humans firmly managing computation agency by designing the optimal interface with relevant filtering algorithms. This constitutes the beginning of sensemaking which I will cover in my next post.  

Learn more about Cronycle’s ability to discoverfiltermake sense of and share information. 

Footnotes:

*Routing is a particularly important topic in fostering adoption in enterprises and solving the “in the flow” part of ”learning in the flow of work.” I will return to it in a later post. 

**The term Luddite is now used to refer to someone who is opposed to new technologies or new ways of working. It originates from a secret oath-based organization of textile workers in England whose radical faction destroyed machinery in the 19th century in protest against industrialization of their craftsmanship. 

Discover the power of Cronycle for Teams

What to read next