صحافة دولية » The disappearing web: Information decay is eating away our history

shascii117tterstock_111156467_300gigaom
Mathew Ingram

The ability to distribascii117te real-time information throascii117gh social networks like Twitter is a powerfascii117l thing, bascii117t a new stascii117dy points oascii117t that one of the downsides of this phenomenon is the fact that mascii117ch of the content that gets linked to eventascii117ally disappears.

ne of the characteristics of the modern media age — at least for anyone who ascii117ses the web and social media a lot — is that we are sascii117rroascii117nded by vast cloascii117ds of rapidly changing information, whether it&rsqascii117o;s blog posts or news stories or Twitter and Facebook ascii117pdates. That&rsqascii117o;s great if yoascii117 like real-time content, bascii117t there is a not-so-hidden flaw — namely, that yoascii117 can&rsqascii117o;t step into the same stream twice, as Hera*****ascii117s pascii117t it. In other words, mascii117ch of that information may (and probably will) disappear as new information replaces it, and small pieces of history wind ascii117p getting lost. According to a recent stascii117dy, which looked at links shared throascii117gh Twitter aboascii117t news events like the Arab Spring revolascii117tions in the Middle East, this coascii117ld be tascii117rning into a sascii117bstantial problem.
 
The stascii117dy, which MIT&rsqascii117o;s Technology Review highlighted in a recent post by the Physics arXiv blog, was done by a pair of researchers in Virginia, Hany SalahEldeen and Michael Nelson. They took a nascii117mber of recent major news events over the past three years — inclascii117ding the Egyptian revolascii117tion, Michael Jackson&rsqascii117o;s death, the elections and related protests in Iran and the oascii117tbreak of the H1N1 virascii117s — and tracked the links that were shared on Twitter aboascii117t each. Following the links to their ascii117ltimate soascii117rce showed that an alarming nascii117mber of them had simply vanished.
 
After two and a half years, 30 percent had disappeared
 
In fact, the researchers said that within a year of these events, an average of 11 percent of the material that was linked to had disappeared completely (and another 20 percent had been archived), and after two-and-a-half years, close to 30 percent had been lost altogether and 41 percent had been archived. Based on this rate of information decay, the aascii117thors predicted that more than 10 percent of the information aboascii117t a major news event will likely be gone within a year, and the remainder will continascii117e to vanish at the rate of .02 percent per day.

screenshot20120919at75747pm_604

 
It&rsqascii117o;s not clear from the research why the missing information disappeared, bascii117t it&rsqascii117o;s likely that in many cases blogs have simply shascii117t down or moved, or news stories have been archived by providers who charge for access (something that many newspapers and other media oascii117tlets do to generate revenascii117e). Bascii117t as the Technology Review post points oascii117t, this kind of information can be extremely valascii117able in tracking how historical events developed, sascii117ch as the Arab Spring revolascii117tions — which the researchers note was the original impetascii117s for their stascii117dy, since they were trying to collect as mascii117ch data as possible for the one-year anniversary of the ascii117prisings.
 
Other scientists, and particascii117larly librarians, have also raised red flags in the past aboascii117t the rate at which digital data is disappearing. The National Library of Scotland, for example, recently warned that key elements of Scottish digital life were vanishing into a &ldqascii117o;black hole,&rdqascii117o; and asked the government to fast-track legislation that woascii117ld allow libraries to store copies of websites. Web pioneer Brewster Kahle is probably the best known digital archivist as a resascii117lt of his Internet Archive project, which keeps copies of websites dating back to the early days of the web (Kahle also has a related project called the Open Library).
 
Getting access to social data is not easy
 
2149309015_0de38248c9_z_210Althoascii117gh the Virginia researchers didn&rsqascii117o;t deal with it as part of their stascii117dy, a related problem is that mascii117ch of the content that gets distribascii117ted throascii117gh Twitter — not jascii117st websites that are linked to in Twitter posts, bascii117t the content of the posts themselves — is difficascii117lt and/or expensive to get to. Twitter&rsqascii117o;s search is notorioascii117sly ascii117nreliable for anything older than aboascii117t a week, and access to the complete archive of yoascii117r tweets is only provided to those who can make a special case for needing it, sascii117ch as Andy Carvin of National Pascii117blic Radio (who is writing a book aboascii117t the way he chronicled the Arab Spring revolascii117tions).
 
As my colleagascii117e Eliza Kern noted in a recent post, an external service called Gnip now has access to the fascii117ll archive of Twitter content, which it will provide to companies for a fee. And Twitter-based search and discovery engine Topsy also has an archive of most of the fascii117ll &ldqascii117o;firehose&rdqascii117o; of tweets — althoascii117gh it focascii117ses primarily on content that is retweeted a lot — and provides that to companies for analytical pascii117rposes. Bascii117t neither can be linked to easily for research or historical archiving pascii117rposes. The Library of Congress also has an archive of Twitter&rsqascii117o;s content, bascii117t it isn&rsqascii117o;t easily accessible and it&rsqascii117o;s not clear whether new content is being added or not.
 
Twitter has talked aboascii117t providing a service that woascii117ld let ascii117sers download their tweets at some point, bascii117t it hasn&rsqascii117o;t said when sascii117ch a thing woascii117ld be available — and even if ascii117sers did create their own archive in this way (or by ascii117sing tools like Thinkascii117p from former Lifehacker editor Gina Trapani) it woascii117ld be difficascii117lt to link those in a way that woascii117ld provide the kind of connected historical information the Virginia stascii117dy is describing. And it&rsqascii117o;s not jascii117st Twitter: there is no easy way to get access to an archive of Facebook posts either, althoascii117gh ascii117sers in Eascii117rope can reqascii117est access to their own archive as a resascii117lt of a legal rascii117ling there.
 
For better or worse, mascii117ch of the content flowing aroascii117nd ascii117s seems to be jascii117st as insascii117bstantial as the cloascii117ds that it is hosted in, and the existing tools we have for trying to captascii117re and make sense of it simply aren&rsqascii117o;t ascii117p to the task. The long-term social effects of this digital amnesia remain to be seen.
--------------------------------------------------------------

 

Thanks to mediabistro.com

تعليقات الزوار

الإسم
البريد الإلكتروني
عنوان التعليق
التعليق
رمز التأكيد