The ephemeral nature of web and social media sites, especially of blogs (comments, online discussions etc.) leave them at substantial risk of being lost. Memory Institutions (libraries, museums, archives) and organisations are researching for ways to ensure long-term preservation and reuse of web content.
In Webternity we have the answer: we have developed an exciting system to harvest, preserve, manage and reuse web content. The system is performing an intelligent harvesting operation which retrieves and parses hypertext as well as all other associated content (images, linked files, etc.) from websites. The parsing action is able to render the captured content into structured data, expressed in XML; it does this in accordance with the our data model.
The result of this action is carving semantic entities out of web content on an unprecedented micro-level. Author names, comments, subjects, tags, categories, dates, links, and many other elements are expressed within a hierarchical structure. This content is imported into the Webternity repository (based on CERN’s Invenio platform), a public-facing web archiving mechanism which provides facilities to preserve, view, interrogate and reuse the content to an unprecedented degree of detail.
Our services include:
- Web archiving support and consultancy: We support our clients to create their own web and social media archiving centre by installing, customising and maintaining the Webternity platform.
- Cloud based web and social media archiving: Our clients are able to create and manage their own web archiving repositories through our online platform.
- On demand web content analytics: We support our clients to develop valuable insights from their repositories, on terms of their interest.
Visit http://webternity.eu to learn more about our web archiving solution.