Content Representation With A Twist

Thursday, June 21, 2007

Tagging The Links Might Reveal (The) Content Of Linked Web Pages

The 'link payload' -- tags assigned to a link, the users tag themselves with these tags when they click them, thus generating a cloud of tags of possible interests applying to the clicking individual user -- could be applied by a company that hosts its own site on its own server -- at least on a server it can set up the way they prefer.

In my previous example for such a link payload, I focused on a news company that provides a news overview consisting of nothing but the news articles headlines linking to the articles. That forces the users willing to read the article to click the link, thus tagging themselves by the tags assigned to that link. -- If the company provided the full text by RSS feed, they'd never learn the tag cloud the users would generate/reveal about themselves by clicking several such links.

Learning the interests of a current visitor in realtime might allow to pick more fitting ads to present.

Aside of that immediate advantage of tag based user tracking on a single site, what about the web? Aside of user tracking, a (tagged) headline link to a news article page reveals another particle of content, even without the chance to track the user at all: The link tags tag the linked page too. -- If there are multiple links pointing to the very page, a cloud of tags for that page cumulates. In other words, the page gets a content aside of the text written on that page: the tag cloud. Since that content is not present on that page, I'd call it content assigned to that page, not actually there. For short, maybe something like "content [assigned] to a page" instead of the familiar "content of/on a page".
 

One question is, whether content can be mined from the web immediately, without processing the text presented on the web. Mined in a way like determining tags for links, for users, for web pages.

One goal of processing the tag cloud assigned to a web page (or any other item, of course) might be to gather a MOM net, a condensed form of content. A multi-level directed graph storing distinct content by each of its nodes. I see, it might be helpful to go more into depth with this, explaining what a MOM net actually is and what it stores and how it does so.

I keep that in the back of my head for another post to come.

      
Updates:
none so far

No comments: