Is it link payload? Or something like a content or a set of features the link clicking web users reveal about themselves?
Having a tool in reach that might mine immediately processable content from the web, the reorganizer module of MOM, I keep wondering how to actually mine the web.
Just the minute, I am skimming a news web site that, on its overview page, provides the headlines of the articles only. Not the least preview, not even a snipped of text, hinting on what the linked article mght deal with, and where it might dig into the depth. So, a human can say: If you click on that link, you might be interested in the topic spotlighted by the headline. Or, since I know the sometimes crudely set up headlines, there's a chance you clicked only to get an idea, what the heck the article might deal with. There's also the chance you'd click any link accidentally, but let's skip that possibility for now.
What I noticed the minute before, when I was skimming that headlines list was that converting the headline's words to nouns (e.g. by stemming) might suffice to tag the links. Given the case people would click only links they'd be interested in, in the mirror, any such link clicked reveals the topics the user is interested in -- the tags peel off the link and adhese to the person who clicked that link. In other words: By clicking the link, the users tag themselves. -- Track, what the user clicks over time, and you'd get not only a cloud of tags which you can link to a user, but by actually linking them to the user, applying reorganization, it's simple to learn the interests of a user. Add counting of the -- no, not of the links, as you might do for plain web site statistics, but instead -- add counting of the tags the users tag themselves with, and you might get a rather specific profile of the user. -- Cover a broad cloud of topics, thus a broad cloud of tags, and your users' profiles would become even sharper.
And, in the back of my head, there's still Google's advertising system. If each page, Google puts ads on, has to be 'enriched' by a handful of tags, visiting that page, the users tag themselves with those tags. If Google manages to assign that set of tags to individually you, Google might have quite a good impression of your interests.
Updates: none so far
Content Representation With A Twist
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment