Content Representation With A Twist

Friday, June 29, 2007

New release of MOM SSC

New version is out!
  • Now with test cases in place for all classes of the framework,
  • heavily reworked HandledSet class and
  • documentation for HandledSet too -- with a peek to the principles of reorganization.
      
Updates:
none so far

Thursday, June 28, 2007

Why Do I Approach Developing MOM The Way Visible By MOM SSC?

The MOM Simple Set Core (MOM SSC) is the most recent implementation of MOM. MOM is a trinity of research, development and a project driving both of them ahead. In core, MOM is the Model of Meaning plus research based on that model, aiming at representing every kind of content bare of words and tagging, only based on graphs and bare input sensors, such as 'light given', 'oxygene here', 'soft ground'. -- However, since 'there is a red light under that passenger seat, calmly blinking' is a bit more complex content, and that content is not yet developed by graph, currently MOM accepts crutches -- labels or pieces of software that signal a certain event being given, e.g 'web browser cannot render that page correctly'. As MOM improves, such crutches shall get replaced by the more flexible (and error resistant) representation of content as offered by MOM.

There are several promised benefits of that. Getting content available without words implies the the chance to render content to any language of the world. Getting there without tagging implies the chance that the machine knows of the content represented -- instead of just dealing with it but remaining unaware of what it means. That in turn implies the chance to load content ("knowledge") into any sort of machines, such as traffic lights or vacuum cleaners or cars. Whereby to load the knowledge might be much a bit quicker but needing to train any sort of neuronal network AI. -- MOM is not after implementing any sorts of artificial intelligence but heads for getting the content available. Call it a [content-addressable] memory.

That error resistant representation of content beforementioned originates from another core part of MOM, the recognition. -- Yes, that's right. MOM found recognition to be a part of memory, not of any sorts of intelligence. It's an automatic process which, however, might be supportable by training [link: "is it learning?"]: weighting the graph's edges. [It's clear to me that humans can improve their recognition, but I am not sure whether the causes of learning equal those of improving the recognition abilities of a MOM net, hence the differentiation.] Core of MOM's recognition and cause for its error resistance is that while the MOM net defines every possible feature of an item, for recognition not every such one must be given, only a few. -- Which, by the way, matches a claim recently posted by Chris Chatham: Only a few of the features of a known item result in a correct recognition of that item because there are only the yet known items out there: To discern all the items being similar, you don't need that many different features. But wait the day you encounter an in fact new item! -- You'd get it wrong, in any case. Remember the days you were familar to dogs as the only kind of pet animals? Then, encountering the first pet cat, you likely named it 'dog', din't you? Same so for any kind of flip pictures, like the one you can either see a beautiful young woman in or a rather old one. -- To get back to Chatham: On the issue of change blindness he claimed "[...] the brain is 'offloading' its memory requirements to the environment in which it exists: why bother remembering the location of objects when a quick glance will suffice?"


Along with research, MOM is a project of development. I am used to program, hence cast MOM into software is the most clear way to go. MOM, casted to software, allows for verifying the model. Also, over time, a full implementation of MOM might result, hence achieve to get handy all the chances MOM offers.

For example, the MOM Simple Set Core (MOM SSC) originally was only after implementing the MOM net, i.e. the functionality to maintain (parts of) a MOM net in computer memory (RAM). That's overcome now. Now, going further ahead, MOM SSC aims at implementing the reorganizer. That's a share of MOM which shrinks the graph by kepping the same content -- yet even revealing content which was only implicit beforehand.

Former versions of MOM parts were implemented using Perl. For reasons of readability, for MOM SSC, Ruby was chosen. Since the theoretical work on the reorganizer it was clear, the reorganizer modifies the MOM net, hence challenges the strengths of the recognizer. To get able to make the recognizer perform well even on reorganized MOM nets, I now begun to implement the reorganizer. Having it in place, research on the recognition might go into depth. Especially since having a reorganizer in place implies to get enabled to automatically test quality of recognition: Recognition on the reorganized net should provide the same results as recognition performed on the original net. Fine part is, neither reorganization nor recognition need any labels for the nodes (i.e.: no mark-up/tagging).


Upcoming milestone of the MOM SSC sub-project might be to implement the core of the reorganizer, accompanied by full duck typing approach for the MOM SSC classes, or/and by fixing all the chances for improvement, which accumulated over time since the beginnings of MOM SSC. -- Core of the reorganizer is to detect and replace sub-networks of the MOM graph that occupy (far) more nodes/edges than necessary to represent a piece of content. The replace would be to reduce these sub-networks to just as many nodes/edges as actually needed to represent the content.

      
Updates:
none so far

Tuesday, June 26, 2007

Reorganizing Tags -- For What Benefit?

Having in sight to get over the core MOM reorganization obstacle and get reorganization implemented, as well as having noticed a possible benefit of having only//just//at least a reorganizer at hand (i.e. without any reorganizer) [aside of the benefit of becoming able to develop a more sophisticated recognizer then], I begun thinking about whether there might be a chance to make some profit by providing the MOM reorganizer as a web service.
 

Still unknowingly about any profitable such web service, I ended up with looking up 'tagging' in wikipedia. Which might be worth a read, same so for the German variant of that very article [for those of you comfortable with that language].

      
Updates:
none so far

Monday, June 25, 2007

homework to do: learn the vocabulary of neuro(-bio-)logy, provide reliable..rock-proof definitions

The recent posting on familiarity, recognition, creation of new neurons, their offshoot, self, and the brain causes me another set of things to do for a homework:
  • Get my reliable definitions for the topics I am dealing with, here, online, publicly. Such as for neuron, axon, cell division, dendrite.
  • Learn the proper vocabulary for the items I don't know by name, such as rank growth, what somebody is aware of, somewhat, knows that an item exists (if that's being called 'knowledge', that is rather too less discerning for my purposes), nerve cell core body and others.
And all that although the latest piece of homework is not yet finished

      
Updates:
none so far

questions regarding familiarity, recognition, creation of new neurons, their offshoot, self, and the brain

Saturday morning I awoke when I was scratching my head. I noticed the sound it made. I thought of something like -- Why does it make that sound it makes? That well known sound. Then I started to wonder -- which made me woke finally. Thinking is always a good setting [for me] to get ripped out of the sweetest (and the most horrible dreams), so this one was.

So, fine, scratching my head makes a sound. A familiar sound. One I know really good. Do I? That sound is so familiar I most often not even notice. -- That was what I noticed next: Why didn't I notice it so far? How many times may I have scratched my head up to now? And only now I ask that question. Curious.

Might it be that as soon as we are familiar to a situation/thing we stop asking further questions on that matter? Might this be the cause for why children [apparently] ask about everything? Is their familiarity [with the world] so sparse that recognition can not kick in? Or might it be, recognition itself results in too vague results [for the children]: i.e. results in 1..many nodes which get stimulated to a similar degree, thus automatic ("intuitive") recognizing, that results in a single most probable [represented] item recognized, cannot take place? Therefore, the child has to find that single most probable item consciously, actively? They support recognition by asking grown-ups? And by that support they make a distinct edges become weighted as more important? [I assume, that equals <learning>. The body is able to move a lid or a leg by a pulse of a nerve -- why not move or even grow a neuron's dendrite or axon by basic will?]

If the child, by the approach to weight single edges more important, does not achieve the wanted result, maybe because, after a while, all the edges get weighted equally again [hence the confusion gets as strong as when it was the time before weighting at all], what happens then? Does the child decide -- read: does the child decide, as well as: does the child decide -- one or more new neurons to create?

Or gets this decision made by "the brain"? Or does it cause the creation of new nerve cells without any kind of decision-making, i.e. automatically? Or is it just any single nerve cell which initiates cell division? Or is it not even that single neuron which 'initiates' cell division but plainly begins to divide itself, caused by any external conditions, e.g. biological or chemical ones, which in turn might get caused because there is an obviously needed nerve cell not in place? Might these biological or chemical conditions get caused because neighbouring cells feel some stress and excrete some hormones?

Or might be the reason for new neurons to be created be caused by any neuro biological condition, though? Maybe because nerve cells divide when any of their offshoots -- axons, dendrites -- grew a "too large" tree//braid//knop? And, this rank growth divides itself from the remainder of the very nerve cell?

Or might it be that at some time there's no place left over on the main body of a neuron where any other neurons immediately can dock to, hence dendrites get started to grow? Or the docking nerve cells begin to grow axons, since these might fit between all the other dockers? Or is it that way, the nerve cell gets divided when there's no place left over on the core body of it?
 

PS.: I doubt there is any bird's view instance which decides whether or not to set up any new edge or cell (node). In other words, I doubt "the brain" decides that..anything at all what takes place within the brain itself//brain body, i.e. I doubt there is any other instance in brain but 'self' that makes any decisions regarding brain itself.

      
Updates:
none so far

Sunday, June 24, 2007

Made the ancient postings accessibly by tags

Tagged the early postings of this blog to let it make provide more benefit, e.g. by enabling every reader to pick up topics by tags -- and then get all the posting on that very topic I posted here, so far.

Positive side-effect of that effort is that also for me, in future, it might become more easy for me to set up backward references to earlier posted articles and yet tackled subjects. Such as for the still under-constructional posting on obstacles of traditional notion organizing systems MOM overcomes.


      
Updates:
none so far

Weaknesses of Traditional Notion Organizing Systems

Weaknesses of traditional notion organizing systems:
  • Traditional notion organizing systems aim at providing experts --
    1. people who already know the professional vocabulary and the items the vocabulary refers to
    2. people who are trained in using notion organizing systems
    -- with a tool to find matching terms.
  • The concept of notion traditional notion organizing systems make use of implies the real item too. Hence, in reality, notion organizing systems aren't organizing notions only, but also the items they refer to. -- That might be a reason for the omnipresent preference is a relationships over has a ones:
  • Traditional notion organizing systems prefer is a relationships over has a ones: A car is a motorized vehicle, but a motorized vehicle is clearly not a part of a car. Therefore, a has a relationship to be established from car to motorized vehicle is wrong.
So, because traditional notion organizing system presuppose the notion to imply the matter item too, they at least require two different kinds of relationships to organize a professional vocabulary: the yet mentioned is a and has a relationships.

In my opinion, they are wrong about when//with implying the item to the notion instead of just dealing with the idea of the item and vocabulary for the item: The idea, i.e. the immaterial variant, of a motorized vehicle of course is a part of, or at least is included to, the idea of a car.


      
Updates:
none so far

Friday, June 22, 2007

"big wet transistors" and "spaghetti wiring"

Doing the homework I caused myself, now have to do sent me back to the "10 Important Differences Between Brains and Computers" article of Chris Chatham I cross-read earlier today. His article reader Jonathan points out several weaknesses in Chris Chatham's argumentation. Although I consider him mostly right with his objections, I consider it nitpicking, mostly. In the end, I don't see the point he's about to make. Jonathan's arguing "[...] there must be some level of modularity occurring in the brain. My gut instinct is telling me here that a brain based completely on spaghetti wiring just wouldn't work very well..." obviously takes not into consideration that the single neurons themselves might be the entities of the brain that do the processing and that do constitute memory -- memory and processing in once. On this point, I am far from his arguments.

Another interesting point the reader Kurt van Etten puts into the round: "[...] (I do think a lot of writers equate neurons with big wet transistors)," -- Hm, I learned electrotechnics when in IT support assistant education, and every now and then I ponder about how to cast MOM nodes into hardware, but when doing so, I primarily think of the content storable by such a node. That I might make use of a transistor for that is a negligibility. -- I didn't think so far yet, but I don't presume to cast a MOM node into hardware to utilize transistors might be the only way. Anyway, interesting to learn like what the major part of people occupied by the topic might imagine a single neuron. ... Right now, I think that imagination might be a bit too simplified and doing so might lack this or that important property of a real neuron, hence anyone reducing their imagination of a single neuron to that simplicity might miss this or that important condition or might fail to get this or that insight, just because of a too restricted ("simplified") look at the matter.
 

... Well, I got up to comment number 18, but that one might need some deeper consideration. Hence, now I make a break and might continue with pondering that #18 comment later.

During reading the comments I opened some more links provided there, mostly by the commenters` names linking to these sites:
      
Updates:
20070623.12-42h CEST: added a headline to the posting

Homework to do

My yesterday findings result in a lot of homework to do.

First of all, as a left-over of a former post I have to make clear what a MOM net is, what it stores and how it does so. The recent posting on the issue about finding a heater repair part by using a thesaurus is a first step there.

Second, Chris Chatham's posting on differences between brain and computers I reviewed. But I missed that there's a lot of reader comments I didn't read yet. A to do. Also the insight 'content representation' still seems to refer to marking up content by key words or thesaurus terms demands to make clear my point of view on that topic, and why I chose the term content representation rather than any other. And, in turn, what I refer to by "marking up content by key words" demands for a explanation. Things to do.

Third, what I already begun, is to tidy up the blog. I think posting might be much more useful if sudden readers can dive in at any point, without needing to know what I wrote about before. Therefore, from now on, key concepts shall be cleared tersely (hence the heater posting) by separate postings. And any new postings referring to these concepts shall do point there instead of going into detail every new posting -- which might disturb too much -- you as well as myself, when developing a thought.

Let`s see how it works out.

      
Updates:
none so far

An Issue about Getting a Replacement for a Heater Part by Using a Thesaurus

One of the most ancient questions that led to MOM was the issue a former information science teacher of mine presented by one of the lectures he gave to us: He drafted the case someone had an issue with their heater and had to find the replacement part by using a thesaurus. Actually the task was the guy should look for the matching word in the thesaurus to order the part by mail.
 

Simply said, a thesaurus is a vocabulary with the aim to order that vocabulary. It helps experts to find the right words. Likely the 'thesaurus' embedded to any major text writing program does to lay people. The thesaurus relates the items it deals with to each other to bring them into a hierarchy: A mouse is a mammal, and a mammal is a vertebrate. The cat is a mammal too, so the cat node is placed next to the mouse one. I tiger is a cat too, as well as the lion and the panther, so they get placed as childs of the cat.

The core item the thesaurus deals with is a trinity of item, word for that item and thought item/imagination of the item. It's called a notion. The notion is an abstraction for an item, thus in every case is immaterial. Related to the state of matter the notion focuses on the imaginary part but never lets go the material part out of sight or the word. The thesaurus orders its words by looking at the words and the real items.

In a first step, words synonymous to each other get collected to a set. The most familiar one of those synonyms get picked and becomes declared to be the descriptor of of that set. If the descriptor is referred to, implicitly the whole set is meant -- or, thought-with.

In a second step, the thesaurus goes ahead to order the items, identified by their descriptors. Most widely used relationship between descriptors might be the is a relationship, just as demonstrated above: The cat is a mammal, the mammal is a vertebrate, and so on. Alternatively, another common relationship applied by thesauri is the has a relationship. It states that the vertebrate has a spine, and a human has a head, has a torso, has a pair of arms and also has a pair of legs and feet. However, the is a relationship is much more used but the has a one. Reason for that might be that the resulting network of relationships and descriptors would quickly become a rather dense graph, hardly to maintain.

Aside of these two kinds of relationship any creator of a thesaurus can set up any kind of relationship they might imagine. Such as associative relationships relating related notions to each other that cannot be brought together by using any other kind of relationships, such as cat and cat food. The available relationships may vary from thesaurus to thesaurus, as their developers might have chosen different kinds of relationships to use.
 

For the case of the heater, we assumed a thesaurus effectively consisting of is a relationships only, since that seems to be the most common set up of a thesaurus.

A thesaurus consisting of is a relationships only helps an expert to quickly find the words they already know. On the other hand, a lay person usually gets stuck in that professional slang rather quickly as they get unable to discern the one notion from the other. Thesauri traditionally don't aim at assisting lay people, so the definitions they provide for descriptors are barely more but a reminder -- as said, for something the thesaurus developers assume the user already knows. If the thesaurus provides that definition text at all. During my course of studies I learned, thesauri resemble deserts of words, providing definitions as rarely as deserts have oases.

So, the answer to the heater question is: Restricted to a thesaurus, the guy won't find the replacement part for his heater.

And the amazing part my information science teacher pointed to too, was that going to the next heating devices shop would solve the problem within a minute -- it would suffice if the guy would describe the missing part by its look.
 

That miracle stuck with me. I came to the point to wonder why not to set up kind of a thesaurus that would prefer has a over is a relationships and asked the teacher about that. I was pointed to issues of how to put that into practice? How to manage that heavily wired graph?
 

Well, that's a matter of coping with machines, so I went along, although I wasn't about to get any support from that teacher. On the other hand, I was familiar to programming since 1988 -- so what? ... And over time, MOM evolved.
 
 
Update: During tagging all the rather old postings which were already in this blog when blogger.com didn't offer post tagging yet, I noticed I presented another variant of the issue earlier, then related to a car replacement part.

      
Updates:
20070624: added reference to the car repair example

Some articles on content representation

Since nearing the first mile stone of MOM SSC, I thought, it might make some sense to connect to others occupied with content representation. I technoratied for "content representation" (including the quotation marks) and found several postings aparently totally irrelated to content representation. Also, today "content representation" seems to primarily mean "mark up", e.g. by terms provided by a thesaurus or the like. However, I found one attracting me, pointing to another one which in turn pointed me to 10 Important Differences Between Brains and Computers by Chris Chatham, posted on March 27, 2007.

Number one of his list of differences is nothing new -- "Brains are analogue; computers are digital", therefore skipped.

Number two reveals a new buzz word for describing MOM, "content-addressable memory", and it describes it as follows: "the brain uses content-addressable memory, such that information can be accessed in memory through " [...] spreading activation" from closely related concepts. [...]" When I read it first, I thought, oh, there might be someone with a similar concept in mind like MOM. On the second look, I realized, that claim likely originates just from psychology. The review continues the above quote by "For example, thinking of the word 'fox' may automatically spread activation [...]" which points a bit into neurology. I wonder who that claim "thinking of a word" or "thinking of a fox" or even "thinking of the word 'fox'" can be proven to spin off activation. I mean, that would imply someone proved "the word 'fox'" and a neuron equal, since the neuron is that instance sending activation to other neurons. -- However, I share that opinion, the one a neuron represents an item, but I am just not aware of a proof for that. If you have such a one at hand, I'd be really glad if you could hint me to the source. (Just since it'd support my own claims.)

Aside, I don't share the idea thinking of a word might immediately stimulate "memories related to other clever animals" [as my source, the above linked article, continues] related content. I think, at least it needs to think of the fox itself instead of just the word "fox". And, to finish the quoted sentence, it ends in "fox-hunting horseback riders, or attractive members of the opposite sex."

Back to MOM, taking "content-addressable memory" as a label for it, actually is chosen accordingly: Chris Chatham continues his second difference with a "The end result is that your brain has a kind of 'built-in Google,' in which just a few cues (key words) are enough to cause a full memory to be retrieved." Well, that's exactly what MOM is after: To pick up matching "memories" by just a few cues. -- The way Chris Chatham is describing the issue is pretty close to the original issue that led me to figuring out MOM: A guy who got his heater damaged who must find the spare part by utilizing a thesaurus. The thesaurus mostly consists of abstraction relationships between item names listed there. And rather often, there is no definition for the items provided -- thesaurus makers seem to presume you're a specialist on that field of topic or you wouldn't make use of a thesaurus at all. However, restricted to that tool, if that tool is restricted to abstraction relationships mainly, you cannot find the part you need to repair the heater. But what if you'd remove all the is a (i.e. abstraction) relationships and set up a "kind of thesaurus" consisting of has a relationships only? -- That way, you'd find the spare part as quickly as your in-mind "Google" might do. At least if you've got another tool at hand that jumps you over all the crap of temporarily unnecessary details, like the knowledge that -- let's switch the example to a pet cat -- the four feet, torso, tail, neck and head that belong to the cat also belong to any quadruped animal. Such as a pet dog, or also a pet hamster.

With differences #3–#9 I were familiar with respectively became clear to me over the time I developed the Model of Meaning, e.g. the claim provided by "Difference # 6: No hardware/software distinction can be made with respect to the brain or mind". That's rather clear, but I am not going to explain it here, since this posting is just a note to me (and anyone who might be interested), that there is a posting around which by content is close to MOM.
 

Difference #10, on the first glance looked unfamiliar to me -- "Brains have bodies" --, but although I wasn't aware of those change blindness findings "that our visual memories are actually quite sparse" quickly brought me back to what I already know (well, strongly believe; I lack the laboratories to proove my theoretic findings by scissoring mice). It's rather clear that "the brain is 'offloading' its memory requirements to the environment in which it exists: why bother remembering the location of objects when a quick glance will suffice?"

      
Updates:
none so far

Thursday, June 21, 2007

Tagging The Links Might Reveal (The) Content Of Linked Web Pages

The 'link payload' -- tags assigned to a link, the users tag themselves with these tags when they click them, thus generating a cloud of tags of possible interests applying to the clicking individual user -- could be applied by a company that hosts its own site on its own server -- at least on a server it can set up the way they prefer.

In my previous example for such a link payload, I focused on a news company that provides a news overview consisting of nothing but the news articles headlines linking to the articles. That forces the users willing to read the article to click the link, thus tagging themselves by the tags assigned to that link. -- If the company provided the full text by RSS feed, they'd never learn the tag cloud the users would generate/reveal about themselves by clicking several such links.

Learning the interests of a current visitor in realtime might allow to pick more fitting ads to present.

Aside of that immediate advantage of tag based user tracking on a single site, what about the web? Aside of user tracking, a (tagged) headline link to a news article page reveals another particle of content, even without the chance to track the user at all: The link tags tag the linked page too. -- If there are multiple links pointing to the very page, a cloud of tags for that page cumulates. In other words, the page gets a content aside of the text written on that page: the tag cloud. Since that content is not present on that page, I'd call it content assigned to that page, not actually there. For short, maybe something like "content [assigned] to a page" instead of the familiar "content of/on a page".
 

One question is, whether content can be mined from the web immediately, without processing the text presented on the web. Mined in a way like determining tags for links, for users, for web pages.

One goal of processing the tag cloud assigned to a web page (or any other item, of course) might be to gather a MOM net, a condensed form of content. A multi-level directed graph storing distinct content by each of its nodes. I see, it might be helpful to go more into depth with this, explaining what a MOM net actually is and what it stores and how it does so.

I keep that in the back of my head for another post to come.

      
Updates:
none so far

Link payload to get an impression of user interests

Is it link payload? Or something like a content or a set of features the link clicking web users reveal about themselves?

Having a tool in reach that might mine immediately processable content from the web, the reorganizer module of MOM, I keep wondering how to actually mine the web.

Just the minute, I am skimming a news web site that, on its overview page, provides the headlines of the articles only. Not the least preview, not even a snipped of text, hinting on what the linked article mght deal with, and where it might dig into the depth. So, a human can say: If you click on that link, you might be interested in the topic spotlighted by the headline. Or, since I know the sometimes crudely set up headlines, there's a chance you clicked only to get an idea, what the heck the article might deal with. There's also the chance you'd click any link accidentally, but let's skip that possibility for now.

What I noticed the minute before, when I was skimming that headlines list was that converting the headline's words to nouns (e.g. by stemming) might suffice to tag the links. Given the case people would click only links they'd be interested in, in the mirror, any such link clicked reveals the topics the user is interested in -- the tags peel off the link and adhese to the person who clicked that link. In other words: By clicking the link, the users tag themselves. -- Track, what the user clicks over time, and you'd get not only a cloud of tags which you can link to a user, but by actually linking them to the user, applying reorganization, it's simple to learn the interests of a user. Add counting of the -- no, not of the links, as you might do for plain web site statistics, but instead -- add counting of the tags the users tag themselves with, and you might get a rather specific profile of the user. -- Cover a broad cloud of topics, thus a broad cloud of tags, and your users' profiles would become even sharper.
 

And, in the back of my head, there's still Google's advertising system. If each page, Google puts ads on, has to be 'enriched' by a handful of tags, visiting that page, the users tag themselves with those tags. If Google manages to assign that set of tags to individually you, Google might have quite a good impression of your interests.

      
Updates:
none so far

Sunday, June 17, 2007

cluster your feed news: MOM reorganizer vs RSS feed news

The chance to reveal topics several different sources work with by applying a reorganizer also implies the chance to cluster RSS feeds by topic: Instead of approaching that issue by applying traditional information science procedures, alternatively the tags of the fetched articles could get looked up (retrieve the original article, pick its tags) and thrown through a reorganizer.

That might ease to skip feed news of usually valuable feeds on topics completely out of interest.

      
Updates:
none so far

basic MOM peer scheme reworked

The recently recognized chance to build a MOM useful application without a need for a recognizer demands for a slight modification of the basic MOM peer scheme:


      
Updates:
none so far

MOM's reorganization could reveal topics/theme complexes

Currently, I am preparing to implement MOM's reorganization capabilities. yet implemented parts of MOM/parts currently under development Today, some time amidst of it, I noticed that MOM could provide service already with only the reorganization functionality in place: Based on popular tagging, MOM [actually, its reorganizer] could reveal topics different sources (e.g. flickr photos or blog entries) deal with, unrecognizedly so far. -- The background:

Problem:
I've got lots of papers which are tagged. They deal with several different topics, on and off over time. There might be far later papers dealing with similar topics like any far earlier ones. -- Using the tags alone, doing that task intellectually, I might have a rather hard time: There are too many distinct tags to keep track of.

Approach for solution:
Reorganization could be applied: It might detect clouds of tags that belong together and 'mark' them by pooling them to separate new -- but yet unnamed -- 'tags' (= MOM nodes). That new tag, then, points to every paper the topic the tag represents deals with. -- That reduces the workload to be performed intellectually to find appropriate names for the newly created, first unnamed, tags. And, of course, to tag all those papers beforehand.

Benefit:
That does not only apply for my private issue of getting clear what topic I touched with MOM at what time, but also to any other unordered collection -- e.g. for papers collected in preparation of a diploma thesis..any scientific work, maybe even a law library..any library..all literature ever written.

      
Updates:
none so far

Monday, June 11, 2007

Tag users

Google demands its adsense users to choose some trigger keywords/phrases. If they match a search of a Google user, the advertiser's ad might get displayed.

If the user clicks any such ad, they show interest in the topic of the ad, outlined by the trigger keywords. -- So, if one (or Google itself) were after to figure the interests of a user, Google could take these keywords and tag the user with them. -- The Google cookie one gets practically everywhere in the web, expiring in 2037, provides a unique handle for the user, for a long time. (And I really wouldn't wonder if they'd track 90% of all web users, individually.)

On the other hand, whenever a user enters a page featuring Google content, the user might become identified by their cookie. If Google is about to place ads on that site, by the cookie ID, Google immediately could figure the fields of interests of the visiting user. Also, immediately, it could update the user's record, since by visiting the page Google knew another bit about that user: He's likely interested in the topics dealt with on the page just entered. -- Which either the advertising real estate provider reveals by tagging its own page or which Google might do itself by traditional information science approaches.

Then, Google could select any ad that matches the user's profile and render it to the page currently being put out/the user is waiting for that it finishes rendering.
 

Is there a way to mimic that approach to gather tags of tags?

      
Updates:
none so far

Tuesday, June 05, 2007

What is that, MOM SSC is after?

MOM SSC is after to get out a basic MOM peer, that can determine hidden content from a given MOM graph and recognize the presence of items by varying and not predictable patterns of features of these items being given, e.g. signalled by sensors.

All of that on a strictly reviewable way, maybe even a laymen compatible one. It's avoiding every kind of black boxes such as self-organized neuronal networks especially when featuring loops.

      
Updates:
20070606.10-02h CEST: changed the last sentence to make more clear what I meant.

docs are live!

The docs yesterday put into the MOM Simple Set Core framework, today go live. Here's a preview. Later, when gna.org's cron will have updated the MOM SSC website, the whole docs will get their own place. Up to then, I'd keep the preview link mentioned beforehand.

Matching the improved documentation, there's a new tag of the framework, now labeled v0.2.2.

Update: Okay, gna's cron made it to the MOM SSC site, thus, the docs are live and available by their official URL. Therefore removed the preview link above.

      
Updates:
20070605.14-46h CEST: mentioned the official site went live & removed the preview link.

Sunday, June 03, 2007

new release: documentation improved, dotty output added

New version is out! It mainly features improved documentation of all of the files of the framework, but also introduced better MOM net output for MOMedgesSet and MOMnet class. For the latter even a simple dotty output method.

See here a sample MOM net created by MOMnet and rendered by dot: Two layer MOM nets oftenly contain hidden, i.e. not explicit (i.e. implicit) content. To have a generator for them available constitutes the chance to develop a detector for such implicit content and to make it explicit. A mechanism that takes both of these steps is known as reorganizsation. -- Which might become implemented next.
      
Updates:
none so far