Model of Meaning

Content Representation With A Twist

Wednesday, July 04, 2012

Progressing in preparing MOM for release

Ever since I got my diagnosis "cancer" back in November, I wanted to make sure that at least MOM survives.

A friend of mine, Kerschek, came to help, and now we're making good progress in gathering all the sources I've written on MOM.

Currently, Kerschek is collecting and importing old blogs and parts of my intranet wiki to a common source, and I am sifting through old MOM source repositories, starting 2006. I'm using git for the latter and intend to put it on some free open source hoster once I'm done. Can you recommend any?

Thursday, March 15, 2012

One important issue I'd all the time was the question how to get MOM's memory graph stuffed. Might be I just stumbled upon a (possible) solution: the open database of metaweb.

The content to use would be like ... here.

Friday, September 03, 2010

How to algorithmically determine valid synonyms?

Currently, I am working on a project inofficially called "read and let read", addressing the point that reading is slow and machines are far more quick than humans. The focus is on web feeds (RSS, Atom) and the question of whether it is interesting to the user, so whether or not they should subscribe to it.

How to tackle this by a machine? The idea is to look at the keywords the feed postings are tagged with: How much do they match the user's interests?

Trying this by myself, I made the tags of this here blog's tag cloud be my interests and ran them against some of the more interesting feeds I read. The results are encouraging but the actual match values tiny: Yes, the software detects matches. No, they usually range below 2%, often even below 1%.

One issue here might be synonymy: There may be greater matches between my interests and the feeds topics, but the two of us may speak different languages: Engadget simply might use different words for the same things. So the dumb unknowable machine does not know there is a match. To fix this, now I'm looking for an algorithm that determines synonyms for each word of a given set of keywords. (For the impatient of you, there is a resource named WordNet to come to help here.)

Determining valid synonyms based on a single given word likely will bring up such synonyms that match a different meaning of the given word. Like "canine", also the "trestle" is a synonym for "dog", and the software for sure would come up with that.

Looking into my old university books for this issue, they all implicitly presumed a human would look for the synonym. But, no, here it'd be a machine, and it won't be able to detect the meaning shift intuitively, won't be able to skip nonsensical synonyms.

Looking further, I found some postings on Google starting to imply synonyms to searches. So, there indeed is some kind of algorithm around that determines synonyms based on a small set of given keywords. Remaining question: Got that algorithm published ad what does it look like?

Monday, October 12, 2009

applying tags implicity to a photo collection

No news here since a long while. However, today, I read liw's bit on tag hierarchies for photos. I'd like to comment on some of his key points:

First of all, liw sees some of the key issues dealing with tags rather than notions: He mentions aliases and raises the questions whether there should be tags allowed that have more but one name. -- I'll ask another question: What about different tags sharing the same name? That's called homonymy. The other, "aliases", is synonymy.

Another thing he's asking for: What about translations for tags? -- Well, there is the issue that there are words in one language that don't have a true translation in another language. Little known, but clear is this pair in English and German: "to round out" -- "abrunden" ("to remove sharp edges"//"to round in"). Also, all the foreign-langisms any language has is a sign for that there's not yet any matching translation for a foreign word. Think "kindergarten" in English.

Second, given the translations would equal the original meaning, what else would they be but synonyms?

In my opinion, this cries for using notions rather than words.
 

About the synonyms, liw raises another question: What about descriptions for the tags? This could be particularly useful if you'd share tag hierarchies and one receiving person would not know a certain tag.

Let me point out some usability/convenience issue here: First, how can sharing user A know which of his tags they'd need to describe in order receiving user B will fully get what the particular tag is about? Second, why would user A want to describe any of the tags they already know? What'd be their benefit of that, their reason for doing so? And also, would the receiver actually make very much use of tags they are unfamiliar with? Then, would it be worthwhile to exercise all that effort to describe the tags?

Finally, descriptions are probably the worst source of content for machines: Content is stuck in sentences. Machines are illiterate, and for the foreseeable future this isn't going to change. The effort you put into describing tags by descriptions you unlikely will get out of there for machines to process and, thought a bit further, also not for humans to take advantage of, since machines cannot take advantage of that knowledge buried there in sentences, incomprehensible to them.

Therefore, I'd like to push a little further what liw already incorporated in his approach to tagging: implicity. liw's idea is to imply one tag into the other. (It's great to see liw is is-a/has-a agnostic in his sample tag hierarchies. -- Shift it to the notion level, and you get what I am proposing with the model of meaning (MOM) all the time.) Therefore, instead of burying precious content into machines-incomprehensible sentences, just extend the idea of tagging: show the context tags of the given tag a user might not know. The terms/tags being neighboured to it likely would give a close clue of what the tag in question is about.
 

I just mentioned liw's is-a/has-a agnosticity in his sample hierarchies. (I refer to his clear line of "has a" in "Europe - Finland - Helsinki" vs. is-a in "Location - Europe" in the same hierarchy.) There's one last thing I'd like to point out: Once you allow has-a relationships you'll like get a network rather than a clear hierarchy.
 


MOM -- my model of meaning -- might be a resource for addressing the issues liw points out. However, MOM currently is not under active development, and although I'd very much like to assist to implement that tags implicity system -- or MOM --, as I am currently occupied finding an employment, I won't be of very much help here. However, I'd be glad to see somebody implement it or parts of it. Also, I might be able to check in some of my own sources in several languages into github as a resource to draw from. -- wrs_]

Sunday, March 01, 2009

progessing...

...given the case you've got time and the right focus/objective, suddenly a knot opens and flow comes...

It's amazing, in what a short period I get done (rudimentary versions of) the major components of this project I had on my table for such a long time. This time it's the recognizer. I just implemented a primitive variant of it -- in, originally, less than twenty lines of code.

Operating on tags found in the Debian package descriptions database, it can help you to quickly determine the Debian package(s) that suit you best, just by entering a few tags making sense.
 

Here are some examples:


The program reads from a list of tags known to exist within the Debian package descriptions database. Five of those tags get picked, that's the first line of each output, presented like an array. The next few lines tell how many packages had how many of the picked tags. [The dotted line is minimalist debug output.] And finally, you get the list of results, i.e. packages that probably suit the user's needs best, i.e. share the most tags with the query. The last line is gimmick. It just tells the length of the results line in bytes.

To avoid to overwhelm the user by the amount of results, the amount of results provided is restricted to five.
 

Now, next question is where to get more collections like the Debian one, where you have items, and a few tags for each item. Are you aware of any? Could you give directions? Pointers? Hints? -- I'd be glad!

      
Updates:
none so far

Tuesday, February 17, 2009

A hack makes reorganization work

Progressed further: One important step taken: reorganization, now, basically, works. That's the pre-requisite for introducing quality control to recognition.

Below, there are two renderings of basically the same net: One rendered before, the other after reorganization.

pre-reorg


post-reorg

The net is broken because this reorg is a hack, and I therefore hacked together a generator too. (25) pointing to (22) and (15) pointing to itself can be seen as indicators for the brokenness of the generator.

However, what's important is that reorganization works.

Thursday, October 23, 2008

output of a MOM net stimulation

No, the MOM project is not gone. Because of a horrible 14-hours workday (incl. 4 hours commuting), I simply lack the time to work on MOM. However, this night I found some time to twiddle around with it, developed a very simple kind-M net net generator in Ruby which features node salting, dotty output and progression over time. Below, you find a screenshot of its output. This screenshot shows three different states of the same MOM net.

The top one is almost the initial state. All nodes were set to a value of two, thus were active. Additionally, the net was salted. That means: Some nodes were randomly picked and their values were increased by a random value. That's why there are floating point numbers in the graph at all.

To keep the node shapes small but stay able to watch the values, I added helper nodes. Those display the values and link to their respective real node. Those helper nodes obviously aren't related to anything other but to those nodes they serve as labels for.

Note, despite the bright box in the bottom left corner of each graph, the screenshots show immediately consecutive states: Top is state 1, middle is state 2, bottom is state 3.
 

The center graph shows the net after one iteration. What happened to each node by now was: Every successor node (top level) got its value divided by 2.0. Every active predecessor node stimulated their successors by 1.0, every predecessor node -- active or not -- got their values divided by 2.0 too.

You may notice a color change: Nodes being still active are colored green, as well as their edges to successor nodes. Nodes not active anymore became orange now. You might also notice a different edge style now: Dotted lines means the predecessor node may have change, but at time of the screenshot it didn't effect its successor. That's for: In case some node turns green but didn't effect its successor, the dotted line makes that clear.
 

Another step in time, bottom graph, you notice the color of most of the nodes faded out furtherly. The links between label nodes and real nodes are solid as always. The values of originally unsalted nodes is down to 0.5.
 

Why this new verve? -- I just thought, it'd be a good idea to present MOM and share it with others to improve it together, rather than aiming and aiming for a perfect outcome but having that rather slowly only, because of lack of time.




Well, actually Nathan Sobo's presentation of Treetop inspired me.

      
Updates:
none so far