Content Representation With A Twist

Showing posts with label recognition. Show all posts
Showing posts with label recognition. Show all posts

Sunday, March 01, 2009

progessing...

...given the case you've got time and the right focus/objective, suddenly a knot opens and flow comes...

It's amazing, in what a short period I get done (rudimentary versions of) the major components of this project I had on my table for such a long time. This time it's the recognizer. I just implemented a primitive variant of it -- in, originally, less than twenty lines of code.

Operating on tags found in the Debian package descriptions database, it can help you to quickly determine the Debian package(s) that suit you best, just by entering a few tags making sense.
 

Here are some examples:


The program reads from a list of tags known to exist within the Debian package descriptions database. Five of those tags get picked, that's the first line of each output, presented like an array. The next few lines tell how many packages had how many of the picked tags. [The dotted line is minimalist debug output.] And finally, you get the list of results, i.e. packages that probably suit the user's needs best, i.e. share the most tags with the query. The last line is gimmick. It just tells the length of the results line in bytes.

To avoid to overwhelm the user by the amount of results, the amount of results provided is restricted to five.
 

Now, next question is where to get more collections like the Debian one, where you have items, and a few tags for each item. Are you aware of any? Could you give directions? Pointers? Hints? -- I'd be glad!

      
Updates:
none so far

Tuesday, February 17, 2009

A hack makes reorganization work

Progressed further: One important step taken: reorganization, now, basically, works. That's the pre-requisite for introducing quality control to recognition.

Below, there are two renderings of basically the same net: One rendered before, the other after reorganization.

pre-reorg


post-reorg

The net is broken because this reorg is a hack, and I therefore hacked together a generator too. (25) pointing to (22) and (15) pointing to itself can be seen as indicators for the brokenness of the generator.

However, what's important is that reorganization works.

Thursday, October 23, 2008

output of a MOM net stimulation

No, the MOM project is not gone. Because of a horrible 14-hours workday (incl. 4 hours commuting), I simply lack the time to work on MOM. However, this night I found some time to twiddle around with it, developed a very simple kind-M net net generator in Ruby which features node salting, dotty output and progression over time. Below, you find a screenshot of its output. This screenshot shows three different states of the same MOM net.

The top one is almost the initial state. All nodes were set to a value of two, thus were active. Additionally, the net was salted. That means: Some nodes were randomly picked and their values were increased by a random value. That's why there are floating point numbers in the graph at all.

To keep the node shapes small but stay able to watch the values, I added helper nodes. Those display the values and link to their respective real node. Those helper nodes obviously aren't related to anything other but to those nodes they serve as labels for.

Note, despite the bright box in the bottom left corner of each graph, the screenshots show immediately consecutive states: Top is state 1, middle is state 2, bottom is state 3.
 

The center graph shows the net after one iteration. What happened to each node by now was: Every successor node (top level) got its value divided by 2.0. Every active predecessor node stimulated their successors by 1.0, every predecessor node -- active or not -- got their values divided by 2.0 too.

You may notice a color change: Nodes being still active are colored green, as well as their edges to successor nodes. Nodes not active anymore became orange now. You might also notice a different edge style now: Dotted lines means the predecessor node may have change, but at time of the screenshot it didn't effect its successor. That's for: In case some node turns green but didn't effect its successor, the dotted line makes that clear.
 

Another step in time, bottom graph, you notice the color of most of the nodes faded out furtherly. The links between label nodes and real nodes are solid as always. The values of originally unsalted nodes is down to 0.5.
 

Why this new verve? -- I just thought, it'd be a good idea to present MOM and share it with others to improve it together, rather than aiming and aiming for a perfect outcome but having that rather slowly only, because of lack of time.




Well, actually Nathan Sobo's presentation of Treetop inspired me.

      
Updates:
none so far

Sunday, January 20, 2008

Build a new 'programming' language that neither instructs computers but tells them what to make sure?

Just reading the lastest news on a security hole in Winamp, and still having in mind how our programming trainees tend to assume things and base their programming on that -- instead of making pretty quite sure --, having worked on knowledge representation a rather long time, with the Winamp issue a thought popped into my mind:

As long as there are minds out there trying to make any one's programs do anything they were not intended for -- and that might be for a pretty long time --, programming might initially look like it looked like since decades: instruct the computer what to do and in which order to do it. But on the second glance, having people in mind who try to abuse programs, other people who ease them to do so instead of making sure, all that programming sort of things, in my eyes, looks like being in conversion to be knowledge work, rather than lining up building blocks. That kind of knowledge work that is to make sure things are the way we'd assume them to be. So the whole program might become some sort of building where each single building block was not only lines up but verified too. So, then in fact the whole building consists of knowledge rather than basically of building blocks of assumptions.
 

The majority of my achievements in knowledge representation was to figure out two fundamental concepts, aside of a minor but even more fundamental one: The concept of recognition is after "How to recognize items by a given subset of their features?" while reorganization asks how to reorganize a given graph of knowledge representation to make it less matter/energy consuming while still representing the very same content? The minor one was how to store content by graphs at all. It's very basic but important nevertheless.

Long a while ago, I wondered whether there might be a reason to base any kind of computers instructing language on that effort. But then I didn't see any such reason, and I didn't take any further effort to figure out any such one.

However, coming to the point today to see secure programs as a building of certainities, there in fact might be a reason to convert my efforts into a new computers instructing language.

      
Updates:
none so far

Monday, August 20, 2007

removed: Sidebar element "Objective"

Objective

Common quality of today's information technology, in an aim to become able to identify items, is to mark up every single item. – The Model of Meaning heads to build the foundation to manage-without any such markup.
The approach is about content representation in the literal sense of the term.

      
Updates:
none so far

Tuesday, June 26, 2007

Reorganizing Tags -- For What Benefit?

Having in sight to get over the core MOM reorganization obstacle and get reorganization implemented, as well as having noticed a possible benefit of having only//just//at least a reorganizer at hand (i.e. without any reorganizer) [aside of the benefit of becoming able to develop a more sophisticated recognizer then], I begun thinking about whether there might be a chance to make some profit by providing the MOM reorganizer as a web service.
 

Still unknowingly about any profitable such web service, I ended up with looking up 'tagging' in wikipedia. Which might be worth a read, same so for the German variant of that very article [for those of you comfortable with that language].

      
Updates:
none so far

Monday, June 25, 2007

questions regarding familiarity, recognition, creation of new neurons, their offshoot, self, and the brain

Saturday morning I awoke when I was scratching my head. I noticed the sound it made. I thought of something like -- Why does it make that sound it makes? That well known sound. Then I started to wonder -- which made me woke finally. Thinking is always a good setting [for me] to get ripped out of the sweetest (and the most horrible dreams), so this one was.

So, fine, scratching my head makes a sound. A familiar sound. One I know really good. Do I? That sound is so familiar I most often not even notice. -- That was what I noticed next: Why didn't I notice it so far? How many times may I have scratched my head up to now? And only now I ask that question. Curious.

Might it be that as soon as we are familiar to a situation/thing we stop asking further questions on that matter? Might this be the cause for why children [apparently] ask about everything? Is their familiarity [with the world] so sparse that recognition can not kick in? Or might it be, recognition itself results in too vague results [for the children]: i.e. results in 1..many nodes which get stimulated to a similar degree, thus automatic ("intuitive") recognizing, that results in a single most probable [represented] item recognized, cannot take place? Therefore, the child has to find that single most probable item consciously, actively? They support recognition by asking grown-ups? And by that support they make a distinct edges become weighted as more important? [I assume, that equals <learning>. The body is able to move a lid or a leg by a pulse of a nerve -- why not move or even grow a neuron's dendrite or axon by basic will?]

If the child, by the approach to weight single edges more important, does not achieve the wanted result, maybe because, after a while, all the edges get weighted equally again [hence the confusion gets as strong as when it was the time before weighting at all], what happens then? Does the child decide -- read: does the child decide, as well as: does the child decide -- one or more new neurons to create?

Or gets this decision made by "the brain"? Or does it cause the creation of new nerve cells without any kind of decision-making, i.e. automatically? Or is it just any single nerve cell which initiates cell division? Or is it not even that single neuron which 'initiates' cell division but plainly begins to divide itself, caused by any external conditions, e.g. biological or chemical ones, which in turn might get caused because there is an obviously needed nerve cell not in place? Might these biological or chemical conditions get caused because neighbouring cells feel some stress and excrete some hormones?

Or might be the reason for new neurons to be created be caused by any neuro biological condition, though? Maybe because nerve cells divide when any of their offshoots -- axons, dendrites -- grew a "too large" tree//braid//knop? And, this rank growth divides itself from the remainder of the very nerve cell?

Or might it be that at some time there's no place left over on the main body of a neuron where any other neurons immediately can dock to, hence dendrites get started to grow? Or the docking nerve cells begin to grow axons, since these might fit between all the other dockers? Or is it that way, the nerve cell gets divided when there's no place left over on the core body of it?
 

PS.: I doubt there is any bird's view instance which decides whether or not to set up any new edge or cell (node). In other words, I doubt "the brain" decides that..anything at all what takes place within the brain itself//brain body, i.e. I doubt there is any other instance in brain but 'self' that makes any decisions regarding brain itself.

      
Updates:
none so far

Sunday, February 18, 2007

To Keep One's Insights Secret is the Best Chance to Get Stuck.

I am working on a model to represent content in a manner free of words. There are two other main parts tightly related to it: To recognize content by a variable set of given features. And, to reorder the stored content so that implicitly represented content becomes explicit while the already explicit part becomes more straight at the same time.

There is one main thing I avoided during my approach: black boxes. Things that base on people's believes but on proven facts. Things that are overly complex, hence may be estimated only, but not proven.

I avoided two common approaches: utilizing artificial intelligence and linguistics.

Representing The Content

On dealing with thesauri and classifications, I noticed the fact that those ontologies force abstraction relationships between mentioned notions. Therefore, I thought about the alternative, to force the partial relationship. What would result, if all the notions had to be represented as partial relationships? -- A heavily wired graph. -- Originally, thesauri and classifications were fostered manually. Therefore, it looked clear to me, noone would like to foster a densely wired graph to keep track of notions.

Nevertheless, I continued the quest. There is software available, nowadays, hence why to keep the chance out of mind?

Over time, I came to the insight, that notions might be constituted by sets of features, mainly. There -- I had "words" represented. More precisely: the items which are treated as the "templates" for the notions. ... I begun to develop an algorithm to recognize items by their features, varying sets of features, effectively getting rid of the need for globally unique identifier numbers, "IDs" for short. I had a peer to peer network in mind, which automatically should exchange item data sets. That demanded for a way not to identify but at least to recognize items by their features.

Since items can be part of other items -- like a wheel can be part of a car, and a car a part of a car transporter --, I switched from sets to graphs. -- To make clear that the edges used by my model are not simple graph edges, but also to differentiate them from classification/thesauri relationships, I call them "connections".

Then I noticed similarities to neurological structures, mainly neurons, axons, dendrites. Also, I noticed, the yet developed tools could not represent a simple "not", e.g. "a not red car", I begun to experiment with meaning modifying edges. Keeping in mind, that there is probably no magical higher instance providing the neurons with knowledge -- as for example knowledge about that one item is the opposite of another --, I kept away from connections injecting new content/knowledge; in this case knowledge about how to interpret an antonym relationship. I strove for the smallest set of variations of connections.


However, even without content modifying connections, the tag phenomenon common to the web could take great benefit: The connections between the items make clear which item is implication of which other(s). Applied to tagging, users would not need to mention the implications anymore: The implications would be available by the underlaying ontology. (And the ontology could be enhanced by a central site, by individual users, or by a peer to peer network of users.)

Recognizing The Content

Having that peer to peer network in mind, I needed a way to identify items stored at untrusted remote hosts. I noticed, that collecting sets of features together which theirselves would be considered to be items, meant nothing but a definition of the items by their features. Different peers -- precisely: users of the peer software -- might define the same items by different features. Which might leave only some of the features locally known/set to match those remotely set. -- However, most time that's enough to recognize an item by its features: Some features point to a single item only. These features are valuable: They are specific for that very item. If one of these specific features is given, most probably the item, the feature is pointing to, is meant.

But usually, every feature points to multiple items at once. Most probably, every item a feature is pointing to is chosen reasonably, i.e. the item the feature is pointing to is neither randomly chosen nor complete trash. Thus, a simple count might be enough: How many given features point to which items? How great is the quota of features pointing to a particular item, compared to the number of total incoming feature connections? -- The number of incoming feature connections, I call stimulations.

There's one condition applied to the recognition: If one node gets a particular number of stimulations, e.g. two stimulations, that very node will be considered to be "active" itself, hence stimulating its successor nodes as well. For a basic implementation of recognition, this approach is enough. A more sophisticated kind of recognition also considers nodes stimulated only, but not activated.


However, having recognition at hand -- even at this most basic level -- would finally support the above aproach of tagging to leave alone the implications: Given only a handful of features would be enough to determine the item meant. Also, applied to online search, the search engine could determine the x most probably meant items and include them into the search.

Despite these, I see one big chance: Currently, if a person gets a physical item the individual does not know and cannot identify, she or he needs to let it recognize by someone else. Usually your vendor. Where you have to move the part to. Some parts are heavy, others cannot be moved simply. And you are hindered using common tools to identify the object: Search engines don't operate on visual appearance, and information science tools like thesauri and classifications fail completely, simple because they prefer abstraction relationships over partial ones.

Using software able to recognize items by features would overcome such issues, completely, and independently of the kind of feature: It would be equal whether the feature would be a word, i.e. name, label, or color, shape, taste, smell or other. And, other but relational databases, there were no need for another table dimension for each feature -- just a unified method to attach new features to items.

Reorganizing The Content

Also directly related to that peer to peer network in mind, peers exchanging item data sets -- e.g. nodes plus context (connections and attached neighbor nodes) -- could result in heavy wiring, superfluous chains of single-feature-single-item definitions, and lots of unidentified implicit item definitions. That needs to be avoided.

Since the chains oftenly just can be cut down, I concentrated on the cases of unidentified implicit definitions. For simplicity, I imagined a set of features pointing to sets of items. Some of the features might point to the same items in common, e.g. the features <four legs>, <body>, <head>, and <tail> in common would point to the items <cat>, <dog>, and <elephant>. You might notice, that <cat>, <dog>, and <elephant>, all are animals, and also all these animals feature a body, four legs, head, and tail. Thus, <animal> is one implication of this set of features and items. The implication is not mentioned explicitely, but it's there.

Consequently, the whole partial network could be replaced by another one, mentioning the <animal> node as well: <four legs>, <body>, <head>, and <tail> would become features of the new <animal> node, and <animal> itself would become common feature of <cat>, <dog>, and <elephant>.

By that, the network would become more straight (since the number of connections needed would reduced from number of features * number of items to only number of features + number of items), hence also more lightweight. Also, items implied would become visible.

While this approach makes the implications visible, it opens two new doors: One, the identified implications cannot be named without the help of a human -- at least not easily. (Recognition could do a honour, but I skip that here.) The second issue is, that the newly introduced node ("<animal>") conflicts with recognition: For example: If there was a another node, e.g. <meows>, directly pointing to the <cat> node, after the reorganization, a given set of <meows> and <head>, only, would not result in <cat> anymore since each of the given features would, yes, stimulate their successor nodes, but not activate. -- To actively receive stimulations from predecessor nodes could be a solution, but I am not yet quite sure. As mentioned initially, this is a work in progress.


However, reorganization would automate identification of implications. People could provide labels for the added implication nodes. -- Which induces another effect of the model.

Overcome the Language Barrier

I mentioned that I kept away from connections injecting knowledge unreachable to the software. That's not all. I strive to completely avoid any kind of content unreachable, needing any external mind/knowledge storage which would provide interpretation of such injected content/knowledge. Hence, I also avoided to operate on a basis of labels for the items.

Instead, all the items and the features (whereby "features" is just another name for items located a level below the level of items initially considered) get identified by [locally only] unique IDs. The names of the items I'd store somewhere else, so that item names in multiple languages could point to the very item. That helps in localization of the system, but also, it opens the chance to overcome an issue dictionaries cannot manage: There are languages that do not provide an immediate translation for a given word -- because the underlaying concept is different. The English "to round out" term and the German "abrunden" is such an example: In fact, the German variant considers an external point of view ("to round by taking something away"), while the English obviously takes an internal one.

Not sticking on labels but on items/notions, the model features a chance to label the appropriate, maybe even slightly dismatching, nodes: The need to label exactly the same -- but probably not exactly matching -- node is gone. -- In a word: I think, in different cultures many notions differ slightly from similar ones of other cultures, but each culture labels only their own notions, ignoring slightly different points of view of other cultures. -- This notions/labeling issue, I imagine as puddles: Each culture has its own puddles for each of its notions. From the point of view of different languages, some puddles match greatly, maybe even totally, but some feature no intersection at all. Those are the terms having no counterpart in the other language.

In the long term, I consider the approach to build upon notions/items -- but on words -- as a real chance to overcome the language barrier.

Conclusion

Despite the option of dropping the custom to label different items by the same name (as thesauri tend to do to reduce foster effort) and the possible long-term chance of overcoming the language barrier, I see three main benefits for the web, mainly for tagging and online search:
  1. Tagging could be reduced to the core tags; all implications could be left to the underlaying ontology.
  2. Based on the same ontology, during search, sets of keywords could be examined to "identify" the items most probably meant. The same approach could be applied by the peer to peer network to exchange item data sets.
  3. Finally, the reorganization would keep the ontology graph lightweight, therefore ease the fostering. Also, the auto-detection of implications would support users in keeping their tags clear and definite. That could reduce blur in tag choose, thus increase precision in search/search results.


      
Updates:
none so far