Content Representation With A Twist

Tuesday, July 31, 2007

Finished a first piece of reorganization

Just finished: One core part of reorganization -- finding large replacable partitial networks. I figured, that might get me rid of those double feed news, as having this functionality available might enable me to sort news by topic. ... Which makes me ponder about wrapping this bit into a rails site. ;-)

      
Updates:
none so far

Friday, July 27, 2007

chance for a MOM application: get old news filtered from RSS feeds

Development on the MOM SSC framework and especially implementing one core part of the reorganizer got lagged because I am still after getting a job (and other issues). Apparently, that search distracts more but actually having a job.

However, the time to read the feeds I defend. But there I found a problem -- too much interesting news and too many repetitions of the same topic. I survived one Apple keynote time, and I endured the Vista market introduction. But when there was another hype on the iPhone I begun feeling nagged.

Now, as the iPhone wave gets prolonged by iPhone hacks, and as noone can hid from that Harry Potter hype, I really get annoyed. -- As the Model of Meaning provides the logic to detect similarities, I want a tool that determines old news and variants of yet known news. Such as the latest iPhone hack or Potter p2p share.

Another way but looking up and dealing with the tags of feed entries, might be to take the words of any set of two or more articles and see for sets of words they share. A more brute-force (and less MOM way approach would be to take word neighbourhoods (word sequences) into consideration. -- On the other hand, the tool-to-be could use wordnet to include synonyms into 'consideration' when looking for similarities between texts.

For that reason, now I see how I can get through with the beforementioned reorganizer core -- the one that actually detects similarities for to save edges, i.e. storage -- logical by edges as well as "physically" by disk space.

      
Updates:
20070731: linked the word "lagged" to the last recent release posting

Friday, July 13, 2007

Positive hits in the "content representation" search results

Correct hits on the "content representation" term Google search (in opposite to any such hits that contained "content <something else but whitespace only, such as punctuation> representation"): I went through the results from end (page 79) towards start, since I presumed many false hits the nearer the end of the tail. But there few false hits there.

The above results I picked from pages 79 and 78 only -- and already learned a lession: It might make more sense to apply some kind of clustering here instead of walking through the list manually. Even the intellectual check whether there is anything in between of "content" and "representation" -- to filter out false hits --, can be done by software.

I'd like to learn the most-often used terms (besides of "content representation"), and, by help of that clustering/visualization, I want to get the chance to ignore obvious false hits.

That demands for using -- get hands on -- the Google API.

      
Updates:
none so far

wanted: tag cloud for the other pages mentioning the term "content representation"

I'd like to learn what all these 91,900 search results related to content representation might be about. (Curiously, I wonder where I left the article directly pointing to that search result -- when it still were 88,900 "only".)

To learn that quickly, first I need to decide whether to see the pages manually or "mechanically". Then, I'd need to learn how to use the Google API to quickly get all the hits -- which actually end by page 78 which in fact is not 90 thousand plus search results but only a "small" number of only 788 hits.

However, since I'd like to redo this search every now and then again, and as I might like to do the search for sites like Cite Seer as well, it might be worth the effort to develop a small program which helps me in determining the content of all the pages. -- A tag cloud and toying around with precision and recall might contribute a bit to the visualized cloud. -- The cloud terms' sizes could visualize quantity in recall, while the precision might get indicated by color incoding, e.g. blue .. green .. yellow .. orange .. red, like on maps, where high precision might get indicated by red and low precision by blue.

There's a tag cloud generator available in Debian's share of Perl libraries. I already modified it, and it's available on demand. -- However, I'd prefer to have any place in the web to put my version to. Any repository out there for that library?

      
Updates:
none so far

A hunger for analysis and play(ing), after cramming data into one's memory

After having speed-read a book of project management, my mind starves for any analytical task to do. Not necessarily of the stuff just read/learnt but of anything.

Might it be possible that being confronted with a bold set of news results in a bold number of newly available neurons -- that kind-of want/need to bee stored somewhere, get wired in somewhere/any better in case they' are already wired in, somewhat? Does that task urge, since it might feel unpleasant otherwise?

— Indeed, the motivation behind that hunger for analysis, in fact, might be to give (any) thing a trial, to experiment.

      
Updates:
none so far

Thursday, July 12, 2007

Every beginning is hard.

The interesting question about "Every beginning is hard." is: Why? -- Not: For what reason, but by what origin? By what cause? What, on a neurologic basis is it that makes the beginnings so hard? Is there a way to overcome the beginnings to be hard, any way?

      
Updates:
none so far

Sunday, July 01, 2007

other models of meaning

Maybe worth a skim: Search results on 'Model of Meaning'. ('Content Representation with a Twist' didn't find anything so far, neither on Google, nor on Yahoo. Although Yahoo's crawler visited the MOM development project page over at gna.org.)

      
Updates:
none so far