Content Representation With A Twist

Sunday, March 01, 2009

progessing...

...given the case you've got time and the right focus/objective, suddenly a knot opens and flow comes...

It's amazing, in what a short period I get done (rudimentary versions of) the major components of this project I had on my table for such a long time. This time it's the recognizer. I just implemented a primitive variant of it -- in, originally, less than twenty lines of code.

Operating on tags found in the Debian package descriptions database, it can help you to quickly determine the Debian package(s) that suit you best, just by entering a few tags making sense.
 

Here are some examples:


The program reads from a list of tags known to exist within the Debian package descriptions database. Five of those tags get picked, that's the first line of each output, presented like an array. The next few lines tell how many packages had how many of the picked tags. [The dotted line is minimalist debug output.] And finally, you get the list of results, i.e. packages that probably suit the user's needs best, i.e. share the most tags with the query. The last line is gimmick. It just tells the length of the results line in bytes.

To avoid to overwhelm the user by the amount of results, the amount of results provided is restricted to five.
 

Now, next question is where to get more collections like the Debian one, where you have items, and a few tags for each item. Are you aware of any? Could you give directions? Pointers? Hints? -- I'd be glad!

      
Updates:
none so far