swestrup: (Default)
[personal profile] swestrup
When [livejournal.com profile] _sps_ and I talked about searching in NWO (a proposed replacement for the WWW), we wanted to integrate it into the basic fabric of web surfing, so that every site that wanted a local search wouldn't have to implement it themselves. Now, NWO is very very very much NOT a monolithic design, and it would fully support companies like Google and Alta-Vista that wanted to specialize in providing NWO search-system components. In fact, we rather assumed that such components would rapidly proliferate once NWO took off.

Anyway, that is just the background. So, when I was using Google and thinking "What's wrong with this service?", I was imagining it in terms of what Google over NWO should provide.

What occurred to me is a combination of Google, DMOZ and NorthernLights search engines.

See, Google is really good at indexing a farkload of stuff and has primitive (but reasonably good) general relevancy metrics. What it lacks is any sort of relevancy calculation relative to your current search. If I am looking for information on a consumer product (I've just been googling for the GA-7VM400AMF motherboard), it will rank hits on popularity of the result pages. Most of these pages are for stores. Well, I've already got one™ so they are of no use. What I want are hardware reviews that give the specs for the onboard unichrome graphics processor. Well, every sales page uses the words review and unichrome with respect to this product, so attempts to narrow down the search have failed miserably.

A close inspection of the sales pages that Google returns reveals that they all use the same stock phrases cribbed from Via's product announcement for the KM400a chip. Now, Northern Lights will notice something like that, and will group hits into categories by similarities in the results. This is done automagically for you, and is a help in weeding things out, but it has obvious failings in its categorization. In fact, it had such failings that they no longer provide a free search service, but are trying to sell search-engine licenses instead.

Then we have DMOZ. The categorization of hits is done carefully by hand, by someone who cares about the results. Thus, its one of the best places to search for 'Singularity'. On the other hand, its contents are categorized by only a (relatively) few editors, and so one will have no luck AT ALL trying to look up aerodynes there.

Now, the thing that NWO would bring to this mix is a framework supporting heuristic learning informed through anonymized collabortive feedback mechanisms. What this means, is that when NWO is asked to do a search, it will use the search-bot(s) that are highest ranked in terms of seeming to provide the information that the user wants. It will then take the highest ranked classifiers and filters and use them to categorize the search results. Upon presentation, the user will be able to prune whole categories that seem irrelevant or reclassify things based on what he's looking for. He can play with categorization criteria and invent new categories to sort things into. This activity will not only allow the user to quickly eliminate dead-ends and to expand upon interesting search avenues, but will be carefully monitored by NWO to let it determine if its done a good job. It will take these statistics and use them to update the various rankings of its bots, and to tweak their parameter lists so that they will perform the exact same search (and hopefully similar ones as well) more efficiently, and more precisely than it did last time. Then, it will anonymize its tweaks and upload them to a distributed feedback database where it will be statistically combined with results from every other user. Thus if anyone ELSE ever performs that search, or a similar one, they should get better results than the last person. Over time, this will cause more and more of the web to be indexed in an ever more semantic manner.

Now, the above is a very rough sketch of the idea, and as presented has obvious drawbacks. IE. Not everyone will want their searches optimized the same way. Well, there are also abstract user models that get uploaded as well, so that an NWO client can download the parameters that seem to be best suited to its model of its current user's desires. As well, its modeling of its user, and its success in gathering stats and tweaking its models are also subjected to the exact same heuristic feedback system, so it should get better at all of these as time and technology advance.

Couple all of this with the fact that the proposed NWO web-page format is semantically tagged (not formatting tagged as is HTML), so that the search engine will be able to look at the word "Aerodyne" on a page and know if its a reference to the Science Fiction concept, the Company, a make of airplane or something else, and searching should be a whole lot nicer under the New Web Order!

January 2017

S M T W T F S
1234567
891011121314
15161718192021
22232425262728
293031    

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Page generated Dec. 26th, 2025 04:38 pm
Powered by Dreamwidth Studios