backtop


Print 7 comment(s) - last by S3anister.. on Jan 8 at 2:31 PM

Don't search harder, search smarter

According to a filing issued to the U.S. Patent and Trademark Office, patent 7,158,961, Google is working on deploying a "similarity-engine." This similarity-engine compares documents and websites for redunancy.

A common problem for search engine uses is receiving similar results during a search. Most website results returned will either have identical information or "roughly the same" information. With a similarity-engine in place, Google will be able to return the most relevant information while hiding or discarding reptitive data.

Google's patent filing claims:
From the search engine's perspective, one problem in cataloging the large number of available web pages is that multiple ones of the web documents are often identical or nearly identical. Separately cataloging similar documents is inefficient and can be frustrating for the user if, in response to a request, a list of nearly identical documents is returned. Accordingly, it is desirable for the search engine to identify documents that are similar or "roughly the same" so that this type of redundancy in search results can be avoided.
Google's similarity-engine project is not particularly earth-shattering. According to earlier reports, IBM, Hitachi and Visage Inc., are a few that have filed for similar inventions. In fact, over 15 patents for similarity-engines have been filed over the last 10 years.

According to Google, the similarity-engine will be based on creating and calculating differences and sums in vectors. Using hashes and what Google calls "sketches," its engine will be able to compare differences in text as well as images. The similarity-engine will take an object, create an vector for it, and compare the vector to that of another object.

Further into Google's filing, the search giant also describes the use of its similarity-engine in other applications. Besides web documents, the engine can be used to compare regular text documents, spreadsheets, presentations and other commonly used office productivity data.

"The concepts described could also be implemented based on any object that contains a series of discrete elements," the filing emphasized.


Comments     Threshold


This article is over a month old, voting and posting comments is disabled

cool
By mendocinosummit on 1/6/2007 8:36:53 PM , Rating: 2
Hopefully you have the option to turn it off and also know when it is on.




RE: cool
By Lord Evermore on 1/6/2007 9:06:18 PM , Rating: 2
You're given a notice when Safe Search is on, and there's always (or at least for a long time) been a notice if they've hidden any similar sites (though it shows up at the end of the results on the last page so you don't know until you've looked at them all). Is this a new development of that feature, or just finally patenting what they've been doing all along?

There are many times this would be useful, and probably lots of times you'd want to turn it off. I certainly get royally pissed when I have to look at 13 pages of results that are identical copies of press releases or "syndicated" style articles, or the same discussions shared among different sites, and none of them are actually relevant to what I'm looking for. But then other times, you might be looking for different comments on an article, and those might all be filtered since the main article is the same.


RE: cool
By TheeKat06 on 1/7/2007 1:05:50 AM , Rating: 3
Good point. I totally agree with you on that.
It's a great tool when you're looking for a variety of results, but when you need to get different points of view on the same topic, well, that could become a downside.


RE: cool
By vdig on 1/8/2007 11:28:09 AM , Rating: 1
Definitely needs to be toggle able. Still, this is a feature I look forward to. I have seen many web pages, and have had my share of web pages that trick me into viewing the same blasted page numerous times. Quite a few were links to adult sites, even though I was not searching for adult sites to begin with. I have since learned to avoid them by identifying telltale signs in the paragraph in google's excerpt.

This tool will knock down how many pages I need to go through to get past the trap websites and get to the content I desire. That is a great positive in my book, and I hope this is implemented as soon as possible.


This will be great
By S3anister on 1/8/2007 2:30:58 PM , Rating: 2
but like most Google products, stay in beta forever.




RE: This will be great
By S3anister on 1/8/2007 2:31:30 PM , Rating: 2
sorry, don't know why it double posted.


This will be great
By S3anister on 1/8/2007 2:30:58 PM , Rating: 2
but like most Google products, stay in beta forever.




“We do believe we have a moral responsibility to keep porn off the iPhone.” -- Steve Jobs











botimage
Copyright 2014 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki