July 10, 2007

10,000 Pairs of Eyes

In June Google held a conference on scalability in Seattle. Marrissa Mayer (unrelated to Tim) of Google commented on the 10,000 human evaluators Google has. Their function? Manually go through the search results and rate them.

Question: How do they tell if they have bad results?

Answer: By tracking uptime for various servers to make sure a bad one isn’t causing problems in their main focus. In addition, they have 10,000 human evaluators who are always manually checking the relevance of various results. Which keywords they type in wasn’t revealed.

Damn.

Now, it shouldn’t come as a big surprise that Google search has lots of different users who vary in age, sex, location, education, expertise, etc. According to Google’s data, the only factor that really influences how different users view search relevance is their location.

A key point in the session was distinguishing users as novice search users and expert search users. Novice users typically type queries in natural language while expert users use keyword searches.

Example Novice and Expert Search User Queries

NOVICE QUERY: Why doesn’t anyone carry an umbrella in Seattle?
EXPERT QUERY: weather seattle washington

NOVICE QUERY: can I hike in the seattle area?
EXPERT QUERY: hike seattle area

On average, it takes a new Google user just 1 month to go from typing novice queries to being a search expert. This means that there is little payoff in optimizing the site to help novices since they become search experts in such a short time frame. So don’t waste your time there.

When asked why Google only showed 10 search results per page when Yahoo! was displaying 20 years ago, Mayer responded that when that occurred they actually didn’t do any testing, they just showed 10 because that is what Alta Vista did. When they did test more results, it was found that fewer people used Google … as it would take 0.4 seconds to show 10 results while it would take 0.9 to show 25. That lag time caused the abandonment.

It was also discussed that Google is looking to improve search by not just searching for what you type, but also searching for what you mean:

User Query Google Will Also Try This Query
unchanged lyrics van halen = lyrics to unchained by van halen
how much does it cost for an exhaust system = cost exhaust system
overhead view of bellagio pool = bellagio pool pictures
distance from zurich switzerland to lake como italy = train milan italy zurich switzerland

Now I, of course, love the first query. “Comeon Dave, Give Me a Break…” That deserves a video break.

That show was in Oakland, an hour north of where I grew up in San Jose. I can still recognize high school classmates when they show shots of the crowd.

Alright, back to Google … the belief is that the next big revolution is a search engine that understands what you want because it knows you. This means personalization is the next big frontier. A couple of years ago, the tech media was full of reports that a bunch of Stanford students had figured out how to make Google five times faster. This was actually incorrect. The students had figured out how to make PageRank calculations faster which doesn’t really affect the speed of obtaining search results since PageRank is calculated offline. However this was still interesting to Google and the students’ company was purchased. It turns out that making PageRank faster means that they can now calculate multiple PageRanks in the time it used to take to calculate a single PageRank (e.g. country specific PageRank, personal PageRank for a given user, etc). The aforementioned Stanford students now work on Google’s personalized search efforts.

Speaking of personalization, iGoogle has become their fastest growing product of all time. Allowing users create a personalized page then opening up the platform to developers such Caleb to build gadgets lets them learn more about their users. Caleb’s collection of gadgets garner about 30 million daily page views on various personalized homepage.

Q&A

Q: Does the focus on expert searchers mean that they de-emphasis natural language processing?
A: Yes, in the main search engine. However they do focus on it for their voice search product and they do believe that it is unfortunate that users have to adapt to Google’s keyword based search style.

Q: How do the observations that are data mined about users search habits get back into the core engine?
A: Most of it happens offline not automatically. Personalized search is an exception and this data is uploaded periodically into the main engine to improve the results specific to that user.

Q: How well is the new Universal Search interface doing?
A: As well as Google Search is since it is now the Google search interface.

Q: What is the primary metric they look at during A/B testing?
A: It depends on what aspect of the service is being tested.

Q: Has there been user resistance to new features?
A: Not really. Google employees are actually more resistant to changes in the search interface than their average user.

Q: Why did they switch to showing Google Finance before Yahoo! Finance when showing search results for a stock ticker?
A: Links used to be ordered by ComScore metrics but ince Google Finance shipped they decided to show their service first. This is now a standard policy for Google search results that contain links to other services.

Q: How do they tell if they have bad results?
A: They have a bunch of watchdog services that track uptime for various servers to make sure a bad one isn’t causing problems. In addition, they have 10,000 human evaluators who are always manually checking teh relevance of various results.

Q: How do they deal with spam?
A: Lots of definitions for spam; bad queries, bad results and email spam. For keeping out bad results they do automated link analysis (e.g. examine excessive number of links to a URL from a single domain or set of domains) and they use multiple user agents to detect cloaking.

Q: What percent of the Web is crawled?
A: They try to crawl most of it except that which is behind signins and product databases. And for product databases they now have Google Base and encourage people to upload their data there so it is accessible to Google.

Q: When will I be able to search using input other than search (e.g. find this tune or find the face in this photograph)?
A: We are still a long way from this. In academia, we now have experiments that show 50%-60% accuracy but that’s a far cry from being a viable end user product. Customers don’t want a search engine that gives relevant results half the time.

Filed under Google by Jerry West

Permalink Print Comment

Leave a Comment

Made with WordPress and an easy to customize WordPress theme • Bankers Hours skin by Techie Coach