Query Google

NO LONGER FUNCTIONING

Query Google was a very nice program that let you automatically and rapidly Google a list of words and retrieve the hit counts. It was great for obtaining frequency values of words for use by researchers in linguistics and psycholinguistics.  My colleagues and I used it in writing these papers.

To tell you the truth:  I can understand Google's interest in forbidding mass searches of this kind, and indeed they eventually did crack down, so the program no longer works.

If you learn new ways to do auto-Googling, particularly with Google's permission, I'd be curious to learn about them.  My email is listed below.

Bruce Hayes
Department of Linguistics
UCLA
bhayes@humnet.ucla.edu

Everything below this line is obsolete and listed for historical documentation only.


see update below


Run Query Google
Learn About Query Google:         Purpose
What you need to run Query Google
Specifying words to query
Running the program
Choosing the target language
Details
Questions and comments


Purpose of Query Google

The purpose of Query Google is to permit the linguist to obtain the corpus frequencies of a chosen set of items from the Web, using the well-known Google search engine.  The linguist prepares a list of forms for which (s)he wants to know the frequencies, and the program rapidly queries Google, obtaining the frequencies of each form.


What you need to run Query Google

Since Query Google is a Web applet, it should (in principle) be able to run on any type of computer.  However, you do need to have the Sun Java Virtual Machine installed on your computer.  If you don't have this software installed (test:  try running Query Google and see what happens), you can obtain it for free from Sun Microsystems.


Specifying words to query

You should place the words you want to query in an input file, which should be a plain text file.  

To create a plain text file, use the "Save As Text" option in your word processor, or use a simple text editor like Notepad (PC) or SimpleText (Mac) that comes with your computer.  Each line of the text file should consist of a single word that you want Query Google to find the frequency of.  If you want to find the frequency of a phrase, with more than one word in it, be sure to put quotation marks ("") around the whole phrase.

Here is an example of a legal input file:

Saussure
Jakobson

Bloomfield
Sapir
"Noam Chomsky"


Running Query Google

To run Query Google, click on this link or on the Run Query Google link above.  Wait for a bit, while the Java programming language gets itself ready.  After a few seconds, you will see the program window, which looks like this:

Picture of the Query Google interface

Click on the Choose button.  A new window, Choose Input File, will pop up, which will permit you to navigate through the folders on your computer and locate your input file.  Once you've found it, click on the file name to choose it.  Then click Choose, and you will be returned to the main Query Google window.

Now, click Do Word Count.  Query Google will now ask Google to find the frequencies of the words (or phrases) in your file.  Currently, on a fast Internet connection, it can covers about 25 words or phrases per second.  A "Working..." window will pop up and show you how far Query Google has gotten.  When it tells you "Done", click on Close.

The results of Query Google are stored in a text file, in the same folder as your original list of forms.  For example, if your input filename is called YourFileName.txt, then the output file will be called Results for YourFileName.txt.

Here is an example.  On reading the input file shown above, Query Google (run in July 2003) returned:

Saussure 103000
Jakobson 57600
Bloomfield 803000
Sapir 61800
"Noam Chomsky" 182000

Since these counts are rather large, Google rounded them off to the nearest hundred.  For lower numbers, you get the exact value (for example, querying "Leonard Bloomfield" returned exactly 1790.)


Choosing the target language

Just like Google itself, Query Google permits you to restrict your search to a particular language.  You choose this by checking the box you want from the right side of the Query Google interface.

Google will probably be adding more languages in the future.  You can find the current set of languages, along with the special Google codes, at this Web page.  Look up both the name and code of your target language and type them in the Add Custom Language text box of the Query Google interface.


Details

Query Google provides two methods for automatic Google queries.  The recommended method is the one labeled "HTTP" on the Query Google interface (upper left corner).  This method is selected for you automatically if you don't make a choice.

If for some reason the "HTTP" method doesn't work, try the alternative "API" method.  To do this, you must first obtain a (free) authorization code ("license key"; click on link to get one from Google).  With this method, you'll have to enter the license key in the window near the upper left corner of the interface.   Query Google will remember this key the next time you use it, so long as you answer "yes" to the question "Save settings?" when you exit the program.

The "API" method currently limits you to 1000 queries per day.


Questions and Comments

Please direct questions and comments to Bruce Hayes at:


Update

I'm not sure how many searches you can perform with this utility.  In April 2007 I tried a rather lengthy search and was (very politely) blocked by Google.  A similar search today, however, worked just fine.

Various academics I know who have contacts at Google have suggested to me that if you do get blocked, it would be worth your while to make courteous inquiries to the Google company, indicating that your purpose is research; and that such queries would be likely to be heeded sympathetically.  So if you find that the size of the search you want to conduct exceeds what Google is currently permitting, I suggest you pursue this route.  I would appreciate hearing what you learn; see email below.

--Bruce Hayes, Department of Linguistics, UCLA, February 1, 2007.  


Back to Bruce Hayes's Home Page