FAQ (Fletcher-Anticipated Questions)
Exploring Words and Phrases from the British National Corpus*

What are...?

n-grams
on this site n-grams means sequences of n words as defined here. In this database, n can be any number in the range 1-6, i.e. from individual words up to six-word phrases. Only words and phrases occurring at least five times in the BNC are included here. Relatively frequent n-grams are typically familiar building blocks of English; such recurrent n-grams are also known as lexical bundles, lexical chains or clusters. <<add references>> Shorthand forms like 1-gram, 2-gram, 3-gram etc. specify the value of n; some prefer unigram, bigram, trigram etc.  In information retrieval and computational linguistics contexts term n-gram more frequently means "sequence of n characters".
phrase-frames
sets of phrases (n-grams) which are identical except for one word, dubbed the "wildword" and represented by the wildcard sign *.  For example, at the * of  is a phrase-frame with variants like at the start of,  at the end of, at the heart of etc. Phrase-frames are useful tools for discovering phraseological patterns.  Guidelines for choosing n-grams or phrase-frames are given in the tutorials. Parallel to 3-gram etc. this site uses 3-frame etc. as shorthand for "phrase-frame of three words", and p-frame is a handy stand-in for phrase-frame.
words
lexical units as identified by the BNC's CLAWS parser with POS tags, including "multiword units".  "Fused forms" are split up into morphemes, each tagged as a separate word token. Orthographic variants of the same lexeme (database / data-base, realise / realize) appear as different lexical units. Compound nouns written with white-space instead of hyphens are separated  into their components, so data base is treated as two lexical units.
multiword units
phrases that function grammatically as single words, e.g. conjunction so that or preposition in spite of,  receive a single POS tag, so they are treated here as single words.  To make this obvious in search results they are displayed with underscores instead of spaces: so_that, in_spite_of. To search for multiword units you must enter them in a single query field and use underscores, not spaces. Since spaces are used to separate multiple words to match, the word-form filter in spite of matches in OR spite OR of.  List of multiword units.
fused forms
multiple morphemes written without space in English such as cannot, he'd, George's are "de-fused" by the parser into can not, he 'd, George 's. Different POS tags clarify whether 'd stands for had or would and whether 's comes from is or has, or else represents a possessive. List of fused forms.
filters
query conditions which focus the matching dataset by "filtering out" unwanted items.   Filtering can be done by words, POS codes and / or frequency, and multiple forms can be specified to either include or exclude from the dataset.
POS-tags
"Words" in the corpus are tagged with one of 57 "Part Of Speech" codes consisting of three characters; this list of POS codes explains and gives examples of how these codes are applied. The w&pdb database permits searching for specific combinations of POS codes specified by either choosing from a list or entering directly; wildcards can be used to match groups of related codes. Occasionally the code UNC (unclassified) is overused, for example for the ai of ain't, which is ambiguous but could be assigned manually to the proper form of BE or HAVE.

Why...?

Why do you only support Internet Explorer?
In this initial phase the time required to develop and test for multiple browsers would detract from building the database and user interface.  Webmasters report that over 85% of Website visitors use Internet Explorer (IE), and even more have access to IE on their machine. When this Website is stable and fully documented I will strive for cross-browser compatibility.  Incidentally, the compact and capable Opera 7 browser supports most of the IE features on this site, and most functions should also work in Netscape versions 6 and higher.
Why do I see no change in the results pane after editing the query parameters?
When you change any of the query parameters you must click the "Query" button to start a new query.  (The "Next" button in the results pane continues fetching
subsequent chunks of the dataset from your previous query.)
Why do I only see the page heading in the results pane, but no results appear?
The current server can be very slow:  you may have to wait up to 90 seconds for results, and the server or your browser may "time out" while you are waiting. Some suggestions are...
Why do results show no matches for a phrase that must be in the BNC?
This question has many possible answers:
Why are there no phrase frames matching my query even though I find several variants in the database?
Why can't I save results pages with the "Save Page" or "Save Data" buttons?
These buttons require the ActiveX file system component and work only with the Windows version of Internet Explorer 5.x and greater. With this browser your security settings will prevent saving pages unless you either have enabled ActiveX components to run automatically or after prompting (in which case you will be nagged for permission each time). It is potentially unsafe to allow every site to run any desired components on your computer. The best solution is to add this site to the browser's "Trusted Sites" list.  (Tools menu > Internet Options... menu > Security tab, click on the "Trusted sites" icon, then the "Sites" button and add this site to the list.  Uncheck "Require server verification..."), then click "Ok".  On this site ActiveX is used exclusively to save Web pages.  Users with security concerns are encouraged to verify this by inspecting the JavaScript function savepage( ) in the script file BNCresults.js.
Why can't I find common phrases like of course, in spite of?
Such "multiword units" are treated by the BNC's CLAWS parser as single words. Enter them in a single word field and replace the spaces with _ (underscore):  of_course. Complete list of multiword units
Why can't I find contractions like don't, they're or possessives like children's, parents'?
Such "fused forms" are treated by the BNC's CLAWS parser as separate words. Enter each part in a separate word field: do n't, they 're, children 's, parents ' . Note that "altered" forms like won't, ain't are segmented as wo n't, ai n't; the exception can't is segmented can n't', parallel to cannot > can not.  Complete list of fused forms.