Challenges to Responsible Research (2)
Representativity of Web as Corpus
Much ill-formed or fragmentary language
Domain only a rough clue to provenance
Numbers vs. Statistics
Search engines number of pages matching
a query, not actual citations
One page may contain alternate usages
Narrower filters may eliminate some pages