The Google Ngram data is a large dataset of ngrams. An ngram consist of 5 words in a row and how often they are occuring during a year "lisa has gone to school 234 1978". There are approximately ten languages, each have about 700 ngram files, that may expand to 10G each. I would like to take out ngram of pronouns (e.g. he, she) and measure the valence. This allows to measure how groups are evaluated in different languages, context and times.