Corpus export files often come in formats that require certain modifications if you want to import them into a spreadsheet program or if you want to read them into R as a data frame. The aim of concordances is to automatize this process. All you need is a corpus export file, and concordances will (try to) convert it for you. Currently it can handle export files from
In addition, the functions last_left and first_right provide the option to get the last n words from the left context and the first n words from the right context. The function export provides a convenient wrapper for write.table, exporting concordances as tab-separated UTF-8 files (without text qualifiers). In some cases, this is the most desirable option for KWIC concordance files as they can contain unmatched scarequotes, which can lead to parsing errors when using the typical CSV export settings. Tabs, by contrast, are rare (though not unheard of) and most of the functions in this package try to get rid of them.
getCWB depends on the package data.table, which speeds up handling of large files considerably. By default, getCWB therefore returns data.table objects, unless you set dt = FALSE, in which case it returns an ordinary R data frame. All other functions return R data frames.
You can install concordances from github with:
if(!is.element("devtools", installed.packages())) {
install.packages("devtools")
}
devtools::install_github("hartmast/concordances")
The functions currently differ considerably in their arguments, the way they work, and also with regard to their reliability. I’ll try to optimize them in the near future. In principle, however, all functions require only one obligatory argument: the path to the file that you want to read in.
Note that on Windows machines, you usually have to use double backslashes in file paths, e.g.
getCWB("path\\to\\file.txt") # do not run
If you want to open the resulting dataframes in a spreadsheet, e.g. for annotating them, you can easily export them using export() or write.table():
# read in text
myText <- getCWB("path/to/file.txt")
# export text
export(myText)
# export(myText) is equivalent to:
write.table(myText, "myText.tsv", sep = "\t", row.names = F, quote = F,
fileEncoding = "UTF-8")