concordances.Rmd
Corpus export files often come in formats that require certain modifications if you want to import them into a spreadsheet program or if you want to read them into R as a data frame. The aim of concordances is to automatize this process. All you need is a corpus export file, and concordances will (try to) convert it for you. Currently it can handle export files from
getCWB
,getCOSMAS
getNSE
,getWACKY
. (getWACKY
will sooner or later be merged with getNSE
.)getCWB
depends on the package data.table
, which speeds up handling of large files considerably. By default, getCWB
therefore returns data.table objects, unless you set dt = FALSE
, in which case it returns an ordinary R data frame. All other functions return R data frames.
You can install concordances from github with:
if(!is.element("devtools", installed.packages())) {
install.packages("devtools")
}
devtools::install_github("hartmast/concordances")
The functions currently differ considerably in their arguments, the way they work, and also with regard to their reliability. I’ll try to optimize them in the near future. In principle, however, all functions require only one obligatory argument: the path to the file that you want to read in.
library(concordances)
getCWB("path/to/file.txt") # do not run
Note that on Windows machines, you usually have to use double backslashes in file paths, e.g.
getCWB("path\\to\\file.txt") # do not run
If you want to open the resulting dataframes in a spreadsheet, e.g. for annotating them, you can easily export them using write.table
:
myText <- getCWB("path/to/file.txt")
write.table(myText, "myText.tsv", sep = "\t", row.names = F, quote = F,
fileEncoding = "UTF-8")