Violence in the Brothers Grimm ’ s Fairy Tales : A Corpus-Based Approach

The purpose of this article is to carry out a corpus-based study on the presence of violence in a selection of eight tales by the Grimm's Brothers by looking at the terms which can be said to relate to the semantic field of violence. More specifically, this study will analyse a selection of eight tales in which the frequency of the words cut, dead and blood will be studied in detail. These words have been chosen due to their possible connection to violence after carrying out a quantitative analysis of the frequency of the whole main corpus. My initial hypothesis is that the corpus-based study of those eight tales would support my intuition regarding the high percentage of violence in the Brothers Grimm collection as opposed to the content of violence in a wider variety of texts. The study initially involved the analysis of frequencies of the lexical units in the Brothers' Grimm corpus and, secondly, a comparison of the results obtained in the frequency test to two reference corpora: the British National Corpus and the Cobuild Concordancer. The comparison and its results seem to indicate that there was a higher than average percentage of the use of words related directly or indirectly to violence.


Introduction
Kinder und Hausmärchen, known in English as Children's and Household Tales or the Grimm's collection) is a classic among children's literature.As children, we have all been fascinated by characters such as Snow-White, Cinderella and so on, and we have all dreamt of becoming one of them.Tatar (2004: xv), one of the worldwide renowned authorities on folklore and fairy tales, states in her preface to The annotated Brothers Grimm that "the fairy tales collected in Germany almost two centuries ago by the Brothers Grimm continue to have a powerful hold on our culture.Adapted, revised, rescripted and bowdlerized, they greet us at the movies such as Pretty Woman or Working Girl, at the opera with Hansel and Gretel or La Cenerentola, and in advertisements for everything from Chanel to chocolate, and in visual media as often as in print".
The purpose of this article is to carry out a corpus-based study on the presence of violence in a selection of eight tales by the Grimm's Brothers, which compose the main corpus of the present study-by looking at the terms which can be said to relate to the semantic field of violence.More specifically, this study will analyse the frequency of the words cut, dead and blood.These words have been chosen due to their possible connection to violence after carrying out a pilot survey of the frequency of occurrence of the whole main corpus.For the present analysis, I understand violence as a verbal or non verbal action that results in physical or psychological harm to a person, object, place or animal.My initial hypothesis was that the corpus-based study of those eight tales would support my intuition regarding the high percentage of violence in the Brothers Grimm collection as opposed to the content of violence in a wider variety of texts.The study primarily involved the analysis of frequencies of the lexical units in the Brothers' Grimm corpus and, secondly, a comparison of the results obtained in the frequency test with two reference corpora: British National Corpus and Cobuild Concordancer.The comparison and its results support the presence of higher-thanaverage percentage of the use of words related directly or indirectly to violence.
If my hypothesis was to be confirmed, the implications of my findings would be, that yet again, children are being exposed to violence through products which, supposedly, are products aimed at them.
Although this has been often affirmed and argued in numerous studies (i.e.Tatar 1987Tatar , 1992Tatar , 2004;;Haase 2008;Zipes 1991;) My contribution may somehow be justified by bringing in a new view on the content of the brothers Grimm's tales.That is, although many studies have paid attention to the violent content of the brothers Grimm's tales, none, to my knowledge have supported their argument empirically.This is therefore the aim of this study, to fill an existing gap in the already numerous studies dedicated to the brothers Grimm's tales.A corpus-based approach will provide a better understanding of such tales and make it possible for educators to deal with the tales more adequately.Labelling those readings more empirically and classifying in detail the lexical units belonging to the semantic field of violence can help understand the type of violence we are dealing with and, in addition, can help deal with it more effectively.
More specifically, this study will focus on the study of the frequency of occurrence of some of the lexical units referred to cruelty and violence found in eight out of the 210 tales which compose the brothers Grimm's collection.These eight tales will be analyzed using Concapp, in order to prove that the content of violence in these tales is higher than average.The analysis will provide me with a frequency list of the lexical items which compose the Grimm's corpus so as to highlight the percentages of some previously selected words related to cruelty and violence found in the tales.After that I will collate these results and compare them to other two corpora: the British National Corpus and the Cobuild Concordancer for comparison purpose.These results will be displayed in different graphics to make data understanding easier and visually clearer.

Statement of hypothesis
My main hypothesis here is that a corpus-based approach will contribute to a better understanding of such tales and will make it possible for educators to deal with the tales more adequately.Labelling the readings more empirically and classifying in detail the lexical units belonging to the semantic field of violence can help understand the type of violence we are dealing with and help deal with it more effectively.

Violence in the Brothers Grimm's Fairy Tales
As said before, the aim of the present study is to investigate the presence of violence in eight tales of the brothers Grimm's fairy tales collection using a corpus-based approach -in order to achieve an objective and empirical classification.
In my study, I primarily reviewed the different approaches which have been taken to research the presence of violence in the brothers Grimm's fairy tales, discovering that some of the most worldwide-recognised scholars who have researched the fairy tales, have provided many examples which confirm the high presence of violence in the Grimm's tales.Tatar, for instance confirms that "in fairy tales, nearly every character -from the most hardened criminal to the Virgin Mary-is capable of cruel behaviour" (1987:3-4), referring to the Grimm's tales.Interestingly, tales like Hansel and Gretel, The boy who went forth to learn what fear was and The Juniper tree, which all belong to the Grimm's collection, were included by Warner (1998:4), in a book described by the author herself as a book about fear.Not only violence, but the fact that the brothers Grimm tales were amongst the recommended titles by Nazis had a historical consequence: the Allied forces, after World War II, thought that the Grimm's fairy tales had contributed to Nazi atrocities and savagery.In fact, as Haase argues, the Nazis promoted German folk education and saw the folktales as a means to their racial and political ends (2008:407-408).For these reasons, their books were forbidden in England and America since, according to Haase, it was confirmed that these fairy tales were "profoundly repressive, fuelled prejudices and xenophobia, and glorified cruelty and militarism " (2008:408).Violence in the Grimm's tales is not only about killing or hurting, but also about indoctrinating malechildren to learn fixed roles which have many factors in common with psychological or physical ill treatments of women by means of sexist and racist attitudes (Zipes 1991:47).All the data mentioned above makes us question how it is possible to find so many scenes of violence in a collection of tales aimed at children.Tatar provides us with the answer to this question: the brothers Grimm had other scholars in mind when they published the tales (2004:xvi) as it was born as a philological research focused on collecting old stories from oral tradition -aimed at adults-to preserve German identity and not to achieve a collection of tales for children, in fact, the Grimm's collection became tales aimed at children due, at least in part, to marketing reasons (Alcantud 2009).

A corpus-based study of violence in the Brothers Grimm's Fairy Tales
Corpus linguistics focuses on the importance of studying patterns of real language research.More concretely, this discipline studies language through large collections of authentic -written and spoken-texts.These texts are called corpora and are used "to derive empirical knowledge about language" (Koteyko 2006:144-145).The above authors have highlighted the particular significance of violence in the brothers Grimm's collection from different approaches: socio-political, psychological and so on.Very little, however, has been investigated about the role played by violence within fairy tales from the point of view of a corpus-based approach.To fill this gap, I take as a starting point the work by Stubbs (1996), who explains how computer assisted analyses may provide a substantial and well documented alternative to the use of intuitive data as well as a new understanding of form-meaning relations.Stubbs establishes nine principles for this kind of analysis: Linguistics is essentially a social science and an applied science; language should be studied in actual, attested, authentic instances of use, not as intuitive, invented, isolated sentences; the unit of study must be the whole texts; texts and texts types must be studied comparatively across text corpora; linguistics is concerned with the study of meaning: form and meaning are inseparable; there is no boundary between lexis and grammar: lexis and grammar are interdependent; much language use is routine; language in use transmits the culture; saussurian dualisms are misconceived (1996: 24-44).
The point of departure of my study will be Stubb's second and fourth principles, that is to say, that text must be studied comparatively across text corpora using authentic samples.As argued by Stubbs (1996: 45), when talking about Sinclairs ' work, this shows in a precise and concrete way how a large corpus and an associated technology create a viewpoint which can lead to innovations in linguistic description and theory.The essential vision underlying corpus linguistics is that computer-assisted analysis of language gives access to data which were previously unobservable, but which can now profoundly change our understanding of language".Thus, using a computer assisted analysis in my analysis of the Grimm's Brother tales will enable me to prove my hypothesis in an innovative and objective way.
The present study will consider the frequency of use of some words related to cruelty and violence.Following Stubb's proposal (1996), I will take into account two factors in the study of the tales: the first one is the study of collocations of those words, extracted after doing the frequency test, which could be suspected of having some relation to cruel or violent situations.The second one will be the comparison of my findings to the same words in the reference corpora in order to study the percentages of use of those words in violent and cruel situations.As said before, a quantitative and empirical investigation of the Grimm's collection will be carried out using three distinct corpora: Firstly, the 8 tales which compose a corpus of 17,416 words.An electronic version, collated to a printed version (Grimm, 2009), has been used to make corpora research by computer easier to use.These tales are:

9.
The Twelve Brothers.Twelve brothers are obliged to leave the castle where they live and their little sister had to find them after suffering a fatal spell which made her remain mute.11.Little Brother and Little Sister.Two children who were seriously ill treated by their step mother had to escape from their house.The boy suffers a terrible spell.15.Hansel and Grethel.Two children, brother and sister, are abandoned by their father and kidnapped by a mean witch who tries to eat them.16.The Three Snake-Leaves.A man who is willing to do anything to bring back his dead wife, and the hard price he must pay.21.Cinderella.A girl who has no mother suffers with her father's new marriage to a widow with two other daughters.She is the object of serious ill treatments.40.The Robber Bridegroom.A young bride is married to a murderer.46.Fitcher's Bird.A wizard takes the role of a beggar and catches pretty girls.53.Little Snow-White.A girl, whose mother died, sees how her step mother tries to kill her on several occasions simply due to envy.
Secondly, I have used the Cobuild Corcordancer corpus.In order to compare the 1 percentages of frequency of some words related to violence which belonged to my main corpus (Grimm's tales), I typed in some simple queries (the words: cut, dead and blood) and I got a display of 50 concordance lines chosen at random from the corpus and which have been used for comparison purposes.Thirdly I used British National Corpus.I typed in the same queries as in Cobuild 2 Concordancer, (cut, dead and blood) and the search result showed the total frequency in the corpus and up to 50 examples, used for comparison purposes as well.I first carried out the quantitative analysis of the frequency by primarily identifying in a corpus all occurrences of a node word (word form or lemma) and its raw frequency.After that, I kept a record of collocates of this node which occurs in a window of defined size (i.e.four words to left and right).Then I counted the frequency of joint occurrence of node and each collocate; in this case the frequency of each collocation when related to violence and its estimation in percentages.Finally I collated these results with the ones achieved by the same process in the two reference corpora.

Methodology
In this section I describe the design of my study.I define the computational tool used to carry out my study and indicate the steps which have been followed.ConcApp Corcondancing program is the program which has been selected for this task.As ConcApp web page argues, 3 (it) "is a free and user-friendly text analysis program.It offers concordances, collocations and word frequency statistics.It can also be used to edit text files".ConcApp has been used in order to find the frequencies lists as well as the concordances lists in my tales corpus.
The corpus selected for the present study has already been presented as the brothers Grimm's tales.The method used for this analysis is based on computational corpus linguistics and comparative analysis.Comparative analysis implies carrying out an empirical analysis of the main corpus, which involves a computational analysis of frequencies of the lexical units in the Brothers' Grimm corpus.For that reason I have generated a list of all the words in it, ordered in alphabetical order and by percentage of frequency, in which it is possible to observe the lexical units which are used in a higher percentage in the tales.Analyzing this kind of information provided me with: firstly a study of the most frequently used words-not taking into account function words.It enabled me to check the concordances of some words which, having a high percentage of frequency of use in the Grimm's corpus, are related to violent and cruel situations.In other words, it allowed me to find out when these words have been used in the Grimm's corpus relating to violence or cruelty.
Secondly, a comparative study of the same selected words.That is to say, after finding the percentages of frequency of use in violent or cruel situations in the Grimm's corpus, I collated them to the percentages of use in the same situations achieved when typing the same words in the other two reference corpora, the Cobuild Concordancer and the British National Corpus.These two corpora provided me with a random sample of 50 examples in which the queries asked were included.So it was possible to achieve the percentages of use of the selected words in violent situations.By doing this, and after collating all the results, it was possible to prove my hypothesis.

Results
My first step was to generate a list of the most frequently used words, not taking into account function words.By doing this I was able to check those which were directly or indirectly related to violent and cruel situations.I worked with just the first 200 words of the frequency list of the tales corpus.This list was cleared out as it is a well known fact that the words which are more frequently used in any kind of text are all those with a mainly grammatical meaning.Those words are pronouns, prepositions, articles and some others.I decided to exclude them from my list mainly since they exert no influence on the final results of my research (cf.Pérez Paredes 2002).
Thus after having removed all the function words, the list was composed of 89 words (see Considering the results above, it can been observed that, essentially, there are a lot of verbs in past tenses, which shows it is not just a simple story but in addition a story with plenty of action in it.Besides, and more related to the present study, there are some words directly related to violence: dead, blood.The high frequency occurrences of these lexical units in comparison to other words thus highlight a relation of the tales to topics related to violence and cruelty.One interesting finding is that there are some words which apparently seem to have no relation to the semantic field of violence or cruelty but, if we have a closer look at their concordances within the tales corpus, there is a clear relationship with it.This is the case of words like pieces, fire and heart.They are amongst the most frequent words used in this corpus (pieces is used 17 times, heart 25 and fire 16).If we have a look at their concordances lists, we can observe the high percentage in which these words are used related to violence (fig. 1 and fig.2).
If we study in depth the concordances of the word fire, there are 7 out of 16 direct examples of violent or cruel situations in figure 1.For example, see line 2, the sentence "by wild beast, but the witch was cast into the fire and miserably burnt […]" or line 13 "And when she was bound fast to the stake, and the fire was licking at her clothes with its red tongue".
On the other hand, if we study the word pieces (figure 2) which appears 17 times, almost every time this word appears, it is related to violence -15 out of 17 examples.
Examples of this can be seen in line 2 "and therein lay human beings, dead and hewn to pieces, and hard by was a block of wood[…]".Or line 16 "Have thee in their power, they will cut thee to pieces without mercy, will cook thee […]".Some other examples can be seen in lexical units related to the semantic field of 'parts of the body'.The word heart, for instance, provided me with interesting data: almost every time this word was used, it was related to a violent or cruel situation (figure 3).
13 out of the 25 times this word is used in the text are related to violence.For example line 6 "Came running by he stabbed it, and cut out its heart and took it to the Queen[…] or line 12 " believing that she had eaten Snow-White's heart, could not but thing she was the first[…".It has to be taken into account that this one is not a word related per se to it.These tables are a clear indicator of the content of the tales corpus, but my intention is to go one step forward in the study of the most frequent words.My next step is to research the frequencies of three words selected from the list which, at first sight, are related to violence: cut, blood, dead.I have used ConcApp in order to find their frequency percentages.These percentages can be considered as high percentages bearing in mind we are working with a list composed of the 89 most frequent words in the tales corpus.This table confirms once again the remarkable presence of violence in these tales, since they are words related intrinsically to this semantic field, but are all these examples used violently in the corpus?In order to answer this question, my next step will be to extract percentages of how many of times these words have been used related to violence and cruelty within the tales by using a concordance analysis.In figures 7, 8, 9, 10, and 11 screenshots of the concordances related to these words in my corpus will be seen: Let's study the times in which these words appear in violent or cruel situations in  As we can see in table 3, a high percentage of times (100% for dead ) these words have been used in violent situations.These are very high percentages and they might already confirm my hypothesis which stated the presence of a high percentage of violence in the Grimm's tales.However once I have studied the corpus, my next step is to look for the same words in the two corpora proposed for comparison purpose: BNC, The British National Corpus and Cobuild Concordancer.The reason for this comparison is to determine if the percentages of frequency of use of these words are higher or shorter than the ones found in the reference corpora.
The next step in the analysis involves generating examples of the use of these three words in a larger corpus; with the intention to compare them to the corpus of the brothers Grimm's tales.
First a random selection of 50 solutions was taken from the British National corpus and the Cobuild Concordancer.The results are illustrated in tables 4 and 5 below: If we combine all the percentages in frequencies related to violent situations in just one   And we transform these data into a graph in order to obtain visual evidence of the difference in percentages (see graphic 1), we can observe that the Brother Grimm's corpus percentages on the use of violence are largely higher than the results obtained in the two reference corpora used for comparison purposes.

Conclusions
In this research into the brothers Grimm's fairy tales, I have studied a corpus composed of 8 of their tales in order to make a twofold computational analysis.This study has been carried out in two steps: firstly the analysis of frequencies of the lexical units in the Brothers' Grimm corpus.Secondly, comparing it to the British National Corpus (BNC) and the Cobuild Concordancer in which I found a high percentage of use of words related directly or indirectly to violence.
The computational study unquestionably provided confirmation of a high percentage of violence in the brothers Grimm's tales.Thus, I have found out the frequency of use and percentages of some words related to cruelty and violence: cut + parts of the body, dead and blood found in the tales.After that I have compared these results to other two corpora, the British National Corpus and the Cobuild Concordancer, in order to discover such a high percentage of violence in the text which leads us to think of them as not aimed at children.It is widely recognized, violence can be found where you least expect it: cartoons, sports, readings, movies, etc and the presence of violence in the Grimm's collection is another key example.
The true aim of this article was to demonstrate that classical tales aimed at children can be relabelled by way of a corpus-based approach in an objective and empirical way, taking into account the evolution of society.In this sense it seemed worthwhile using the presence of violence in the brothers Grimm's fairy tales collection to detect it following Stubb's line of study on corpus analysis.Thus, my point of departure was to demonstrate a higher than average percentage of presence of actions related to violence and cruelty within the Grimm's corpus empirically.After doing it, it has become apparent that any lexical unit which might be controversial -in other words, topics which could not be total or at least in part aimed at children-might be detected.
Almost no investigation has been carried out by means of a computer-based approach to the brothers Grimm's fairy tales.My results seem to be pointing to the possibility of getting classical readings relabelled according to a standard, empirical and objective list of conventions on literature aimed at children.
The present research is just a starting point of what I hope will offer a thorough insight into the Brothers Grimm's Fairy tales.I have only covered eight out of the 209 tales taken from the original version but, as observed, the presence of violent behaviour is a characteristic of almost all of them.The results, no doubt, are interesting enough to divert the public's attention towards a more in depth and multidisciplinary study of the whole collection itself, which I hope to continue doing.I realise the limitations in the review of the literature section which I hope to extend by including the perspectives of other disciplines such as psychology and social sciences, which, I am sure will give me a richer insight into the interpretation of the tales themselves and how they are or should be used in nowadays society.
To finish with, I want to make it clear that the historical and literary quality of the brothers Grimm Collection is not being questioned in this article.The only point here is to study the possibility that this collection does not suit the scale of values that we are trying to instil in our children according to the society in which we are living nowadays.
Figures 4, 5 and 6 show a detailed screenshot of ConcApp with the frequency percentages lists.As it can be seen, there are 19 examples of the word cut which represents 0.1089% of the whole corpus.In terms of the word blood, there are 18 examples (0.0774 %), which should be added to the two examples of the word blood-red (0.0086%) and the 4 examples of the word bloody (0.172%).Finally, I have found 19 examples of the word dead (0.1089%) plus one example of the word deadly (which is not representative), as illustrated in table 2 below.

Table 2 .
Frequency percentages of cut, blood and dead.

Table 3 .
Percentages of frequency