Connect, Grow, Thrive

ETAS Journal Editors’ Choice Number 33 (January 2019)

Claudia Harsch: How external tests can support teaching and learning in the foreign language classroom. 

ETAS Journal, Volume 34, Number 1, (Winter 2016), pp. 29-32

The issue of testing and assessment is often a source of intense discussion because people are so personally invested in it, with – in an ideal world – teachers painstakingly setting principled, meaningful and differentiated tasks, and students preparing and performing as best they can. The article I have chosen compares and contrasts the purposes and characteristics of internal ‘classroom-based assessment’ (generally conducted by the teacher) with external assessments (set by exam boards), and explores angles of formative and summative assessment, as well as both the limitations of ‘teaching to the test’ and the hoped-for add-on of positive washback. 

First, the author outlines the cornerstones of classroom-based assessment, highlighting its clear usefulness for diagnostic purposes and the beauty of easy alignment with the relevant curriculum, whilst also discussing the limitations of this type of assessment. She goes on to examine the characteristics of external tests, highlighting the obvious advantage – in contrast to most classroom-based assessment – that such tests are designed by experts, before being piloted and revised, guaranteeing quality and also comparability with performances in other such tests. 

The article then explores how external tests can help provide positive washback by helping teacher and students to identify areas to work on and how exam objectives can provide a structured approach to a general language learning course with no standardised final assessment. It concludes with a plea for practitioners to engage both critically and constructively with formalised external assessment in order to positively shape contemporary classroom assessment.

The author demonstrates wide-ranging knowledge of her field of interest and at the same time addresses key points with the necessary degree of depth but without overwhelming the reader with superfluous detail. She also refers to – and helpfully also describes– a number of relevant publications and professional bodies involved in assessment, providing a useful jumping-off point for ELT practitioners interested in exploring the topic further.

Lynn Williams Leppich

How external tests can support teaching and learning in the foreign language classroom

Claudia Harsch

The relationship between external tests and the foreign language classroom is complex and not without controversy. While external tests have been criticised for their potentially negative impact on teaching and learning, this article argues that there are a number of ways in which external tests can support teaching and learning if they are used responsibly within their scope and purposes. 


I will first highlight some of the key differences between assessment done by teachers in the classroom and external assessment by standardized tests. Based on this, three aspects from very different contexts serve to illustrate how external tests can influence learning and teaching in a positive way. I will outline how external tests can enhance learner diagnosis, how proficiency certification can inform admissions decisions and language support in the context of higher education, and finally how test-preparation courses could be designed to aim at language development complementary to test preparation. This is rounded off by a brief view on the influence of international test quality standards on teacher-led test development as part of enhancing assessment literacy within the teaching profession.

The use of external tests in foreign language education has been gaining momentum, whether for certification purposes, for reasons of comparison within an educational system, or for educational monitoring. This development has not gone unchallenged. Scholars such as Cumming (2009a) or Fulcher (2009) have voiced warnings against the possible unwanted effects of external tests on learner development or pedagogical contexts. These effects can come about because, in general, pedagogical contexts differ from testing contexts in a number of aspects. These include the contextual constraints on tests (such as limited time, individuals having to work alone, or the need to restrict responses to one correct and objective solution), the pedagogical orientation towards the curriculum and tests’ orientation towards external criteria, their level of standardization, and their choice of focus: on the process or products of learning, or their potential of feeding back results into the teaching and learning process (e.g. Norris, Brown, Hudson & Bonk, 2002). Nevertheless, both worlds, the world of teaching and the world of testing, using Dlaska’s and Krekeler’s (2009) terms, have points of contact where external tests can inform teaching and lead to positive washback. Some of these points of contact will be explored in this paper. First, we will look at the characteristics of assessments developed by teachers in the foreign language classroom before we explore the world of external tests. Finally, I will illustrate how external tests can be used to inform teaching and learning in a range of different ways.

Classroom-based assessment

Assessment in the classroom is traditionally done by teachers, who set objectives and find suitable means of assessing their learners’ development and achievements. They may do this in formative and summative ways. Formative here means that the main purpose of the assessment is to feed the results back into the teaching and learning process (e.g. Brooks, 2002; Wood, 2007), whereas summative refers to assessments employed after a certain period of teaching and learning, to gain insight into learners’ attainment (e.g. Council of Europe, 2001, chapter 9).

The focus of assessments developed by teachers can either be on processes or on products. Usually, the content of the assessment is very closely linked to the teaching content and aligned to the curriculum, in order to monitor and find out what students have taken on board. Such tests can also help the teacher to see where to go next. Students can be assumed to be familiar with the content and the formats used because they have seen similar tasks before. In general, it is then straightforward for the teacher to feed results back into the classroom, linking assessment with teaching and learning. The interested reader is referred to Cumming (2009b) or Rea-Dickins (2001) for more detailed discussions of the requirements of classroom-based assessment. 

The instruments developed for the classroom context are usually informal ones, which have not been pretested or formally validated and calibrated in the way that would be expected for a professionally- developed test. While this is understandable given the contextual constraints, it does mean that such assessment results are hard to compare across contexts, as they only have meaning in the particular context they have been developed for. For example, since the relative difficulty of such informal tests is not known, it is meaningless to compare results from one class with results from another class who took a different informal exam. It is also impossible to make meaningful comparisons between results obtained in different years if different uncalibrated exams are used. Since most informal classroom tests take the curriculum as their point of reference, it is also difficult to interpret the results with reference to external criteria such as educational achievement standards or proficiency frameworks such as the Common European Framework of Reference for Languages (CEFR).

External tests

It is precisely in the area of comparability – let it be in terms of difficulty or criteria – where external tests can offer the greatest benefits. External tests serve a variety of purposes, such as certification, school evaluation, or international educational monitoring. Common to all such tests is that they are developed by institutions for specific purposes in a standardized way. This means that they are comparable within one institutional context, and they adhere to international quality requirements. You can find good general overviews of the test development process in Alderson, Clapham, and Wall (1995) or Bachman and Palmer (2010). Quality requirements are expressed in so-called quality standards or codes of ethics, which have been developed and published by a number of testing associations and test institutes as one step towards establishing ‘good practice’ in the profession (e.g. Association of Language Testers in Europe, 2001; European Association for Language Testing and Assessment, 2006; International Language Testing Association, 2007). The most important quality criteria cover the reliability and validity of a test, i.e., whether the test instrument yields consistent results, and whether the instrument measures what it is supposed to measure. The instruments are usually pretested with a relatively small number of learners to check for their reliability and they are validated, using quantitative and qualitative methods, to examine to what extent they measure what they are supposed to measure.

Often, external tests are aligned to what are called achievement standards or proficiency frameworks. This allows test providers to report results not only in terms of test scores but also in a qualitative way, describing what a learner at a certain level can do. This allows for comparison across test instruments, exams, and educational contexts. 

Perhaps the most prominent proficiency framework at present is the Common European Framework of Reference for Languages (CEFR, Council of Europe, 2001), which describes learner proficiency at six ascending levels. While its use in test development is critically perceived by some (e.g. Fulcher, 2004; Weir, 2005), it has been employed by all major international test providers in many educational contexts across Europe and beyond in order to characterise tests, educational standards and exams in terms of what they measure, and the demands they place on test takers.  International tests that have been linked to the CEFR using a variety of methods include those developed by Oxford University Press, Cambridge Assessment, the Educational Testing Service, Pearson Language Assessments, and Trinity College, London. School exams and curricula developed in a range of countries such as Austria, Germany, Hungary, and elsewhere have also been linked to the framework. 

The focus on communication demands and communicative tasks as met in ‘real-world’ contexts is one of the key features of the CEFR and of tests of communicative proficiency which are based on needs-analyses: analyses of which language activities are most relevant for a certain test-taker group. This allows for interpreting the results of such proficiency tests in terms of relevant ‘real-world’ communicative skills. These skills, in turn, can serve as learning objectives for potential test takers: for foreign language learners.

Influence of external tests on learning and teaching

There are a number of ways in which external tests can influence language learning and teaching; this influence is called washback, and it can be positive as well as negative. Good accounts of washback can be found in Chen, Watanabe, and Curtis (2004) or Wall (1997). Negative washback is found, for instance, when a test leads to limiting the scope of the teaching content to match the test content rather than covering a comprehensive curriculum (‘teaching to the test’). Another symptom of negative washback involves focusing on test-taking strategies rather than teaching language proficiency and communicative skills. In light of these potential dangers, 

I would like to stress the teacher’s (and more generally, the test user’s) professional responsibility to carefully consider such dangers and to use external tests sensibly within their realm and limitations.

Since much has been published on these matters (e.g. Cumming, 2009a; Fulcher, 2009; Shohamy, 2001), I will now pick out three areas in which external tests could lead to positive washback and inform teaching and learning. These are:

            1. how tests can enhance learner diagnosis

            2. how proficiency certification can inform admissions 

               decisions and language support

            3. how test-preparation courses could be designed to aim 

                at language development.

When diagnosing learners, teachers often ask for an external, validated, and calibrated instrument. One such tool has been provided by the DIALANG project (Alderson & Huhta, 2005), where calibrated items are offered which operationalize the CEFR proficiency levels for the four skills in a range of languages. The online tool includes learner self-assessment and provides a detailed profile of learner proficiency in relation to the CEFR. DIALANG can be used across educational settings and independently from curricula and course requirements, providing learners and teachers with a detailed diagnosis of areas of strength and weakness. This diagnosis can helps them to formulate learning objectives and informs teaching. 

Another area of positive influence can be found in the field of proficiency certification. Professional contexts as well as higher education institutions require internationally recognised language certificates, which are usually provided in a standardized way by large testing institutions. As outlined above, almost all such certificates are nowadays linked to the CEFR, reporting learner proficiency in terms of CEFR levels. It has to be stressed that quality of the procedures linking the test to the CEFR is crucial (see Council of Europe, 2009). Test users are well advised to look carefully at the research-based evidence published by the test provider.

As long as the test has been developed in line with international quality requirements, the results can be informative for the teaching profession. Take, for example, international students entering universities. The minimum language requirements are, in the UK and in large parts of Europe, stated in terms of CEFR levels and have to be demonstrated by an international language test certificate. The reported profile allows admissions and departments to decide whether a student is eligible for direct entrance or whether the student needs additional language support. The score profile and its qualitative description (often referring to CEFR criteria and levels) can help the receiving institution to tailor the language support provided by the university to suit the student’s needs. Thus, pre-sessional and in-sessional language support courses can draw on students’ proficiency profiles in order to formulate learning objectives to be addressed in class. These in turn can be linked to the CEFR levels, and to internal assessment during pre-sessional and in-sessional classes, so that students’ progress during these courses can be tracked and reported in meaningful ways. The last area I would like to address here is the washback of test preparation courses on language learning and teaching. A detailed account of this effect in relation to one particular test is given by Green (2006, 2007). Test preparation courses, by their very nature, are designed to teach to the test. However, this need not be a matter of reducing teaching to test-taking strategies. Instead, a course may be built around a test’s communicative activities with the aim of developing the learners’ proficiency. 

Building this kind of course is probably only possible if the test is well-designed, carefully developed, and is based on needs analysis. The test should include communicative tasks which come close to the communicative reality that the test aims to reflect (e.g. Bachman, 1991). If this is the case, the test objectives can serve equally well as course objectives and test tasks can be employed in the language classroom not only to illustrate the test demands, but also to stimulate language development. For example, in a test of languages for academic purposes, learners might be invited to explain, justify, and discuss their solutions to a problem. Depending on the problems learners have in such exchanges, the teacher can design lessons around emerging issues. Pedagogical tasks can be developed which aim at the tests’ objectives but are not restricted by standardized testing constraints such as requiring individuals to work on items in a predetermined amount of time or restricting responses to one correct and objective solution. For pedagogical purposes, tasks can be used to stimulate cooperation and learner discussion, be addressed in a wider range of communicative ways (relevant to the test), and integrate the range of skills addressed in the test. In this way, tasks based on the test demands can lead to language development at the same time as preparing learners for the test. Ellis (2003) or Willis and Willis (2007) are helpful sources of information on frameworks for task design.


The possibilities of making use of external tests to inform teaching and learning are by no means limited to the three aspects I have outlined above. As long as one acknowledges the test’s intended purpose and scope, and, critically, uses professional expertise when employing external tests, these instruments can provide additional information for learners and teachers which informal assessment alone cannot provide. They allow a view on learner proficiency related to external criteria such as the CEFR ‘Can-Do’ statements, which can be used for comparison, diagnosis and aligning teaching and learning to external standards and proficiency frameworks. 

There remains one area I would like to highlight with regard to the positive impact of standardized tests on classroom-based assessment practice. This is the application of quality standards. A substantial body of publications exists aiming at ‘good practice’ in the development of classroom-based assessments. Examples include Davidson and Lynch (2002), and Hughes (2002). Making use of these publications in pre- and in-service teacher training is one way of developing assessment literacy in the teaching profession, the need for which has been identified in recent studies (e.g. Hasselgren, Carlsen, & Helness, 2004; Huhta, Hirvelä, & Banerjee, 2005). This way, ‘good practice’ developed in international high-stakes contexts can positively inform assessment practices in the language classroom.

Editor’s Note: This article was originally published as part of the Oxford English Test by Oxford University Press. Reprinted here with the kind permission of the author and Oxford University Press.


Alderson, J. C., & Huhta, A. (2005). The development of a suite of  computer-based diagnostic tests based on the Common European Framework. Language Testing, 22(3), 301-320.

Association of Language Testers in Europe. (2001). ALTE principles of good practice for ALTE examinations. Retrieved from

Bachman, L. F. (1991). What does language testing have to offer? TESOL Quarterly, 25, 671-704.

Brooks, V. (2002). Assessment in secondary schools: The new teacher’s guide to monitoring, assessment, reporting, and accountability. Buckingham, UK: Open University Press.

Chen, L., Watanabe, Y., & Curtis, A. (Eds.). (2004). Washback in language testing: Research contexts and methods. London, UK: Lawrence Erlbaum Associates.

Council of Europe. (2001). A common European framework of reference for language learning and teaching. Cambridge, UK: Cambridge University Press.

Council of Europe. (2009). Relating language examinations to the Common European Framework of Reference for Languages (CEFR). A Manual. Strasbourg, France: Language Policy Division.

Cumming, A. (2009a). Language assessment in education: Tests, curricula and teaching. Annual Review of Applied Linguistics, 29, 90-100.

Cumming, A. (2009b). What needs to be developed to facilitate classroom-based assessment? TESOL Quarterly, 43, 515-519.

Davidson, F., & Lynch, B. (2002). Testcraft: A teacher’s guide to writing and using language test specifications. New Haven, Connecticut: Yale University Press.

Dlaska, A., & Krekeler, C. (2009). Sprachtests. Leistungsbeurteilungen im Fremdpsrachenunterricht evaluieren und verbessern. Hohengehren, Germany: Schneider.

Ellis, R. (2003). Task-based language teaching and learning. Oxford, UK: Oxford University Press.

European Association for Language Testing and Assessment. (2006). Guidelines for good practice in language testing and assessment. Retrieved from

Fulcher, G. (2004). Deluded by artifices? The Common European Framework and harmonization. Language Assessment Quarterly, 1(4), 253-266.

Fulcher, G. (2009). Test use and political philosophy. Annual Review of Applied Linguistics, 29, 3-20.

Green, A. (2006). Watching for washback: Observing the influence of the International English Language Testing System academic writing test in the classroom. Language Assessment Quarterly, 3(4), 333-368. 

Green, A. (2007). Washback to learning outcomes: A comparative study of IELTS preparation and university pre-sessional language courses. Assessment in Education, 14(1), 75-97.

Hasselgren, A., Carlsen, S., & Helness, H. (2004). European survey of language testing and assessment needs. Report: Part one – general findings. Retrieved from

Hughes, A. (2002). Testing for language teachers (2nd ed.). Cambridge, UK: Cambridge University Press.

Huhta, A., Hirvelä, T., & Banerjee, J. (2005). European Survey of language testing and assessment needs. Report: Part two – regional findings. Retrieved from

International Language Testing Association. (2007). ILTA guidelines for practice. Retrieved from content&view=article&id=182:ilta-code-of-ethics-and-guidelines-for-practice-as-pdf-files&catid=3

Norris, J. M., Brown, J. D., Hudson, T. D., & Bonk, W. (2002). Examinee abilities and task difficulty in task-based second language performance assessment. Language Testing, 19(4), 395-418.

Rea-Dickins, P. (2001). Mirror, mirror on the wall: Identifying processes of classroom assessment. Language Testing, 18(4), 429-462.

Shohamy, E. (2001). The Power of tests: A critical perspective on the uses of language tests. London, UK: Longman.

Wall, D. (1997). Impact and washback in language testing. In C. Clapham &  D. Corson (Eds.), Encyclopaedia of Language and Education 7: Language testing and assessment, (291-302). Dordrecht, Netherlands: Kluwer Academic.

Weir, C. J. (2005). Limitations of the Common European Framework for developing comparable examinations and tests. Language Testing, 22(3), 281-300.

Willis, D., & Willis, J. (2007). Doing task-based teaching. A practical guide to task-based teaching for ELT training courses and practising teachers. Oxford, UK: Oxford University Press.

Wood, D. (2007). Formative assessment. Edinburgh, UK: Sage.

About the Author

Claudia Harsch is a professor at the University of Bremen, specialising in language learning, teaching, and assessment. She has worked in Germany and in the UK, and is active in teacher training worldwide. Her research interests focus on areas such as language assessment, educational evaluation and measurement, intercultural communication, and the implementation of the CEFR. Claudia is the current president of the European Association of Language Testing and Assessment.