44 2033180199
All submissions of the EM system will be redirected to Online Manuscript Submission System. Authors are requested to submit articles directly to Online Manuscript Submission System of respective journal.
Journal of Pure and Applied Mathematics

Sign up for email alert when new content gets added: Sign up

Nikolay P Takuchev*
 
Trakia University, Stara Zagora, Bulgaria, Email: nnnpppttt@gmail.com
 
*Correspondence: Nikolay P Takuchev, Trakia University, Stara Zagora, Bulgaria, Email: nnnpppttt@gmail.com

Received: 04-Jul-2023, Manuscript No. puljpam-23-6569; Editor assigned: 06-Jul-2023, Pre QC No. puljpam-23-6569 (PQ); Accepted Date: Jul 28, 2023; Reviewed: 10-Jul-2023 QC No. puljpam-23-6569 (Q); Revised: 15-Jul-2023, Manuscript No. puljpam-23-6569 (R); Published: 31-Jul-2023, DOI: 10.37532/2752-8081.23.7(4).260-271

Citation: Takuchev NP. Semantic approach of knowledge assessment through tests of multiple choice type–theory, confirming experiment, and an adaptive algorithm. J Pure Appl Math. 2023; 7(4):260-271.

This open-access article is distributed under the terms of the Creative Commons Attribution Non-Commercial License (CC BY-NC) (http://creativecommons.org/licenses/by-nc/4.0/), which permits reuse, distribution and reproduction of the article, provided that the original work is properly cited and the reuse is restricted to noncommercial purposes. For commercial reuse, contact reprints@pulsus.com

Abstract

Knowledge assessment through tests is objective and effective technology, widely used in modern education. Tests (multiplechoice tests), consisting of dichotomous items (a question with one correct and one or more incorrect answers (distractors) are widely used in modern education to assess students’ knowledge. Test developers face the problem of converting the number of correct answers into a numerical grade for the knowledge of the assessed. Usually, the number of correct answers is converted into a grade based on the evaluators' inner sense of fair evaluation.

Objective: In the present work, a model for knowledge evaluation through dichotomous tests is proposed, based on the so-called semantic branch of Information Theory.

Results: A critical opinion is given for the Classical Test Theory and the modern Item Response Theory as tools for knowledge assessment. Some concepts in these theories leave the feeling that it could be desired more in respect of assessing knowledge through them. A new, entirely different approach to knowledge assessment by tests is proposed in the paper. In the proposed information model for knowledge assessment, the process of knowledge assessment is considered as an information process with information transfer. The information is generated by a source (the assessed), which has a goal – to get as close as possible to the error-free solution of the test. The information in the form of an information signal (the answers to the test that the assessed gives) is directed to the recipient – the assessor. The assessor evaluates the value (importance) of this information signal, which is a measure of the knowledge of the assessed. The value of the information signal is measured by the progress of the examinee towards reaching the goal. Formulas are obtained, linking the value of the information signal with a numerical grade of knowledge of the assessed. In particular, evaluation formulas are derived for the six-score scale (used in Bulgaria) for tests of the most used types – with items with 3, 4, and 5 answers. However, detailed assessment requires answering a large number of items (items bank, included in the test at the stage of development), which increases the time for the examination. The examination time could be shrunken with an adequate algorithm that reduces items number according to the answers of the examinee, without deterioration the quality of the examination and assessment. An adaptive algorithm of knowledge assessment is proposed, based on analytical expressions, which can be integrated into computer tests in order to shorten the examination process by reducing the number of items asked, depending on the examinee’s previous answers. The adaptive algorithm reduces the number of items that the examinee answers, compared to the number of items in the bank. The grade that the examinee receives for his/her knowledge of the examined topic differs from the "exact" grade (that he/she would receive after solving a test with all items in the bank) with a value not exceeding a given tolerance. The grade is calculated from (1) the number of items in the items bank; (2) the number of items the examinee has answered, which are a part of all items in the items bank, and (3) the relative number of correct answers.

Key Words

Assessment of knowledge; Semantic approach in Information theory; Information value (importance); Tests of the multiple-choice type; Adaptive algorithm.

Introduction

The Tests have been used since antiquity as a tool for assessing knowledge for selecting candidates for the Chinese administration. Their main advantage is their applicability as a technology for a rapid assessment of knowledge. Since the beginning of the 20th century, psychological tests have been developed, initially in the USA. The so-called Classical Theory of Tests (CTT, discussed for example in [1]) began to develop in the 40s.

In the last few decades, there has been a growing worldwide use of a certain type of test–dichotomous, composed items (tasks) with identical structure, each consisting of a question with two types of logically mutually exclusive answers – one correct and several incorrect answers. Incorrect answers are misleading (distractors), and serve to reduce the probability to guess the correct answer.

After the 80s, was developed the so-called Item Response Theory (IRT) [1-7], which claims to be applicable to the analysis and evaluation of all types of tests, in particular those for knowledge assessment. IRT offers several types of dependencies (models) on the probability of a correct answer to a given item on the intellectual qualities of the evaluated, in particular, their knowledge of the subject under examination. The models differ in the number of parameters included in them, subject to determination after testing of a group answering the test. As a result of the testing, the values of the parameters in the models are determined and it is assessed which of the models is the most suitable for assessing the verifiable qualities of the group. At the author's modest discretion, in the particular case of knowledge assessment, there is still much to be desired from the IRT. For example, the difficulty of the test item is an individual feeling of the evaluated, and in the models of IRT the difficulty is a parameter in the models of the test item, independent of the knowledge of the evaluated person, i.e. according to IRT models, the item is equally difficult for both the knowledgeable and the ignorant to answer correctly. Unconvincing is the solution in the IRT of the problem of the individual's tendency to accidentally guess the correct answer, included in some models as a parameter of the test item, and not as the individual's tendency to use or not guess when solving a test. In this regard, Ivailo Partchev [3] commented in chapter "Guessing and the 3PL model": "Items never guess–people do".

Objective

The above-discussed theories CTT and IRT use the idea to evaluate the knowledge of the individual relatively – they use the evaluation of knowledge of a group of individuals in one or another form (as a sample or the entire population).

The present paper describes a new informational approach to knowledge assessment through a test, completely different from the concepts set in CТТ and IRT. The process of knowledge assessment is regarded as a special case of an information process related to the generation, transmission, and perception of information. In the frame of the described informational approach, the knowledge of the individual is assessed absolutely – as the feature of the individual and it is not necessary to access the knowledge of a group of individuals to be assessed this feature of the individual.

Material and Methods

Information System

A system with an information process in it is hereinafter referred to as an information system. In the particular case of knowledge assessment, the information system consists of:

1.Source of the information signal. In this case, it is theexaminee (the evaluated), who has the goal, when sendingan information signal.

2.An information signal directed to the recipient.

3.Recipient of the information signal. In this case, it is theexaminer (the evaluator).

The examinee is a generator of the information signal to be perceived by the evaluator. An information signal in the case of the assessment process means the overall presentation of the knowledge that the examinee provides to the evaluator – depending on the signal carrier, this may be a written work, an oral presentation, or computer test answers.

By submitting the information signal, the evaluated has a specific goal -he/she strives to be as close as possible to achieving it – to answercorrectly of all the items in the test. Progress towards the goal ismeasured according to the criteria developed by the evaluator. Theycan be:

1.Applied directly by a person-evaluator, based on a generalassessment of the information signal submitted by theevaluated – the so-called holistic assessment based on theprofessional experience of the evaluator. The assessmentcriteria remain unclear in the case.

2.Pre-clearly formulated criteria, with which the evaluated isalready familiar at the stage of preparation for the exam. Inparticular, the criteria may be embedded in a computer test.

The evaluator receives the information signal and assesses the value of the received information, measuring the progress of the examinee towards achieving his goal. The value of the information signal is maximal when the goal is reached. The partial progress in moving towards the goal corresponds to the partial value of the information signal. The evaluator uses the value of the information signal as a measure of knowledge of the evaluated.

Semantic branch of information theory

The main branch of information theory is related to solving the problems of machine transmission and coding of information, as human problems of conscious generation, transmission, perception, understanding, and subjective evaluation of information remain outside the objectives of researchers working in this branch.

A group of researchers works in the so-called "Semantic" (meaningful) branch of information theory. These are mainly Russian scientists.

No literature data have been found for the application of the semantic branch of Information theory to the problem of knowledge assessment.

Results

Definition of the value of information signal applicable to the problem of knowledge assessment

The approach for assessing knowledge discussed below, as well as all the methods used for assessing knowledge, is indirect (as far as direct reading of thoughts is not possible) – the knowledge of the evaluated on a given topic is judged by his/her answers – his/her information signal. The information value of this signal is assessed through the progress of the evaluated towards reaching his/her goal – the correct answer to all items in the test. Progress is measured according to the criteria set by the evaluator. The value of the information signal is a characteristic of the knowledge of the evaluated on the topic. The value of the information signal ("knowledge") in the proposed information approach is obtained through "ignorance" – by assessing the progress towards the goal of the examinee without any knowledge of the examined topic.

For the assessment of the value of information signal described below a probabilistic approach is used.

The proposed analytical type for the value of the information signal is based on two computable probabilities for random progress towards achieving the goal. For tests consisting of items with one correct answer, an indicator of progress towards reaching the goal is the number of correct answers that the evaluated has selected.

In principle, the accidental achievement of the mentioned goal is not impossible. If the number of items in the test is large, and the answers in each item are a finite number, usually 2, 3, 4, or 5, the probability of accidentally guessing the correct answers to all items is very small but greater than zero. In this case, “guessing” means that the examinee randomly chooses the answers to the items in the test, without having any knowledge of them that would affect his/her choice. For example, it happens if the examinee does not understand the language, in which the test is written.

Figure 1 shows the probability of the ignorant to give accidentally a certain number of correct answers from a test containing 20 items, each with four equally probable (for the ignorant) answers. One of the answers for each item is correct. This probability decreases rapidly, and accidental guessing of all items in the test is of the order of 10-14. For comparison, the probability of choosing six specific numbers from 49 (as in the games of fortune) is 7.2.10-8, which is comparable to accidentally guessing 16 correct answers from the mentioned test.

Figure 1: The probability P of an ignorant person to progress randomly to the goal – correct answers to all items in a test with 20 items, decreases rapidly with the progress towards the goal. The relative change in probability ΔP/P is also shown, as well as the change in the value V of the information signal (see below)

The greater the progress of the evaluated towards achieving the goal, i.e. the more questions he answered correctly, the less likely it was tobe accidental, and the more likely it was to be the result of availableknowledge. Hereinafter, "ignorance" (the probability of accidentalprogress towards the goal) is used as a measure of the value(importance) of the information signal, which in turn is a characteristic of "available knowledge". A distinction must be made between theconcepts of "available knowledge" – a quality of the individual thatcannot be measured directly, and "value of the information signal" – ameasurable quantity, a characteristic of available knowledge. Thegreater the value of the information signal, the lower the probability of accidental achievement of certain progress towards the goal, and inparticular the value of the information signal is maximal at maximumprogress, i.e. when it is sufficient to achieve the ultimate goal – correctanswers to all questions in the test.

When solving a test by chance, the ignorant encounters correct answers too. The average number of correct answers that would receive a large ignorant group of solving simultaneously and independently a test (or one ignorant without memory repeats the test many times), indicates the most likely (probable) progress towards the goal in the absence of knowledge. Accordingly, the information value of a signal leading to the most probable progress is zero. To each number of correct answers (progress towards the goal) corresponds certain probability the ignorant to achieve this number by chance. The probability reaches its maximum at the most probable random progress, after which it rapidly decreases monotonically to its minimum at the maximum progress (when all randomly selected answers are correct) i.e. the curve of the dependency between the probability for a random number of correct answers on the number of correct answers has a maximum for a number of correct answers greater than zero. The most probable random progress depends on the number of items in the test and the number of answers per item.

If the examinee deliberately tries to avoid the correct answers, then he has set an "anti-goal", reaching which also requires knowledge.

In a test with n items, each number of correct answers (progress towards the goal) corresponds to the probability of achieving it by chance, according to the scheme in Table 1. As the number of correct answers increases from 0 to n, the corresponding probability initially increases, reaches a maximum Pmax, and then monotonically decreases to P(n). Its change ΔP (when changing the correct answers by one) is a negative function of the number of correct answers. The relative change ΔP/P of the probability is also a negative function of the number of correct answers, but it changes to a lesser extent (0÷-46.78 in the example of Figure 1 and Table 2).

TABLE 1 The number of correct answers in the test (progress towards the goal) and the probability of achieving it by chance

Number of correct answers (progress towards the goal) 0 1 m k n
Probability P(0) P(1) Pmax P(k) P(n)

Below, the relative change in the probability ΔP/P is taken as a measure of the change in the value ΔV of the information signal. The two dependencies have opposite signs, i.e. the decrease in the probability of random progress towards the goal corresponds to a proportional increase in the value of the information signal. The analytical expression of this definition of change of the information signal value is:

equation

The result of an experiment described below confirms the correctness of the choice of the dependence definition (1).

After integration,

equation

The value of the constant can be determined by the condition that the most probable random progress towards the goal corresponds to the zero value of the information signal, i.e. V = 0 at P = Pmax. Therefore, from (2):

equation

and from (2) and (3) for the value of the information signal in the case of knowledge assessment follows:

equation

The formula is applicable to a number of correct answers equal to or greater than the number of the most probable random progress towards the goal, i.e. for each number of correct answers k from the interval m ≤ k ≤ n.

The value of the information signal is an additive characteristic – the total value for independent tests is the sum of the values for each of them separately.

In the proposed definition of the value of information signal in knowledge assessment, probabilities are computable values applicable to knowledge assessment through tests. In the model of the value of the information signal, there is a clear criterion for zero value of the information signal, i.e. when the assessed has no knowledge of the topic of the exam. The zero value corresponds to a calculable value – the maximum probability of accidental progress towards the goal. This turns the set of estimates of information signal values into a scale of relations – the most informative type of measurement scale (with a natural zero, the relations between the values are allowed). For comparison, the Celsius temperature scale is a scale of intervals (without natural zero) – a less informative scale, in this scale relations between the temperatures are not allowed (It is incorrect to say that 2°C is two times greater temperature than 1°C). Kelvin temperature scale is a scale of relations (with natural zero, and it is correct to say that 2K is two times greater temperature than 1K).

The probabilities in the proposed formulae (4) for the value of information are computable quantities and the value of information signal has a natural zero.

Guessing in solving a test for knowledge assessment is a problem in the analysis of tests through the CTT and the IRT. In the proposed information approach to the assessment of knowledge through a test, guessing is integrated and taken into account in the assessment process –the examinee is free to guess – no special measures and sanctions are required against the guessing by the examinee.

Assessment of a test through the semantic information approach

The most technological type of test is a multiple-choice test, consisting of items with the same number of answers, only one of which is the correct one. If the ignorant guesses solving the multiple choice test, then to calculate the probability Pn(k) to randomly guess k correct answers from a total of n items in the test is applicable Bernoulli formula [8].

equation

where p denotes the probability of accidentally guessing the correct answer to a particular test item. If all the items in the test are of the same type, this probability is the same for all items. With q is denoted the probability of accidental choice of an incorrect answer from the answers in the item. The sum of the probabilities for random choice of an answer among the answers of the item is p + q = 1.

The most probable number of correct answers, randomly chosen by the ignorant, is the closest integer to np [8]. To obtain the maximum probability Pmax corresponding to the most probable number in random progress to the goal, k in (5) is replaced by np. For Pmax we get:

equation

For the value of the information signal after substitution of (5) and (6) in (4) is obtained the expression:

equation

This exact expression (7) of the value of the information signal is harder to use in calculations. An easier-to-apply formula for Pn(k) is obtained by the de Moivre–Laplace formula, which is the more accurate approximation of (5) the more items are included in the test. For Pn(k), expressed by the de Moivre–Laplace formula [8], we obtain:

equation

Since the most probable progress k in the random movement to the goal is equal to np, the exponent in the above formula when calculating the maximum probability is 1, and the maximum probability is equal to the coefficient in front of the exponent in (8).

After substituting (8) and the expression for the maximum probability in (7), for the value of the information signal is obtained:

equation

The maximum value of the information signal is reached when k = n,

equation

The relative value is the ratio between the value of the information signal, corresponding to randomly chosen k correct answers, to the maximum information value. Relative value is a characteristic of the relative progress of the evaluated toward the goal, i.e. the available knowledge of the examinee as regards the knowledge needed to achieve the goal. For the relative value we get:

equation

The available knowledge corresponding to the achieved progress towards the goal can be evaluated according to formula (11) in values between 0 (at k = np) and 1 (at k = n). This scale is a scale of ratios, as explained above.

Educational systems around the world use numerical grading scales to assess knowledge, in which grades vary within traditionally defined ranges from the minimum grade ϑmin to the maximum ϑmax.

The traditional scale would also be a relations scale if the same available knowledge is expressed on the one hand as a ratio of grades of the traditional scale and on the other as a relative value of the information signal (11).

The ratio in the grades on the traditional scale would be:

1.with a numerator the difference between the grade ϑ for theachieved progress towards the goal, and the minimumpossible grade ϑmin, i.e. ϑ - ϑmin,

2.with a denominator the difference r between the maximumand minimum grades, r = ϑmax - ϑmin.

After equating this ratio with (11), is obtained the expression:

equation

from which for the grade ϑ follows:

equation

In particular, for the six-score scale used in Bulgaria ϑmin=2, ϑmax=6, i.e. r=4.

The most commonly used tests consist of items with 3, 4, and 5 answers. For them for Bulgaria, formula (13) has the form:

equation

The evaluation by the derived formulas can be judged for reliability with data in Table 2 below. The dependences of the value of the information signal and the grade from the six-score scale on the progress towards the goal are shown in Figure 2.

Figure 2: Dependences of the value of the information signal and the numerical grade from six-score scale on the progress towards the goal according to the data in Table 2

TABLE 2 Test parameters of the example of a test with 20 items with 4 equally probable answers, one of which is correct. The grade is from a “sixscore” scale, used in Bulgarian education. The data from the table are presented graphically in Figure 1.

k - progress towards the goal Probability P Difference in probabilities Relative change in probability Value of the information signal Numerical grade ϑ
(number of correct answers), k ≥ 5 to randomly choose k correct answers ΔP = P(k) – P(k-1) ΔP/P V = ln Pmax/P
5 2.06.10-01 2
6 1.80.10-01 -2.57.10-02 -0.14 0.13 2.02
7 1.21.10-01 -5.94.10-02 -0.49 0.53 2.07
8 6.20.10-02 -5.88.10-02 -0.95 1.2 2.16
9 2.44.10-02 -3.76.10-02 -1.54 2.13 2.28
10 7.35.10-03 -1.71.10-02 -2.32 3.33 2.44
11 1.70.10-03 -5.65.10-03 -3.33 4.8 2.64
12 3.00.10-04 -1.40.10-03 -4.66 6.53 2.87
13 4.05.10-05 -2.59.10-04 -6.39 8.53 3.14
14 4.20.10-06 -3.63.10-05 -8.65 10.8 3.44
15 3.34.10-07 -3.87.10-06 -11.6 13.33 3.78
16 2.03.10-08 -3.13.10-07 -15.44 16.13 4.15
17 9.45.10-10 -1.93.10-08 -20.47 19.2 4.56
18 3.37.10-11 -9.11.10-10 -27.03 22.53 5
19 9.21.10-13 -3.28.10-11 -35.6 26.13 5.48
20 1.93.10-14 -9.02.10-13 -46.78 30 6

The probability of erroneous evaluation when using the scale from the type presented in Table 2 is very small, if the criterion used in practice for the successfully solved test is applied – the test was taken successfully if the grade is at least 3.00 (13 correct answers out of 20 possible). As can be seen from Table 2, the probability of accidentally encountering 13 correct answers is of the order of 10-5, i.e. only one out of one hundred thousand ignorant evaluated by thetest from the example in Table 2, would receive erroneously a gradeof 3.14.

An experiment with an examination of a group of students simultaneously with computer tests and human experts

An experiment was conducted, the purpose of which was to check to what extent the assessment of knowledge by a computer test coincides with the human expert assessment of knowledge if the assessment algorithm in the computer test is based on a formula (14).

In the experiment, a group of 82 students answered in writing 20 open-ended (without answers) physics questions, immediately after which they solved a computer test with the same questions, but of a closed type – with 4 answers.

The written responses were scored independently by 10 physics teachers. The mean grades of the written papers were compared with the computer grades. The teachers' average grades varied between 4.43 and 5.08, and the average of all the teachers' grades – the "exact" grade – was 4.87. The average score on the computer test was 4.76. Only two of the teachers had a better match of the average grade to the "exact"grade. I.e. the grade from the computer test is unbiased from the“exact" – the computer test has no systematic error – it does notincrease or decrease the average grade from the "exact”.

In Figure 3, for each student in the experiment, the relationship between the computer grade and the corresponding average teacher grade of the written work is shown graphically.

Figure 3: Correlation between computer test grades and their corresponding mean teacher grades of written works

By differentiating the linear model of the obtained dependence between the scores from the computer test and the expert evaluations of the written works, it turns out that the change in the computer score is almost equal to the change in the teacher's score – the coefficient (0.962) is very close to the “ideal” (1.000). I.e. computer evaluations obtained by formula (14) are adequate with the human evaluation of knowledge. Therefore, the choice of (1) as the definition of the value of the information signal leads to an objective evaluation of knowledge, adequate to the human one.

Minimum number of items in a dichotomous test for a scale to be with non-degenerate scores

Traditionally, the interval r of grades in the scale is divided into equal sub-intervals (scores) of grades. Scales with different scores, in some cases dozens, are used in evaluation practice around the world. Scores also have a verbal expression, for example a six-score scale may include scores: "poor", "satisfactory", "medium", "good", "very good", and "excellent".

The grade from the computer test (14) can be compared with a point that falls in one of the scores on the scale. The distribution of point grades on the scale depends on the number of items in the test and the number of answers in each item. The density of point grades is uneven. As the grade increases, the density of the point grades decreases according to the quadratic law, i.e. the highest grades are most distant on the scale.

In traditional assessment practice, the final result of the knowledge assessment is presented through the score in which the point grade falls.

If the items in the test are few, due to the uneven distribution of the grades in the scale interval, it may happen that the scale has more scores than the possible point grades. I.e. the scale has "empty" scores that do not correspond to a point grade. The term "degenerate" is used for such a scale below. The use of degenerate scales is not logically justified. Below is a criterion for the minimum number of items in a test so that the scale is nondegenerate.

The difference between the point grades Δϑ depends on the difference Δk of the number of correct answers in the test, and can be calculated by differentiating from (13):

equation

where the notations are the same as in formula (13).

If the difference between the correct answers Δk is fixed at its minimum value 1 (Δkmin = 1), it follows from (15) that the difference between neighbor point grades Δϑ is proportional on the ratio τ between the number of correct answers and the number of items in the test. The difference of the point grades reaches a maximum Δϑmax at τmax = 1. From (15) for the maximum difference of the neighbor point grades, is obtained Δϑmax = 2r/(qn). If the number n of items in the test is small, Δϑmax is large.

If the number of scores in the scale is N and the scale is uniform – with equal-sized scores, the interval of grades for one score is ΔB = r/N. A situation can arise in which Δϑmax > 2.ΔB, i.e. to have a blank score between two point grades (the scale is degenerate for this test). Such situation is the result of the traditional choice of the number of scores in the scale in a given education system and the assumption that the intervals of grades corresponding to all scores are the same in size. Point grades are not related to this degeneration.

Therefore, in order for the scale to be non-degenerate with respect to a specific test with n items in it, it is necessary to meet the criterion:

equation

For knowledge assessment in the universities in Bulgaria is used the five-score scale (N = 5), with scores "poor", "medium", "good", "very good", "excellent". For example, in order the scale to be non-degenerated for a test with 3, 4 and 5 answers, the minimum number nmin of items in the test, calculated from (17) is given in Table 3:

TABLE 3 The minimum number nmin of items in the test

Answers per item q N/q nmin
3 2/3 7.33 8
4 3/4 6.67 7
5 4/5 6.2 7

Requirements for test items

The above conditions, under which formula (5) is valid, impose additional requirements to the test items, which must be observed in the preparation of the tests in order the obtained formulas for assessment of knowledge by test to be applicable. Many of these requirements are met in the most widely used tests. The more the test complies with these requirements, the more applicable the resulting assessment formula (14) is to that test.

1. The answers must be constructed in such a way that theyappear to be equally meaningful so that the assessedcannot guess which is the correct answer by side signs as a difference in length or shape between the correct and incorrect answers. In particular, if the correct answer contains a list of concepts, distractors can be constructed as lists containing incorrect concepts cyclically mixing with the correct concepts. The symmetry in the answers equalizes the probabilities for the assessed to guesses the correct answer. For example, if an item has the question: "Which colors are in the USA flag?" with a correct answer: "White, blue, red", the distractors can be constructed cyclically: "Green, white, blue", "Red, green, white" and "Blue, red, green".

2. When composing tests, certain words or phrases are moreoften used to compose distractors and should be avoided,for example, words like "only", expressions like "Allanswers are correct". The examined knows this and can use it to eliminate some of the answers, increasing thelikelihood of guessing the correct answer.

3. There should be no items in the test with partially correctdistractors. For example, to the question: "Specify thevegetables:" the test compiler has indicated for a correctanswer "cabbage, lettuce, tomato" and for a distractor"cabbage, lettuce, apple", i.e. the distractor is a partiallycorrect answer (vegetables are also indicated) and the itemis not dichotomous. In this case, the question should bemodified as follows: "Indicate the answer containing onlyvegetables:". The suggestive word "only", often used toconstruct distractors, moved from the distractor to thequestion, improves the logical structure of the item,making it dichotomous.

4. Assessors often tend to give more weight to a certain itemthan to others multiplying the correct answer with acoefficient (more than 1). The information content of allitems in the test (with the same number of answers) is thesame, for example for items with four answers it is 2 bits, so it is equally easy to guess the correct answer of the "morevaluable" item as well as of the "less valuable". In case of anaccidental correct answer to an item with a high coefficientof weight, the evaluated person would be unfairlyoverestimated, and for the accidental incorrect answer tosuch an item, the evaluated person would be unfairlypunished. I.e. it is inadmissible to assign a higher value toindividual items in a test of the discussed type than toothers, by assigning weight coefficients to them. Theproblem could be solved at the stage of test development.The pre-evaluated topic is broken down by the testdeveloper (evaluator) of separate concepts. A rank(weighting factor) is assigned to each of them, showing itsimportance for the evaluated topic. Then for each concept,a certain number of items proportional to its rank aredeveloped,.(See substantive validation of the test [1])

An adaptive algorithm

The detailed assessment of knowledge through a test, requires testing with a large number of items in the test, the answer to which would take a considerable time for assessment. The term "items bank" is used below for all items included in the test at the stage of its development. If a computer version of the test is used, the exam can be shortened in time by reducing the number of items assigned during the exam in comparison with their number in the bank of items. Reducing the time simply by reducing the number of assigned items hides risks of unacceptable reduction of the detail of the test and the accuracy of assessment of knowledge on the topic. The detail and accuracy of the assessment will not be compromised if:

1.The items in the bank meet the requirement for substantivevalidation of the test in relation to the tested topic, i.e. all items test the knowledge on this topic. For such a test, droppingsome of the items does not significantly reduce the detail ofthe assessment [1].

2.For the purposes of the examination, the items are randomlydrawn from the items bank and submitted sequentially oneafter the other, with only one of them visible on the screen.The next item appears after the answer to the previous one.The examinee cannot return to previous items, even just to see them.

3.The assessment of the shortened version of the test remainswithin the permissible deviation from the exact assessment –the one that the examinee would have received if he had solved the test with all the items from the bank.

In order to reduce the time of the exam in compliance with the above conditions, an adaptive assessment algorithm is needed, which, depending on the frequency of the correct answers indicated by the examinee, changes the number of items set during the exam. The aim is to reduce the time for the exam in as many cases as possible – for the exam to end before all the items of the bank have been exhausted, without negatively affecting the accuracy of the assessment of the knowledge of the examinee.

The paper proposes an adaptive algorithm of the type described above, based on the semantic information model for knowledge assessment through a test containing dichotomous items, with an equal number of answers. The test should be designed so that the answers to each item seem equally likely to the examinee who does not know the correct answers. With τ = k/n denoted the ratio between the number of correct answers k and the number n of items in the test, formula (13) takes the form:

equation

in which the remaining notations are the same as in formula (13).

From (1) the opposite task can be solved too: to determine the number of correct answers through which a specific grade is achieved when solving the test:

equation

The numerical grade corresponds to a point on the axis on which the grades scale can be plotted, so the term “a point grade” is also used below. The final result of the exam often is the scores (sub-intervals on the rating scale) in which the point grade falls.

Table 4 presents the example of all possible values of point grades for a test with n = 20 items with 4 answers each – from n = 8 to n = 20 and k ≥ 5 (τ ≥ 1/4 according to the constraint in (18)). The traditional criterion for successfully passing a test applied in Bulgaria is the grade to be at least 3.00. Table 4 shows that the test is not solved successfully if the examinee, solving a test with all 20 items in the bank, has chosen no more than 12 correct answers (for which the grade is 2.87). I.e. if no correct answer is selected after the examinee has answered 8 items, the adaptive algorithm can terminate the test at the earliest. This number of items is the difference between the number of items in the bank (20 in the example) and the maximum number of correct answers (12 in the example) for which the criterion for taking the exam is not met in the case of a test with all items in the bank.

TABLE 4 Point grades ϑ, corresponding to each of the admissible combinations between the number k of correct answers and the number n of the drawn items in a test with a bank of 20 items. In a darker background are shown the grades that do not meet the criterion to pass successfully the exam – a grade of at least 3.00

ϑ k 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
n
8 3,00 3,78 4,78 6,00
9 2,66 3,23 3,98 4,90 6,00
10 2,44 2,87 3,44 4,15 5,00 6,00
11 2,30 2,62 3,06 3,62 4,30 5,09 6,00
12 2,20 2,44 2,79 3,23 3,78 4,42 5,16 6,00
13 2.13 2.32 2.59 2.95 3.39 3.92 4.53 5.22 6
14 2.08 2.23 2.44 2.73 3.1 3.53 4.04 4.62 5.27 6
15 2.05 2.16 2.33 2.57 2.87 3.23 3.66 4.15 4.7 5.32 6
16 2.03 2.11 2.25 2.44 2.69 3 3.36 3.78 4.25 4.78 5.36 6
17 2.01 2.08 2.19 2.35 2.56 2.81 3.12 3.48 3.88 4.34 4.84 5.4 6
18 2.01 2.05 2.14 2.27 2.44 2.66 2.93 3.23 3.59 3.98 4.42 4.9 5.43 6
19 2 2.03 2.1 2.21 2.36 2.54 2.77 3.04 3.34 3.69 4.07 4.49 4.96 5.46 6
20 2 2.02 2.07 2.16 2.28 2.44 2.64 2.87 3.14 3.44 3.78 4.15 4.56 5 5.48 6

Let a test contain a bank with N items. In the process of adaptive testing, the examinee responds to a certain number n ≤ N of randomly withdrawn items from all items in the bank (non-repeat sample). To some of the items (or to all), the examinee gives correct answers.

Let kN indicate the number of correct answers that the examinee would give depending on his/her knowledge of the assessed topic in solving all N items in the bank. The ratio τN = kN/N, substituted in (17), would give the exact grade for the knowledge of the examinee. If in the course of the test the examinee answers n < N randomly drawn items (sample) from all items in the bank and the number of given correct answers is denoted by k, then the ratio is τ = k/n. The ratio τ is a sample estimate of the exact ratio τN. In the test process, k, n, and τ are known after each item response but the exact τN ratio remains unknown because kN is unknown.

If the difference between the ratios τn (known) and τN (unknown) is negligible (statistically insignificant) at a number n < N of the randomly withdrawn items, the test grade calculated after substituting τn in (17) will not differ significantly from its exact value, calculated with τN (if its value was known). If at a given time during the test a non-repeated sample of n items was withdrawn randomly and the difference between τn of τN become less than one limit value – the maximum deviation Δτmax, then this difference is negligible and the grade may be calculated from (17) with τn instead of τN without significant loss of accuracy and the adaptive algorithm could finish the exam. The maximum deviation Δτmax is calculated by the formula [8]:

equation

where tα,n-1 denotes the critical points of Student's t – criterion for significance level α and degree of freedom n-1.

The values of tα,n-1 for a significance level of 0.05 and a different number of n items in the test are given in Table 5 [9, 10].

TABLE 5 The values of tα,n-1 for a significance level of 0.05 and different number of n items in the test [4]

n 7 8 9 10 11 12 13 14 15 16 17 18 19 20
t0.05, n-1 2.45 2.37 2.31 2.26 2.23 2.2 2.18 2.16 2.14 2.13 2.12 2.11 2.1 2.09
n 21 22 23 24 25 26 27 28 29 30 40 60 120
t0.05, n-1 2.09 2.08 2.07 2.07 2.06 2.06 2.06 2.05 2.05 2.05 2.02 2 1.98 1.96

After n < N test items are answered the maximum deviation Δτnmax of τn from its corresponding exact value τN can be calculated from formula (20) and table 5.

Figure 4 shows the dependence of τ on the number of correct answers k for tests with a different number n of items representing a non-repeated sample from a bank of test items with volume N = 20. Each such test corresponds to a segment of a line marked in the figure with τn. The conclusions below are made using this example. It can be seen that as the number of correct answers increases, the straight sections corresponding to the tests with a different number of items move away from each other, i.e. the difference between τn and τN also increases (in the example τN coincides with τ20).

Figure 4:Dependence of the ratio τ = k/n on the number of correct answers k in tests in which the number n of items is a non-repeated sample withdrawn

Figure 5 with an arrow shows graphically the maximum deviation Δτmax of τ13,11 (for a test with 13 items and 11 correct answers) in the direction of its exact value τN, which is a point on the segment τ20 (corresponds to a test with the maximum number of items, N = 20).

Figure 5: Graphical representation of the deviation of the ratio τ13,11 at the point k = 11 from the exact ratio τN and from the ratio τNk

If for a given k the absolute value of the distance between τn and τN is less than Δτmax, i.e. the condition is fulfilled:

equation

then the ratio τnk is a statistically unbiased estimate of the exact ratio τN and the difference between them is statistically insignificant. I.e. if this condition is met, the test grade can be calculated with the value of τnk substituted in formula (17). The grade thus obtained would differ statistically insignificantly from the exact grade, although it was obtained through a sample of the bank of items. As far as τN is unknown, the fulfillment of condition (21) cannot be verified in this form.

This condition can be transformed into the condition:

equation

where τnk and τNk denote the values of the ratios τ and τN for k correct answers. In condition (22) the difference τnk - τNk contains

computable quantities, i.e. is known. But the maximum deviation Δτnkmax for which the ratios τnk and τNk differ statistically insignificantly is unknown. Geometric considerations were used to determine it.

With τ20 in Figure 5 is represented the line segment corresponding to a test with the full number N = 20 items. For k correct answers τN = k/N. The change ΔτN with the change Δk determines the slope of this line relative to the abscissa. At Δk = 1,

equation

The same slope is equal to tgα, where α is the angle between the abscissa and the segment τN (in example τ20). It follows from (23) that this angle is:

equation

The same angle is concluded between Δτmax and Δτkmax (Figure 5), i.e.

equation

From (24) and from [10] it follows:

equation

and

equation

For example, at N = 20, the expression (1+1/N2)1/2 = 1.001249, decreases with increasing N and can be assumed to be 1 without affecting noticeable the accuracy of the calculation. I.e. condition (22) can be converted to

equation

In condition (29) all quantities are computable. If it is fulfilled, the difference τnk – τNk is statistically insignificant.

Figure 6 shows the dependence of three of the differences τ - Δτmax on the number of correct answers: τ13 - Δτ13 (for a sample test of 13 of all 20 items), τ16 - Δτ16 (sample test of 16 of all 20 items), and τ19 - Δτ19 (sample test of 19 of all 20 items). The notations are the same as in Figure 5. The straight line τ20 also is shown in Figure 6. The differences τnk – Δτnkmax are curved lines that approach the straight line τ20 at its various points for which condition (29) is satisfied. In them, the difference between the sample τn and the exact τN as well as between sample grade and exact grade is insignificant. For example, the figure shows that if after answering 13 items the examinee is answered correctly only 7 of them, the difference between ratio τ13,7 and τ20,7 is negligible. The adaptive algorithm could terminate the test after the 13th answer with a score of 2.59 (Table 4). Similarly, if after answering 16 items the examinee gave 10 correct answers, the ratio τ16,10 is negligibly different from τ20,10 and the test can be terminated on the 16th item with a score of 3.00 ("satisfactory"). For a test with 19 items and 16 correct answers, the adaptive algorithm should terminate the test with a grade of 4.49 (score "good").

Figure 6: Dependence of three of the differences τn – Δτnmax on the number of correct answers: τ13 - Δτ13 (for a test with a sample of 13 of all 20 items), τ16 - Δτ16 (test with 16 of 20 items), and τ19 - Δτ19 (sample test 19 of 20 items). The same figure shows the straight line τ20

The examples analyzed above help to clarify the logical sequence underlying an adaptive algorithm that reduces the number of items assigned in the test. It should consist of the following 7 successive steps of calculations:

1.Calculate the number of correct answers for the upper limit ofthe score "poor" for a test with all N items in the bank. In thecase of the six-score scale discussed above, the upper limit forthe score "poor" is 2.99 (3.00 means passed exam) and afterreplacing with this value the grade in (17), 12.46 correctanswers can be calculated, which is the maximum number ofthe correct answers for the score "poor". It is rounded to thenearest whole number -12.

2.Calculate the “terminal” difference between the number ofitems in the bank and the maximum number of correct answers for a score of "poor". In the example, this difference is 8.

3.The examinee solves a sequence of randomly drawn items with a number equal to the “terminal” difference. In the example,these are 8 items.

4.The current number of items n and the number of correctanswers k are used to calculate τnk.

5.After reaching the number of items equal to the “terminal”difference, after each next answer it is checked whether thenumber of incorrect answers is not equal to the “terminal”difference. If the number of incorrect answers has reached the“terminal” difference, the test is terminated with a grade in score "poor", as this is the grade that the examinee would receive, even if his test includes all items from the bank.

6.Δτnkmax is calculated from formula (20), and τnk - Δτnkmax iscalculated too.

7.The fulfillment of criterion (29) is checked.

a.If the criterion is not met, the next item is randomlysubmitted and proceeds to step 4.

b.If the criterion is met, according to formula (17) thepoint grade ϑ is calculated, corresponding to τnk = k/n, as well as the score in which it falls. The test iscompleted and is terminated after n answered itemsfrom all N items in the bank.

In the example with a test with a bank of 20 dichotomous items with 4 answers each, according to the above formulas it can be calculated that the test is terminated:

1.with a score "poor" due to the number of incorrect answersreaching the terminal difference:

1.a. after the 8th item with 0 correct answers,

1.b. after the 9th item with 1 correct answer,

1.c. after the 10th item with 2 correct answers,

1.d. after the 11th item with 3 correct answers,

1.e. after the 12th item with 4 correct answers,

1.f. after the 13th item with less than 7 correct answers,

1.g. after the 14th item with less than 8 correct answers,

1.h. after the 15th item with less than 9 correct answers,

1.i. after the 16th item with less than 10 correct answers,

1.j. after the 17th item with less than 11 correct answers,

1.k. after the 18th item with less than 12 correct answers,

1.l. after the 19th item with less than 12 correct answers,

1.m. after the 20th item with less than 13 correct answers,

2.with a score "poor" due to a fulfilled criterion, but for agrade below 3.00:

2.a. after the 10th item with 5 correct answers,

2.b. after the 11th item with 5 correct answers,

2.c. after the 12th item with 5 and 6 correct answers,

2.d. after the 13th item with 5 and 6 correct answers,

2.e. after the 14th item with 7 correct answers,

2.f. after the 15th item with 8 correct answers,

2.g. after the 16th item with 9 correct answers,

2.h. after the 17th item with 10 correct answers,

3.with a score of "satisfactory" as a result of a fulfilled criterion:

3.a. after the 17th item with 11 correct answers,

3.b. after the 18th item with 12 and 13 correct answers,

3.c. after the 19th item with 12 and 13 correct answers,

3.d. after the 20th item with 13 and 14 correct answers,

4.with a score of "good" as a result of a fulfilled criterion:

4.a. after the 19th item with 14, 15, and 16 correct answers,

4.b. after the 20th item with 15 and 16 correct answers,

5.with a score of "very good" as a result of a fulfilled criterion:

5.a. after the 20th item with 17, 18, and 19 correct answers,

6.with a score of "excellent":

6.a. after the 20th item with 20 correct answers.

Conclusions

In the present article, a new mathematical model based on the semantic branch of Information Theory and Probability Theory is offered to the reader interested in the objective assessment of knowledge. The model offers a definition of the parameter "value (importance)" of the information signal, as a measure of the knowledge of the evaluated. The information values form the most informative type of scale of relations, allowing absolute knowledge assessment, without the need to compare this knowledge with an external standard such as the knowledge of other subjects. The model offers formulas convenient for inclusion in the test algorithm, and suitable for assessment with computer tests.

Unlike the IRT, which was created with the ambition to be applicable in all areas of life in which tests are applicable, the proposed model is suitable only for the information systems with a goal, as the systems of assessing knowledge are. The model is an alternative to IRT in several aspects – 1. different paradigm, 2. solves the problem of guessing, for which there is no convincing solution in IRT, 3. uses the most informative type of scale of relations for information value with absolute zero, while in IRT and CTT assume that grades form less informative scale — the interval scale, with no absolute zero. IRT offers several models with a different number of parameters, the values of which are calculated through an optimization procedure from the data of a group solving the test. I.e. the evaluation with the obtained IRT model is relative – it depends on the specifics of the group, while the evaluation with the proposed model is absolute, depends only on knowledge of the evaluated.

An adaptive algorithm is proposed, adaptively reducing the number of set items in the test whenever possible, depending on the alternation of correct and incorrect answers of the examinee in the testing process. The algorithm saves time in the testing process without changing the grade obtained from the one the examinee would receive if he/she answers all the items in the test bank. The analysis shows that the adaptive algorithm saves time mainly for testing those without knowledge who receive a score of "poor". Exam practice shows that this type of examinees is the most hesitant and their exam is time-consuming. Therefore, a quick preliminary computer test with an adaptive algorithm in its software as the first part of the examination process would weed out the unprepared and would make this process shorter in time without loss of accuracy in the assessment.

References

 
Google Scholar citation report
Citations : 83

Journal of Pure and Applied Mathematics received 83 citations as per Google Scholar report

Journal of Pure and Applied Mathematics peer review process verified at publons
pulsus-health-tech
Top