The idea of test validity is primarily concerned with the 'fundamental honesty' of the test—honesty in the sense of doing what one claims to do. It is a fundamental concern for the link between the goal established and the efforts made, the methods utilized, and what these efforts and means accomplish. More specifically, validity relates to how well a tool measures what it is supposed to measure.
According to Goode and Hatt, a measuring instrument (scale, test, etc.) has validity when it genuinely measures what it promises to measure. The topic of validity is complicated and crucial in development research since it is here, more than anywhere else, that the nature of reality is called into question.
It is feasible to investigate dependability without delving into the nature and significance of one's variable. Validity is not an issue when measuring some physical traits and relatively simple qualities of people. A pre-school child's anthropometric measures, such as head and chest circumference, can be measured using measuring equipment with a specified number of centimeters or inches.
Suppose a child development extension professional wishes to study the relationship between malnutrition and the intellectual development of pre-school children. In that case, there are no rules to measure the degree of malnutrition, nor are there any scales or clear-cut physical attributes to measure intellectual development. In such instances, it is vital to devise indirect methods of measuring certain properties. These methods are frequently so indirect that the measurement's and its product's validity is called into question.
Every measuring instrument, to be useful, must have some indication of validity. There are four approaches to the validation of measuring instruments −
Logical Validity / Face Validity
Jury Opinion
Known-Group
Independent Criteria
This is one of the most popular approaches. It relates to either theoretical or common sense analysis that finds simply that, given the elements, the nature of the continuum cannot be anything other than what is stated. Logical validation, also known as face validity, is nearly always employed since it naturally arises from the meticulous description of the continuum and the selection of items.
A measure with logic/face validity focuses directly on the type of behavior in which the tester is interested. Example: The capacity to solve mathematical problems is tested by success in solving a sample of such problems while reading speed is measured by computing how much of a chapter a person reads with understanding in a certain time. Although there is a limitation, relying on logical and common sense confirmation is not prudent. Such validity claims are, at best speculative and seldom definitive. It is essential to make good use of a measuring device in addition to logical correctness.
This is an extension of the logical validation approach, except that, in this case, the reasoning is confirmed by a group of specialists in the subject in which the measuring device is utilized. For example, if a scale to evaluate mental retardation in pre-school children is developed, a jury comprising psychologists, psychiatrists, pediatricians, clinical psychologists, social workers, and teachers may be formed to establish the scale's validity. However, there is a restriction. Experts are also human, and this method can only lead to logical legitimacy. As a result, jury validation is only marginally superior to logical validation.
This is an extension of the logical validation approach, except that, in this case, the reasoning is confirmed by a group of specialists in the subject in which the measuring device is utilized. For example, if a scale to evaluate mental retardation in pre-school children is developed, a jury comprising psychologists, psychiatrists, pediatricians, clinical psychologists, social workers, and teachers may be formed to establish the scale's validity.
However, there is a restriction. Experts are also human, and this method can only lead to logical legitimacy. As a result, jury validation is only marginally superior to logical validation. Other variations between the groups, in addition to their known religious practice, might account for the discrepancies in the scale scores.
Although this is a great theoretical strategy, its practice is typically problematic. A criteria measure should have four characteristics. They are listed in descending order of importance −
Relevance − We consider criteria to be relevant if it is standing on the criterion measure matches the scale scores.
Bias-free − This means that the metric should be one in which everyone has an equal chance of scoring well. Biasing variables include differences in the quality of equipment or working conditions for manufacturing workers and the quality of education received by students enrolled in various classes.
Reliability − If the criteria score fluctuates from day to day, so that a person who does well one week may perform poorly the next, or a person who receives a good rating from one supervisor obtains a terrible rating from another, then there is no way to create a test that would predict that score. Nothing else can forecast a measure that is entirely unstable on its own.
Availability − Finally, while selecting a criteria measure, we are constantly confronted with practical issues of convenience and availability.
Any criterion measure chosen must have a realistic limit to account. However, when the independent criteria are good, it becomes a potent tool and may be the most successful validation procedure.
A large number of factors influence the validity of an evaluation tool. Gronlund (1981) has suggested the following factors −
Each test has items. A detailed examination of the test items will reveal if the test appears to evaluate the subject matter material and mental functions that the instructor desires to assess. The following issues in the test can hinder test items from operating properly and reduce validity.
Uncertain Direction − If the student needs help understanding how to spend the items, if it is permitted to guess, and how to record the answers, the validity will suffer.
Difficulty in Reading Terminology and Sentence Structures − The sophisticated language and phrase structure intended for the students taking the exam may interfere with measuring elements of child performance, reducing the validity.
An Insufficient Level of Difficulty in the Test Items − The tool's validity suffers when the test items have an inappropriate level of difficulty. For example, failing to match the difficulty stipulated by the learning result in criterion-referenced assessments reduces validity.
Poorly Prepared Test Questions − Test items containing accidental hints to the answer tend to assess students' awareness in identifying clues and factors of student performance that ultimately impact validity.
Ambiguity − Ambiguity in test item statements causes misinterpretation, conflicting interpretations, and confusion. It may occasionally confuse better students more than worse students, resulting in negative discrimination. As a result, the test's validity is compromised.
Test Items Unsuitable for the Outcomes being Measured − It is common to try to evaluate some sophisticated sorts of achievement, understanding, thinking, abilities, and so on with test forms that are only adequate for testing factual information.
In Performance evaluation, The functioning content of test items cannot be identified only by examining the test's design and substance. Before including an issue in the test, the teacher must thoroughly teach how to solve it. Complex learning outcome tests are legitimate if the test items perform as planned. Suppose the students have prior experience with the solution to the issue contained in the exam. In that case, such tests are no longer reliable for evaluating more complicated mental processes, and their validity suffers as a result.
The test administration and scoring technique may also impact the validity of the finding's interpretations. For example, in teacher-created examinations, variables such as insufficient time to finish the test, unfair assistance to specific students, cheating during the examination, and incorrect scoring of essay replies may reduce the validity. Similarly, in standardized examinations, the absence of following conventional directions and time, unauthorized support to students, and errors in scoring would diminish the validity. Whether it is a teacher-made test or a standardized exam, unpleasant physical and psychological circumstances during testing time may impact the validity
Certain personal characteristics impact students' responses to test situations, rendering the test interpretation incorrect. Students that are emotionally upset, lack motivation or are terrified of the exam scenario may not answer properly, which may impair the validity. The Response setting also influences the test results. The pupil's score is affected by his test-taking habits. A response set is a persistent propensity to react to test items similarly.
It has previously been stated that legitimacy is always exclusive to a given group. Age, gender, aptitude level, educational experience, and cultural background are all factors that impact test results. As a result, the type of the validation group should be noted in the test manuals.
Another crucial consideration when calculating the validity coefficient is the nature of the criterion utilized. For example, scores on a scientific aptitude test are likely to offer a more accurate predictor of accomplishment in an environmental studies course. Other things being equal, the stronger the validity coefficient, the greater the resemblance between the performance evaluated by the test and the performance indicated in the criteria.
The degree to which a test measures what it promises to measure is called its validity. A test is legitimate if its conclusions are suitable, understandable, and valuable. Events outside the laboratory, maturation, testing effects, regression effect, selection, and death all contribute to an experiment's internal validity. Problems originating from generalizing to other subjects, timeframes, or contexts are examples of external validity threats. Experimenter bias can be decreased by preventing the experimenter from knowing the circumstances or purpose of the experiment and by standardizing the process as feasible.