Thông báo Lịch họp Hội đồng đánh giá luận án cấp cơ sở của nghiên cứu sinh Nguyễn Thị Mai Hữu khóa QH2014

Th12 31, 2020 in Thông báo

Trường Đại học Ngoại ngữ – ĐHQGHN xin trân trọng thông báo Thông báo Lịch họp Hội đồng đánh giá luận án cấp cơ sở của nghiên cứu sinh Nguyễn Thị Mai Hữu khóa QH2014 chuyên ngành Lí luận và phương pháp dạy học Bộ môn Tiếng Anh, cụ thể:

Đề tài: An investigation into the cognitive validity of the speaking section of the Vietnamese Standardized Test of English Proficiency (VSTEP.3-5) (Nghiên cứu giá trị xác thực đối với quá trình tư duy của thí sinh thi phần thi nói bài thi chuẩn hóa đánh giá năng lực tiếng Anh từ bậc 3 đến bậc 5 theo khung năng lực ngoại ngữ 6 bậc dùng cho Việt Nam (VSTEP.3-5))

Chuyên ngành: Lí luận và phương pháp dạy học bộ môn tiếng Anh
Mã số: 9140231.01;
Người thực hiện: Nguyễn Thị Mai Hữu
Nghiên cứu sinh: Khóa QH.2014
Cán bộ hướng dẫn 1: GS. TS. Nguyễn Hòa
Cán bộ hướng dẫn 2: GS.TS. Fred Davidson
Thời gian: 14h30, thứ Tư ngày 03 tháng 02 năm 2021
Địa điểm: Phòng Bảo vệ luận văn – luận án, Phòng 101 – Nhà A3, Trường Đại học Ngoại ngữ – ĐHQGHN

Tóm tắt luận án bằng tiếng Việt, xin xem tại đây!

Tóm tắt luận án bằng tiếng Anh, xin xem tại đây!

THÔNG TIN VỀ LUẬN ÁN TIẾN SĨ

Họ và tên nghiên cứu sinh: Nguyễn Thị Mai Hữu
Giới tính: Nữ
Ngày sinh: 22/10/1978
Nơi sinh: Hà Nội
Quyết định công nhận nghiên cứu sinh số: 2019/QĐ-ĐHNN, ngày 31 tháng 12 năm 2014 của Hiệu trưởng Trường Đại học Ngoại ngữ.
Các thay đổi trong quá trình đào tạo:

Gia hạn thời gian bảo vệ luận án 2 năm từ 2017 đến 2019

Tên đề tài luận án: Nghiên cứu giá trị xác thực đối với quá trình tư duy của thí sinh khi thi phần thi nói bài thi chuẩn hóa đánh giá năng lực tiếng Anh từ bậc 3 đến bậc 5 theo Khung năng lực ngoại ngữ 6 bậc dùng cho Việt Nam (VSTEP.3-5)
Chuyên ngành: Phương pháp giảng dạy bộ môn tiếng Anh
Mã số: 9140231.01
Cán bộ hướng dẫn khoa học: Giáo sư Nguyễn Hòa

Cán bộ hướng dẫn: Giáo sư Fred Davidson

Tóm tắt các kết quả mới của luận án:

Bằng việc qá dụng khung xã hội – tư duy của Weir dùng cho xây dựng và xác định giá trị xác thực của đề thi ngôn ngữ, quá trình tư duy của thi sinh thi bài thi VSTEP.3-5 được nghiên cứu và đưa ra minh chứng cho phần thi nói. Minh chứng về giá trị xác thực được chia thành ba nhóm gồm quá trình tư duy được lồng ghép vào đề thi, việc sắp xếp mức độ yêu cầu về tư duy theo từng bậc năng lực khác nhau của đề thi, và sự giống và khác nhau giữa quá trình tư duy của thí sinh khi thi bài thi nói VSTEP.3-5 và khi nói tiếng Anh trong bối cảnh ngoài bài thi.

Trước tiên, quá trình tư duy thí sinh có thể trải nghiệm khi thi bài thi nói VSTEP.3-5 được nghiên cứu theo cả hai bước là phát triển đề thi và tổ chức thi. Minh chứng về giá trị xác thực của quá trình tư duy của thí sinh khi thi bài thi nói VSTEP.3-5 được xây dựng dựa trên mô hình sản sinh lời nói của Levelt (1989, 1999) và sau đó được áp dụng bởi Weir (2005) trong mô hình xã hội – tư duy cho phát triển và xây dựng giá trị xác thực của đề thi ngôn ngữ. Mô hình này được chia thành các bước sau: hình thành khái niệm, mã hóa ngữ pháp. mã hóa âm thanh, mã hóa phát âm, phát âm và tự kiểm soát. Mặc dù mô hình tư duy – xã hội của Weir không phải là khung lý thuyết được áp dụng khi phần nói VSTEP.3-5 được phát triển, các quy trình tư duy của mô hình cung cấp một khung tốt để nghiên cứu giá trị xác thực của bài thi. Tất cả các quá trình được mô tả trong mô hình có thể được tìm thấy trong phần nói của VSTEP.3-5, cho thấy nỗ lực của nhóm phát triển VSTEP.3-5 trong việc giải quyết các yêu cầu tư duy mà người dự thi có thể gặp phải khi làm bài thi. Bên cạnh đó, khi áp dụng mô hình để thiết lập minh chứng về giá trị xác thực của quá trình tư duy cho phần nói, một số vấn đề đã được xác định bao gồm:

(1) Việc sử dụng các từ thông tục chỉ được đề cập ở mức độ thông thạo C2 của khung CEFR và CEFR-VN; tuy nhiên, chúng được tìm thấy trong các đặc tả của mức độ thành thạo 9 và 10 của thang đánh giá nói VSTEP.3-5.

(2) Các điểm tạm ngừng và do dự được tìm thấy trong các đặc tả của hầu hết các nhóm của thang đánh giá VSTEP.3-5; tuy nhiên, không có sự giải thích về sự khác biệt của những khoảng ngừng và ngập ngừng như vậy giữa các mức điểm.

(3) Tải trọng tư duy của người đối thoại và người đánh giá khi phỏng vấn và đánh giá người dự thi đã được mô tả trong CEFR và CEFR-VN, khung lý thuyết được áp dụng khi thiết kế bài thi nói VSTEP.3-5; tuy nhiên, tải trọng đó không được đề cập trong đặc tả kỹ thuật hoặc sổ tay đào tạo cán bộ chấm thi nói.

Thứ hai, các đặc tả của thang đánh giá nói VSTEP.3-5 được sắp xếp hợp lý với mức độ khó tăng dần từ mức điểm 1 đến mức điểm 10, ngoại trừ một số đặc tả như sau:

(1) Các đặc tả theo các mức điểm khác nhau của thang đánh giá nên được quan tâm nhiều hơn bao gồm các mức Vocabulary 4,5, 9,10 và Fluency 6 và 7.

(2) Sự khác biệt đáng kể được tìm thấy giữa điểm nói và điểm tổng của học sinh ở trình độ C1. Kết quả phân tích cho thấy cần nghiên cứu các dải điểm 9 và 10 của tất cả các tiêu chí nói và người chấm thi vấn đáp phải được thông báo về mẫu điểm này của thang chấm điểm.

(3) Một số đặc tả nhất định có độ khó khác biệt nổi bật so với các đặc tả còn lại, trong số đó là các mức điểm Grammar 3, 4, 6, 7; Vocabulary 4, 5, 9, 10; Discourse Management 3, 9, 10; Pronunciation 7, 8; Fluency 6, 7.

Nhìn chung, các đặc tả liền kề nhau và tương ứng với cùng mức độ thông thạo của người dự thi thì có mức độ khó tương đối như nhau. Bên cạnh đó, khi nghiên cứu kỹ các đặc tả, dường như khó nhận thấy sự khác biệt giữa kết quả hoạt động của thí sinh tương ứng với các điểm lân cận của cùng một tiêu chí của thang đo. Một vấn đề khác được xác định là các đặc tả của các mức điểm Vocabulary 9 và 10 bao gồm yêu cầu về khả năng sử dụng các cách diễn đạt thành ngữ và các từ thông tục, các đặc tả này không được đề cập trong CEFR hoặc CEFR-VN của bậc trình độ năng lực tương ứng. Sau đó, có thể nhận thấy rằng mức điểm 9 và 10 của tất cả các tiêu chí dường như được đặt ở mức yêu cầu tư duy cao hơn so với tất cả các mức điểm khác. Một số mức điểm như điểm từ vựng mô tả năng lực không được đề cập trong nội dung kiểm tra; tuy nhiên, các đặc tả này lại được mô tả trong thanh chấm điểm.

Thứ ba, kết quả so sánh quá trình tư duy của thí sinh khi thi bài thi nói VSTEP.3-5 và khi nói tiếng Anh trong bối cảnh ngoài bài thi cho thấy rằng:

(1) Người dự thi đã trải qua tất cả các giai đoạn của quá trình tư duy khi thi bài thi nói VSTEP.3-5. Tất cả năm giai đoạn của quá trình tư duy bao gồm hình thành khái niệm, mã hóa âm thanh, mã hóa ngữ pháp, mã hóa ngữ âm, phát âm và tự kiểm soát.

(2) Hơn một nửa số người dự thi VSTEP.3-5 đã trải qua quá trình tư duy khi nói tương tự giữa điều kiện làm bài thi và điều kiện không phải bài thi. Tất cả năm giai đoạn của quá trình được các thí sinh chia sẻ đã trải qua khi nói tiếng Anh trong bối cảnh ngoài bài thi gồm hình thành khái niệm, mã hóa âm thanh, mã hóa ngữ pháp, mã hóa ngữ âm, phát âm và tự kiểm soát.

(3) Đối với những người nói rằng quy trình tư duy không giống nhau cho rằng chúng có thể nói tốt hơn những gì họ đã thể hiện trong bài thi. Họ tuyên bố rằng thông tin về phần nói VSEP.3-5 bao gồm thang điểm đánh giá phần nói và các bài thi mẫu chỉ được tiếp cận với họ một cách hạn chế. Những thí sinh chuẩn bị tốt hơn cho bài kiểm tra có xu hướng đạt điểm cao hơn so với những người không được chuẩn bị.

(4) Một tình huống của Phần 2 của bài thi khó đối với một nhóm thí sinh so với các nhóm khác. Điều này cho thấy nên thử nghiệm các đề thi một cách kỹ lưỡng hơn.

Khả năng ứng dụng trong thực tiễn:

Thứ nhất, mặc dù các bước của quy trình tư duy mà người dự thi có thể trải qua khi làm bài thi VSTEP.3-5 đã được nghiên cứu ở cả giai đoạn phát triển và tổ chức thi của bài thi, một số vấn đề nhất định đã được xác định bao gồm:

(1) Việc sử dụng các từ thông tục chỉ được đề cập ở mức độ thông thạo C2 của khung CEFR, tương đương bậc 6 của khung CEFR-VN; tuy nhiên, những đặc tả này được tìm thấy trong các mức điểm 9 và 10 của thang đánh giá nói VSTEP.3-5. Để người kiểm tra nhận thức rõ về thực tế này, thông tin về mô tả của bậc C2 được sử dụng trong thang chấm điểm nói VSTEP.3-5 đối với bậc 4 nên được bổ sung vào tài liệu bồi dưỡng giám khảo chấm thi. Một cách khác để giải quyết các đặc tả bày là đưa các đặc tả này ra khỏi thang đánh giá. Để làm như vậy, các đặc tả kỹ thuật của bài thi VSTEP.3-5 cần được sửa đổi, dẫn đến việc sửa đổi Quyết định số 729 / QĐ-BGDĐT ngày 11 tháng 3 năm 2015 của Bộ Giáo dục và Đào tạo Việt Nam.

(2) Sự ngừng lời và ngập ngừng khi nói được tìm thấy trong các đặc tả của hầu hết các mức điểm của thang đánh giá VSTEP.3-5; tuy nhiên, không có giải thích nào khác được đưa ra về sự khác biệt của những khoảng dừng và ngập ngừng như vậy giữa các mức điểm. Do vậy, cần có giải thích thêm trong tài liệu bồi dưỡng giám khảo chấm thi nói để giám khảo không bị nhầm lẫn về các dấu hiệu ngừng lời hoặc ngập ngừng khi chấm điểm kết quả thi của thí sinh.

(3) Yêu cầu về tư duy của cán bộ chấm thi nói khi hỏi thi và chấm thi đã được mô tả trong khung CEFR và CEFR-VN, khung lý thuyết được áp dụng khi thiết kế bài thi nói VSTEP.3-5; tuy nhiên, tải trọng đó không được đề cập trong đặc tả kỹ thuật hoặc tài liệu đào tạo cán bộ chấm thi. Mức độ tư duy mà cán bộ chấm thi nói trải qua khi phỏng vấn và đánh giá người dự thi có thể tác động đến tính công bằng trong việc phỏng vấn và chấm điểm của họ, và do đó ảnh hưởng đến kết quả thi của người dự thi và điểm số của họ. Chúng tôi đặc biệt khuyến nghị rằng các yếu tố ảnh hưởng đến mức độ tư duy của cán bộ chấm thi vào khi phỏng vấn và đánh giá người dự thi nói VSTEP.3-5 nên được đưa vào tài liệu bồi dưỡng cán bộ chấm thi nói để người chấm thi nhận thức được những vấn đề đó khi phỏng vấn và đánh giá. Một gợi ý cụ thể là cán bộ chấm thi nói hiện tại không được cung cấp thời gian để làm công việc chấm điểm giữa những người dự thi gần nhau. Khi được bổ sung thêm khoảng một phút để chấm điểm giữa hai người dự thi liền kề, các cán bộ chấm thi có thể sẽ chấm thi hiệu quả hơn.

(4) Các đặc tả gần nhau tương ứng với cùng mức độ thông thạo của người dự thiì có độ khó tương đối bằng nhau. Bên cạnh đó, khi nghiên cứu kỹ các đặc tả và độ khó của các đặc tả, khó nhận ra sự khác biệt giữa năng lực của thí sinh tương ứng với các mức điểm lân cận của cùng một tiêu chí của thang chấm điểm VSTEP.3-5. Sự khó phân biệt này có thể gây khó khăn cho cán bộ chấm thi khi xếp thí sinh vào đúng các mức điểm tương ứng của thang đánh giá. Để cán bộ chấm thi có thể xếp chính xác kết quả bài nói của thí sinh vào các mức điểm của thang đánh giá, các đặc điểm định lượng của các mức điểm cần được đưa vào tài liệu bồi dưỡng cán bộ chấm thi nói hoặc cần có bài làm mẫu của thí sinh dưới dạng âm thanh và/hoặc video cho tất cả các mức điểm khác nhau đối với từng tiêu chí đánh giá của thang đánh giá. Một cách khác để giải quyết tình huống này có thể là hợp nhất các đặc tả của các dải tương ứng với cùng một bậc trình độ tiếng Anh, làm như vậy, số lượng các mức điểm của thang đánh giá VSTEP.3-5 sẽ giảm xuống còn 5 mức điểm. Điều này sẽ đơn giản hóa công việc chấm điểm của các giám khảo và có thể sẽ dẫn đến độ tin cậy của kết quả đánh giá tốt hơn.

(5) Tất cả các đặc tả của mức điểm 9 và 10 có độ khó cao hơn đáng kể so với tất cả các mức điểm 8 của thang đánh giá VSTEP.3-5, điều này có thể dẫn đến tỷ lệ thí sinh được xếp loại C1 thấp so với các mức độ thông thạo được đánh giá bằng bài kiểm tra. Để điểm nói đật mức độ tin cậy hơn đối với những thí sinh đó, các giám khảo phải được cung cấp những dữ liệu này khi họ được tập huấn. Một lý do có thể giải thích cho độ khó cao đáng kể của các đặc tả tương ứng với các mức điểm đó là do các giám khảo có xu hướng ít cho điểm cao nhất cho người dự thi. Để giải quyết vấn đề này, chúng tôi khuyến nghị rằng mức điểm nhất của thang đánh giá phải phản ánh kết quả hoạt động của mức độ thông thạo CEFR C2, bằng cách làm như vậy, giám khảo sẽ thấy rõ kỳ vọng về mức độ thông thạo CEFR C2 và xếp thí sinh vào mức điểm CEFR C1 với độ tin cậy cao hơn.

(6) Các thí sinh cho rằng thông tin về phần thi nói VSEP.3-5 bao gồm thang điểm đánh giá kỹ năng nói và các bài kiểm tra mẫu chỉ được tiếp cận với họ một cách hạn chế. Và những thí sinh chuẩn bị tốt hơn cho bài kiểm tra có xu hướng đạt điểm cao hơn những người không chuẩn bị. Các đơn vị tổ chức thi VSTEP.3-5 nên cung cấp cho người dự thi quyền truy cập không chỉ vào định dạng đề thi, mà còn cả thang đánh giá và các bài làm mẫu, đề thi mẫu để người dự thi hiểu rõ những gì họ mong đợi sẽ thực hiện ở các mức độ thông thạo khác nhau, và vì vậy họ có thể chuẩn bị tốt hơn cho bài thi.

(7) Một tình huống của Phần 2 của bài thi dường như khó đối với một nhóm thí sinh so với các nhóm khác. Điều này cho thấy nên thử nghiệm các đề thi một cách kỹ lưỡng hơn. Chúng tôi đặc biệt khuyến nghị rằng tất cả các đề thi nên được xây dựng theo đúng 12 giai đoạn phát triển đề thi do Bộ Giáo dục và Đào tạo Việt Nam ban hành kèm theo Thông tư 23/2017/TT-BGDĐT, trong đó mỗi bài kiểm tra nên được thi thử trước hai lần. Bên cạnh đó, để điểm số được phân tích đúng với độ tin cậy cao, đặc biệt cho phần thi nói và viết, các ứng dụng như FACETS và R, trên đó có thể chạy Mô hình thang điểm đánh giá và / hoặc Mô hình phân tích nhân tố, nên được sử dụng để phân tích điểm thi.

Thứ hai, nghiên cứu xác định giá trị xác thực của quá trình tư duy khi thí sinh thi bài thi nói VSTEP.3-5 lần nữa đã minh chứng rằng mô hình tu duy xã hội của Weir cung cấp một khung lý thuyết toàn diện để phát triển và xây dựng giá trị xác thực cho bài thi ngôn ngữ. Đặc biệt đối với giá trị xác thực của một khía cạnh nhất định của bài kiểm tra ngôn ngữ như giá trị xác thực về quá trình tư duy, mô hình tư duy xã hội của Weir không chỉgiúp phát triển lập luận xác thực có hệ thống cho một bài kiểm tra ngôn ngữ cụ thể mà còn giúp xác định vấn đề mà bài kiểm tra ngôn ngữ có thể gặp phải ở cả hai các giai đoạn phát triển và xây dựng giá trị xác thực. Do đó, chúng tôi đặc biệt khuyến nghị rằng mô hình tư duy xã hội của Weir nên được áp dụng khi một bài kiểm tra ngôn ngữ mới được phát triển và xác thực.

Ngoài ra, khung tư duy – xã hội của Weir được chứng minh là có tính ứng dụng cao trong việc phát triển và xác thực bài kiểm tra ngôn ngữ, tuy nhiên khi áp dụng khung này trong việc xác thực bài kiểm tra ngoại ngữ như VSTEP.3-5, các đặc điểm cụ thể của việc học tiếng Anh như một ngoại ngữ cần được nghiên cứu kỹ lưỡng để có thể được áp dụng tốt hơn. Trong trường hợp về giá trị xác thực, các quá trình tư duy mà người dự thi có thể trải qua khi làm bài kiểm tra tuân theo các quy trình tư duy của Levelt (1988, 1989) về sản sinh lời nói cho người nói ngôn ngữ thứ nhất và quy trình tư duy của Weir (2005) về sản sinh lời nói cho người nói ngôn ngữ thứ hai nên hai mô hình này gần như giống nhau. Tuy nhiên, khi nghiên cứu tính xác thực về quá trình tư duy của bài thi nói VSTEP.3-5, cần thiết phải phát triển một mô hình tư duy cập nhật cho quá trình sản sinh lời nói của những người nói một ngôn ngữ như ngoại ngữ, vấn đề này được nảy phát sinh sau thực tế là một số người dự thi bài thi nói VSTEP.3-5 đã suy nghĩ bằng tiếng Việt, ngôn ngữ thứ nhất của họ. Việc suy nghĩ bằng ngôn ngữ thứ nhất khi nói một ngoại ngữ ảnh hướng nhất định đến khả năng nói của họ khi làm bài thi và có thể cả trong điều kiện không làm bài thi, đặc biệt trong trường hợp của người dự thi có trình độ CEFR B1 trở xuống.

Những hướng nghiên cứu tiếp theo:

Trước hết, Quyết định số 729/QĐ-BGDĐT ngày 11 tháng 3 năm 2015 do Bộ Giáo dục và Đào tạo Việt Nam ban hành có thể cần được sửa đổi liên quan đến các đặc tả kỹ thuật của bài thi VSTEP.3-5. Để thuyết phục các nhà nghiên cứu và các nhà hoạch định chính sách thay đổi Quyết định như vậy, cần thiết phải tiến hành các nghiên cứu sâu hơn để đưa ra minh chứng rõ ràng hơn về sự cần thiết phải sửa đổi bản đặc tả kỹ thuật của bài kiểm tra này và phần nào của đặc tả kỹ thuật này cần được sửa đổi.

Thứ hai, nghiên cứu này được thực hiện trên quy mô thu thập dữ liệu nhỏ (288 người dự thi). Để dữ liệu thu thập được có giá trị và mức độ tin cậy cao hơn, các nghiên cứu tương tự với quy mô thu thập dữ liệu lớn hơn được khuyến khích tiến hành.

Thứ ba, hiện tại, khung tư duy – xã hội của Weir được chứng minh là có tính ứng dụng cao để thực hiện các nghiên cứu về giá trị xác thực; tuy nhiên, mô hình về quá trình tư duy trong sản sinh lời nói hiện tại chỉ được phát triển cho người nói ngôn ngữ thứ nhất và ngôn ngữ thứ 2. Một mô hình có xem xét đến các đặc điểm của việc thụ đắc ngoại ngữ nên được phát triển để cung cấp mô hình hoàn hảo hơn để thiết lập giá trị xác thực của một bài kiểm tra ngoại ngữ, đặc biệt là khi bài kiểm tra được thiết kế để đánh giá người dự thi có năng lực CEFR B1 và các cấp độ thông thạo thấp hơn. Do đó, các nghiên cứu tiếp theo có thể được tiến hành để xây dựng mô hình tư duy cho người nói ngoại ngữ, điều này có thể khả thi bằng cách áp dụng mô hình của Levelt cho người nói ngoại ngữ hoặc một mô hình như vậy có thể được phát triển mới vì cho đến nay mô hình nói ngôn ngữ thứ nhất của Levelt là mô hình duy nhất có thể áp dụng trong việc phát triển và xây dựng giá trị xác thực cho đề kiểm tra ngôn ngữ.

Các công trình đã công bố có liên quan đến luận án:

1. Nguyen Thi Mai Huu (2019). Stimulated recall – A practical data collection technique for cognitive studies in language testing. Journal of Foreign Languages Studies, 58/2019, 38-50.

2. Nguyen Thi Mai Huu (2020). Application of the socio-cognitive framework in language test development and validation. 2020 International graduate research symposium ND 10^TH East Asia Chinese teaching forum, Volume 1, 727-735.

3. Phuong Tran, Hoa Nguyen, Trang Dang, Minh Nguyen, Lan Nguyen, Tuan Huynh, Ha Do, Huu Nguyen, Fred Davidson (2015). A validation study on the newly-developed Vietnam standardized English proficiency test. 37^th Language Testing Research Colloquium, 142.

4. Victoria Clark, Nguyen Thi Mai Huu, Jessica Wu, Jamie Dunlea, Richard West (2016). Test centers and standardized Testing: Challenges, issues and benefits. 4^th British Council New Directions in English language assessment conference, 19-20.

5. Nathan Carr, Quynh Nguyen, Huu Nguyen, Yen Nguyen, Thao Nguyen (2016). Systematic support for a communicative standardized proficiency test in Vietnam. 4^th British Council New Directions in English language assessment conference, 26-27.

6. Huu Nguyen (2018). An Investigation into the Cognitive Validity of the Speaking Section of the Vietnam’s Standardized Test of English Proficiency. 40^th Language Testing Research Colloquium, 114.

7. Huu Nguyen (2019). An investigation into the cognitive processes reflected in the VSTEP speaking scoring rubrics. 7^th British Council New Directions in English language assessment conference, 22.

8. Barry O’Sullivan, Josep Lo Bianco, Huu Nguyen, Mitsuharu Ota, Yoshinori Watanabe (2019). Language Assesment Policy. 7^th British Council New Directions in English language assessment conference, 18.

Ngày 30 tháng 12 năm 2020

Nghiên cứu sinh

Nguyễn Thị Mai Hữu

INFORMATION ON DOCTORAL THESIS

Full name: Nguyen Thi Mai Huu
Sex: Female
Date of birth: 22/10/1978
Place of birth: Hanoi
Admission Decision number: 2019/QĐ-ĐHNN Dated 31/12/2014
Changes in academic process:

2 years of extension

Official thesis title: AN INVESTIGATION into THE COGNITIVE VALIDITY OF THE SPEAKING SECTION OF THE VIETNAMESE STANDARDIZED TEST OF ENGLISH PROFICIENCY (vstep.3-5)
Major: English language teaching methodology
Code: 9140231.01
Supervisors: Professor Nguyen Hoa

Supervisor: Professor Fred Davidson

Summary of the new findings of the thesis:

With the application of the Weir’s socio cognitive framework for language test development and validation, the cognitive processes that the VSTEP.3-5 test takers are expected to experience, actually experienced in test and non-test conditions were investigated to provide validity evidence for the VSTEP.3-5 speaking section. The validity evidence covers three headings including the cognitive processes that the test is supposed to cover, the calibration of the cognitive demands across the different levels of the test, and the similarity between the processes in the test and in non-test conditions.

Firstly, the cognitive processes that the test takers may encounter when taking the VSTEP.3-5 have been addressed at both the development and administration stages. Cognitive validity evidence was established for VSTEP.3-5 speaking section based on the model of speech production developed by Levelt (1989, 1999) and applied by Weir (2005) in the socio-cognitive model of test development and validation. The model falls into six major phases of processing: conceptualization, grammatical encoding, phonological encoding, phonetic encoding, articulation and self-monitoring. Though the Weir’s socio cognitive model was not the theoretical framework applied when the VSTEP.3-5 speaking section was developed, the cognitive processes of the model provide a good framework for cognitive validity study of the test. All the patterns described in the model could be found in the VSTEP.3-5 speaking section, showing the effort of the VSTEP.3-5 development team in dealing with the cognitive demands that the test takers may encounter when taking the test. Besides, when applying the model to establish cognitive validity evidence for the speaking section, several issues have been identified including:

The use of colloquialisms is only mentioned at C2 level of proficiency of the CEFR and the CEFR-VN; however, they are found in the descriptors of proficiency bands 9 and 10 of the VSTEP.3-5 speaking rating scale.
Pauses and hesitation are found in the descriptors of almost all the bands of the VSTEP.3-5 rating scale; however, no further explanation is remarked of the difference of such pauses and hesitation across the bands.
The interlocutor and assessor’s cognitive load when interviewing and assessing the test takers were described in the CEFR and so the CEFR-VN, the theoretical framework applied when designing the VSTEP.3-5 speaking test; however, such load is not mentioned in the specifications or the examiner training manuals.

Secondly, the descriptors of the VSTEP.3-5 speaking rating scale are arranged properly with increasing level of difficulty from band 1 to band 10, except for some of the descriptors as below:

The bands of different criteria of the rating scale should be placed more concern, including Vocabulary bands 4,5, 9,10 and Fluency bands 6 and 7. No significant difference found among the logits arranged for the other descriptors of the same bands of different criteria.
Significant difference found between the speaking scores and the overall scores of students of C1 level. The speaking score bands of 9 and 10 of all speaking criteria should be studied and the oral examiners should be informed of this pattern of the speaking scores.
Certain descriptors of the VSTEP.3-5 speaking rating scale stand out, among those are grammar bands 3, 4, 6, 7; vocabulary bands 4, 5, 9, 10; discourse management band 3, 9, 10; pronunciation bands 7, 8; fluency bands 6, 7. These are the descriptors with difficulty levels arranged into special positions as compared to the other descriptors of the same bands.

On the whole, the descriptors which are adjacent to each other and correspond to the same level of proficiency of the test takers are of relatively same level of difficulty. Besides, when studying the descriptors carefully, it seems to be difficult to see the difference between the performance of the test takers that correspond to those adjacent bands of a same criterium of the scale. Another matter identified is the descriptors of Vocabulary bands 9 and 10 include requirement about the ability to use idiomatic expressions and colloquialisms, which is not mentioned in the CEFR or CEFR-VN. Then, it is noticeable that bands 9 and 10 of all the criteria seem to be placed higher level of cognitive demand as compared to all the other bands. Some of the bands such as the ones for Vocabulary describe the performance that is not mentioned in the construct of the test; however, the same feature has not been observed in the other bands.

Thirdly, relating the cognitive processes in VSTEP.3-5 speaking test and non-test conditions, the survey questionnaires and stimulated recall interviews’ results showed that

The test takers experienced all the stages of cognitive processes when performing in the test condition. All five stages of the processes are observed including conceptualization, phonological encoding, grammatical encoding, phonetic encoding and articulation and self-monitoring.
More than half of the VSTEP.3-5 test takers experienced similar speaking cognitive processes in the test and non-test conditions. All five stages of the processes are observed including conceptualization, phonological encoding, grammatical encoding, phonetic encoding and articulation and self-monitoring.
As for those who said that the processes were not the same thought that they would have performed better than what they performed on the day. They claimed that the information about VSEP.3-5 speaking section including the speaking rating scale and the same tests were of limited access to them. The test takers who prepared better for the test tend to get higher scores than those who did not.
One situation of Part 2 of the test seems to be difficult to one group of test takers as compared to the other groups. This suggests that the test forms should be piloted more carefully.

Practical applicability, if any:

Though the cognitive processes that the test takers may encounter when taking the VSTEP.3-5 have been addressed at both the development and administration stages of the test, certain issues have been identified including:

The use of colloquialisms is only mentioned at C2 level of proficiency of the CEFR and the CEFR-VN; however, they are found in the descriptors of proficiency bands 9 and 10 of the VSTEP.3-5 speaking rating scale. In order that the examinees are well aware of such inclusion, such information should be included in the oral examiners’ training manual. Another way to deal with such descriptors is to remove them from the rating scale. In order to do so the test specifications of the VSTEP.3-5 test should be amended, which leads to the amendment of the Decision No. 729/QD-BGDDT dated March 11^th 2015 issued by the Ministry of Education and Training of Vietnam.
Pauses and hesitation are found in the descriptors of almost all the bands of the VSTEP.3-5 rating scale; however, no further explanation is remarked of the difference of such pauses and hesitation across the bands. Thus, further explanation should be included in the oral examiners’ training manual so as that the examiners are not confused of the signals when grading the performance of the test takers.
The interlocutor and assessor’s cognitive load when interviewing and assessing the test takers was described in the CEFR and so the CEFR-VN, the theoretical framework applied when designing the VSTEP.3-5 speaking test; however, such load is not mentioned in the specifications or the oral examiner’s training manuals. It is highly recommended that the factors that affect the interlocutor and assessor’s cognitive load interviewing and assessing the VSTEP.3-5 test takers should be included in the oral examiners training manual so that the interlocutors/assessors are aware of such issues when interviewing and rating. One particular suggestion is that the interlocutors/assessors are not provided with time to do grading work between adjacent test takers. They would find it to be practical when around one minute should be provided between two adjacent test takers so that they can complete the grading of the test taker who performed.
Tt seems to be difficult to see the difference between the performances of the test takers that correspond to those adjacent bands of a same criterium of the scale. Such issue may hinder the oral examiners from putting the test takers into the correct bands of the rating scale. In order that the examiners could precisely put the test takers’ performance on the correct bands of the scale, quantified features of the bands should be included in the oral examiners’ training manual or sample performance of the test takers in the forms of audio and/or videos should be developed for all the bands of the rating scale. Another way to deal with such situation is to merge the descriptors of the bands which correspond to a same level of English proficiency, in so doing, the number of bands of the VSTEP.3-5 rating scale will reduce to 5 bands. This will simplify the grading work of the examiners and will probably lead to better reliability of the work.
All bands 9 and 10 are of significantly higher level of difficulty as compared to all bands 8 of the VSTEP.3-5 rating scale, which probably led to the low percentage of test takers who were graded C1 level as compared to other proficiency levels gauged by the test. In order that the speaking scores are more reliable for those test takers, the examiners should be provided with such data when they are trained.
The test takers claimed that the information about VSEP.3-5 speaking section including the speaking rating scale and the sample tests were of limited access to them. And the test takers who prepared better for the test tend to get higher scores than those who did not. It is recommended that VSTEP.3-5 administration organizations should provide test takers with access to not only the test format, but also the rating scale and sample test forms so that the prospective test takers understand well what are expected of them to perform at different levels of proficiency, and so they could be better prepared for the test.
One situation of Part 2 of the VSTEP.3-5 test seems to be difficult to one group of test takers as compared to the other groups. This suggests that the test forms should be piloted more carefully. It’s highly recommended that all test forms should be developed strictly following the 12 stages of test development provided by the Vietnam Ministry of Education and Training, with which, each test task should be pretested twice. Besides, in order that the pretest scores are analyzed properly, specifically for speaking and writing sections, applications like FACETS and R, on which the Rating Scale Model and/or the Partial Credit Model can be run, should be used for test scores analysis.

Then, from the case of the VSTEP.3-5 cognitive validity study, it is proved that the Weir socio cognitive model provides a comprehensive framework for language test development and validation. Specifically for the validity of a certain aspect of a language test like cognitive validity, the Weir’s socio cognitive model is approvingly applicable for it not only helps develop systematic validity argument for a particular language test but helps identify problem that face such studied language test at both the development and administration stages of the test. Thus, it is highly recommended that Weir’s socio cognitive model should be applied when a new language test is developed and validated.

Third, the Weir’s socio cognitive framework proves to be of highly applicable to language test development and validation, yet when applying the framework in validating a foreign language test like VSTEP.3-5, the specific features of learning English as a foreign language should be studied carefully to better apply the framework. In case of cognitive validity, the cognitive processes that the test taker may go through when taking the test follow Levelt’s cognitive processes of spoken production for L1 speakers (1988, 1989) and Weir’s adopted cognitive processes of spoken production for L2 speakers (2005) should be are almost the same. Nevertheless, when investigating the cognitive validity of the VSTEP.3-5 speaking section, the need to develop an updated cognitive model for oral production of foreign language speakers is arisen following the fact that a number of the VSTEP.3-5 studied test takers think in their L1 Vietnamese, which to certain extent affected their speaking performance in the test condition and probably in the non-test conditions as well, specifically in the cases of the test takers of CEFR B1 and lower proficiency.

Further research directions, if any:

First, it is recommended by the researcher hereof that Decision No. 729/QD-BGDDT dated March 11^th 2015 issued by the Ministry of Education and Training of Vietnam should be amended with regards to the specifications of the VSTEP.3-5 test. In order to convince researchers and policy makers to make such a change to the Decision, further studies should be conducted to place stronger evidence on the urge to make amendment to the test specifications of the test and which part of the specifications should be amended as well.

Second, the study herein was accomplished on a small data collection size (288 test takers). In order that the data collected are of higher validity and reliability levels, similar studies with bigger data collection size are highly recommended to be conducted.

Third, for the moment, the Weir’s socio cognitive framework proves to be of highly applicable to conduct cognitive validity studies; however, the model of cognitive processing in oral production was only developed for L1 and L2 speakers. A model that takes into consideration the features of foreign language acquisition should be developed to provide better impeccable model for establishing the cognitive validity of a foreign language test, specifically when the test is designed for assessment of test takers of CEFR B1 and lower levels of proficiency. Thus, prospective studies could be conducted to develop speaking cognitive processing model for foreign language speakers, which could be placed feasible by adopting the Levelt’s model for L1 speakers or such a model could be newly developed for the fact that by far Levelt’s model of L1 speaking cognitive processing is the only model applicable in language test development and validation.

Thesis-related publications:1. Nguyen Thi Mai Huu (2019). Stimulated recall – A practical data collection technique for cognitive studies in language testing. Journal of Foreign Languages Studies, 58/2019, 38-50.2. Nguyen Thi Mai Huu (2020). Application of the socio-cognitive framework in language test development and validation. 2020 International graduate research symposium ND 10^TH East Asia Chinese teaching forum, Volume 1, 727-735.3. Phuong Tran, Hoa Nguyen, Trang Dang, Minh Nguyen, Lan Nguyen, Tuan Huynh, Ha Do, Huu Nguyen, Fred Davidson (2015). A validation study on the newly-developed Vietnam standardized English proficiency test. 37^th Language Testing Research Colloquium, 142.4. Victoria Clark, Nguyen Thi Mai Huu, Jessica Wu, Jamie Dunlea, Richard West (2016). Test centers and standardized Testing: Challenges, issues and benefits. 4^th British Council New Directions in English language assessment conference, 19-20.5. Nathan Carr, Quynh Nguyen, Huu Nguyen, Yen Nguyen, Thao Nguyen (2016). Systematic support for a communicative standardized proficiency test in Vietnam. 4^th British Council New Directions in English language assessment conference, 26-27.6. Huu Nguyen (2018). An Investigation into the Cognitive Validity of the Speaking Section of the Vietnam’s Standardized Test of English Proficiency. 40^th Language Testing Research Colloquium, 114.7. Huu Nguyen (2019). An investigation into the cognitive processes reflected in the VSTEP speaking scoring rubrics. 7^th British Council New Directions in English language assessment conference, 22.
8. Barry O’Sullivan, Josep Lo Bianco, Huu Nguyen, Mitsuharu Ota, Yoshinori Watanabe (2019). Language Assesment Policy. 7^th British Council New Directions in English language assessment conference, 18.