View Issue Details

IDProjectCategoryView StatusLast Update
16273Bug reportsConditionspublic2020-08-07 16:50
Reporteraquigar Assigned To 
PrioritynormalSeverityminor 
Status confirmedResolutionopen 
Product Version3.22.15 
Summary16273: Validation regex including unicode characters fails
Description

I´m testing this simple regex to validate the content of an user response:

/^[A-Z0-9\s]+$/

It matches capital letters, numbers and whitespaces. It seems to be correct and works fine.

But assume I would like to include the unicode char "á".

I've tested the following regexes with no results:

/^[áA-Z0-9\s]+$/

/^[\x00E1A-Z0-9\s]+$/

/^[\x{00E1}A-Z0-9\s]+$/

/^[\u00E1A-Z0-9\s]+$/

0x00E1 is the hex code value for "á"

Tha validation test fails in all the cases above

Steps To Reproduce

Using a test survey, apply this validation regex to any response field and test...

TagsNo tags attached.
Complete LimeSurvey version number (& build)3.22.15+200505
I will donate to the project if issue is resolvedNo
Browser
Database & DB-Versionmysql Ver 15.1 Distrib 10.1.24-MariaDB, for Linux (x86_64) using readline 5.1
Server OS (if known)Red Hat Enterprise Linux Server release 7.3 (Maipo)
Webserver software & version (if known)
PHP Version7.1.5

Relationships

related to 16531 confirmed Validation regex including unicode characters fails 

Activities

cdorin

cdorin

2020-05-14 14:30

manager   ~57786

Forum info: https://forums.limesurvey.org/forum/design-issues/121205-validation-regex-including-unicode-characters

gabrieljenik

gabrieljenik

2020-07-21 21:17

developer   ~59028

OK, so what if use decodeHtml or htmlspecialchars_decode for html decoding the "within" parameter in the LEMRegexMatch function.
https://github.com/LimeSurvey/LimeSurvey/blob/7092a6f600d2019811b0a722d09475101c37cd85/assets/packages/expressions/em_javascript.js#L847-L858

Comments?

gabrieljenik

gabrieljenik

2020-07-22 17:25

developer   ~59049

@DenisChenu What do you think?

DenisChenu

DenisChenu

2020-07-22 17:38

developer   ~59050

Are you sure it work ?

My opinion is :
We use preg_match with PHP : https://github.com/LimeSurvey/LimeSurvey/blob/7092a6f600d2019811b0a722d09475101c37cd85/application/helpers/expressions/em_core_helper.php#L3050-L3058
Why we use different option in JS

Why not https://locutus.io/php/pcre/preg_match/ directly …
And for unicode : https://stackoverflow.com/a/12897222/2239406

Maybe start by remove gimy modifier …

DenisChenu

DenisChenu

2020-07-22 17:39

developer   ~59051

PS : still wait a stable 4.X to create some own function (for example : a real word counter …)

gabrieljenik

gabrieljenik

2020-07-28 02:28

developer   ~59113

Decoding html before running regex. This (a decode string) is similar to what the PHP side regex function gets.

PR: https://github.com/LimeSurvey/LimeSurvey/pull/1499

sushmanadendla

sushmanadendla

2020-08-06 14:36

manager   ~59342

Tested the issue before pulling the PR, Issue exist. Tested the issue after pulling the PR, below are my findings:
The Scenario fails in below cases:
/^[\x00E1A-Z0-9\s]+$/

/^[\x{00E1}A-Z0-9\s]+$/

/^[\u00E1A-Z0-9\s]+$/

Screenshot 1: Includes scenarios where there is no Unicode included and with "á" Unicode
Screenshot 2: Includes scenarios where it works with Unicode "á" but fails for above scenarios

Please refer the attachment for more details

16273_BeforePR.png (202,377 bytes)
16273_AfterPR.png (159,161 bytes)
sushmanadendla

sushmanadendla

2020-08-07 16:50

manager   ~59374

Actually the codes mentioned above where wrong , I tried giving as below :
^[\u00E1A-Z0-9\s]+$

^[\xE1A-Z0-9\s]+$

Working as expected

Issue History

Date Modified Username Field Change
2020-05-14 13:19 aquigar New Issue
2020-05-14 14:29 cdorin Priority none => normal
2020-05-14 14:29 cdorin Status new => confirmed
2020-05-14 14:30 cdorin Note Added: 57786
2020-07-21 21:17 gabrieljenik Note Added: 59028
2020-07-22 17:25 gabrieljenik Note Added: 59049
2020-07-22 17:38 DenisChenu Note Added: 59050
2020-07-22 17:39 DenisChenu Note Added: 59051
2020-07-28 02:20 gabrieljenik Issue cloned: 16531
2020-07-28 02:20 gabrieljenik Relationship added related to 16531
2020-07-28 02:28 gabrieljenik Note Added: 59113
2020-08-06 14:36 sushmanadendla Note Added: 59342
2020-08-06 14:36 sushmanadendla File Added: 16273_BeforePR.png
2020-08-06 14:36 sushmanadendla File Added: 16273_AfterPR.png
2020-08-07 16:50 sushmanadendla Note Added: 59374