View Issue Details

This bug affects 1 person(s).
 12
IDProjectCategoryView StatusLast Update
16273Bug reportsConditionspublic2020-08-12 11:11
Reporteraquigar Assigned Togabrieljenik  
PrioritynormalSeverityminor 
Status closedResolutionfixed 
Product Version3.22.15 
Summary16273: Validation regex including unicode characters fails
Description

I´m testing this simple regex to validate the content of an user response:

/^[A-Z0-9\s]+$/

It matches capital letters, numbers and whitespaces. It seems to be correct and works fine.

But assume I would like to include the unicode char "á".

I've tested the following regexes with no results:

/^[áA-Z0-9\s]+$/

/^[\x00E1A-Z0-9\s]+$/

/^[\x{00E1}A-Z0-9\s]+$/

/^[\u00E1A-Z0-9\s]+$/

0x00E1 is the hex code value for "á"

Tha validation test fails in all the cases above

Steps To Reproduce

Using a test survey, apply this validation regex to any response field and test...

TagsNo tags attached.
Bug heat12
Complete LimeSurvey version number (& build)3.22.15+200505
I will donate to the project if issue is resolvedNo
Browser
Database type & versionmysql Ver 15.1 Distrib 10.1.24-MariaDB, for Linux (x86_64) using readline 5.1
Server OS (if known)Red Hat Enterprise Linux Server release 7.3 (Maipo)
Webserver software & version (if known)
PHP Version7.1.5

Relationships

related to 16531 closedgabrieljenik Validation regex including unicode characters fails 

Users monitoring this issue

vkuzmin

Activities

cdorin

cdorin

2020-05-14 14:30

reporter   ~57786

Forum info: https://forums.limesurvey.org/forum/design-issues/121205-validation-regex-including-unicode-characters

gabrieljenik

gabrieljenik

2020-07-21 21:17

manager   ~59028

Last edited: 2020-08-10 16:36

OK, so what if use decodeHtml or htmlspecialchars_decode for html decoding the "within" parameter in the LEMRegexMatch function.
https://github.com/LimeSurvey/LimeSurvey/blob/7092a6f600d2019811b0a722d09475101c37cd85/assets/packages/expressions/em_javascript.js#L847-L858

Comments?

gabrieljenik

gabrieljenik

2020-07-22 17:25

manager   ~59049

Last edited: 2020-08-10 16:36

@DenisChenu What do you think?

DenisChenu

DenisChenu

2020-07-22 17:38

developer   ~59050

Last edited: 2020-08-10 16:36

Are you sure it work ?

My opinion is :
We use preg_match with PHP : https://github.com/LimeSurvey/LimeSurvey/blob/7092a6f600d2019811b0a722d09475101c37cd85/application/helpers/expressions/em_core_helper.php#L3050-L3058
Why we use different option in JS

Why not https://locutus.io/php/pcre/preg_match/ directly …
And for unicode : https://stackoverflow.com/a/12897222/2239406

Maybe start by remove gimy modifier …

DenisChenu

DenisChenu

2020-07-22 17:39

developer   ~59051

Last edited: 2020-08-10 16:36

PS : still wait a stable 4.X to create some own function (for example : a real word counter …)

gabrieljenik

gabrieljenik

2020-07-28 02:28

manager   ~59113

Last edited: 2020-08-10 16:36

Decoding html before running regex. This (a decode string) is similar to what the PHP side regex function gets.

PR: https://github.com/LimeSurvey/LimeSurvey/pull/1499

user225042

2020-08-06 14:36

  ~59342

Last edited: 2020-08-10 16:36

Tested the issue before pulling the PR, Issue exist. Tested the issue after pulling the PR, below are my findings:
The Scenario fails in below cases:
/^[\x00E1A-Z0-9\s]+$/

/^[\x{00E1}A-Z0-9\s]+$/

/^[\u00E1A-Z0-9\s]+$/

Screenshot 1: Includes scenarios where there is no Unicode included and with "á" Unicode
Screenshot 2: Includes scenarios where it works with Unicode "á" but fails for above scenarios

Please refer the attachment for more details

16273_BeforePR.png (202,377 bytes)
16273_AfterPR.png (159,161 bytes)

user225042

2020-08-07 16:50

  ~59374

Last edited: 2020-08-10 16:36

Actually the codes mentioned above where wrong , I tried giving as below :
^[\u00E1A-Z0-9\s]+$

^[\xE1A-Z0-9\s]+$

Working as expected

gabrieljenik

gabrieljenik

2020-08-10 16:11

manager   ~59394

Last edited: 2020-08-10 16:36

Fix committed to 3.x-LTS branch: http://bugs.limesurvey.org/plugin.php?page=Source/view&id=30366

lime_release_bot

lime_release_bot

2020-08-12 11:11

administrator   ~59437

Fixed in Release 3.23.0+200813

Related Changesets

LimeSurvey: 3.x-LTS f35f77b0

2020-08-10 17:45

gabrieljenik

Committer: GitHub


Details Diff
Fixed issue 16273 - Validation regex including unicode characters fails (#1499)

Decoding html before running regex. This (a decode string) is similar to what the PHP side regex function gets.
Affected Issues
16273
mod - assets/scripts/expressions/em_javascript.js Diff File

Issue History

Date Modified Username Field Change
2020-05-14 13:19 aquigar New Issue
2020-05-14 14:29 cdorin Priority none => normal
2020-05-14 14:29 cdorin Status new => confirmed
2020-05-14 14:29 cdorin Zoho Sprints => |Yes|
2020-05-14 14:30 swendrich Zoho Sprints ID => 14469000000091001
2020-05-14 14:30 cdorin Note Added: 57786
2020-07-21 21:17 gabrieljenik Note Added: 59028
2020-07-22 17:25 gabrieljenik Note Added: 59049
2020-07-22 17:38 DenisChenu Note Added: 59050
2020-07-22 17:39 DenisChenu Note Added: 59051
2020-07-23 10:08 vkuzmin Issue Monitored: vkuzmin
2020-07-28 02:20 gabrieljenik Issue cloned: 16531
2020-07-28 02:20 gabrieljenik Relationship added related to 16531
2020-07-28 02:28 gabrieljenik Note Added: 59113
2020-08-06 14:36 user225042 Note Added: 59342
2020-08-06 14:36 user225042 File Added: 16273_BeforePR.png
2020-08-06 14:36 user225042 File Added: 16273_AfterPR.png
2020-08-07 16:50 user225042 Note Added: 59374
2020-08-10 16:11 gabrieljenik Changeset attached => LimeSurvey 3.x-LTS f35f77b0
2020-08-10 16:11 gabrieljenik Note Added: 59394
2020-08-10 16:11 gabrieljenik Assigned To => gabrieljenik
2020-08-10 16:11 gabrieljenik Resolution open => fixed
2020-08-10 16:36 swendrich Zoho Sprints Yes => |Yes|
2020-08-10 16:36 swendrich Status confirmed => resolved
2020-08-12 11:11 lime_release_bot Zoho Sprints Yes => |Yes|
2020-08-12 11:11 lime_release_bot Note Added: 59437
2020-08-12 11:11 lime_release_bot Status resolved => closed
2021-08-03 04:11 guest Bug heat 10 => 12