View Issue Details

This bug affects 1 person(s).
 12
IDProjectCategoryView StatusLast Update
05988Bug reportsConditionspublic2012-08-04 14:51
ReporterLise94 Assigned Toc_schmitz  
PrioritynormalSeverityminor 
Status closedResolutionfixed 
Product Version1.92+ 
Fixed in Version1.92+ 
Summary05988: Problem displaying words with French accent within Expression Manager
Description

Hi,

I want to use expression manager to display a title question depending of a previous answer. No problem to do that,

but when an accent is in the expression manager, then, the question is not displayed properly and the respondent can see é instead of é :

this problem is only seen in the respondent screen, and not in the admin part.

see file attached for an exemple

TagsNo tags attached.
Attached Files
ls2_specialchars.lss (32,968 bytes)
Bug heat12
Complete LimeSurvey version number (& build)120330
I will donate to the project if issue is resolvedNo
BrowserFirefox
Database type & versionmysql
Server OS (if known)Windows
Webserver software & version (if known)?
PHP Version5

Users monitoring this issue

DenisChenu, Lise94

Activities

Lise94

Lise94

2012-04-05 19:58

reporter   ~18236

the respondent see & e a c u t e ; instead of é
(I have to put spaces between the letter, otherwise, you can't see the "word" that is really displayed)

TMSWhite

TMSWhite

2012-04-07 16:32

reporter   ~18247

c_schmitz - the value in the database has the "wrong" value - it contains "Pourquoi {if((boisson.code=='Y'),'aimez-vous le thé','non')} ?"

EM uses htmlspecialchars_decode() when processing database values. Should it be using html_entities_decode(), or should the database be storing non-entity-encoded values?

c_schmitz

c_schmitz

2012-04-07 17:08

administrator   ~18250

The database should always store the pure UTF-8 encoded non-entity-encoded value.

TMSWhite

TMSWhite

2012-04-07 17:09

reporter   ~18251

OK, so EM is working properly, and somehow the wrong value is being inserted into the database.

DenisChenu

DenisChenu

2012-04-07 17:48

developer   ~18253

Maybe something with database.

I try: importing lss file, and look and save question text.

  • Using : no HTML editor, corrected éto é : Work fine.
  • Activate inline HTML editor : work fine : in db have é, not é
  • Using popup HTML editor, open the HTML editor, don't corrected ( lot of'), save : OK in db.

Then i test with:
Import lss file, direct testing : Not OK.
in db:
aimez-vous le th& eacute ;?
Pourquoi {if((boisson.code=='Y'),'aimez-vous le th& eacute ;','non')} ?
BUT !
LEM change & to & amp;

Did LEM have to accept utf8 encoding AND html encoding ? Then don't replace & to & amp;

DenisChenu

DenisChenu

2012-04-07 18:05

developer   ~18254

Javascript is good:

LEMif(LEManyNA('boisson.code'),'',(LEMif((LEMval('boisson.code') == 'Y'), 'aimez-vous le th& eacute;', 'non')))); (without & amp;)

It's a javascript issue , see some example :
http://www.naveen.com.au/javascript/jquery/encode-or-decode-html-entities-with-jquery/289

c_schmitz

c_schmitz

2012-04-07 18:38

administrator   ~18257

Last edited: 2012-04-07 18:41

TMSWhite: I am sorry, but when I meant 'should' it is not meant in an absolute way. As long as it is valid HTML also entity-encoded is allowed, preferred is non-encoded though.

TMSWhite

TMSWhite

2012-04-07 18:39

reporter   ~18258

One core question is how to properly protect EM from cross-site scripting attacks.

I used htmlspecialchars, which takes care of '>','<','"',"'", and '&'. Sounds like we may not need to escape '&'.

However the main issue is that the database is getting the wrong value stored. If you fix the database contents, you'll see the following generated:

LEMif(LEManyNA('boisson.code'),'',(LEMif((LEMval('boisson.code') == 'Y'), 'aimez-vous le thé', 'non'))));

TMSWhite

TMSWhite

2012-04-07 18:48

reporter   ~18260

Carsten-

We could have EM process all content through html_entities_decode(), but that is potentially risky. It will work if the original is encoded through html_entities_encode(), but if the original is only processed through html_specialchars(), we might get the wrong result.

It would be nice to ensure we're consistent in the database and always encode/decode with the same function.

c_schmitz

c_schmitz

2012-04-07 19:08

administrator   ~18261

I don't understand why we might get the wrong result?

AFAIK htmlentities_decode does not care if encoding with html_specialchars or html_entities_encode was done. If we assume that always valid HTML is supposed to be used it should be fine.

DenisChenu

DenisChenu

2012-04-07 19:14

developer   ~18262

Last edited: 2012-04-07 19:18

Carsten: it's javascript functionality.

If you make :
< div id="myelement" >< /div >
< script >
document.getElementById("myelement").innerHTML="&";
< /script >

You get:
< div id="myelement" >& amp ;< /div >
and not
< div id="myelement" >&< /div >

EDIT : mlaybe try with https://developer.mozilla.org/fr/JavaScript/R%C3%A9f%C3%A9rence_JavaScript/R%C3%A9f%C3%A9rence_JavaScript/Fonctions_globales/decodeURIComponent

TMSWhite

TMSWhite

2012-04-07 19:30

reporter   ~18263

Denis - if so, where in the source code is the problem? Somewhere before insertion into the database, I presume.

Carsten-

You are right - I thought there might be a way to do injection attacks by first encoding with html_spacialchars() and then decoding with html_entities_decode(), but I can't create a working test case of that.

TMSWhite

TMSWhite

2012-04-07 21:41

reporter   ~18266

Last edited: 2012-04-07 21:48

Carsten-

I just tried a dozen permutations of changing some or all of the htmlspecialchars_decode() within EM to html_entity_decode(). All made it worse.

I also remember spending nearly 20 hours last year trying to get the right balance of specialchars and entity encode and decode.

I think we should insist that the database be htmlspecialchars() encoded and not html_entities() encoded. Trying to mix and match the two types of encoding is a big mess.

So, the core fix would be to ensure that entity-encoded data doesn't get into the database.

DenisChenu

DenisChenu

2012-04-08 12:06

developer   ~18278

I try to have html_entities in db, the onky way was to import the lss file and don't change the question text (don't save).

To fix, the only way seem, in em_javascript.js:
<pre>
/* Display number with comma as radix separator, if needed
/
function LEMfixnum(value)
{
var newval = String(value);
if (parseFloat(newval) != value) {
-- return htmlspecialchars(value); // unchanged
++ return value; // really unchanged
}
if (LEMradix===',') {
newval = newval.split('.').join(',');
return newval;
}
return value;
}
</pre>

TMSWhite

TMSWhite

2012-04-09 15:10

reporter   ~18284

I think we need a group decision on this since it affects how we handle cross-site scripting protection.

This we want to automatically do:
(1) Ensure that if a person enters a script into a text entry box, they can't use EM to substitute that script elsewhere (so anything entered in a text entry box should be processed through htmlspecialchars()

Things I'm not sure about:
(1) If people enter markup (like a script) into an equation variable, should it be possible to insert that markup into something else via EM? Perhaps, since the XSS protections tend to protect against accidental creation of markup or scripts via the editor
(2) Similarly,should people be able to write {if(true,'<br />','')} and have EM insert markup via normal substitution?
(3) If someone enters an answer that contains markup, should the database store the markup as entered, or should it first be processed through htmlspecialchars()?

c_schmitz

c_schmitz

2012-04-09 15:43

administrator   ~18285

Last edited: 2012-04-09 15:45

1) [...]be processed through htmlspecialchars() on output, yes

1.) Think so, yes. It should obey the XSS global filter setting in the admin, though. So if someone tries to save an equation like and the according filter setting is activated it should be filtered accordinlgy.

2.) People? Who is that?

3.) Store as entered. Consider raw data always unsafe to be displayed, but safe to store (assuming that that SQL-injection safe storing methods are used).

TMSWhite

TMSWhite

2012-04-09 15:48

reporter   ~18286

For 1 & 2, people = survey authors.

c_schmitz

c_schmitz

2012-04-09 16:12

administrator   ~18289

Last edited: 2012-04-09 16:12

Then 2.) Yes, they should be able to do that.

TMSWhite

TMSWhite

2012-04-09 16:41

reporter   ~18292

OK. That isn't how EM currently handles this, so I (or someone else) will need to look through all of the existing calls to htmlspecialchars() within EM and make the needed adjustments.

In general, seems like:
(1) static strings withing expressions should be output without processing through htmlspecialchars()
(2) string content outside of expressions (e.g. text of questions and answers) should be output without processing through htmlspecialchars()
(3) variable values from free-text fields (e.g. a subset of .code, .value, and .shown, including "comment" and "other" fields) should be processed through htmlspecialchars().
(4) tool tips should use htmlentitites() without double encoding.

c_schmitz

c_schmitz

2012-04-09 17:08

administrator   ~18295

That proposal sounds fine to me.

TMSWhite

TMSWhite

2012-04-11 19:13

reporter   ~18321

attached sample survey for testing special characters. It is possible it won't import correctly. It uses the following text repeatedly:

This <question> has "special chars" including '&'; foreign chars like ßüöäÜÖħéèàçù£, and entities like < > é " ' £ é ò ô

However, that text gets converted at many points, especially during editing. So, consistent handling of entities may be more pervasive than just EM.

TMSWhite

TMSWhite

2012-04-11 19:15

reporter   ~18322

Last edited: 2012-04-11 19:15

Hmm, even Mantis does conversion - the section after "and entities like" spells out the actual entities:

Here they are with the ampersand replaced by a tilde so that they don't get converted:

and entities like ~lt; ~gt; ~eacute; ~quot; ~apos; ~pound; ~eacute; ~ograve; ~ocirc;

c_schmitz

c_schmitz

2012-04-16 09:37

administrator   ~18361

Can you point out where it is still going 'wrong' beside EM?

TMSWhite

TMSWhite

2012-04-16 17:21

reporter   ~18379

The short answer is that the problem appears to be isolated to EM, but looking at the code, I'm worried it may be more pervasive (which might explain why it took so long to get it "right" in EM in the first place).

When I do a regular expressions search of (/html(specialchars|_entity|entities)/ of 1.92 codebase (searching all .php and .js source, I get 542 matches across 78 files.

Among them, we have several versions of these JavaScript functions:
(1) em_javascript.js and admin_core.js and browse.js have incompatible versions of htmlspecialchars()
(2) em_javascript.js and translation.js have incompatible versions of html_entity_decode()

We should probably move the phpjs.org functions out of em_javascript.js and maintain them separately. At present, I didn't want to touch this since I don't know whether standardizing those functions would break anything.

Also, I'm not clear on some basics, such as, "how should strings be natively stored within":
(1) database
(2) $_SESSION[SGQA]
(3) $fieldmap
(4) other?

And what is the proper way map from one to the other?
(1) database<=>$_SESSION for values
(2) database<=>question "object" for question text, help, relevance, etc. (e.g. database=>$fieldmap)
(3) question object=>HTML nodes
(4) question object=>HTML attributes
(5) $_SESSION values=>HTML input elements
etc.

Furthermore, if we do encoding, should we be doing double encoding and double decoding, or not. For example, "&" => "&amp;"?

My recommendation would be to have someone create succinct coding guidance documentation to indicate the proper way to manage special characters and entities within LS. That way we can (perhaps slowly) look through those 542 matches and make sure they are all correct and consistent.

c_schmitz

c_schmitz

2012-04-17 09:45

administrator   ~18402

Last edited: 2012-04-17 09:45

As mentioned earlier:

Strings should always be stored raw, be it in database, fieldmap or session.

On display:
Strings entered by survey participants should always be htmlspecialchars-encoded (as they are considere unsafe)
Strings entered by survey admin in the administration should not be encoded and displayed as they are (as they are considered to be safe or XSS-filtered)

c_schmitz

c_schmitz

2012-04-20 09:24

administrator   ~18437

Fix committed to master branch: http://bugs.limesurvey.org/plugin.php?page=Source/view&amp;id=8200

c_schmitz

c_schmitz

2012-04-20 09:26

administrator   ~18438

Fix committed to Yii branch: http://bugs.limesurvey.org/plugin.php?page=Source/view&amp;id=8201

c_schmitz

c_schmitz

2012-05-01 11:56

administrator   ~18519

New 1.92+ build released

Related Changesets

LimeSurvey: master acf57bc2

2012-04-20 00:24:07

c_schmitz

Details Diff
Fixed issue 05988: Problem displaying words with French accent within Expression Manager Affected Issues
05988
mod - scripts/em_javascript.js Diff File

LimeSurvey: Yii 1cdf15df

2012-04-20 00:26:17

c_schmitz

Details Diff
Fixed issue 05988: Problem displaying words with French accent within Expression Manager Affected Issues
05988
mod - scripts/expressions/em_javascript.js Diff File

Issue History

Date Modified Username Field Change
2012-04-05 19:53 Lise94 New Issue
2012-04-05 19:53 Lise94 Status new => assigned
2012-04-05 19:53 Lise94 Assigned To => lemeur
2012-04-05 19:53 Lise94 File Added: limesurvey_survey_44141.lss
2012-04-05 19:55 Lise94 Issue Monitored: Lise94
2012-04-05 19:58 Lise94 Note Added: 18236
2012-04-05 21:11 c_schmitz Assigned To lemeur => c_schmitz
2012-04-07 12:55 DenisChenu Issue Monitored: DenisChenu
2012-04-07 16:32 TMSWhite Note Added: 18247
2012-04-07 17:08 c_schmitz Note Added: 18250
2012-04-07 17:09 TMSWhite Note Added: 18251
2012-04-07 17:48 DenisChenu Note Added: 18253
2012-04-07 18:05 DenisChenu Note Added: 18254
2012-04-07 18:38 c_schmitz Note Added: 18257
2012-04-07 18:39 TMSWhite Note Added: 18258
2012-04-07 18:41 c_schmitz Note Edited: 18257
2012-04-07 18:48 TMSWhite Note Added: 18260
2012-04-07 19:08 c_schmitz Note Added: 18261
2012-04-07 19:14 DenisChenu Note Added: 18262
2012-04-07 19:18 DenisChenu Note Edited: 18262
2012-04-07 19:30 TMSWhite Note Added: 18263
2012-04-07 21:41 TMSWhite Note Added: 18266
2012-04-07 21:48 TMSWhite Note Edited: 18266
2012-04-08 12:06 DenisChenu Note Added: 18278
2012-04-09 15:10 TMSWhite Note Added: 18284
2012-04-09 15:43 c_schmitz Note Added: 18285
2012-04-09 15:45 c_schmitz Note Edited: 18285
2012-04-09 15:48 TMSWhite Note Added: 18286
2012-04-09 16:12 c_schmitz Note Added: 18289
2012-04-09 16:12 c_schmitz Note Edited: 18289
2012-04-09 16:41 TMSWhite Note Added: 18292
2012-04-09 17:08 c_schmitz Note Added: 18295
2012-04-11 19:10 TMSWhite File Added: ls2_specialchars.lss
2012-04-11 19:13 TMSWhite Note Added: 18321
2012-04-11 19:15 TMSWhite Note Added: 18322
2012-04-11 19:15 TMSWhite Note Edited: 18322
2012-04-16 09:37 c_schmitz Note Added: 18361
2012-04-16 17:21 TMSWhite Note Added: 18379
2012-04-17 09:45 c_schmitz Note Added: 18402
2012-04-17 09:45 c_schmitz Note Edited: 18402
2012-04-20 09:23 c_schmitz Status assigned => resolved
2012-04-20 09:23 c_schmitz Fixed in Version => 1.92+
2012-04-20 09:23 c_schmitz Resolution open => fixed
2012-04-20 09:24 c_schmitz Changeset attached => LimeSurvey master acf57bc2
2012-04-20 09:24 c_schmitz Note Added: 18437
2012-04-20 09:26 c_schmitz Changeset attached => LimeSurvey Yii 1cdf15df
2012-04-20 09:26 c_schmitz Note Added: 18438
2012-05-01 11:56 c_schmitz Note Added: 18519
2012-05-01 11:56 c_schmitz Status resolved => closed
2021-08-03 06:11 guest Bug heat 8 => 12