View Issue Details

This bug affects 1 person(s).
 12
IDProjectCategoryView StatusLast Update
09611Bug reportsImport/Exportpublic2015-05-08 09:16
Reporternwinter Assigned Tomfaber  
PrioritynormalSeverityminor 
Status closedResolutionfixed 
Product Version2.05+ 
Fixed in Version2.05+ 
Summary09611: Stata XML Export fails when output file > ~15MB
Description

When exporting to Stata XML, the export fails when the exported file is greater than about 15MB. I've experimented with several surveys; by limiting the range of observations exported the export succeeds. The maximum number of observations that can be exported successfully appears to vary from survey to survey, and does not appear to correspond to a specific output file size. For the tests I've done, the failures begin when the exported file gets bigger than about 15 MB.

Steps To Reproduce

Create a survey, populate with thousands of responses, export to Stata XML.

TagsNo tags attached.
Attached Files
STATAxmlWriter.php (30,509 bytes)
Bug heat12
Complete LimeSurvey version number (& build)150413
I will donate to the project if issue is resolvedNo
Browserseveral
Database type & version178
Server OS (if known)x86_64-redhad-linux-gnu
Webserver software & version (if known)Apache/2.2.17
PHP Version5.3.8

Users monitoring this issue

mfaber

Activities

mfaber

mfaber

2015-04-21 14:26

reporter   ~32031

nwinter: i unfortunately do not have data to test this with but your problems could be due to php or sql memory settings. Are there any error messages in LS's debug mode?
Can you check whether the same problem is present on other machines (maybe on the ls test server or on a machine running a different setup, eg. windowss server and ms sql etc.)?
Can you also check if the SPSS export runs ok with large datasets?

Alternatively, can you provide me with an (anonymized) dataset to run some tests myself?

nwinter

nwinter

2015-04-21 21:01

reporter   ~32034

Last edited: 2015-04-21 21:02

Ah, progress! When I turn on debugging, the console says:

Resource interpreted as Document but transferred with MIME type application/download: "http://SITEURL/index.php/admin/export/sa/exportresults/surveyid/631256".

and a small file is downloaded. The downloaded file has the following contents:

Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 4 bytes) in /web.pri/SITEURL/application/core/plugins/ExportSTATAxml/STATAxmlWriter.php on line 439

In other tests the line number varies, of course; in my second test with a slightly different dataset it was 394, for example.

So it seems (I think) like PHP is running out of memory. Is there a way the plugin can/should be changed to limit its memory use? As it stands, the ultimate XML file it is creating to export would be 16 or 17MB, so if it is using >200MB I wonder if there is some inefficiency somewhere?

aesteban

aesteban

2015-04-22 00:36

developer   ~32035

A memory leak at stata export generation?

mfaber

mfaber

2015-04-22 15:55

reporter   ~32039

There are a lot of conversions going on and the data is held multiple times in different arrays so the plugin might be quite memory intensive with large datasets. Of course i cannot say that there is not room for improvement of the plugin. ;)
But I'd just adjust the php settings for now and see how it goes. If it's really a memory leak (memory is not freed after the plugin finished its job successfully) i am happy to look into it.

nwinter

nwinter

2015-04-22 20:09

reporter   ~32040

I don't think its a memory leak. However, the function updateCustomresponsemap() uses a lot of memory along the way. One quick change I found that reduced that overhead considerably is to iterate through the responses with a for loop and a counter variable, rather than with foreach. I replaced lines 368-369 from this:

foreach ($this->customResponsemap as $iRespId => $aResponses)
{

to this:

$keys = array_keys($this->customResponsemap);
for ($counter=0; $counter<count($keys); $counter++)
{
$iRespId = $keys[$counter];
$aResponses = $this->customResponsemap[$iRespId];

It seemed to work and to reduce considerably the peak memory usage of the export routine. However, I don't trust my knowledge of PHP enough to be sure this isn't having some side effect (though in a couple of examples the output datasets were the same). I also don't have any experience with Github, so even if I trusted my change I don't know how to post it there. But maybe if this seems wise it could be incorporated?

mfaber

mfaber

2015-04-22 21:29

reporter   ~32041

That's interesting, thanks for investigating. Using your favorite search engine, you can actually find a lot of reports of memory problems using foreach loops over large arrays.
Eg: http://thomas.gouverneur.name/2011/05/20110509php5-memory-leak-with-object-arrays/
Would have expected such problems were solved in php versions greater than 2.0ß. Oh well... ;)

Your solution seems fine and i am happy to make the changes for you. Maybe I should replace other foreach loops in the process. Can you tell me how you checked for memory usage while the plugin ran?

nwinter

nwinter

2015-04-22 22:07

reporter   ~32042

Thanks--this is great!

The short answer is, very laboriously. I made a copy of the plugin, commented out the "header" and actual output code, and added lots of lines like

echo "memory at point 1: ".memory_get_usage();

or

if ($counter%200==0) { echo "memory after ".$counter." records: ".memory_get_usage();

and so on...

mfaber

mfaber

2015-04-22 23:15

reporter   ~32043

Thanks!
I checked most of the foreach loops in my plugin as to memory use and most are quite ok. So you found the memory hog already. :)

Could you check if adding a "&" before the "$aResponses" works ok for you?

So replacing
foreach ($this->customResponsemap as $iRespId => $aResponses)
{

by
foreach ($this->customResponsemap as $iRespId => &$aResponses)
{

nwinter

nwinter

2015-04-22 23:55

reporter   ~32044

I was wondering about that approach. On my system it maxes out just below the counter approach--the difference is probably the memory taken up by the $keys array.

Would it make sense to do some of the other loops as references too?

Thank you!

(One off-topic question: I made some custom code in my production version of the Stata writer to recode optional-other responses from '-oth-' to a numeric value, which allows otherwise-all-numeric responses to be labelled. That might be of broader interest--should I post that as a feature request, or is there some other way to communicate that sort of thing? Or should I figure out GitHub?)

aesteban

aesteban

2015-04-23 00:33

developer   ~32045

-- About your off-topic question --

Perhaps Carsten, Denis or Sam are appropiate people to answer this question and obviously creating a pull request in github is the best way but some times we merge patches suggested in these bug reports so feel you free to create a new feature issue and paste your patch in diff format. Don't forget to describe how to test the feature or improvement.

DenisChenu

DenisChenu

2015-04-23 12:16

developer   ~32047

Hi,

If you asking me : i say : move whole export to external core plugin ;)
I think it's best if we have a ligther core system.

And actually : i don't have any idea about some plugin. Stata is an example, i don't have any advice on Stata ;).

mfaber

mfaber

2015-04-23 14:51

reporter   ~32048

Denis: no problem, it already IS a core plugin ;)

DenisChenu

DenisChenu

2015-04-23 14:53

developer   ~32049

Yes, i know .... and i say we can move it to : "External plugin not distributed by default with LS core" but downloadable in a clean "download plugin" system.

For ls3 or more ;)

mfaber

mfaber

2015-04-24 18:44

reporter   ~32055

nwinter: could you check the attached version of the plugin? Memory use should be considerably less now. Thanks!

nwinter

nwinter

2015-04-24 19:23

reporter   ~32056

mfaber: that one isn't workig for me at all. Even with a small dataset, I get a "This webpage is not available
ERR_INVALID_RESPONSE" error message.

mfaber

mfaber

2015-04-26 13:59

reporter   ~32061

That's strange. I downloaded the attached file again and put it into another test installation. Tested on different sets of responses and it runs without problems here.

nwinter

nwinter

2015-04-26 22:57

reporter   ~32064

I explored some more. The problem seems to be with the max() functions in two lines:

$aStatatypelist[$this->headersSGQA[$iVarid]]['type'] = max($iDatatype, $aStatatypelist[$this->headersSGQA[$iVarid]]['type']);

and

$aStatatypelist[$this->headersSGQA[$iVarid]]['format'] = max($iStringlength, $aStatatypelist[$this->headersSGQA[$iVarid]]['format']);

The function seems to choke when one of the arguments is not yet set; i.e., on the first time through. I wrapped each of those lines with a check on whether the array value is set, and then things work perfectly. Surely this isn't the most elegant approach, but it worked and did demonstrate that on my system at least that is the problem:

            if (isset($aStatatypelist[$this->headersSGQA[$iVarid]]['type']))
            {
                $aStatatypelist[$this->headersSGQA[$iVarid]]['type'] = max($iDatatype, $aStatatypelist[$this->headersSGQA[$iVarid]]['type']);
            }
            else {
                $aStatatypelist[$this->headersSGQA[$iVarid]]['type'] = $iDatatype;
            }
            // if datatype is a string, set needed stringlength
            if ($iDatatype==7)
            {
               if (isset($aStatatypelist[$this->headersSGQA[$iVarid]]['format']))
               {
                   $aStatatypelist[$this->headersSGQA[$iVarid]]['format'] = max($iStringlength, $aStatatypelist[$this->headersSGQA[$iVarid]]['format']);
               }
               else 
               {
                   $aStatatypelist[$this->headersSGQA[$iVarid]]['format'] = $iStringlength;
               }

            }
mfaber

mfaber

2015-05-02 22:20

reporter   ~32084

Fix committed to master branch: http://bugs.limesurvey.org/plugin.php?page=Source/view&id=15125

mfaber

mfaber

2015-05-02 22:44

reporter   ~32085

Fix committed to 2.06 branch: http://bugs.limesurvey.org/plugin.php?page=Source/view&id=15126

c_schmitz

c_schmitz

2015-05-08 09:16

administrator   ~32117

2.05+ Build 150508 released

Related Changesets

LimeSurvey: master 1f1f3cc8

2015-05-02 22:19

mfaber

Committer: mfaber


Details Diff
Fixed issue 09611: High memory use of STATA export plugin

Dev: refactored plugin to get rid of some memory intensive arrays.
Affected Issues
09611
mod - application/core/plugins/ExportSTATAxml/STATAxmlWriter.php Diff File

LimeSurvey: 2.06 4b0d6170

2015-05-02 22:19

mfaber

Committer: mfaber


Details Diff
Fixed issue 09611: High memory use of STATA export plugin

Dev: refactored plugin to get rid of some memory intensive arrays.
Affected Issues
09611
mod - application/core/plugins/ExportSTATAxml/STATAxmlWriter.php Diff File

Issue History

Date Modified Username Field Change
2015-04-20 23:30 nwinter New Issue
2015-04-21 14:26 mfaber Note Added: 32031
2015-04-21 14:26 mfaber Issue Monitored: mfaber
2015-04-21 21:01 nwinter Note Added: 32034
2015-04-21 21:02 nwinter Note Edited: 32034
2015-04-22 00:36 aesteban Note Added: 32035
2015-04-22 15:55 mfaber Note Added: 32039
2015-04-22 20:09 nwinter Note Added: 32040
2015-04-22 21:29 mfaber Note Added: 32041
2015-04-22 21:30 mfaber Assigned To => mfaber
2015-04-22 21:30 mfaber Status new => assigned
2015-04-22 22:07 nwinter Note Added: 32042
2015-04-22 23:15 mfaber Note Added: 32043
2015-04-22 23:55 nwinter Note Added: 32044
2015-04-23 00:33 aesteban Note Added: 32045
2015-04-23 12:16 DenisChenu Note Added: 32047
2015-04-23 14:51 mfaber Note Added: 32048
2015-04-23 14:53 DenisChenu Note Added: 32049
2015-04-24 18:43 mfaber File Added: STATAxmlWriter.php
2015-04-24 18:44 mfaber Note Added: 32055
2015-04-24 19:23 nwinter Note Added: 32056
2015-04-26 13:59 mfaber Note Added: 32061
2015-04-26 22:57 nwinter Note Added: 32064
2015-05-02 22:20 mfaber Changeset attached => LimeSurvey master 1f1f3cc8
2015-05-02 22:20 mfaber Note Added: 32084
2015-05-02 22:20 mfaber Resolution open => fixed
2015-05-02 22:44 mfaber Changeset attached => LimeSurvey 2.06 4b0d6170
2015-05-02 22:44 mfaber Note Added: 32085
2015-05-02 22:48 mfaber Status assigned => resolved
2015-05-02 22:48 mfaber Fixed in Version => 2.05+
2015-05-08 09:16 c_schmitz Note Added: 32117
2015-05-08 09:16 c_schmitz Status resolved => closed
2021-08-02 20:50 guest Bug heat 10 => 12