LimeSurvey issue tracker
Registration

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
05289Development Otherpublic2011-06-19 22:172012-03-29 22:46
ReporterTMSWhite 
Assigned To 
PrioritynormalSeverityminor 
StatusnewResolutionopen 
Product Version1.91 
Target VersionFixed in Version 
Summary05289: Support more than 1000 questions+answers (database columns) in a survey
DescriptionAccording to the documentation, LimeSurvey can hold at most 1000 questions & answers per survey. A more generic data model could let LimeSurvey support 1000s of questions and answers. I have used such a generic model to support surveys with 3500+ questions and 10,000+ answers.

A brief description of the data model, plus Entity Relationship diagrams can be found at issue 05106.

Given the Google 2011 Summer of Code project to redesign the database API, it is timely to consider alternate (or even parallel) database architectures.

FYI, the system I used took exactly that strategy - everything was written to both generic Entity-Attribute-Value, and specific (like LimeSurvey's current approach) database tables. There were no significant performance limitations.

TagsNo tags attached.
Attached Filespng file icon LS-EAV-DataModel.png [^] (490,551 bytes) 2012-01-16 16:46

- Relationships
related to 05106confirmed User patches Possible way to address add/remove/edit of questions in deployed surveys 

-  Notes
User avatar (16802)
TMSWhite (reporter)
2012-01-16 16:45

Now that EM integration is completed in 1.92 and Yii, this would be a relatively easy addition. EM already controls collecting all of the values that need to be updated, and keeps track of which questions were asked (so it can validate them upon page turn or submission), so it could be extended to support a vertical, Entity Attribute Value-type database design. Here are the steps in the task.

(A) Admin menu (General, Notification & data management):
- add $DatabaseStyle option, which choices {horizontal, vertical-EAV, both}

(B) On activation, always create survey_SID table with the core columns
-id
-submitdate
-lastpage
-startlanguage
-token
-datestamp
-startdate

(C)if $DatabaseStyle is horizontal or both, add SGQA columns

(D) if $DatabaseStyle is vertical-EAV or both, create tables similar to that described in LS-EAV-DataModel.png attachment. LS doesn't need as many columns, but in general, here is the idea. We can change the names, but this data model is known to work:
(1) data_elements stores one row per SGQA code per survey instance. It keeps track of the current value for all data. At survey-start-time, X rows are added to data_elements, one per SGQA code, all NULL
(2) item_usages - this stores data about questions asked during the survey. At survey start, it has no content. Each time a page is generated, LS would insert rows into item_usages - one row per SGQA on the generated page. Then, if the subject answers the questions, the values are updated when responses are submitted prior to navigating to the next page. This table includes important meta-data including:
(a) starting answer values and codes,
(b) finishing answer values and codes (so can detect that someone has changed an answer),
(c) questionAsAsked - this is the fully tailored question in the current language - this helps you debug cases where you didn't ask what you expected,
(d) time, and
(e) visit_num - so you know how many times this question was asked
(3) page_usages - this stores usage timing data - such as
(a) how long server took to generate process requests and generate next page
(b) how long page took to render
(c) network latency
(d) how much time the user spent on the page

(E) Modify EM data saving functions to save into either or both of the $DatabaseStyle. With each page flip,
(1) Update survey_SID table so know $lastpage
(2) Update changed values in data_elements table
(3) Insert values into item_usages table
(4) Insert row onto page_usage table

(F) Modify database export to transpose the data_elements table to horizontal format for export of SPSS, R, etc. data

(G) Optionally add performance monitoring reports that show how survey performance changes with load or time.
User avatar (18123)
tacman1123 (reporter)
2012-03-29 20:45

"At survey-start-time, X rows are added to data_elements, one per SGQA code, all NULL". I don't think that's necessary. If a row is missing, it means it hasn't been answered yet.

I'd like to propose that we add a group_sequence to the data_elements. That would allow for repeatable groups or subquestions. In the flat/horizontal database structure, you'd have to have a column for up the the maximum number of items, but you wouldn't have that restriction in the vertical structure.

So your data might look like this (skipping survey_SID and timing data)

child|1|name|Peter
child|1|gender|M
child|2|name|Paul
child|2|gender|M
child|2|name|Mary
child|3|gender|F

The group_code (or prefix) would be "child", it would contain two questions (name and gender), be and tagged as repeatable. In the UI, you'd see the question in a group, but instead of "Next>", you'd see "Add Another>", which would repeat that group (and update the internal sequence).

The point here is to NOT generate fields like child1_age, child2_age, which then requires parsing out field names to map back to anything. Certainly you'd have the option of exporting data as such, but internally that's a very difficult way to keep the repeatable data.
User avatar (18124)
TMSWhite (reporter)
2012-03-29 22:05

Tac-

The reason for "At survey-start-time, X rows are added to data_elements, one per SGQA code, all NULL" is so that we can easily get the current value for all data elements. That way, there is a single SQL INSERT statement when the survey starts, and everything else is an UPDATE. However, as long as we do a good job of maintaining state, LS should know which variables have not been set, so could do INSERTs for those and UPDATEs for the rest.

That would be essential to support your proposal for repeating groups.

Since EM needs to know the names of all declared variables, we might need a different naming syntax for such variables. As you know, we already use a dot notation for question attributes. Perhaps we could use a tilde syntax for repeating groups - like <group>~<repeat#>~<varname>.<suffix>, like child~1~name.shown - but this will probably require more thought.
User avatar (18125)
tacman1123 (reporter)
2012-03-29 22:28

I think the idea of a group code is key in that setup. There will be other uses for that code as well, in EM.

I think the EM should also handle _relative_ rules, so that within a repeatable group the rule doesn't need to spell out the number, e.g.

the condition for child~~had_hpv_vaccine is child~~gender == 'F'
the condition for child~~hpv_vaccine_date is child~~had_hpv_vaccdine == 'Y'

These condition would be relative to the group they're already in (and therefore the child~~ isn't necessary, but something would be needed to say "look up a question from within this group with my same sequence number".

In MySQL, you can do a single insert/update by doing an INSERT ... ON DUPLICATE KEY UPDATE (http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html [^]), not sure how that would translate to the other databases.
User avatar (18126)
TMSWhite (reporter)
2012-03-29 22:36

Other things people might want from such repeating groups include:
(1) comparison of current group field to prior group - e.g. if you're supposed to enter ages of children in descending order
(2) functions across all entries for a variable in a group - e.g. if your group has you list products you purchased and their cost, you might want an easy way to sum the costs, get the max or min value of them, etc.

This sounds like it might be an extension of the "this" functionality proposed elsewhere.

Another option is to let EM support array processing, but that is a much bigger lift.
User avatar (18127)
tacman1123 (reporter)
2012-03-29 22:46

Brainstrorming: I wonder if we'd be able to have a "grid group", which would have an interface similar to labels/subquestions (code, assessment value, label text), but for any group with questions that could fit in a single line.

So your repeatable group would be displayed as a grid, with + - up/down icons, e.g.

Name | Age | Status
[ ] | [ ] | ()In School ()Not in School () Home Schooled. + - ^ v

The radio could also be a dropdown, the age would be a number, etc.

Right now, the grids are either all numeric or all text, no validation, rarely what we need. The alternative is to set up a group and repeat it, but that leads to a lot of questions.

The horizontal layout of a group would even allow us do to conditional questions, disabling some columns if they're not relevant.

- Issue History
Date Modified Username Field Change
2011-06-19 22:17 TMSWhite New Issue
2011-06-19 22:18 TMSWhite Relationship added related to 05106
2012-01-16 16:45 TMSWhite Note Added: 16802
2012-01-16 16:46 TMSWhite File Added: LS-EAV-DataModel.png
2012-01-16 17:22 TMSWhite Project User patches => Development
2012-03-29 20:45 tacman1123 Note Added: 18123
2012-03-29 22:05 TMSWhite Note Added: 18124
2012-03-29 22:28 tacman1123 Note Added: 18125
2012-03-29 22:36 TMSWhite Note Added: 18126
2012-03-29 22:46 tacman1123 Note Added: 18127


Copyright © 2000 - 2014 MantisBT Team
Powered by Mantis Bugtracker