View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|05289||Feature requests||_ Unknown||public||2011-06-19 22:17||2021-03-08 20:08|
|Summary||05289: Redesign response storage to support more questions+answers (database columns) in a survey|
According to the documentation, LimeSurvey can hold at most 1000 questions & answers per survey. A more generic data model could let LimeSurvey support 1000s of questions and answers. I have used such a generic model to support surveys with 3500+ questions and 10,000+ answers.
A brief description of the data model, plus Entity Relationship diagrams can be found at issue 05106.
Given the Google 2011 Summer of Code project to redesign the database API, it is timely to consider alternate (or even parallel) database architectures.
FYI, the system I used took exactly that strategy - everything was written to both generic Entity-Attribute-Value, and specific (like LimeSurvey's current approach) database tables. There were no significant performance limitations.
|Tags||No tags attached.|
|has duplicate||09620||closed||c_schmitz||proposed fix to the 1600 column issue|
|has duplicate||07745||closed||c_schmitz||Increase database limit by joining tables|
|related to||07411||acknowledged||Warn survey authors about survey size|
|related to||15021||closed||c_schmitz||Use VIEW to cirumvent the database column limit|
|related to||07593||closed||c_schmitz||Increase the allowed length of the answer code|
Now that EM integration is completed in 1.92 and Yii, this would be a relatively easy addition. EM already controls collecting all of the values that need to be updated, and keeps track of which questions were asked (so it can validate them upon page turn or submission), so it could be extended to support a vertical, Entity Attribute Value-type database design. Here are the steps in the task.
(A) Admin menu (General, Notification & data management):
(B) On activation, always create survey_SID table with the core columns
(C)if $DatabaseStyle is horizontal or both, add SGQA columns
(D) if $DatabaseStyle is vertical-EAV or both, create tables similar to that described in LS-EAV-DataModel.png attachment. LS doesn't need as many columns, but in general, here is the idea. We can change the names, but this data model is known to work:
(E) Modify EM data saving functions to save into either or both of the $DatabaseStyle. With each page flip,
(F) Modify database export to transpose the data_elements table to horizontal format for export of SPSS, R, etc. data
(G) Optionally add performance monitoring reports that show how survey performance changes with load or time.
LS-EAV-DataModel.png (490,551 bytes)
"At survey-start-time, X rows are added to data_elements, one per SGQA code, all NULL". I don't think that's necessary. If a row is missing, it means it hasn't been answered yet.
I'd like to propose that we add a group_sequence to the data_elements. That would allow for repeatable groups or subquestions. In the flat/horizontal database structure, you'd have to have a column for up the the maximum number of items, but you wouldn't have that restriction in the vertical structure.
So your data might look like this (skipping survey_SID and timing data)
The group_code (or prefix) would be "child", it would contain two questions (name and gender), be and tagged as repeatable. In the UI, you'd see the question in a group, but instead of "Next>", you'd see "Add Another>", which would repeat that group (and update the internal sequence).
The point here is to NOT generate fields like child1_age, child2_age, which then requires parsing out field names to map back to anything. Certainly you'd have the option of exporting data as such, but internally that's a very difficult way to keep the repeatable data.
The reason for "At survey-start-time, X rows are added to data_elements, one per SGQA code, all NULL" is so that we can easily get the current value for all data elements. That way, there is a single SQL INSERT statement when the survey starts, and everything else is an UPDATE. However, as long as we do a good job of maintaining state, LS should know which variables have not been set, so could do INSERTs for those and UPDATEs for the rest.
That would be essential to support your proposal for repeating groups.
Since EM needs to know the names of all declared variables, we might need a different naming syntax for such variables. As you know, we already use a dot notation for question attributes. Perhaps we could use a tilde syntax for repeating groups - like <group>~<repeat#>~<varname>.<suffix>, like child~1~name.shown - but this will probably require more thought.
I think the idea of a group code is key in that setup. There will be other uses for that code as well, in EM.
I think the EM should also handle relative rules, so that within a repeatable group the rule doesn't need to spell out the number, e.g.
the condition for child
These condition would be relative to the group they're already in (and therefore the child~~ isn't necessary, but something would be needed to say "look up a question from within this group with my same sequence number".
In MySQL, you can do a single insert/update by doing an INSERT ... ON DUPLICATE KEY UPDATE (http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html), not sure how that would translate to the other databases.
Other things people might want from such repeating groups include:
This sounds like it might be an extension of the "this" functionality proposed elsewhere.
Another option is to let EM support array processing, but that is a much bigger lift.
Brainstrorming: I wonder if we'd be able to have a "grid group", which would have an interface similar to labels/subquestions (code, assessment value, label text), but for any group with questions that could fit in a single line.
So your repeatable group would be displayed as a grid, with + - up/down icons, e.g.
Name | Age | Status
The radio could also be a dropdown, the age would be a number, etc.
Right now, the grids are either all numeric or all text, no validation, rarely what we need. The alternative is to set up a group and repeat it, but that leads to a lot of questions.
The horizontal layout of a group would even allow us do to conditional questions, disabling some columns if they're not relevant.
|2011-06-19 22:17||TMSWhite||New Issue|
|2012-01-16 16:45||TMSWhite||Note Added: 16802|
|2012-01-16 16:46||TMSWhite||File Added: LS-EAV-DataModel.png|
|2012-01-16 17:22||TMSWhite||Project||@2@ => Development|
|2012-03-29 20:45||tacman1123||Note Added: 18123|
|2012-03-29 22:05||TMSWhite||Note Added: 18124|
|2012-03-29 22:28||tacman1123||Note Added: 18125|
|2012-03-29 22:36||TMSWhite||Note Added: 18126|
|2012-03-29 22:46||tacman1123||Note Added: 18127|
|2021-03-07 20:39||c_schmitz||Project||Development => Feature requests|
|2021-03-07 20:39||c_schmitz||Category||Other => Other issues|
|2021-03-07 20:40||c_schmitz||Severity||@50@ => feature|
|2021-03-07 20:40||c_schmitz||Category||Other issues => _ Unknown|
|2021-03-07 20:40||c_schmitz||Product Version||1.91 =>|
|2021-03-07 20:40||c_schmitz||Summary||Support more than 1000 questions+answers (database columns) in a survey => Redesign response storage to support more questions+answers (database columns) in a survey|
|2021-03-07 20:43||c_schmitz||Relationship added||has duplicate 09620|
|2021-03-07 20:43||c_schmitz||Assigned To||=> c_schmitz|
|2021-03-07 20:43||c_schmitz||Status||new => acknowledged|
|2021-03-07 20:43||c_schmitz||Relationship added||related to 07411|
|2021-03-07 21:08||c_schmitz||Relationship added||related to 15021|
|2021-03-07 21:10||c_schmitz||Relationship added||related to 07593|
|2021-03-07 21:11||c_schmitz||Relationship added||has duplicate 07745|