Efficient design in main study

This forum is for posts that specifically focus on the online (web-based) version of Ngene.

Moderators: Andrew Collins, Michiel Bliemer, johnr

Efficient design in main study

Postby hmtcsa26 » Fri Jun 20, 2025 11:11 pm

Dear Michiel,

Here is my labelled experiment: three labelled alternatives and an opt-out option. The three alternatives, optA, optB, and optC, represent different new methods. A forced choice question will be displayed if respondents select the opt-out option. I dummy coded all attributes and set all priors to zero in the pilot study. I obtain estimated priors from the pilot study but I have some questions when designing the Ngene main study design:

1. When calculating relative importance of each attributes, I exclude the alternative specific constant (ASC) value in the calculation. Is this correct?

2. For dummy coded attributes, TIME [0,1,2,3] has three dummy coded coefficients ([-0.417|-0.129|-0.234]) in optA, the value of zero in this base level is also included in the relative importance calculation. That is, the utility contribution calculation considers the value of 0, -0.417, -0.129, -0.234, and thus the TIME makes a maximum difference of 0.417 in utility. Is this correct?

3. I calculate the relative importance of each attribute separately for optA, optB, and optC. Is this approach correct, or should the attributes across all three alternatives (optA, optB, and optC) be combined to calculate a unified measure of relative importance?

4. I set 10 Bayesian priors and the remaining parameters use local priors. The syntax is ;eff = 2*unforced(mnl,d,mean) + forced(mnl,d,mean). Is it correct to use “mean” here? Besides, only 24 forced-choice observations are included out of a total of 474 observations from 50 respondents. Given this, is the weight of “2” for the unforced choice reasonable?

5. In the Ngene syntax below, the unforced choice model does not include scenario variables in the utility function for the “none” alternative. In the forced choice model, scenario variables are also not included in the utility function for the “optC” alternative. However, during the model estimation phase, I include scenario variables in the utility functions of optA, optB, and optC. Is this setting correct?

6. Since optA, optB, and optC represent new methods, I am uncertain whether the attribute parameters should have positive or negative signs. For attributes with unexpected signs, how can I determine whether these reflect true preferences or are due to model design error?

7. S estimate is 113735.262468 here. But my budget can only afford 550 respondents. How can I deal with this sample size issue?

8. The Bayesian mean D-error is 0.87 for the unforced model and 0.81 for the forced model. Are these values acceptable? I have currently set 10 Bayesian priors. Should I increase the number of Bayesian priors to better account for parameter uncertainty?

9. For Ngene unforced and forced choice model, prior parameters of all attributes are the same in both models. Is this correct?

Thank you in advance for your time and guidance.

Best regards,
Olivia
Last edited by hmtcsa26 on Tue Jun 24, 2025 7:31 pm, edited 1 time in total.
hmtcsa26
 
Posts: 10
Joined: Tue May 13, 2025 11:24 am

Re: Efficient design in main study

Postby Michiel Bliemer » Sat Jun 21, 2025 6:15 pm

Oooh that is a lot of questions.

1. You can include constants; the constants refer to the labels of the alternatives, which you can consider as a separate attribute (ASCs are mathematically the same as dummy coded coefficients for a label attribute).

2. Yes.

3. If you have an alternative-specific coefficients I would compute it per alternative. The attribute importance measure was originally designed for unlabelled alternatives, but you can apply it to labelled alternatives separately.

4. Yes. You can also use 'median' instead of 'mean' if you believe that your distributions are too wide and extreme draws mess up the mean. You can look at individual Bayesian draws on the design output screen and see if there are any outliers for the D-error. If so, 'median' would be a safer choice.

5. You would do the same as you did with the constants. You put a constant in optA, optB, and none. So if the none disappears, you end up with the correct number of constants. You can do the same for the scenario variable, so include it in optA, optB, and none, and have identical utility functions across both the forced and unforced choice. This is also needed for model estimation later if you want to pool the data and estimate a single model. So simply move the scenario variable from optC to none.

6. Well you don't know. It is unlikely to be due to design error. Note that for labelled experiments you need to randomise the order of the alternatives across respondents to be able to account for left-to-right bias (respondents are often more likely to select the left alternative over the right alternative). You would give one respondent order optA, optB, optC, none, while another respondent would see none, optB, optC, optA, for example. You need to do such randomisation within the survey instrument or you can simply create multiple versions of the choice experiment, each with different orderings. In model estimation you add a dummy variable that indicates the position in which each alternative was shown, which allows you to disentangle the left-to-right position effect from the alternative-specific constants. If you choose not to randomise then that is fine, as long as you are aware that this effect may exist and is included in your constants.

7. You cannot guarantee that all coefficients will be statistically significant, especially since you are estimating so many. The large sample size estimates is for coefficients that are very close to negligible, such as -0.0646 in b3, which is almost zero. Sample size estimates rely on good priors, so if your priors are not very reliable then your sample size estimates are also not very reliable. Just look at individual sample size estimates (Sp-estimates in Ngene), not the overall estimate, so you can see the sample sizes for each parameter.

8. D-errors cannot be interpreted easily as they are study and model specific. They look reasonable to me. For 10 Bayesian priors I would use at least 1000 Sobol draws. If you increase the number of Bayesian priors you will also need to increase the number of draws exponentially. That is why it is best to limit the number of Bayesian priors to 10 or so because calculations may otherwise become unstable.

9. That is fine, but you can also estimate two separate models and use different priors for the forced and unforced model based on pilot study data. You may not have enough data to reliably estimate both models separately, so in that case a joint model with the same parameters may be preferred.

10. It looks good to me, well done :)

Michiel
Michiel Bliemer
 
Posts: 2019
Joined: Tue Mar 31, 2009 4:13 pm

Re: Efficient design in main study

Postby hmtcsa26 » Sun Jun 22, 2025 12:35 pm

Dear Michiel,

Many thanks for your thorough and insightful response. It has been very helpful. I’m sorry to bother you again, but I still have two follow-up questions:

1. For a Ngene joint model with forced and unforced choices, is the total number of Bayesian priors equal to the sum of the priors used in the forced choice model and the unforced choice model? Or is it only equal to the number of Bayesian priors in the unforced choice model?

2. In Ngene pilot study syntax, I set up an unforced choice model with four alternatives: optA, optB, optC, and none. Scenario variables were included in optA, optB, and optC in the pilot study. Based on your suggestion, I should move the scenario variables from optC to the none alternative in Ngene main study design of joint model. In this case, with my Ngene pilot study design, can I simply rename the scenario variables currently assigned to optC as belonging to the none alternative, while keeping their values unchanged, and then include the scenario variables in the utility function of the none alternative instead of optC during the model estimation phase?

Thank you very much for your kind support.

Best regards,
Olivia
hmtcsa26
 
Posts: 10
Joined: Tue May 13, 2025 11:24 am

Re: Efficient design in main study

Postby Michiel Bliemer » Sun Jun 22, 2025 11:59 pm

1. Each model has its own Bayesian priors and each own Bayesian efficiency, so you do not add the number of Bayesian priors together. The number of draws is the same across both models. Your unforced model has the maximum number of Bayesian priors, so that is the one that counts.

2. Yes. It does not matter where you put the scenario variable in the pilot study estimations, you can move it from optC to none. You can re-estimate the model by putting it in none. It does not change the efficiency of your pilot design.

Michiel
Michiel Bliemer
 
Posts: 2019
Joined: Tue Mar 31, 2009 4:13 pm

Re: Efficient design in main study

Postby Michiel Bliemer » Wed Jun 25, 2025 4:16 pm

Btw, in my earlier response regarding including constants when determining attribute importance, you would only consider the ASCs when all other attributes are generic across the alternatives, i.e. the alternatives only vary due to "brand" or label type. If the attributes are alternative-specific, or if the coefficients are alternative-specific, then you can indeed ignore the constants when determining attribute importance WITHIN each alternative.

Michiel
Michiel Bliemer
 
Posts: 2019
Joined: Tue Mar 31, 2009 4:13 pm

Re: Efficient design in main study

Postby hmtcsa26 » Thu Jun 26, 2025 5:42 pm

Dear Michiel,

Thank you very much for your guidance! I’m so sorry, but I still have two follow-up questions:

1. I plan to use the estimated parameters from the pilot study as the starting values for model estimation in the main study. Is this approach correct?

2. In a labelled experiment, I set all parameters to be alternative-specific in the main study design. How should I determine whether the parameters should be specified as generic or alternative-specific during main study model estimation? Can software such as Biogeme or Excel be used to make this decision?

Thank you again for your support. I truly appreciate your help.

Best regards,
Olivia
hmtcsa26
 
Posts: 10
Joined: Tue May 13, 2025 11:24 am

Re: Efficient design in main study

Postby Michiel Bliemer » Thu Jun 26, 2025 8:10 pm

1. For the MNL model you generally use zeros as starting values, since they do not matter. Any starting values will result in the same parameter estimates. For mixed logit and latent class models, your starting values are typically the estimates of the MNL model. There is never a need to use parameters from the pilot study, you will only use these parameters to set priors for generating an efficient design.

2. You do this with hypothesis testing. The null hypothesis to compare parameters a and b would be H0: a = b, which is the same as H0: a-b = 0. To test this hypothesis, you can perform a t-test, where t = (a-b)/se(a-b). The standard error of a-b, denoted here as se(a-b), can be determined with the Delta method. The formula is easy, namely se(a-b) = sqrt[var(a) + var(b) - 2*cov(a,b)]. Biogeme produces parameter estimates, so you obtain a and b, and the variance-covariance matrix, so you obtain var(a), var(b), and cov(a,b). Estimation software such as Apollo has a specific function to use the Delta method and test, I am not sure if Biogeme has the same. But of course this test is easy to implement in Excel. If you reject the null hypothesis, then you should consider them as alternative-specific. If not, you may consider treating them as generic.

For model estimation questions it would be best to consult the Biogeme forum.

Michiel
Michiel Bliemer
 
Posts: 2019
Joined: Tue Mar 31, 2009 4:13 pm

Re: Efficient design in main study

Postby hmtcsa26 » Fri Jun 27, 2025 11:52 am

Dear Michiel,

Thank you very much for your insightful guidance. :D

Best regards,
Olivia
hmtcsa26
 
Posts: 10
Joined: Tue May 13, 2025 11:24 am


Return to Support for Ngene Online

Who is online

Users browsing this forum: No registered users and 1 guest

cron