choice-metrics.com

by **hmtcsa26** » Sun May 18, 2025 12:06 pm

Dear Ngene team,

I am new to DCE and plan to conduct a pilot study using a D-efficient design. Specifically, it will be a labelled experiment with three labelled alternatives and an opt-out option. The three alternatives, optA, optB, and optC, represent different new methods. I read manual and forum posts but still have following questions:

1. As shown in the syntax below, I have dummy-coded all attributes (both qualitative and quantitative) to avoid undesired attribute-level comparisons, resulting in a relatively high D-error of 0.9. However, when I do not dummy-code the quantitative attributes, the total number of parameters is reduced. So I set rows = 24 and blocks = 3, achieving a D-error of 0.29. Could you suggest which coding method is more appropriate?

2. In this labelled experiment, all parameters are alternative-specific. COST, TIME, and ATT1 are generic attributes applied to all alternatives, but they have different parameters for optA, optB, and optC, as these represent different new methods. Is this setup correct? I am unsure whether the parameters should be generic or alternative-specific.

3. I expect the opt-out alternative to represent respondents’ preference not to choose any of the new methods (i.e., optA, optB, and optC) and maintain the current status. As mentioned in previous posts, “If they select the opt-out in the unforced choice, you can follow up with a forced choice where only optA, optB, and optC are available.” Can I simply include an opt-out alternative, and if respondents select it, not require them to make a follow-up choice among optA, optB, and optC?

4. I am not sure which interaction effects should be added to the utility function during the design phase. Is it feasible to exclude interaction effects from the utility function during the design phase and include them later during the model estimation phase? Additionally, can interaction terms in the Ngene utility function include demographic variables? For example, can income interact with attribute A?

5. Is it feasible to conduct willingness-to-pay (WTP) calculations based on this syntax design? Do you have any other suggestions for calculating WTP?

6. Is it feasible to include a scenario variable during the design phase but exclude it during the model estimation phase? Would this impact the accuracy of the model? If a respondent is unfamiliar with the scenario context, would it affect the model results? For example, if a respondent is a student and the scenario variable represents a working environment, would this lack of familiarity impact the results? Alternatively, would it be better to make the scenario variable a multiple-choice question, asking respondents about their current working environment?

7. Can this efficient design be used to estimate a hybrid choice model with latent variables? If so, do the latent variables need to be included in the Ngene utility function?

I’m sorry for asking so many questions, and I truly appreciate your help.

Thank you so much!

Olivia

by **Michiel Bliemer** » Wed May 21, 2025 10:25 pm

1. The D-error is not comparable between different models; changing the coding changes the model and therefore changes the D-error. When using dummy coding, the D-error will automatically go up, but this does not mean that it is a worse design. In your case, both options are fine. If there are undesirable attribute level combinations when you don't use dummy coding, them simply dummy code the attributes. I would probably dummy code all attributes when using uninformative priors, so what you have done seems fine.

2. Yes this is correct. You can specify the same attribute across alternatives but use an alternative-specific coefficient.

3. Yes, in that case you would only collect the unforced choice, which is fine. Just be aware that if a lot of people select the opt-out option that you will not get much information for estimating your coefficients, so this is why often people include a forced choice as a backup.

4. Not always. You may not be able to estimate all relevant interaction effects if you did not account for them during the design phase. To increase the likelihood of being able to estimate interaction effects without explicitly adding them in the utility function at the design phase, it is best to increase the number of rows. This increases variety in your data, which allows you to estimate more interaction effects. You will include interactions with socio-demographics in the model estimation stage, it is not common to include them in the experimental design unless you want to create different designs for different population segments (e.g. low, medium, high income specific designs).

5. Yes, as long as you have a cost coefficient in your model you will be able to estimate WTP. You can estimate a different coefficient per labelled alternative, as you have done, but people often choose a generic coefficient for cost because economists argue that a dollar is a dollar. Having the same cost coefficient across labelled alternatives makes it easier to compare WTP across alternatives. But you could estimate it alternative-specific if you prefer. With your script you will be able to do both in model estimation.

6. For any variable, including attributes and scenario variables, of course you can omit the variable in the estimation phase if the coefficient does not turn out to be statistically significant. If it is statistically significant, but you nevertheless omit it, then the error term becomes confounded with the scenario variable and therefore you introduce more unexplained noise. But of course you can estimate the model with or without the scenario variable, that would work fine.

7. Ngene cannot optimise for a hybrid choice model as this is computationally infeasible, but you should have no problem adding latent variables later during the model estimation phase with a design that was optimised for the multinomial logit model.

8. I do not see any issues with it, it looks pretty good. Note that the - and + priors do not have any effect in a labelled choice experiment as they are only used for dominance checks in an unlabelled experiment. So in your case you could omit them or replace them with a 0, that would have the same effect.

Michiel

by **hmtcsa26** » Thu May 22, 2025 10:02 pm

Dear Michiel,

Thank you for your detailed and helpful explanations! I’m sorry to bother you again, but I still have a few follow-up questions:

1. In the pilot study, all attributes were dummy coded. After obtaining the actual prior values from this pilot study, do I still need to use dummy coding for all (quantitative and qualitative) attributes in the main study?

2. In the model estimation stage of the pilot study, is it necessary to include (1) interactions with socio-demographic variables and (2) latent variables in the utility functions in order to obtain accurate prior values for Ngene main study design?

3. With the above script, for the generic attribute COST specified with alternative-specific coefficient, you mentioned that I can estimate it either using a generic coefficient or alternative-specific coefficients, and that I can try both in the model estimation. So should I estimate the model twice—once with generic coefficients and once with alternative-specific coefficients for COST—and compare their performance? If the model with generic coefficients provides a better fit, then I can specify COST with a generic coefficient across labelled alternatives and use it as a prior for Ngene main study design? In this case, should I use goodness-of-fit indices (e.g., p-values, McFadden’s R-squared, and AIC) to compare the two models? Do I also need to apply the same approach to the other generic attributes (TIME and ATT1), or just to COST?

Thank you in advance for your time and guidance.

Olivia

by **Michiel Bliemer** » Thu May 22, 2025 11:28 pm

1. No, in many cases with informative priors the issue of undesirable attribute level combinations disappears and you simply use linear relationships for the numerical attributes when generating the design for the main study. But not always.

2.I would not include interactions with socio-demographic variables at the pilot phase because your sample size is relatively small and each respondent only gives you 1 data point for such a covariate. Hybrid choice models with latent variables are very data hungry, you will generally not be able to estimate such models in the pilot phase. In almost all cases, you stick to the simple multinomial logit model and only consider the attributes in the generation of your design.

3. You would usually conduct statistical testing to assess whether any two coefficients are statistically different. You would test the hypothesis H0: b1 = b2, which is equivalent to H0: b1-b2 = 0. To test this hypothesis, you need to standard error of b1-b2, which you can easily obtain via the Delta method and depends on the standard errors of b1 and b2 as well as the cov(b1,b2) that comes out of model estimation. Estimation software like Apollo can compute the Delta method automatically for you, or otherwise you need to apply a simple formula in a spreadsheet. You can do this test for cost, but indeed also with other attributes such as time. But please be mindful that hypothesis testing after the pilot phase may lead to rejections of the null hypothesis quite easily because your sample size is small and your standard errors are still large. It may be safest to keep them all alternative-specific, especially the ones for time if they for example relate to different modes of transport since travel time for walking, train, and car are experienced very differently (physically walking, driving, or reading a newspaper in the train). Then you can test whether they are generic or not when you estimate models after your main data collection.

Michiel

by **hmtcsa26** » Fri May 23, 2025 12:05 am

Dear Michiel,

Thank you very much for your prompt and valuable feedback.

Olivia

by **hmtcsa26** » Thu Jun 05, 2025 11:00 pm

Dear Michiel,

Thank you again for your earlier replies. I still have a few questions below:

1. I included an opt-out alternative in this labelled experiment. If respondents select “None,” they are presented with a follow-up forced choice among “optA,” “optB,” and “optC.” For model estimation, should I still define a utility function for the “None” alternative? I plan to replace the “None” responses with the corresponding forced-choice answers during the model estimation phase—so only “optA,” “optB,” and “optC” will be included in the model estimation. In this case, one of these three alternatives (“optA,” “optB,” and “optC”) must be set as the reference alternative, with its alternative-specific constant (ASC) fixed to zero. Given this setup, how can I estimate ASCs for all three alternatives (“optA,” “optB,” and “optC”) for the Ngene main study design? Could you advise on how to address this issue?

2. During model estimation for the pilot study, I only include the ASCs, scenario variables, and attributes as specified in the Ngene pilot design, and do not need to include the error term in the utility function. Is this correct?

3. When estimating the model using pilot study data, if (1) the prior parameters show unexpected signs (e.g., theoretically positive but estimated as negative), and (2) some attributes appear statistically insignificant, how should I address these issues to design the main study in Ngene?

4. In the Ngene pilot study, all prior parameters are set to zero by default. During the model estimation phase, the initial default values of all parameters (including ASCs, attributes, and scenario variables) should also be set to zero accordingly. Is this correct?

Thank you so much in advance.

Olivia

by **Michiel Bliemer** » Fri Jun 06, 2025 6:03 pm

1. If you captured a forced and unforced choice, you can estimate separate models for forced and unforced, but it is more common to estimate a joint model. Below are the utility functions you would define for each of these:

Only forced, where you need to normalise one of the constants to zero (you cannot estimate all 3 constants, which is not a problem since only relative differences between utilities matter)
U(opt1) = b1 + ...
U(opt2) = b2 + ...
U(opt3) = ...

Only unforced, where you can estimate all three constants:
U(opt1) = b1 + ...
U(opt2) = b2 + ...
U(opt3) = b3 + ...
U(optout) = 0

Joint model with forced and unforced, whereby you normalise the constant of one of the three options, and make the optout alternative available or unavailable when estimating the choice model (which can be easily done in estimation software) depending on whether the observation is forced or unforced choice:
U(opt1) = b1 + ...
U(opt2) = b2 + ...
U(opt3) = ...
U(optout) = boptout + ...

2. In Ngene and in model estimation software (Biogeme, Apollo, Nlogit, etc) you do not add the error term to the utility function as this is done automatically within the software. So yes, you only specify the constants, scenario variables, and attributes. Ngene automatically adds the Gumbel (EV1) distributed error terms.

3. If parameters show an unexpected sign you would check your design for any possible correlations that may explain that outcome. For example, if a high level for comfort mostly appears with a high level for price and respondents mainly look at the comfort attribute then it may seem that they prefer a higher price over a lower price and hence the price coefficient could become positive. In that case you may need to change the design. Parameters that are not statistically significant often occur after a pilot study and that is not an issue. If they have the expected sign then the estimate is still the best guess you have and you can account for the unreliability of the estimate by using Bayesian priors.

4. Yes. But there is no relationship between the priors used in Ngene and the initial values of the parameters during model estimation. It is correct that the start values of the parameters are almost always set to 0 when estimating your first (mnl) model. The starting values for an mnl model do not matter since the loglikelihood function is concave and you will always find the same parameter estimates. The design could have been generated with non-zero priors or with zero priors, and this does not influence the estimation process.

Michiel

by **hmtcsa26** » Fri Jun 06, 2025 10:07 pm

Dear Michiel,

Thanks for your detailed explanations! I’m sorry, but I still have a few follow-up questions:

1.1 In the Ngene pilot study syntax above, I set the constant for the opt-out alternative to zero and omitted its utility function. In this case, can I still estimate a joint model, since a constant for the opt-out alternative (boptout) is required in your utility specification of a joint model? Or should I revise the pilot study syntax accordingly? Additionally, when estimating the joint model, should scenario variables and attributes also be included in the utility function for the opt-out alternative?

1.2 For the joint model, I should input the following estimated constants into Ngene for the main study design: b1 for opt1, b2 for opt2, 0 for opt3 (as the reference), and boptout for the opt-out alternative. Is this correct?

1.3 Additionally, does the choice of estimating only the forced choice model, only the unforced model, or the joint model affect the validity of model estimation? Can I estimate only the forced choice model for the pilot study, and then estimate a joint model using the main study data?

2.1 Do I need to check whether each attribute and scenario variable independently shows the expected or unexpected sign? I am uncertain about the expected direction of some attributes. For example, for the travel purpose attribute, which has three levels (business, entertainment, and others), I am not sure whether the effect should be positive or negative. In such cases, how should I address this issue?

2.2 If one or two parameters show unexpected signs, how can I change the design? For example, if a higher cost is highly correlated with a higher level of comfort, I may need to introduce a combination such as high cost and low comfort. How should I incorporate such levels manually, or does Ngene offer any settings to help address this issue?

Thank you very much for your kind support.

Best regards,
Olivia

by **Michiel Bliemer** » Sat Jun 07, 2025 3:23 am

1.1 It does not matter where you put the constants in the utility functions the behavioural model remains the same and it does not affect the generated design. So you can keep it as you had, and you can move the constant later if you like.

1.2 Correct. Because that means you have 2 constants when you dont have the optout, and 3 constants when you do have the optout. You can also use opt1 or opt2 as the reference with constant 0, but in a joint model with a forced choice the constant of the optout cannot be normalised to zero because then you would have 3 constants for 3 alternatives.

2.1 If you estimate only the forced choice model then you are missing the value of a prior for one constant to optimise the design also for the unforced choice. So in that case you can only optimise for estimating the forced choice model. Of course you will still be able to estimate the unforced choice model and the joint model but you will lose some efficiency. Please refer to Script 7.5 in the Ngene manual that shows you how to optimise the design for estimating both the forced and unforced choice models.

2.2 The design is often driven by the priors. If you use zero priors, then this is unlikely to happen. If you use non-zero priors, then you will need to check whether the priors are properly chosen, or whether one of the attributes dominates the choice. You could also choose to dummy code all attributes during the design phase, which usually creates more variation in the trade-offs between attributes. There are many things you could do, but there is not much point discussing it right now because it really depends on the study, whether it is labelled and unlabelled, and in your case you have not even encountered this issue yet. If you encounter the issue and you do not know what to do, feel free to post a new topic.

Michiel

by **hmtcsa26** » Sat Jun 07, 2025 10:55 pm

Dear Michiel,

Thank you very much for your valuable insights.

Best regards,
Olivia

choice-metrics.com

Pilot study using an efficient design

Pilot study using an efficient design

Re: Pilot study using an efficient design

Re: Pilot study using an efficient design

Re: Pilot study using an efficient design

Re: Pilot study using an efficient design

Re: Pilot study using an efficient design

Re: Pilot study using an efficient design

Re: Pilot study using an efficient design

Re: Pilot study using an efficient design

Re: Pilot study using an efficient design

Who is online