Pilot study using an efficient design

This forum is for posts that specifically focus on the online (web-based) version of Ngene.

Moderators: Andrew Collins, Michiel Bliemer, johnr

Pilot study using an efficient design

Postby hmtcsa26 » Sun May 18, 2025 12:06 pm

Dear Ngene team,

I am new to DCE and plan to conduct a pilot study using a D-efficient design. Specifically, it will be a labelled experiment with three labelled alternatives and an opt-out option. The three alternatives, optA, optB, and optC, represent different new methods. I read manual and forum posts but still have following questions:

1. As shown in the syntax below, I have dummy-coded all attributes (both qualitative and quantitative) to avoid undesired attribute-level comparisons, resulting in a relatively high D-error of 0.9. However, when I do not dummy-code the quantitative attributes, the total number of parameters is reduced. So I set rows = 24 and blocks = 3, achieving a D-error of 0.29. Could you suggest which coding method is more appropriate?

2. In this labelled experiment, all parameters are alternative-specific. COST, TIME, and ATT1 are generic attributes applied to all alternatives, but they have different parameters for optA, optB, and optC, as these represent different new methods. Is this setup correct? I am unsure whether the parameters should be generic or alternative-specific.

3. I expect the opt-out alternative to represent respondents’ preference not to choose any of the new methods (i.e., optA, optB, and optC) and maintain the current status. As mentioned in previous posts, “If they select the opt-out in the unforced choice, you can follow up with a forced choice where only optA, optB, and optC are available.” Can I simply include an opt-out alternative, and if respondents select it, not require them to make a follow-up choice among optA, optB, and optC?

4. I am not sure which interaction effects should be added to the utility function during the design phase. Is it feasible to exclude interaction effects from the utility function during the design phase and include them later during the model estimation phase? Additionally, can interaction terms in the Ngene utility function include demographic variables? For example, can income interact with attribute A?

5. Is it feasible to conduct willingness-to-pay (WTP) calculations based on this syntax design? Do you have any other suggestions for calculating WTP?

6. Is it feasible to include a scenario variable during the design phase but exclude it during the model estimation phase? Would this impact the accuracy of the model? If a respondent is unfamiliar with the scenario context, would it affect the model results? For example, if a respondent is a student and the scenario variable represents a working environment, would this lack of familiarity impact the results? Alternatively, would it be better to make the scenario variable a multiple-choice question, asking respondents about their current working environment?

7. Can this efficient design be used to estimate a hybrid choice model with latent variables? If so, do the latent variables need to be included in the Ngene utility function?

8. Is the syntax for pilot study correct?


The syntax for the pilot study is attached:

design ? labelled experiment
;alts = optA, optB, optC, None
;rows = 36
;block = 4
;eff = (mnl,d)
;con
;cond:
? COST in optA must be higher than or equal to COST in optB and optC
if(optA.COST = 7, optB.COST = [1,3,5,7] and optC.COST = [1,3,5,7]),
if(optA.COST = 5, optB.COST = [1,3,5] and optC.COST = [1,3,5]),
if(optA.COST = 3, optB.COST = [1,3] and optC.COST = [1,3]),
if(optA.COST = 1, optB.COST = [1] and optC.COST = [1])
;model:
U(optA) = con_a
+ a1.dummy[-] * COST[1,3,5,7]
+ a2.dummy[-] * TIME[0,1,2,3]
+ a3.dummy[+] * ATT1[0.3,0.5,0.7,0.9]
+ a4.dummy * ATT2[1,2,0]
+ a5.dummy * SCENARIO[1,2,0] ? scenario variable
/
U(optB)= con_b
+ b1.dummy[-] * COST
+ b2.dummy[-] * TIME
+ b3.dummy[+] * ATT1
+ b4.dummy[-] * ATT3[0,250,500]
+ b5.dummy * ATT4[1,2,0]
+ b6.dummy * SCENARIO[SCENARIO]
/
U(optC) = con_c
+ c1.dummy[-] * COST
+ c2.dummy[-] * TIME
+ c3.dummy[+] * ATT1
+ c4.dummy[-] * ATT3
+ c5.dummy * ATT5[1,2,3,0]
+ c6.dummy * SCENARIO[SCENARIO]
? quantitative or numerical attributes: COST, TIME, ATT1, ATT3
? qualitative attributes: ATT2, ATT4, ATT5, SCENARIO
$


I’m sorry for asking so many questions, and I truly appreciate your help.

Thank you so much!

Olivia
hmtcsa26
 
Posts: 3
Joined: Tue May 13, 2025 11:24 am

Re: Pilot study using an efficient design

Postby Michiel Bliemer » Wed May 21, 2025 10:25 pm

1. The D-error is not comparable between different models; changing the coding changes the model and therefore changes the D-error. When using dummy coding, the D-error will automatically go up, but this does not mean that it is a worse design. In your case, both options are fine. If there are undesirable attribute level combinations when you don't use dummy coding, them simply dummy code the attributes. I would probably dummy code all attributes when using uninformative priors, so what you have done seems fine.

2. Yes this is correct. You can specify the same attribute across alternatives but use an alternative-specific coefficient.

3. Yes, in that case you would only collect the unforced choice, which is fine. Just be aware that if a lot of people select the opt-out option that you will not get much information for estimating your coefficients, so this is why often people include a forced choice as a backup.

4. Not always. You may not be able to estimate all relevant interaction effects if you did not account for them during the design phase. To increase the likelihood of being able to estimate interaction effects without explicitly adding them in the utility function at the design phase, it is best to increase the number of rows. This increases variety in your data, which allows you to estimate more interaction effects. You will include interactions with socio-demographics in the model estimation stage, it is not common to include them in the experimental design unless you want to create different designs for different population segments (e.g. low, medium, high income specific designs).

5. Yes, as long as you have a cost coefficient in your model you will be able to estimate WTP. You can estimate a different coefficient per labelled alternative, as you have done, but people often choose a generic coefficient for cost because economists argue that a dollar is a dollar. Having the same cost coefficient across labelled alternatives makes it easier to compare WTP across alternatives. But you could estimate it alternative-specific if you prefer. With your script you will be able to do both in model estimation.

6. For any variable, including attributes and scenario variables, of course you can omit the variable in the estimation phase if the coefficient does not turn out to be statistically significant. If it is statistically significant, but you nevertheless omit it, then the error term becomes confounded with the scenario variable and therefore you introduce more unexplained noise. But of course you can estimate the model with or without the scenario variable, that would work fine.

7. Ngene cannot optimise for a hybrid choice model as this is computationally infeasible, but you should have no problem adding latent variables later during the model estimation phase with a design that was optimised for the multinomial logit model.

8. I do not see any issues with it, it looks pretty good. Note that the - and + priors do not have any effect in a labelled choice experiment as they are only used for dominance checks in an unlabelled experiment. So in your case you could omit them or replace them with a 0, that would have the same effect.

Michiel
Michiel Bliemer
 
Posts: 1996
Joined: Tue Mar 31, 2009 4:13 pm

Re: Pilot study using an efficient design

Postby hmtcsa26 » Thu May 22, 2025 10:02 pm

Dear Michiel,

Thank you for your detailed and helpful explanations! I’m sorry to bother you again, but I still have a few follow-up questions:

1. In the pilot study, all attributes were dummy coded. After obtaining the actual prior values from this pilot study, do I still need to use dummy coding for all (quantitative and qualitative) attributes in the main study?

2. In the model estimation stage of the pilot study, is it necessary to include (1) interactions with socio-demographic variables and (2) latent variables in the utility functions in order to obtain accurate prior values for Ngene main study design?

3. With the above script, for the generic attribute COST specified with alternative-specific coefficient, you mentioned that I can estimate it either using a generic coefficient or alternative-specific coefficients, and that I can try both in the model estimation. So should I estimate the model twice—once with generic coefficients and once with alternative-specific coefficients for COST—and compare their performance? If the model with generic coefficients provides a better fit, then I can specify COST with a generic coefficient across labelled alternatives and use it as a prior for Ngene main study design? In this case, should I use goodness-of-fit indices (e.g., p-values, McFadden’s R-squared, and AIC) to compare the two models? Do I also need to apply the same approach to the other generic attributes (TIME and ATT1), or just to COST?

Thank you in advance for your time and guidance.

Olivia
hmtcsa26
 
Posts: 3
Joined: Tue May 13, 2025 11:24 am

Re: Pilot study using an efficient design

Postby Michiel Bliemer » Thu May 22, 2025 11:28 pm

1. No, in many cases with informative priors the issue of undesirable attribute level combinations disappears and you simply use linear relationships for the numerical attributes when generating the design for the main study. But not always.

2.I would not include interactions with socio-demographic variables at the pilot phase because your sample size is relatively small and each respondent only gives you 1 data point for such a covariate. Hybrid choice models with latent variables are very data hungry, you will generally not be able to estimate such models in the pilot phase. In almost all cases, you stick to the simple multinomial logit model and only consider the attributes in the generation of your design.

3. You would usually conduct statistical testing to assess whether any two coefficients are statistically different. You would test the hypothesis H0: b1 = b2, which is equivalent to H0: b1-b2 = 0. To test this hypothesis, you need to standard error of b1-b2, which you can easily obtain via the Delta method and depends on the standard errors of b1 and b2 as well as the cov(b1,b2) that comes out of model estimation. Estimation software like Apollo can compute the Delta method automatically for you, or otherwise you need to apply a simple formula in a spreadsheet. You can do this test for cost, but indeed also with other attributes such as time. But please be mindful that hypothesis testing after the pilot phase may lead to rejections of the null hypothesis quite easily because your sample size is small and your standard errors are still large. It may be safest to keep them all alternative-specific, especially the ones for time if they for example relate to different modes of transport since travel time for walking, train, and car are experienced very differently (physically walking, driving, or reading a newspaper in the train). Then you can test whether they are generic or not when you estimate models after your main data collection.

Michiel
Michiel Bliemer
 
Posts: 1996
Joined: Tue Mar 31, 2009 4:13 pm

Re: Pilot study using an efficient design

Postby hmtcsa26 » Fri May 23, 2025 12:05 am

Dear Michiel,

Thank you very much for your prompt and valuable feedback.

Olivia
hmtcsa26
 
Posts: 3
Joined: Tue May 13, 2025 11:24 am


Return to Support for Ngene Online

Who is online

Users browsing this forum: No registered users and 1 guest

cron