Hello!
I am in the process of analysing the pilot data from a DCE, to give me information on priors for the experimental design.
The DCE contains 3 alternatives: program A, program B and no screening. I have a categorical variable 'conditions covered', which I have dummy coded: (1) lots of conditions and (0) less conditions (see below). So if lots of conditions is shown, I indicate this as a 1 in the 'cond' column. If less conditions is shown, I indicate this with a 0. In the no screen option (alt 3), I have put a 0 in all columns.
ID SET CSET Alt CHOICE Cond
4 1 3 1 1 1
4 1 3 2 0 0
4 1 3 3 0 0
4 2 3 1 1 1
4 2 3 2 0 0
4 2 3 3 0 0
4 3 3 1 1 0
4 3 3 2 0 1
Is this the correct approach? Or will there be confusion because the 0s indicate different things i.e. less conditions, and then nothing in the no screen option. Alternatively, should I be coding conditions covered using 3 levels: (2) lots of conditions, (1) less conditions and then (0) no conditions (only present in the no screening option)?
No screening option and dummy coding
Moderators: Andrew Collins, Michiel Bliemer, johnr
Re: No screening option and dummy coding
Or should I be coding the no screening attibute values as -999?
Re: No screening option and dummy coding
Hi
No, the original data format is correct. Treating it as -999 in Nlogit will get the same result (if you use it correctly, which I will explain below). If you think about the utility functions, you have
u(A) = ... + B1*Con + ...
u(B) = ... + B1*Con + ...
u(C) = 0
Your concern is, if Con = 0, this collapses to
u(A) = ... + B1*0 + ... = ... + 0 + ...
u(B) = ... + B1*0 + ... = ... + 0 + ...
u(C) = 0
such that the effect of the base level of the dummy code is the same as the non-chosen utility. This has been a key argument for years, where many have stated that the base level of dummy codes are perfectly confounded with the constants in the model. This led to arguments about effects coding versus dummy coding. We now know better, where the two really are the same, just different scaling, hence it doesn't really matter which coding structure you choose to use. That aside, B1 represents the effect of moving from condition = 0 to condition = 1 for utility functions A and B. The issue is that the base condition is confounded with something else, which is okay., the effect of B1 tells us relative to condition = 0, the impact of moving to condition 1 is B1. This is what you want.
So why is -999 in Nlogit the same thing? Because this is a missing number, which will drop the observation if present. Hence, if you had
u(A) = ... + B1*Con + ...
u(B) = ... + B1*Con + ...
u(C) = B1 * -999
It would drop the entire observation from your analysis. Given that alternative C is in every choice task in your data, then it would drop all the data and not estimate a model at all. Hence, you would drop the variable from your utility function and end up with
u(A) = ... + B1*Con + ...
u(B) = ... + B1*Con + ...
u(C) = 0
which funnily enough is what we started with. Hence, my opening statement, Nlogit will get the same result (if you use it correctly - that is, not include that variable in utility function C).
John
No, the original data format is correct. Treating it as -999 in Nlogit will get the same result (if you use it correctly, which I will explain below). If you think about the utility functions, you have
u(A) = ... + B1*Con + ...
u(B) = ... + B1*Con + ...
u(C) = 0
Your concern is, if Con = 0, this collapses to
u(A) = ... + B1*0 + ... = ... + 0 + ...
u(B) = ... + B1*0 + ... = ... + 0 + ...
u(C) = 0
such that the effect of the base level of the dummy code is the same as the non-chosen utility. This has been a key argument for years, where many have stated that the base level of dummy codes are perfectly confounded with the constants in the model. This led to arguments about effects coding versus dummy coding. We now know better, where the two really are the same, just different scaling, hence it doesn't really matter which coding structure you choose to use. That aside, B1 represents the effect of moving from condition = 0 to condition = 1 for utility functions A and B. The issue is that the base condition is confounded with something else, which is okay., the effect of B1 tells us relative to condition = 0, the impact of moving to condition 1 is B1. This is what you want.
So why is -999 in Nlogit the same thing? Because this is a missing number, which will drop the observation if present. Hence, if you had
u(A) = ... + B1*Con + ...
u(B) = ... + B1*Con + ...
u(C) = B1 * -999
It would drop the entire observation from your analysis. Given that alternative C is in every choice task in your data, then it would drop all the data and not estimate a model at all. Hence, you would drop the variable from your utility function and end up with
u(A) = ... + B1*Con + ...
u(B) = ... + B1*Con + ...
u(C) = 0
which funnily enough is what we started with. Hence, my opening statement, Nlogit will get the same result (if you use it correctly - that is, not include that variable in utility function C).
John
-
- Posts: 2039
- Joined: Tue Mar 31, 2009 4:13 pm
Re: No screening option and dummy coding
For those of you estimating choice models using Nlogit, Apollo, or Biogeme, you may want to consider posting questions on user groups dedicated to these software tools:
Nlogit: https://www.limdep.com/listserver/
Apollo: http://www.apollochoicemodelling.com/forum/
Biogeme: https://groups.google.com/g/biogeme?pli=1
Note that different softwares require different data formats.
Nlogit: https://www.limdep.com/listserver/
Apollo: http://www.apollochoicemodelling.com/forum/
Biogeme: https://groups.google.com/g/biogeme?pli=1
Note that different softwares require different data formats.