We're going to look at two cases of horizontal combination: one-to-one ( merge 1:1) and many-to-one ( merge m:1). For instance, if you have one dataset containing variables for country and GDP per capita for the year 2004 but need to add observations for 2005 then this would be a candidate for the append command You want to use this when you are adding observations from one file to another file with the same variables. Vertical Combination which uses the append command. You'll first need to identify what kind of combination you want to accomplish: For other methods, see Stata's help resource. These are just a few recommended ways of creating dummy variables. * always tab your newly created variable to see if the expected number of 0s and 1s were assigned */ For more methods of generating descriptive statistics see this Stata ResourceĪ dummy variable is a variable that takes on the values 1 or 0 where 1 means some condition is true (such as age 18 Such as creating two-way, three-way, four-way.n-way crosstabs. There are many applications of the tabulate command. This creates a table that breaks down the mean, standard deviation and frequency of a continuous variable "by" some categorical variable. It is most useful for categorical/factor variables.Īnother useful application of the tabulate command involves adding the, summarize() option. The tabulate command is useful for creating frequency tables. It additionally let's you know how many missing and unique values there are for the specified variable. The codebook command provides the data type, percentiles, mean, min/max and standard deviation. ![]() ![]() Summarize can be used with specified variables:Īdding ,detail provides additionally, the percentiles (where you can identify the median), variance, skewness and kurtosis. It provides # of observations, mean, standard deviation, min and max for each numerical variable. ![]() The summarize or abbreviated sum command is a bit more useful. The describe command provides the number of observations and variables, storage type, display format and any variable labels assigned to the variables. This cross-sectional dataset contains variables for region, country, average % population growth, life expectancy at birth, GNP per capita and % of population with access to safe water. To demonstrate each of the descriptive commands, we'll use the sysuse command to open a pre-installed dataset: The Life Expectancy, 1998 dataset. The four most widely used commands used for descriptive statistics of your data:
0 Comments
Leave a Reply. |