These commands act on all the observations for a given variable (unless you limit it with an if condition) but one at a time: the new value for a given observation depends on the expression as evaluated for the same observation. The command to change the value of an existing variable is replace, and has the same syntax: The expression will typically include some combination of numbers, variables, and mathematical functions. Where name is the name of the variable to be created and expression is some mathematical expression you write. The Stata command to create a variable is generate, usually abbreviated gen. Because sum() needs to know what variable you want it to act on, the name of the variable goes in parentheses after the option name. The sum() option tells Stata to also calculate basic summary statistics for the variable x for the observations in each cell of the table. produce a frequency table for) the variable y. produce summary statistics for) the variable x, but only for those observations where y is 1. The square brackets indicate that most of these elements are optional. These syntax elements always go in the same order:Ĭommand If an option needs additional information, like a number or the name of a variable, that information goes in parentheses after the option. If a command is followed by a comma, then anything that comes after the comma is interpreted as one or more options that change how the command runs. If a command is followed by the word if and a logical condition, the command will only act on those observations where the condition is true. the names of one or more variables) the command will only act on those variables. If a command is followed by a variable list or varlist (i.e. Stata normally has exactly one data set in memory, and commands act on that data set.
Some commands have subcommands, like label variable and label value. Most commands can be abbreviated, the exception being those that can destroy data (thus use, gen, replace, sum, reg). A Stata command is usually a verb, like use, generate, replace, summarize, or regress. The key to using Stata effectively is understanding its fundamental syntax, which applies to the vast majority of Stata commands.
STATA MP 10 HOW TO
If not, we'll talk about how to do all this in Reading in Data and you can get the example files then.
If you are comfortable doing so, create a folder for the example files, make that Stata's working directory, and run the command above.
This will put the example files in your current working directory. If this fails on your computer, try net get dws, from(). The example files for this class can be obtained within Stata by running: Data wrangling is not something you read and understand-it's a skill you must practice.ĭata Wrangling in Stata includes the following sections: Do the exercises (some of them are straightforward applications of what you just learned others will require more creativity). This will help you retain more, and ensure you get all the details right-Stata is always happy to tell you when you're wrong.
STATA MP 10 CODE
Open Stata, and type in and run the example code yourself. To get the most out of Data Wrangling in Stata you need to be an active participant. We'll start by very briefly reviewing some basic Stata concepts that should be familiar to you, but if they're not, Introduction to Stata will do a much better job of teaching them to you. If you're new to Stata, we recommend working through our Introduction to Stata before proceeding. You'll learn a lot about Stata from this workshop, but the primary focus is on the tasks you'll need to carry out. Most data sets need to be transformed in some way before they can be analyzed, a process that's come to be known as "data wrangling." Data Wrangling in Stata will introduce you to the key concepts, tools, and skills of data wrangling, implementing them in Stata.