Skip to Main Content

Exploring and Cleaning Data with OpenRefine

Editing Cells

Transformations

  • Many common transformations are built into the interface for easy use.  These are under the column menu.  Select 'Edit cells' then 'Common transforms'.  Included are: 
    • to titlecase
    • to lowercase
    • to date
    • to text
    • to null 
  • Custom transformations are possible as well.  Select 'Edit cells' then 'Transform'.  You will need to use the General Refine Expression Language (GREL).  

Splitting Data

  • You can split columns into more than one column.  For example one column named 'Full name' could be split into the columns  'First name' and 'Last name'.  This can be done if the data in the cell has a separator (such as a comma, dash, or space) or not.  
    • Under the column menu select 'Edit column' then 'Split into several columns'.  Select 'by separator' or 'by field lengths'.  Make sure to select also how to handle the data after splitting.  
    • You can use custom splitting by selecting 'Edit column' then 'Add column based on this column'. You will need to use the General Refine Expression Language (GREL).  Note: you are limited to creating only one new column at a time using this method.  
  • You can split columns into more than one row.  Splitting cells with more than one value in them is a common way to get your data from single rows into multi-row records. For example survey data may have all the responses from a "select all that apply" question in one cell.
    • Under the column menu select 'Edit cells' then 'Split multi-valued cells' and then enter a separator.
    • Take note of how new rows are formed.

Merging/Concatenating 

  • You can join columns by selecting 'Edit column' then 'Join columns'.
  • All the columns currently in your dataset will appear in the pop-up window. You can select or un-select all the columns you want to join, and drag columns to put them in the order you want to join them in. 
  • You can also add a column based on a column if you need to do some transformations.  

Clustering Data

  • Creating a facet on a column is a great way to look for inconsistencies in your data; clustering is a great way to fix those inconsistencies and one of OpenRefine's biggest strengths. 
  • Clustering always requires the user to approve each suggested edit - it will display values it thinks are variations on the same thing, and you can select which version to keep and apply across all the matching cells (or type in your own version).
  • You can start the process in two ways: using the dropdown menu above a column, select 'Edit cells' then 'Cluster and edit' or create a text facet and then press the 'Cluster' button that appears in the facet box.
  • Clustering methods and algorithms are ordered from strict to lax and should be used in order; first selecting: Method: key collision; Keying Function: fingerprint.