Skip to Main Content

Exploring and Cleaning Data with OpenRefine

Sorting and Filtering Data

Conducting a Sort

  • Pick which column you want to sort by and select 'Sort' in the drop down menu by the column header name.  You can do a secondary sort simply by sorting on another column.
  • Once a sort is conducted, a new Sort option menu appears in the top left that allows you to reorder the rows permanently.  If this is not done the sort is temporary and will NOT be saved. 

Filtering

  • Use filters to limit the viewable data to only the rows whose cells contain a precise text string in a particular column (works like the 'Find' function in Excel).
  • You can also invert the filter to show only the rows whose cells do NOT match a text string.
  • To filter: select 'Text filter' in the drop down menu of the column you want to search. The search can be case sensitive or use regular expression language for more advanced searching. For more on regular expressions see the Library Carpentry Lesson: Regular Expressions

Faceting Data

Basics

  • Facets allow you to look for patterns and trends. 
  • Typically, you create a facet on a particular column. 
  • Allows you to filter down to a specific subset for further exploration or manipulation. 
  • Types of facets include: 
    • text
    • numeric
    • timeline
    • scatterplot
    • custom

Using Facets

  • Select 'Facet' in the drop down menu of the column you want to create a facet on and make a selection (Note: if you select a numeric facet when the data is text, you may be given a message that "no numeric value is present"). 
  • When you create a facet, a box will appear to the left of the data table.  From this box you can explore the data, batch edit, include, or exclude data. Note: including or excluding data in a facet does NOT delete the data from the dataset, it just alters what is visible at a time.
    • You can delete all the rows you are viewing (number shown at top of screen) by selecting the drop down menu of the 'All' column, click on 'Edit rows' then 'Remove all matching rows'.
    • You also can star or flag individual rows for manipulation/deletion by individually clicking on the icons in a row or batch edit by selecting 'Edit rows' and either 'Star rows' or 'Flag rows' using the drop down menu in the 'All' column to the far left (this will star or flag all the rows being viewed at this time).
  • If you want to edit one of the groups that appears in the facet, like a misspelled word, simply click the edit button that appears when you hover.  Any edits made will be automatically applied to all rows with that value.
  • Custom facets are possible as well.  Select 'Facet' then one of the custom options.  You will need to use the General Refine Expression Language (GREL).  
  • For more in-depth information see the Exploring Facets page in the OpenRefine User Manual.