Group

Group Admins

  • Profile picture of Stephen Yerka

DINAA

Public Group active 5 years, 4 months ago

DINAA (the Digital Index of North American Archaeology) is a web hosted resource that delivers machine-readable, archaeological site data through direct cooperation and partnership with SHPO site file managers, federal institutions, and THPO representatives. Open Source translational protocols, extensive indexing, and URI minting allow integration of datasets from many different data sources in a single interface. Currently DINAA shares primary data from ~500,000 archaeological sites in North America, and is expected to increase to over one million in 2016-2017.

These data can be accessed with feeds, linked through URIs, or downloaded from Open Context (http://opencontext.org/)

Open Refine Trinomial Parsing

Tagged: 

This topic contains 0 replies, has 1 voice, and was last updated by  Eric Kansa 5 years, 9 months ago.

Viewing 1 post (of 1 total)
  • Author
    Posts
  • #663

    Eric Kansa
    Participant

    On Twitter Jolene asked about parsing randomly formatted trinomials using Open Refine. Here is a recipe that should work in many cases (can’t guarantee all).

    So, start with having a project open in Open Refine and the non-standard trinomials are given in a column called “Trinomial”. The following code will help parse out state, county, and site ID parts of the Trinomial in three new fields:

     

    (1) Getting the Site Number from a Trinomial

    Select the Trinomial field and “Add a new column based on this column”. The Name of this field will be “Parsed Site”. In the Grel expression window put in this:

    toNumber(replace(trim(cells[‘Trinomial’].value), /[^0-9]+/, ‘/’).split(‘/’)[1])

    To explain, a regular expression strips all non-number characters and converts this to 1 slash character (‘/’). So ’44SX202′ becomes ’44/202′. ’44-SX 202′ also becomes ’44/202′, and ’44-SX-00202′ becomes ’44/00202′. The “split” function then slits the ’44/202′ into a list of 2 items [’44’, ‘202’] or [’44’, ‘00202’]. We select the second part of that list to be the site ID (that’s the [1] part of the above expression, since in most programming languages the first item in a list is the 0 index, or [0], and so second is [1]). Finally, to remove any leading zeros (as with the case of ‘00202’), we convert this value from a string of characters to a number with the “toNumber” part of the expression.

     

    (2) Getting the County Code from a Trinomial:

    Select the Trinomial field and “Add a new column based on this column”. The Name of this field will be “Parsed County”. It’s a bit easier to get the county from a Trinomial, since a county code should be the only alphabetic characters in the trinomial.

    toUppercase(cells[‘Trinomial’].value.replace(/[^A-z+]+/, ”))

    To explain, the “replace” function uses a regular expression to delete all alphabet characters. Then the “toUppercase” function converts the result to uppercase characters.

     

    (3) Getting the State Number from a Trinomial

    Select the Trinomial field and “Add a new column based on this column”. The Name of this field will be “Parsed State”. In the Grel expression window put in this:

    toNumber(replace(trim(cells[‘Trinomial’].value), /[^0-9]+/, ‘/’).split(‘/’)[0])

    To explain, this is just like getting the Site Number from a trinomial, except we’re getting the first batch of numbers, not the second. That’s why we have the “[0]” in this expression and not “[1]”.

     

     

     

     

    • This topic was modified 5 years, 9 months ago by  Eric Kansa.
Viewing 1 post (of 1 total)

You must be logged in to reply to this topic.

Close

Account Activated

Your account was activated successfully! You can now log in with the username and password you provided when you signed up.

Close

Account sign-in

Please use the form below to sign-in to your account.

Forgot password?
Close

Recover password

Please enter your username or email address. You will receive a link to create a new password via email.

We've sent you an activation link. Please check your inbox.

Close

Account signup

1 Account Info

2 Personal Info

Registering for this site is easy, just fill in the fields below and we'll get a new account set up for you in no time.

In order to avoid spam, automatic account registration is restricted to emails from the following domains – .edu, .org, .gov. To register with a different email address, please write to digitalarchaeology@matrix.msu.edu to request an account.

Success!

A confirmation link has been emailed to you

Skip to toolbar