Discussion group for the members and faculty of the NEH Funded Institute for Digital Archaeology Method & Practice (http://digitalarchaeology.msu.edu) organized by Michigan State University’s Department of Anthropology and MATRIX: The Center for Digital Humanities and Social Sciences
December 7, 2015 at 3:20 am #290
Hi Nancy – This is what I’m understanding.
– data already exists in XML (85 000 records) that can be exported from the existing cms
– this data would be edited, curated into opencontext
– some sort of front end exhibit that draws on the opencontext data, through oc’s api
– the exhibit would primarily be accessed through mobile
– targetted at scholarly audience & particular society
Right now, this feels perhaps a bit ambitious while still searching for focus. Getting material into Open Context is to the good I think, but in the context of the project, given that you already have a CMS, it might take too much time and energy to get that step accomplished. If you drew back the focus to a single building at Fort Snelling, what would that do to your data? Conceive of all of this as small test. One building, exposed through open context API (or what options does your CMS have to expose the data?), feeding into a leaflet powered map or bootstrap gallery (eg https://blueimp.github.io/Bootstrap-Image-Gallery/ ) ?December 9, 2015 at 10:31 pm #303
Hi Shawn –
Thank you for your comments! I do plan to start with the data from one building but I can see now I neglected to make that point in my blog post. I originally hoped to start with the Officer’s Quarters. Lack of transcription data for that catalog shifted my focus to one of the Enlisted Men’s Barracks. I have been steadily working on marrying the transcription with the inventory data for the collections from this excavation and now have ~6700 records basically ready for import to OC. One of the main reasons I want to publish the data in OC is to encourage interest at my institution in making this data publicly available.
I have an issue with nested fields in the MHS CMS and am still working on getting the XML export report written the way I want it. I would also like to convert from XML to CSV, so I need to start digging into Open Refine.
My lack of experience with web design makes this feel like the most challenging part of the project. I purposely scoped this as proof-of-concept only for this reason and because any public interpretation of the site would need to go through an extensive approval process. The best case outcome would be adoption of my website for interpretation at the Fort. At a minimum, I want to show what can be done by way of interpretation once the collections data is published.
I met with my supervisor and an interpretive content creator for the Fort. Everyone would like some kind of a map interface, but I’m not sure how well that would work at the scale of a small building. This looks like the smallest Google map tile for the building area: http://bit.ly/1YZbqlf
Data prep has occupied most of my time up to this point so I haven’t turned my mind to the website but the Fort Vancouver ebook http://www.nps.gov/fova/learn/photosmultimedia/upload/FTV-Ebook-Final-v-1-1.pdf seems like very simple presentation model that could be adapted to less text.December 10, 2015 at 3:11 pm #308
Hi, Nancy. Your question about mapping such a small area made me think of this project, Yale’s plan (rather than the usual map) of Dura Europos with thumbnails of specific buildings. Perhaps is can provide some food for thought. http://media.artgallery.yale.edu/duraeuropos/dura.htmlDecember 16, 2015 at 11:38 am #331
Sorry for not replying sooner, bad of me as your mentor. Well done on producing your project vision. An ambitious project, and as Shawn has highlighted already, you may have issues getting this all done in time for next year. You do want to use Eric’s amazing platform – can he accommodate getting your data into his system within this timescale? Is he committed elsewhere?
I think Shawn is right about perhaps scaling back to maybe a smaller focus and then iterate up. I mentioned to Jolene in another thread about the LEAN project principles that we’re using at the BM and that again may suit this project – go for build, measure and iterate through your project. Improve as and when you can. Web dev is always on going! It is good your resource is structured data – you maybe on the route to Linked Data too (don’t scope that in now though!)
Audience – can you try and reach a larger audience, or let serendipity take over and see what people do with your resources?
License – have you considered what you might do with a license for your content?
@Alice – nice example of Dura Europos.
DanDecember 16, 2015 at 8:56 pm #343
Thank you for your comments! Not late at all.
Regarding project scope – I have narrowed in on one building for the MSUDAI project.
I can see your point about Eric’s workload. I have a lot of experience cross-walking metadata from various collection types. To the extent that Open Content is about mapping the data to a central data model, I might be able to take part of the work off of Erica’s shoulders. I know he has not had success with this in the past, but this is one of my strengths.
I hope the audience for my project will be professional archaeologists. I have a partner at the University of Minnesota, Dr. Kathryn Hayes, who plans to work with the data to demonstrate its value in her research once it is published. I also plan to give presentations about my project at professional meetings to make other archaeologists aware of the data. In fact, I have already done one talk at the Midwest Historical Archaeology Conference in Minneapolis in October.
Another important audience is my own institution. I want to demonstrate the value of archaeology data to garner support for additional work with the Fort Snelling and other archaeology data sets. I have written an internal grant for this type of work before – without success. I think having an example in hand will show momentum and outcomes worthy of further support.
I had a look at the LEAN process link you posted and hope to follow that model for the website development. This fall, I met with education and interpretive staff who can help me review and improve as I put it together.
The licensing is an excellent question. I hope to go for a CC0 but really need to get clear on the permissions with my institution. I will take this up with my supervisor at the next opportunity.
-NancyJanuary 6, 2016 at 9:35 pm #389
Sorry for my delayed reply. I’ve been making comments in DropBox, and also had some jury duty distractions.
I can’t say how easy or hard it will be to process the XML records until I see some of them. If the records are consistent, I think I’d disagree about Shawn’s suggestion that we do only a subset. The main time and effort will be spent in understanding the XML and setting up an import. After that, it won’t matter much how big the dataset is (if it is clean) since it’ll be a computer doing the work not any of us.
As far as commitments go, I’m most available for this mid April through mid June, with smaller windows before. Again, I’d have to see what the data actually looks like first.
-EricApril 20, 2016 at 9:57 pm #662
Hi Eric –
I have my data ready for you to review. I put it into a GitHub repository: hoffmanNBH/ShortBarracksCSV. Let me know what I need to do next to help get this published with OC. If there is any part of the process where I can make your life easier, please let me know.
-NancyApril 21, 2016 at 5:53 pm #665
I’m taking a look now. Thanks Nancy!April 21, 2016 at 6:31 pm #666
Ok. I took a look and have the following recommendations.
- Add URIs (additional fields): It’s awesome you are using the Getty AAT vocabulary terms in your data. That will really help with interoperability. Let’s also add the URIs to these different terms we can create linked data. So for example, we should add a field that provides URIs for the “Materials 1” field. That new field should be “URI – Materials 1”. We will then provide URIs for each of the materials referenced, so “cast iron” will be accompanied by “http://vocab.getty.edu/aat/300011004” in the new “URI – Materials 1” field.
- Dates: Dates are a royal pain when we do international data sharing. I think it is better practice to use a date format like ‘2016-01-01’ (Year, month with leading zero, day with leading zero). These dates are sortable and won’t be confusing to people outside the US. You can (usually) convert your dates using Open Refine without trouble.
- Metric: We should probably convert measurements to the metric system when sharing data, just again since data going on the Web goes to a global audience.
Let me know if you need some help! We can use the Getty’s SPARQL endpoint with Open Refine to get the URIs you need.April 21, 2016 at 7:20 pm #667
Thanks Eric! I figured I should do these things but wanted to wait until you gave the go ahead. I will see if I can figure out the Getty URIs. Should I add the ORCID IDs for the staff names too? The dates make me crazy because we record them in universal format (for sorting purposes) but they come out of the database messed up again. Do I need the hypens? We don’t normally use them.April 21, 2016 at 7:27 pm #668
ORCIDS for all would be great!
I loaded your data in Open Refine to test out getting URIs for your Getty AAT terms. I’m “Creating a New Column by Fetching URLs” with the following expression:
“http://vocab.getty.edu/sparql.json?query=select+*+%7B%3Fsubj+gvp%3AprefLabelGVP%2Fxl%3AliteralForm+%22” + escape(value, “url”) + ” %22%40en%7D&toc=Find_Subject_by_Exact_English_PrefLabel&implicit=true&equivalent=false&_form=%2FqueriesF”
I’ll let you know about the results.
As far as dates go, yes, “2016-01-01” is preferred since it easily translates into date formats that work well on the Web.
April 21, 2016 at 7:36 pm #670
- This reply was modified 5 years, 9 months ago by Eric Kansa.
Great – I really appreciate the URI help!April 22, 2016 at 12:25 am #671
I had some success getting URIs for your Getty AAT materials using the Getty SPARQL endpoint to request JSON. In Open Refine, I used “Create a New Column by Fetching URLs”. I called my new column “JSON – Material 1” and used the following expression:
“http://vocab.getty.edu/sparql.json?query=select+*+%7B%3Fsubj+gvp%3AprefLabelGVP%2Fxl%3AliteralForm+%27” + escape(value, “url”) + “%27%40en%7D”
This took some time (I made 1 request every .5 seconds) to get lots of JSON. Then I processed these results to get the URI from the JSON by (from the options on the ‘JSON – Material 1’ column) “Add a new column based on this column”. I called my new column ‘URI – Material 1″ and used the following expression to populate its values:
Seems like it mostly worked.April 22, 2016 at 8:18 pm #674
Hi Eric –
I checked the Getty’s Open Refine reconciliation service documentation and think you saved me about a week of work figuring out how to modify it to create the JSON from my data. The URI conversion gave me a syntax error but I used the example on the documentation page with a modification based on your code with success: value.parseJson().results.bindings.subj.value
This is so gratifying. Thanks!
-NancyApril 25, 2016 at 3:30 pm #675
Eric -question on the GVP URI generation:
I ran the Materials queries without any problems. In the Descriptor terms however, I realized that I am using Alternate terms rather than Preferred. We have entered them as singular rather than plurals for reporting reasons. I tried changing the query by replacing the “AprefLabelGVP” portion of the string with “AaltLabelGVP.” It looks like it is going to work in the preview, but generates no matches after running. There must be something else I need to change in the query but after spending half a day studying your example and reading the documentation yesterday, I am at a loss. Can you help?
You must be logged in to reply to this topic.
active 1 month, 4 weeks ago
active 1 month, 4 weeks ago
active 2 months ago
active 2 months, 1 week ago
active 2 months, 1 week ago
active 1 week, 3 days ago
active 1 year, 3 months ago
active 2 years, 8 months ago
active 2 years, 8 months ago
active 2 years, 9 months ago