Notes From Nature Talk

Transcription standards

  • HelenBennett57 by HelenBennett57

    Some questions about transcription standards and practice seem to keep coming up. I'm not an authority or employed on this project or anything, but here are the herbarium-flavoured answers to questions I can remember with some material added from @joanball and @robgur below; and answers from this thread http://talk.notesfromnature.org/#/boards/BNN0000003/discussions/DNN00003mc until it gets turned into official FAQ. Anyone got more?

    --- SPECIFIC FIELDS ---

    County: If there are two counties, give the first-named one. Normally it means that something was growing on the boundary.

    County can't be pin-pointed e.g. because the location is only given as a National Park, which spans >1 county: don't give the county.

    Many scientific names: copy only the most recent one.

    Most recent scientific name has the genus abbreviated e.g. "J. marginatus" when there's a previous "Juncus marginatus" on the page: enter the genus name in full i.e. "Juncus marginatus".

    Variations and subspecies: do record, but as always, omit the scientific author's name. So "Cyperus odoratus var. squarrosus (Britton) Jones, Wipff & Carter" becomes "Cyperus odoratus var. squarrosus". "Echinodorus cordifolius (Linnaeus) Grisebach ssp. cordifolius" becomes "Echinodorus cordifolius ssp. cordifolius"

    Case for scientific name: transcribe as Genus species even if it is written in all capitals (or even all lower-case), e.g. Cyperus odoratus. (From Mr. Kevvy)

    Scientific name blank or can't be figured out: skip the field.

    Cultivated specimen from seeds/cuttings/etc taken elsewhere: the location is the location where it was cultivated, e.g. greenhouse 5.

    Provinces: e.g. COASTAL PLAIN PROVINCE, PIEDMONT PROVINCE: put them in the Location field, type with initial capitals only (Coastal Plain Province, Piedmont Province...)

    P. O.: put this in the Location field.

    Formation: put this in the Location field (discussion here)

    Location information from the label: transcribe this if it doesn't appear elsewhere, i.e. don't duplicate the state or county, but do add non-duplicate information. Beware of museum/collection labels that give the location of the collection - don't transcribe this by accident!

    Floodplains: put these in the Habitat field.

    Powerlines: put these in the Habitat field, e.g. "On powerline right-of-way..."

    Many collectors with no separators e.g. they're on different lines of the label and you want to put some punctuation between them: separate with commas

    "s.n." instead of a collector number: this stands for "sine numero", "without number". Leave the number field blank.

    Multiple collector numbers: transcribe as written.

    Roman numerals in the date: the convention is that Roman numerals are the month (Robgur).

    T28S, R17E, Sec. 2 - or things that look like this. This is Location information more info from md68135

    Two dates: enter the earliest date only (this definitely only applies to the Herbarium).

    Dates before 1880: Skip (if the UI has not yet been updated to allow these to be entered). Put a note in the comments.

    --- ANY FIELD/WHOLE RECORD ---

    Not in English: transcribe exactly as written. Match label content to transcription fields as far as you can.

    Abbreviations: transcribe exactly as written. For interest only: DBH = diameter at breast height, frs = fruits (uncommon).

    Errors, typos and spelling mistakes: transcribe exactly as written, unless you have looked it up and are absolutely certain of a simple spelling mistake. In this case, you can enter the correct spelling. It improves the chances of a consensus transcription which can then be automatically or manually corrected in the database. Flag these in comments on objects with #error so that #scientists can find them.

    Re-examination:, e.g. "Examined as part of a study..." . Do not transcribe.

    Personal comments which have nothing to do with the specimen: Do not transcribe.

    Missing spaces between words: insert these (so "3miN Oakland" becomes "3 mi N Oakland", R.Kral becomes "R. Kral").

    Word hyphenated across a line break: remove the hyphen and transcribe as one word. (This does not apply to hyphenated word-pairs; transcribe these as written.)

    Specimen flagged as paratype: in the discussion, add a comment with #paratype

    --- CONSENSUS TRANSCRIPTIONS ---

    This blog post about deriving consensus transcriptions was interesting, and it sheds more light on the "when is a record 'done'?" question: http://soyouthinkyoucandigitize.wordpress.com/2014/01/14/412/

    Posted

  • riskingraisin by riskingraisin

    Elevation: enter elevation values in the Habitat & Description field (see http://talk.notesfromnature.org/#/boards/BNN0000003/discussions/DNN00000q1)

    Posted

  • riskingraisin by riskingraisin

    PLSS/TRS coordinates: enter in the Location field exactly as written on the label (see http://talk.notesfromnature.org/#/boards/BNN0000003/discussions/DNN0000094 & http://talk.notesfromnature.org/#/boards/BNN0000003/discussions/DNN00001j1)

    PLSS (Public Land Survey System)/TRS (Township-Range-Section) references are coded geographic locations used in a number of western, midwestern, & southern states, and consist of three parts (and sometimes additional notes). For example, T4N R7W Sec. 36 (Township 4 North, Range 7 West, Section 36). Sometimes additional notes are added, such as "SW 1/4 of NE 1/4 of NW 1/4 Sec. 36," "SW/4 NE/4 NW/4 S36," or "SWQ, NEQ, NWQ, Sec. 36" ).

    Posted

  • HelenBennett57 by HelenBennett57

    Some questions, @Austinmast or another scientist please:

    Can we flag errors (typos etc) with something like #error, so that they can be more easily found by a #scientist? (ans., yes)

    Is case a criterion for matching strings? E.g. if you split out a location and a habitat from each other mid-sentence, should you tidy up the split-out text's capitalisation, or copy it exactly as written? (ans., capitalise the first letter)

    Provinces in block capitals: do they need transcribing in screaming block capitals? (ans., capitalise the first letter)

    If the latest determination has two names and an equals, e.g. x [=y], do we just put in the first? http://talk.notesfromnature.org/#/subjects/ANN0003u4r

    If the common name is shown, do we put it in the habitat/description?

    Posted

  • joanball by joanball scientist

    Thanks for starting this thread. I will answer what I can for Calbug, and forward it to some others to see if they can add or disagree with anything. Please post more questions about transcribing if you have them.

    Flag errors: yes, you can use #error and comment about what is in error. We still hope to get a better system, like a checkbox for problem records.

    Case: type it out as it would be in a database (i.e., capitalize only the first letter)

    Interpretation that we would like you to make:
    "3miN. of Oakland" should be "3 mi N. of Oakland"
    Just simple spacing errors

    Interpretation that you should leave to us
    Don't type out abbreviations in verbatim (we'll sort that out. E.g., "Convict Lk." )

    Posted

  • joanball by joanball scientist

    I'm now working on an FAQ section for transcribing records. So, please post any more questions that you have about transcriptions, or any that you think others might benefit from.

    Thanks again for all of your help and for starting this useful thread!

    Posted

  • tmeconverse by tmeconverse

    I have several comments/suggestions:

    1. fix the bug where typing "22" in the day section of the date field always goes back to "21". It doesn't do it in the year section-just the day. You will need to check every date with a 21 in that field to verify that it is not really 22.
    2. Make clear where the Province information (such as COASTAL PLAIN PROVINCE) goes--Location or habitat.
    3. Should it be in all caps?
    4. What is the best way to associate the collector number with the collector when there are several. I always put the name closest to the number, or the name that is printed on the card, first in the "collected" by field.
    5. What is the best way to separate collections' name when they are listed in a column? I use commas.
    6. What should we do with those labels where there is no collector number. Instead of skipping field, I put s.n. in it to indicate that there is indeed no number.
    7. Have some instructions on how to use the field just to the left of the "Finish Record" tab. (The one that shows "1/9" when you start a record and "9/9" when you finish. I find it very useful to be able to go back to previous fields, or to go to the "Scientific name" field when I want to get that information from a part of the label that might be outside the "lasso-able" part of the main label such as the determination information often stuck above the main one. I can then close that lassoed image, lasso the main label and be back at 1/9.
    8. The "9/9" button is also extremely useful for proof-reading my work before I finish the record.
    9. Make an iPad interface for the project. It would be great to be able to get in a few records while standing in line at Starbucks or at an airport where one doesn't have access to a desktop.
    10. I appreciated the distinction above between silently correcting obvious errors and expanding abbreviations. That would have been helpful several months ago.
    11. Make it possible to use the back button to return to a finished record. I don't know how many times I've noticed something just as the image is leaving my screen and it's too late.
    12. What about inconsistencies on the part of the collections, such as R. Kral. Half the time he types his name as "R.Kral" (with no space between initial and last name) and half the time he puts R. Kral (with a space). Should this be typed as is, or would it be better to always put the space (or to leave it out)?
    13. What should we do if we don't have a special character on our keyboard, such as the degree symbol? I go to Google, search for "degree symbol", then copy and past it into the text. Does this cause problems with the data base? Would it be better to type out "degree"?
    14. Make it possible for users to rearrange the country list if they so desire to put the United States at the top of their own view of the pick-list. I understand and agree with the need for an international effort not to have a country-specific bias by putting the U.S. at the top of the list, but it would be nice to modify it on the fly in an individual's view of it.
    15. Put defunct counties (such as Warwick County, Virginia or Dade County, FL) in the state's lists of county, but have them go to the correct locate (such as Norfolk City or Miami/Dade County) in the data-base. We wouldn't then have to keep Google-ing them to see where they go.
    16. The convention used for consolidated city/county mergers is inconsistent. In the Georgia list, for example, Muscogee County is listed, not Columbus/Muscogee County, even though they consolidated governments in 1974. This will be a nightmare to maintain since there are changes all the time. Macon, Georgia merged with its county government at the beginning of 2014. Who is going to maintain this? Why not just leave merged governments with the county name?
    17. One of the FAQs is going to be "How do we handle renamed/obsolete/multiple scientific names? Do we put them all, only the latest, a mixture? I know the guidance is only the latest, but what if the newest determination includes an = name, or an X? Do we leave that off or is that part of the "latest" name?
    18. If we can have state and county lists for the U.S., why not Departamentos, Provinces, Estatdos, Condatos, oblasts or whatever the governmental subdivisions are for other countries? It would make the data base much more consistent than leaving those decision to the transcribers.

    I commend you, Joan Ball, on undertaking the FAQ. I cannot wait to see it. Let us know how we can help.

    Posted

  • HelenBennett57 by HelenBennett57

    @joanball - thank you!

    @tmeconverse - ditto so much of what you are saying! (1) I keep hitting that too, thought it was just me so it's good to see it isn't. (13) There are symbols to copy and paste here if that helps. http://talk.notesfromnature.org/#/boards/BNN0000001/discussions/DNN00001vl

    Posted

  • Mr._Kevvy by Mr._Kevvy

    Yay! FAQ posted in the Blog.

    From the FAQ: "Spelling mistakes: Transcribe exactly as written, unless you have looked it up and are absolutely certain of a simple spelling mistake. In this case, you can enter the correct spelling." Lol... there go all the #error flagged ones I did. I'll start correcting them then from now on, as I am pretty sure of them being errors (not only from the ITIS-based dictionary but I also recheck the errors from web resources.) I'll still use the #error flag as well.

    Posted

  • tmeconverse by tmeconverse

    The FAQ is great. I would suggest adding something to number 8, (Multiple collectors). Sometimes there is a printed name, with names above and below, but with the collector number next to the printed one. That one is obviously the one claiming that number, and in those cases I put the printed name first on the list. Joan, should I continue doing that?

    Posted

  • joanball by joanball scientist

    I'm not really sure what the collector numbers are, actually. Is that for SERNEC Herbarium specimens?

    Posted

  • md68135 by md68135 scientist in response to tmeconverse's comment.

    Hi @tmeconverse
    Thanks so much for your question. Can you post an example so that we can be sure to give the right advice? Since the (collector) number is recorded separately for SERNEC I want to be sure I understand completely.
    Thanks again for all your efforts and questions!

    Posted

  • tmeconverse by tmeconverse

    I posted an example on the discussion page a couple of days ago that illustrates the issue of how to associate a collector number to only one of several collectors.

    Posted

  • HelenBennett57 by HelenBennett57 in response to joanball's comment.

    Hi @joanball, saw the FAQ, thanks! Collector numbers - yes that's a SERNEC Herbarium thing. I'm guessing that they uniquely identify a specimen collected by a particular collector, e.g. R. Kral 12345 would be his 12345th record.
    @tmeconverse, that sounds like what I've been doing (for what it's worth) - I've assumed that the printed name represents the primary collector, and the rest are friends, family, botany class, colleagues...

    Posted

  • riskingraisin by riskingraisin in response to HelenBennett57's comment.

    Yes, the collector number is unique to the collector. Many collectors simply use consecutive numbers, while a few collectors include the year in the collection number & re-start their numbering every year (for example see here). These numbers are generally assigned when the collection is made, and allow the collector to cross reference the specimen with the relevant entry in a field notebook.

    When there are multiple collectors listed, I have been trying to list the collector that the collector number appears to be affiliated with first in the collector field, even if he or she is not first on the list of collectors given on the label (as someone previously described). I agree that it would be good to have a more definitive way to link the collector number to the appropriate collector.

    Posted

  • tmeconverse by tmeconverse

    That's exactly what I do -- if it seems obvious that one name is associated with the number, even if there are names above that one closest to the number, especially if it is printed, I put it first in my list of collectors.

    Posted

  • tmeconverse by tmeconverse

    I have another question regarding transcription standards. What do you do regarding punctuation when both location and habitat information are included in a sentence? If it said something like "Frequent in the muddy shallows near the marina between 4th and 5th Streets." would you put "Frequent in the muddy shallows" (with or without a period?) in the habit field and in the location field "near the marina between 4th and 5th Streets", starting it with "Near" or "near"? I could argue either way, but it would be nice to know how it will appear in the final version of the database and try to get the punctuation, capitalization, etc. right to begin with.

    Posted

  • Mr._Kevvy by Mr._Kevvy

    I was going to ask exactly what tmeconverse did regarding that but was waiting for a good example. :^) Here is the best I have so far of the Location and Habitat/Description being interleaved:

    NW Gainesville, moist, rich woods bounded by NW 57th Tce, 61st St., 8th Ave., and 23rd Ave; vine with tendrils at nodes and no spines.

    I put the location elements in bold and the habitat elements in italics. How should these mixed blocks be transcribed? Is case-sensitivity a criterion when choosing matching transcriptions? Punctuation? Thanks.

    Posted

  • HelenBennett57 by HelenBennett57 in response to tmeconverse's comment.

    I'd split it up like this even though "in the muddy shallows" sounds a bit odd and neither are, strictly speaking, sentences:

    Frequent in the muddy shallows.

    Near the marina between 4th and 5th Streets.

    Since the request to capitalise records as they'd appear in a database entry, I've started using capital letters at the start of chunks of text, and generally putting a full stop at the end.

    Mr. Kevvy, I would probably transcribe your example split out the same way and punctuated like this. The ellipses feel a bit pedantic but I guess if enough people don't use them, they'll get 'voted out'.

    NW Gainesville ... bounded by NW 57th Tce, 61st St., 8th Ave., and 23rd Ave.

    Moist, rich woods ... vine with tendrils at nodes and no spines.

    Posted

  • joanball by joanball scientist

    I think you all have the right idea about splitting out the habitat information. But don't add any additional punctuation.

    Thanks!

    Posted

  • wreness by wreness

    I don't know if I feel foolish or not now - I have spent an incredible amount of time the last many many months looking up the correct spellings of things, the correct counties, filling in data where it is missing, researching place names that were so misspelled they were illegible and so made no sense and weren't in the database, and also spent tons of time looking up details in the database to find records when it was obvious the record wasn't going to be recorded because it was an utter mess, but I was determined. It seemed when we started this, that was part of the "mission" was to help make this as exact and complete as we could, if we were willing to. But if we just want to type away and put the scribbles down "as is", regardless of if it's right, wrong, or a mess? If many are transcribing the same record and someone is reading them, wouldn't the person reading them see that there is 1 (or others) which are more complete, cleaned up, punctuated correctly or amended to be "best"? Or it hasn't mattered? I'm confused.

    Posted

  • Mr._Kevvy by Mr._Kevvy

    As noted earlier, from the FAQ:

    Spelling mistakes: Transcribe exactly as written, unless you have looked it up and are absolutely certain of a simple spelling mistake.
    In this case, you can enter the correct spelling.

    I start a Discuss thread on the objects where I transcribe corrected so that it's noted that there is an error and the transcription is correct. I also include a reference for the correct spelling. For any genus/species (for Herbarium and Macrofungi at least) all that should be required is the ITIS Taxonomic Serial Number ie TSN. For anything else a link to a reputable source would have to be found. I guess it comes down to what is reputable. :^)

    Posted

  • joanball by joanball scientist

    We do want you to look up counties when they are missing. But, since the beginning we have tried to indicate that the locality and collector fields should be entered verbatim. The reason is because of the "consensus" finding process for data checking, see blog post linked below... We have so many records, that it is impossible for us to manually look at each one. However, we've been very grateful to receive information on inaccurate records by NfN users... I've corrected many records this way. So, keep letting us know if you come across mistakes!

    http://blog.notesfromnature.org/2014/01/14/checking-notes-from-nature-data/

    Posted

  • Mr._Kevvy by Mr._Kevvy

    Hasn't been any update in this thread in four months, so have been patiently collecting queries. Only nine thus far, which isn't bad for that amount of time, so I will post them in the hopes that they can be answered and the answers added to the Standards.


    1. Although we transcribe only the latest determination if there are multiple, should we also transcribe multiple synonyms in the same determination if they are listed, or just the first? (ie "Cyperus echinatus [=C. ovularis]")

    2. Should we also transcribe multiple collector numbers as written? ie "123 & 4567" (Probably an obvious "yes" but isn't formally in the Standards.)

    3. Are the transcriptions case sensitive, or is it OK to correct or improve capitalization without notifying? (ie the screaming caps of the PROVINCES). I note that it is already approved to correct capitalization in the genus/species scientific name, and by splitting/joining sentences due to separation of location from habitat will necessitate changing case. Disregard this one... already answered here and fortunately it's a "yes"!

    4. Should we transcribe location information that is printed into the template of the label rather than being added? (such as "Plants of the Great Dismal Swamp" or "Flora of Fort..." etc.)

    5. Should we transcribe "Collected as part of a survey..." and other info that doesn't relate to this specimen per se?

    6. Should we transcribe "sheet # of #" or other information indicating that this specimen is part of a set, but again is not just about this one per se?

    7. Should we transcribe re-examination? ie "This specimen was examined as part of a study of..." that occurs years after the original label.

    8. Should we transcribe personal comments that clearly have nothing to do with the specimen? (Thinking Philip E. Hyatt here for some reason. 😄 )

    9. If a word is hyphenated across two lines, do we remove the hyphen and join it? (Not including hyphenated word pairs of course. This is probably also an obvious "yes" but should be in the Standards formally.)

    10. Should we transcribe Habitat/Description (or other specimen-relevant) info in later, separate determinations? (sometimes the person who made it adds a comment with further info about the specimen, ie its condition or maturity.)

    Thanks!

    Posted

  • HelenBennett57 by HelenBennett57 in response to Mr. Kevvy's comment.

    My tuppence-worth:

    1. I haven't been; just been giving the 'main' name.

    2. Agree

    3. Agree

    4. I haven't been doing this unless without it you couldn't tell the location (despite temptation - "Great Dismal Swamp" is just wonderful) - I think AustinMast said a long time back in an object comment, just to transcribe the bits that relate to the data entry fields.

    5. Again, haven't been transcribing 😦 would like to though.

    6. "Sheet x of y" - transcribe and also flag in the comments.

    7. Re-examination: haven't transcribed.

    8. Philip E. Hyatt should have his own database fields 😃 His comments were fun at first but now they're just getting a bit tedious. Or maybe all these things need an 'other random stuff' field?

    9. Agree, de-hyphenate.

    Posted

  • Mr._Kevvy by Mr._Kevvy

    Thanks, Helen. :^)

    1. You're probably right with that as we are supposed to only transcribe the latest, even on the same entry. (Topmost in this case.)

    2. Agreed! The Plants of the Great Dismal Swamp font design is so 1960's... I'd expect the animated Rankin-Bass Gollum to live there.

    3. I will have to start doing that... was there a directive?

    Also added:

    Should we transcribe Habitat/Description (or other specimen-relevant) info in later, separate determinations? (sometimes the person who made it adds a comment with further info about the specimen, ie its condition or maturity.)

    Posted