Notes From Nature Talk

When is a record "done"?

  • El_Lion by El_Lion

    How do you decide when a record is done? I.e. how many transcripts have to match in order you can tick the record off the list? Are there more "problematic" fields, like location and habitat with sometimes a lot of text, that are hard to match than others like number or collection date? And if so how much do those "problematic" field have to match? Is some software used which recognizes "similarity" instead of exakt words/spelling?

    Just curious. 😃

    Posted

  • geckzilla by geckzilla

    I'm also curious about this. I noticed that the Herbarium completeness is inching forward while Calbug seems to be stuck in the same position ever since the new specimens got added. Makes me wonder if something is broken.

    Posted

  • jamac41 by jamac41

    CalBug has gone up 1% since the additions - I suspect it may be simple numbers - Herbarium roughly doubled in size if I remember correctly, whilst CalBug got five or six times larger. Also, since the progress is calcuated by completed records rather than by single transcriptions, even assuming a constant number of transcriptions per day, the rate of change is inversely proportional to the number remaining non-completed. Thus, at the moment, a CalBug percent will take notably more than the c. 3.5 times as long as a Herbarium percent.

    Posted

  • geckzilla by geckzilla

    Thanks for keeping track of the precise numbers, jamac. I figured it was probably something like that but since I hadn't been looking closely enough it got me slightly worried.

    Posted

  • robgur by robgur scientist, admin

    These are great questions and we should be hopefully giving some more complete answers soon. Here is what the science team so far knows. Initially, each image was in the "pool" of records to transcribed until it received 10 transcriptions. We recently lowered that number to 4, which should mean that progress happens faster. Its an open question how many transcriptions are needed to test for 'convergence' or lack thereof. The science teams will be doing a lot of work in the next weeks to take the output records and reconcile them. There are some great tools, like Open Refine that can help with this task. We'll be hopefully blogging more about that, and if you all have great ideas, we'd really like to hear them etc.

    Posted

  • nosenabook by nosenabook

    Cool. My head is stuck in the 1900s, thinking the records required human eyeballs for reconciliation. Software should be a HUGE help. The how for the Herbarium Locations and Habitats still puzzle me. However, not my problem, so I'm good.

    I think you are right, 10 transcriptions is too many/too slow. This job is copying, not making snap judgments (if you don't count reading hand writing). If it can't be reconciled with four, it can always be returned to the pool.

    It might take a few hundred reconciles to feel secure with results. The sooner you can get some sort of end product, the sooner you will know if and how the process needs to be changed!

    Posted

  • ghewson by ghewson in response to robgur's comment.

    Not an idea, but another question, I'm afraid. If there's a typo on the label, how do you pick out, say, one correction out of three verbatim transcriptions? It's not necessarily an obvious typo, either. I've got into the habit of checking plant species names against www.theplantlist.org to check my own typing, but I've found one or two typos on labels that way.

    Posted

  • El_Lion by El_Lion in response to ghewson's comment.

    Maybe one could run the entries against a database like theplantlist or similar in order to find typos? Also the "outdated" names could be detected that way. I imagine the software sorting the entries in three categories: green = name found in list, current; yellow = name found in list, not current ; red = name can't be found in list (most probably due to a typo).

    Posted

  • Mikerollem by Mikerollem in response to robgur's comment.

    Hi

    when I start a transcribing a bug session I always get in the top right hand corner 45 records done. This goes up by one each time I complete a record as expected, so when I end the session it's a greater number. However, when I start my next session, it's gone back to 45. I want to be sure this does not imply that the records I've transcribed have been ignored. Please will someone explain and confirm.

    Posted

  • bumishness by bumishness admin in response to Mikerollem's comment.

    Let me reassure you that all of your work is being collected properly.

    That said, there is some bug I can't quite track down with the display of your personal progress. It's been reported a number of times.

    Could you give me a few other bits of information to help me troubleshoot this? What browser/OS are you using? What collection were you doing transcribing?

    Posted

  • Mikerollem by Mikerollem

    Hi I'm using M'soft Internet Explorer 9/ Vista. I'm only transcribing the Calbug collection.

    Hope that helps.

    Mike

    Manchester, UK

    Posted

  • Mikerollem by Mikerollem

    Hi

    I'm using Internet Explorer 9/Vista. I'm only transcribing the Calbug collection.

    Hope that helps

    Mike

    Manchester, UK

    Posted

  • Mikerollem by Mikerollem in response to bumishness's comment.

    Hi

    It's IE9/Vista and Calbug.

    Once you've tracked down this 'bug' you can put it in the collection and we'll transcribe it [smiley face].

    Best Wishes

    Mike

    Posted

  • Mikerollem by Mikerollem

    Sorry for multiple replies above - didn't realise they had gone on to next 'page' so couldn't see them and assumed they hadn't been submitted.

    Mike

    Posted

  • SweetBee by SweetBee

    This is an old thread, but every time I log on to transcribe, I'm back to 0 records transcribed! I'm using Internet Explorer 11.

    Posted

  • HelenBennett57 by HelenBennett57

    You said you keep going for one record until there are ten matching transcriptions. Is that per entire record, or per field?

    For example... if two people transcribe the same record, and the two transcriptions match perfectly except that they have different dates, would you accept all the other fields (country, state, county, scientific name etc) as valid matches? Or would you throw out all the other fields transcribed by the person who got the date wrong?

    Posted

  • HelenBennett57 by HelenBennett57 in response to bumishness's comment.

    Hi bumishness, the "personal progress jumps massively" bug could be related to going straight into a URL like this, http://www.notesfromnature.org/#/archives/herbarium/transcribe. Using the same computer and the same browser, transcribing the same collection (herbarium) I've hit the bug when I've gone through /transcribe; and not hit the bug when I go in 'properly' through http://www.notesfromnature.org/.

    Posted

  • Mikerollem by Mikerollem

    Hi

    I've been transcribing Calbugs for quite a while - I've had the first 3 'badges' - I'm considerably overdue for the 'Butterfly Swarm' badge (500 transcriptions).

    While I'm not bothered about the badge, at least the other 3 that I received showed that 100 transcriptions had been recorded by you. The absence of the 'Butterfly Swarm' again worries me that my transcriptions are going into a black hole.

    When I start up it always says I've done 63 transcriptions, though I've done several thousand.

    Perhaps you could let me know the exact number of transcriptions you've received from me.

    Thanks, Mike

    Posted