Notes From Nature Talk

Why the repetitive transcribing?

  • Pop_science by Pop_science

    With 143,000 records still to go, I find it very strange that I have seen several records more than once. It should be very easy to keep a counter system to see how many times a record has been transcribed and to present the records with the least transcriptions. When I see a record for the second time (and I believe I saw one three times already), I would say the random system of presenting records is not getting the kind of results that you might want. Multiple transcriptions per record makes comparison possible, but not if these transcriptions were done by the same person. Please adapt your software to present me with records that have not been transcribed yet.

    Posted

  • wreness by wreness

    I was wondering that today, too, Pop. I know they need the same record done (I think?) 4 times and then compare it but I know I have done the same records myself at least 5 times now over the last 2 weeks .... at least 10 of them. They were unusual, so I remembered them. I even wrote some down just to make sure and I got those 2 more times, so I know I'm not just wiggin' on too much java and Pringles here.

    I hope someone is checking all this as it's progressing and making sure things are going ok on the server/technical end. And reading the discussion board suggestions.

    Posted

  • reddder by reddder

    Pop & wreness: I have experienced what you both describe, but I've seen no reason to believe that 'someone is checking all this as its progressing and making sure things are going ok on the server/technical end. And reading the discussion suggestions'. If they are aware, they have not deigned to communicate their thoughts to the workforce. We are like the proverbial mushrooms . . . and it gets more annoying as the months pass by. If things can't be improved, just tell us: I think we can handle it. Being ignored is an altogether different kettle of fish.

    Posted

  • j-walk by j-walk

    Maybe the true purpose of the site is to analyze a participant's reaction to stress. : )

    @ dmbrgn: Did you notice your special mention in the March 21 blog entry?

    Posted

  • reddder by reddder in response to j-walk's comment.

    Yes, i was quite surprised. They succeeded in scuttling, to a small degree, my complaints about lack of open and mission-driven communications. Nevertheless, I say 'thank you' for the recognition.

    Posted

  • wreness by wreness

    Where did you get a mention? What blog? Are you famous now? 😃 Kudos! I say you have to buy us all pizza now.

    I just passed 3000 entries and my stress level has passed "mildly amused" to "sarcastic", so it's not looking good here.

    Have you gone over to Galaxy Zoo? That place is so busy it's incredible. Boards with thousands of posts. Every hour, every day, people of Import answering every question and engaging in discussions, thanking the volunteers. It's like Disneyland over there.

    Posted

  • j-walk by j-walk in response to wreness's comment.

    http://blog.notesfromnature.org/

    Posted

  • Hayduke13 by Hayduke13

    I came across the Essig Museum's database (http://essigdb.berkeley.edu/) when looking up a collector. Then I noticed that the record I was transcribing was already in their database and was noted as being added some time in 2012. What gives? Are some records being transcribed again?

    CASENT 8177313; collector EE Ball Jr.

    May be that this is an anomaly. The next few records I checked didn't seem to be in there.

    Posted

  • CTidwell3 by CTidwell3

    That's basically what I have seen too. Only some records are there in that database. It provides a good base to compare how well transcriptions by a professional compare to consensus data from a project like this.

    Similarly, if the quality of data from this projects is confirmed to be high, it can help find typos and mistakes there. I know I have found errors looking in the Essig database, where I have seen 10+ records from the same location and collector with the same month and day, but different years where I was pretty sure they should all be the same year.

    Posted

  • reddder by reddder in response to wreness's comment.

    Well, I've reached the 12,000+ level for notes from nature, plus another 30,000 on other projects [weather, serengetti, asteroids, etc.] and my observations have become far more pointed. I mention the numbers not to blow my own horn, but because I believe in the work and I usually find it interesting. However, the 'Bird Ledger project' finally did me in. I just couldn't keep typing the same info a hundred times. The big joke about that project is that they say you can do a page in 15 min! Well, I can't!!!! I'm back to CalBug. With so many specimens to record, you think OZ would want to make things work better by eliminating unnecessary steps/keystrokes. However, I sometimes get the feeling that OZ doesn't care, as long as those w/picks & shovels continue the grind. If anyone 'on the bridge' is listening: give us some new tools so we can get the job done in our lifetimes.

    Posted

  • wreness by wreness

    WOW dmbrgn, congrats! I think. Amazing. My liege :curtsey:

    I'm not sure what I have here as it wasn't recording my counts for a long time, now it's over 3000 but I've done a few thousand on the sea floor, too, I'm nuts about that but am sick of scallops. I loved the ornithology stuff but I agree 15 minutes? Puh-lease. Gawd, it'd take me an hour or two sometimes. I'd type the thing in a WORD document as I went to keep my place and cut/paste the redundant info back and forth but when you'd get a page where you had to type a lot, MAN. It was work. I tried the transcribing of the music pages recently but I'd rather eat dead worms. ZZZZZzzzzzzzzz.

    @Hayduke - I asked that same question of one of the Scientist gods here awhile back (I'm not as polite as you are..it was more like "HEY why am I typing all this up when you already have it???fer cry sakes") I was told that that's part of the point here - they are comparing our data with what they already have to make sure it's accurate (seems kinda redundant but who am I to quibble) and then of course there's all the ones that are NOT entered, that they are needing to enter.

    In the case where they already have the data listed, then, I'm thinking it would have been a lot more efficient if they just had us confirm it was all OK and then be done with it instead of triple-quadruple handling something that's already a done-deal. I'd rather have known about the database, pulled up a bug, looked to see if it was listed, compared the tag & database, said "yeah it's all ok" or "I added such and such" and then check it off the Galactic Input List. But then I'm just a Minion 😃

    Picks and shovels. I like the visual. Goes with the Pitchforks and Torches.

    Posted

  • Mr._Kevvy by Mr._Kevvy

    I question not only presenting us with the same entries repeatedly (I just had one like this as I found had already commented on it which prompted me to find this thread) but the necessity of having always the same number of multiple transcriptions (four) in the first place. It seems like a vast waste of effort if only 1/4 of what we are doing is being kept.

    This could all be alleviated if records that had even a single transcription would still appear in the approval process and could be approved as such if they were done properly the first time and marked as complete so that no one else would have to redundantly transcribe them again. And if not, then they would remain in the active set to be randomly assigned to other transcribers until such time as they were completed properly.

    Trying to think of the long-term, if it's actually planned to bring all twelve million SERNEC-member specimens onto this project, a change like this is absolutely essential. This project has been around for about a year and has gathered ~153,000 transcriptions. If 12M SERNEC records are digitized and brought online, that will turn into 48M transcriptions required. At the same rate, this will take over 313 years to complete. 😃

    Posted

  • reddder by reddder in response to Mr. Kevvy's comment.

    Thank you very much for quantifying the problem with which we are confronted. You and others have put forth possible solutions/improvements to our work methods, but the management says we are too poor or too undermanned to enact these changes. Why field such a task, if it is virtually impossible to complete?

    Posted

  • HelenBennett57 by HelenBennett57 in response to dmbrgn's comment.

    Needed next: the quantification of how much would be saved by some of the simpler UI improvements.

    Mr. Kevvy - the multiple transcriptions are compared automatically like gene sequences and a unified transcription produced as the result. There was a blog post somewhere... could dig it out. I felt much better about redundancy and data quality after reading it.

    Posted

  • joanball by joanball scientist

    Here are the blog posts on the process of checking transcriptions:

    http://blog.notesfromnature.org/2014/01/14/checking-notes-from-nature-data/

    http://soyouthinkyoucandigitize.wordpress.com/2014/01/14/412/

    Your comments about problems and potential solutions are very valuable to this project. I am forwarding all of this to the steering committee to see what can be done, and will keep you posted. Thank you!

    Posted

  • robgur by robgur scientist, admin in response to Mr. Kevvy's comment.

    We have the same thought on how to improve the UI and hope we can potentially implement that solution sooner than later. Notes from Nature is a labor of love but there is absolutely no spare capacity from the Zooniverse, really, and the science team doesn't have the resources right now to help. We are hoping to get there sooner than later... and we agree that these are great ideas!

    Posted

  • Mr._Kevvy by Mr._Kevvy

    @ joanball & robgur: Thanks, and it's good to see the project scientists back around here again. :^)

    Posted

  • reddder by reddder in response to Mr. Kevvy's comment.

    Has anyone responded to your calculations that it will take 315 years to finish this project following the current modus operandi. BY that time, humans may have evolved beyond processing information in this form and all will be moot.

    Posted

  • Mr._Kevvy by Mr._Kevvy in response to dmbrgn's comment.

    Nary a peep since joanball above indicated it would be passed on. However that at least was something, and more than expected. 😃

    Posted

  • darryluk by darryluk in response to dmbrgn's comment.

    Cut and paste makes it easy!

    Posted

  • Mr._Kevvy by Mr._Kevvy

    As well as transcribing here, I also do plenty of Citizen Science with BOINC, which is a platform for distributed computing primarily for scientific research.

    Much like this platform, results need to be gathered from multiple random participants for each parcel of work to prevent bad/fake results and for accuracy. What a BOINC project does to minimize redundancy is to initially send each "work unit" to two participants only. When the results are returned, they are compared. If they match, then this is considered a completion. If they don't, only then is the work sent to a third participant; it will be re-sent as many times as required for concurrence.

    This system has been place for about a dozen years for several dozen BOINC projects, so it seems to have been tested as the most efficient and reliable.

    Posted

  • reddder by reddder in response to Mr. Kevvy's comment.

    Unless, I've been asleep at the switch, I haven't seen anything responding to your msg about the elephant in the room: 313 YEARS to finish this project at the current rate of activity. Such a statistic really makes the whole project pointless. I still keep plodding away, hoping there will be some fix to the 4 transcription rule for each entry. If Jon Stewart were interested in citizen science projects, I'm sure we'd get some pointed barbs thrown this way. I recently saw the Monty Python reunion show, and when they talked about the success of the British Radio Ballet, I realized what a brilliant skit could be built around a project that is not looking for a cure for cancer in 13 yrs,. but a project that is simply hoping to digitize archived specimens of life forms in 300 years.

    Posted

  • Mr._Kevvy by Mr._Kevvy

    I still plod away too... I'm sure it raised an eyebrow or two and I expect an improvement will eventually appear. Perhaps it isn't planned to bring the entire set of SERNEC-member herbaria in, but if it is, I think one will be required! But, I still enjoy transcribing (I do it with headphones on, and go off into The Zen Zone... it's peaceful.) So I'll keep at it pending an outcome. Hope I didn't scare anyone off. 😄

    Oh, and I think that all the mistakes I am finding and reporting are possibly more valuable than the actual transcriptions (especially with the transcriptions being worth 1/4 their apparent data size)... the output isn't going to be worth anything with thousands of errors in it. So that also keeps me going.

    Posted

  • reddder by reddder in response to Mr. Kevvy's comment.

    Well, I'm w/you. I'm hooked -- just feel the need, occasionally, to let management know we're still at our posts.

    Posted

  • robgur by robgur scientist, admin

    We totally agree re: rate of transcription and one of the highest priority plans is to move to a "transcribe once and validate" approach, which I think would be much more efficient. The only reason we haven't implemented is "capacity" --- we don't have the resources yet to make changes like that to how NFN works. We are very actively pursuing the resources to make such changes, though, and hope you are willing to stick it out while we move from what we all think is a working but still "not quite there" prototype to something much better, and much more engaging and useful.

    Posted

  • reddder by reddder in response to robgur's comment.

    Your message brings hope & joy. Yes, I'll stay at it in the hope of seeing the promised land.

    Posted

  • Mr._Kevvy by Mr._Kevvy in response to robgur's comment.

    Yay! Thanks for the confirmation.

    Posted

  • robgur by robgur scientist, admin in response to dmbrgn's comment.

    We are working hard on a "ditto function" that can hopefully help a bit here for repetitive entries. We know it matters!

    Posted

  • reddder by reddder in response to robgur's comment.

    Thanks again for your answer. The 'ditto function' will really enable us to up the productivity and get the info 'on the street' a lot sooner.

    Posted

  • robgur by robgur scientist, admin

    Hoping this can happen really soon --- we may prototype within a week or two.

    Posted

  • Mr._Kevvy by Mr._Kevvy

    Did we just go to the prototype of "transcribe once and validate"? I got logged out of my session and then the numbers changed drastically for the completions to:

    Total Images: 52,552
    Active Images: 14,166
    Complete Images: 38,386
    73% completion (from 94%)

    If so, yay! And I will be more careful as I hope everyone will with the transcriptions now to minimize errors.

    Posted

  • HelenBennett57 by HelenBennett57

    Surely some of us would have noticed that we're being expected to do some validation? I'm assuming something will/would show up in the UI and that it would be loudly announced - @Robgur is that right?

    Posted

  • reddder by reddder

    Why would you assume that anything would be loudly announced? I'm hoping Mr. K is on to something. What is beyond dispute is that the management made radical changes to the program statistics about a week ago w/no explanation whatsoever. Why can't someone take the time to forward an explanation for those who toil in their vineyard?

    Posted

  • HelenBennett57 by HelenBennett57

    Hehehe, cos I'm still not entirely 100% cynic! ... yet. And also because there was a blog post about the new ditto function in Calbug, so I'm assuming an even bigger change would definitely get a blog post.

    Posted

  • DZM by DZM admin

    According to the development team here at the Zooniverse, Notes from Nature is still very much operating under a four-transcription model, and the 1+validation model remains a long-term plan.

    Hope that this helps!

    Posted

  • robgur by robgur scientist, admin in response to dmbrgn's comment.

    Yes we have been meaning to post about the changes to the numbers. The long and short is that we haven't ever tallied the Ornithology ledger work in Notes from Nature, and we don't yet have a way to do that effortlessly --- it seems like a simple fix but its not, unfortunately. So we manually calculate effort (# of records completed) every couple weeks, and this increments numbers on the main page. I will sketch a post on this because it deserves further broadcast.

    Posted

  • robgur by robgur scientist, admin in response to robgur's comment.

    Oh we also changed our collections numbers as well for a variety of reasons. I will fold this all into one blog post! This weekend.

    Posted

  • Mr._Kevvy by Mr._Kevvy in response to robgur's comment.

    Thanks for the update also to DZM... I'm thinking that it was a coincidence: the completed Herbarium images that were quad-transcribed were removed from the project, and this happened to be 75% of the total, which made it appear that Transcribe Once and Validate was now active.

    Ah well, we stick with the 300+ year estimate for a while if that is the case. 😃

    Posted

  • reddder by reddder

    I'm kind of disappointed as I won't be here in 2314 when the project is completed. Hopefully, our work will be made available before the last item is processed. Mr. K: thanks for the too-good-to-be-true assessment. I enjoy your observations.

    Posted

  • Mr._Kevvy by Mr._Kevvy in response to dmbrgn's comment.

    Thanks. 😃 I guess we are akin to the ancients who planted trees and vines that grew to tremendous size that they never got to enjoy.
    Back to the vineyard for me!

    Posted

  • reddder by reddder

    Mr K: I, as well.

    Posted

  • HelenBennett57 by HelenBennett57

    Yep, I'll pick up my spade again!

    Posted

  • robgur by robgur scientist, admin in response to Mr. Kevvy's comment.

    Man, I totally agree about the mix of hope and minor ARGH about the scope of the challenge and what we can do to make it better. Transcribe Once and Validate is on the horizon (not just crossed fingers but plans to get to that one)--- one of two or three key priorities in the next 4-6 months.

    Posted

  • DZM by DZM admin

    For the scientists on this project... if there's anything that I can do to help, any communications difficulties with the developers, etc., on trying to get this issue fixed, please let me know! I'll do all that I can to help. 😃

    Posted

  • DZM by DZM admin

    So, with regards to repetitions in NfN transcriptions, I had our lead developer look into it. Unfortunately, he didn't have much luck finding any evidence of a problem:

    This is going to be more difficult to track down. I've tried doing a number of classifications, and I can't spot any obvious instances of duplicates appearing in the subject queue.

    We've already completed the library update here on Notes from Nature that is supposed to help eliminate repeats, too...

    So at this point, I think the best thing to do is, if you see a repeat, report it right here. If I can put together a body of several objects that have appeared as repeats, perhaps we can figure out if/why it's still happening.

    Thanks!!

    Posted