Hi!

While I really enjoy seeing many of my fellow man being accommodating to people with disabilities. I find manually transcribing every image I post to be very tiring.

I thought that I could at least use some sort of AI to help with image transcripts, tho, that could probably be better used by the actual person with the disability.

So thats the question, should I skip the transcribing of an image or let an AI do it?

  • Lumidaub@feddit.org
    link
    fedilink
    English
    arrow-up
    23
    ·
    28 days ago

    If you can get an AI to produce an actually useful description, that would be extremely interesting. However, AIs don’t know what’s important about an image and will fill up the description with useless information, effectively spam for the person that needs a description.

    Write just a sentence, describe the thing that is important, while keeping in mind why you’re even posting the image, and it’s going to take less time than asking the AI.

      • Lumidaub@feddit.org
        link
        fedilink
        English
        arrow-up
        11
        ·
        28 days ago

        True and one sentence written by a human who understands the image is better than twenty sentences by a word prediction machine.

        • HappyFrog@lemmy.blahaj.zone
          link
          fedilink
          English
          arrow-up
          12
          ·
          28 days ago

          No matter how good human written descriptions are, people just won’t do them. So having a automated system is much more preferable.

          • Rain World: Slugcat Game@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            2 days ago

            people do them lots and lots on parts of the fediverse that are not lemmy! in fact, i think the only reason no lemmers do it is because the ui sucks (the alt text box in a post pops up in a completely unrelated section when you add an image!! alt text must be attached to images)

          • Lumidaub@feddit.org
            link
            fedilink
            English
            arrow-up
            6
            ·
            28 days ago

            I know what you’re saying but I truly think for most people it’s simply that they’re overthinking it. They think every single thing needs to be in the description, with references explained and sourced and whatnot. That does sound exhausting. And I have written a handful of descriptions like that for pictures where I thought the details were interesting enough to justify the effort. But really, a simple “The thirteenth Doctor and Rose Tyler embracing and deeply kissing” is already very sufficient in most cases (add “standing on an asteroid in front of a field of glittering stars - digital colour painting” if you have the spoons). So imho it’s better to educate them and encourage short, concise descriptions than to give in to the slop.

        • x74sys@programming.dev
          link
          fedilink
          English
          arrow-up
          7
          ·
          edit-2
          28 days ago

          Yeah, apart from the fact that I imagine that people who need alt text don’t appreciate LLM output. It‘s very boring. It’s either extremely technical and ice-cold or so cringe that you have to stop reading. Just what I think.

          At least for me, if I realize that I’m reading an AI blog article or AI generated text in some other form, I don’t read it.

  • x74sys@programming.dev
    link
    fedilink
    English
    arrow-up
    12
    ·
    edit-2
    28 days ago

    In my opinion, no. It has to be heavily curated. You’re not saving yourself a lot of work if you have to read it word by word (and probably correct stuff) anyway.

    I think just one very short sentence describing what’s on there (it doesn’t have to be detailed) is a lot better than whatever an LLM will give you.

    • Baŝto@discuss.tchncs.de
      link
      fedilink
      English
      arrow-up
      1
      ·
      20 days ago

      It depends a lot on the image. Multi panel comics have pretty long alt texts and AI can make it faster to reproduce the text in tge image.

      • x74sys@programming.dev
        link
        fedilink
        English
        arrow-up
        3
        ·
        20 days ago

        But then you’re primarily extracting text, which you don’t need LLMs for. OCR tools will do the job much cheaper and more effective.

  • placebo@lemmy.zip
    link
    fedilink
    English
    arrow-up
    8
    ·
    28 days ago

    AI is great for this. We shouldn’t put people with disabilities at a disadvantage because of the anti-AI hysteria.

  • Petersson@feddit.org
    link
    fedilink
    English
    arrow-up
    6
    ·
    28 days ago

    Personally “AI” is a slur for profit-driven generative bs. The concept it’s based on is great. I love pattern recognition and all the possible usecases for Machine Learning when it comes to science, material research, …

    tl;dr: Go for it.

  • forestbeasts@pawb.social
    link
    fedilink
    English
    arrow-up
    5
    ·
    27 days ago

    Do not.

    Please just don’t.

    People (hi I’m people) need what the image IS, what’s important about it, why you included it. Not just what some slop generator shat out about it.

    Better to have nothing, which is at least honest, than to have something that PURPORTS to have meaning but then just, doesn’t.

    – Frost

  • Tamlyn@lemmy.zip
    link
    fedilink
    English
    arrow-up
    5
    ·
    28 days ago

    A lot artists doesn’t want that their art is used on ai. You can’t prevent that if you let ai summarize your images. So don’t use ai for that

    • Lumidaub@feddit.org
      link
      fedilink
      English
      arrow-up
      5
      ·
      28 days ago

      Those are different mechanisms. Object recognition doesn’t mean the AI is now trained on the image and can reproduce it (which is btw why AI can still “visually” recognise what’s in an image that has been nightshaded/glazed).

      • Sir. Haxalot@nord.pub
        link
        fedilink
        English
        arrow-up
        4
        ·
        28 days ago

        This is true but it’s also important to remember that if you use an AI model hosted by the same party that trains it it’s likely that they will pass any data you input to the training stage. Unless you have an enterprise contract regulating training use.

        OP mentioned he will use a self-hosted LLM though and in that case it’s no risk of the data being used for training.

        • Lumidaub@feddit.org
          link
          fedilink
          English
          arrow-up
          2
          ·
          28 days ago

          I mean, if you put any image online that hasn’t been protected/poisoned in some way, you have to (unfortunately) assume it’s in some AI’s training data anyway. If the tradeoff for a useful description (! See my other comments about the lack of usefulness) is that an image is also fed into one more training corpus, that would be worth a thought, imho. If the image is protected/poisoned, I’d indeed encourage this whole hypothetical process, just to further sabotage the data.

    • Gonzako@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      28 days ago

      I was actually thinking of using a self-hosted LLM for these tasks. I wanna dig again into it and I got access to computers on the cheap

  • Doorknob@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    28 days ago

    By transcribing, do you mean describing what is in a picture, or transcribing text in a picture?

    For the former, I can’t really imagine an image you couldn’t describe for accessibility within a sentence, and for the latter, OCR could do the job equally well.

    I’m not saying this to just push the view that neural networks are no good for anything btw. For translation, for example, or text to speech/speech to text, I genuinely think they’re a revelation, and they need very little compute to perform those functions.

  • qaz@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    26 days ago

    I’d say go ahead but make sure it produces accurate enough results and make sure to add something like [AI Transcribed] in front so people can take the potential for additional errors into consideration when reading it.

    Also, if you’re using an online service make sure you’re using something that doesn’t use it as training data. Many (probably almost all) artists / photographers won’t appreciate that.

  • Rimu@piefed.social
    link
    fedilink
    English
    arrow-up
    2
    ·
    27 days ago

    If I were blind I’d prefer it if the app just hid all image posts from me. The alt text, when it exists, is going to be trash most of the time anyway.

  • KatherinaReichelt@feddit.org
    link
    fedilink
    English
    arrow-up
    1
    ·
    26 days ago

    I think that technology can really help us here. OCR on images is mostly solved. If you know what PaddleOCR can do, those people on Mastodon who are whining about others not including an image description for a screenshot seem really annoying. It is possible to do this directly on your computer without any costs, without the need for beefy hardware. So no need to try to force everyone else to include transcriptions for screenshot, no need to attack other people, just do it yourself and enjoy the text on the screenshot. Technology can really help us here.

    This also does kind of apply to AI image descriptions. Try it and put an image into Gemini and ask it to describe it. You will be surprised. AI can totally give you a workable description of an image. The problem here is that those AI tools can get quite expensive when you are using them a lot and that many disabled people do not have much money. So in my opinion it totally is ok to include AI image descriptions.

    I think that there are too many people in the fediverse who do not know the current state of the technology and hate AI for maybe the right reasons, but who are missing out how it could help them.