Sunday, October 6, 2013

Extracting text from Photoshop (PSD) images

I just needed to grab text from some Photoshop images and put it into a LaTeX document. But, how do I get that text on Linux? GIMP can't do it, nor can any other program I tried (they all rasterize the text). Fortunately the text is just saved in plain in the .psd file, so it's relatively easy to get it out of there. However it's in utf-16, so with just grep it's quite painful. Here's a little Python script which finds and decodes text from psd images (inline version below). You use it by passing it the .psd file, and the beginning of the text you want to find, and it'll give you the rest.
python psdtextextract.py myimage.psd "This is"
will result in "This is a text layer." if that's the text your layer indeed contains. To find a sample of the text you can use any image viewer such as gwenview. Quite stupid, I know, but it works ;)

Inline source code:
#!/usr/bin/env python
# -*- coding:utf-8 -*-

from codecs import encode, decode

def get_next_occurence(text, buf, start):
    text = encode(text, "utf-16")[2:] # cut BOM
    index = buf.find(text, start)
    if index == -1:
        return b"", -1
    end = buf.find(b'\x00\x00', index)
    if end == -1:
        return b"", -1
    chunk = b'\x00' + buf[index:end + 2]
    return chunk.replace(b'\x5C', b''), end

def get_all_occurences(text, buf):
    start = 0
    items = []
    while start != -1:
        try:
            found, start = get_next_occurence(text, buf, start)
            items.append(decode(found, "utf-16be").replace('\r', '\n'))
        except UnicodeDecodeError as e:
            continue
            print(" - undecodable match skipped:", e)

    return items

if __name__ == '__main__':
    import sys
    if len(sys.argv) > 4 or len(sys.argv) < 3:
        print("Usage: psdtextextract.py file.psd pattern [display-all-matches]")
    with open(sys.argv[1], 'rb') as f:
        items = get_all_occurences(sys.argv[2], f.read())
        if "display-all-matches" not in sys.argv:
            print(items[0])
        else:
            for item in items:
                print(item)

4 comments:

  1. Thank you very much for this trick, this will make my life WAY easier !! Very nice blog. Cheers !

    ReplyDelete
  2. Works like a charm on Windows too. Thank you!

    ReplyDelete
  3. This is awesome, and it's very simple with GIMP. I usually do everything with .psd files like this: http://www.paintshoppro.com/en/pages/psd-file/ but I tried your method and it works great, and very easy. Thank you for the tip!

    ReplyDelete

Note: Only a member of this blog may post a comment.