Announcement

Collapse
No announcement yet.

Any PDF pro's out there - how to make a PDF searchable

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Any PDF pro's out there - how to make a PDF searchable

    We have an issue when creating PDF/A's from existing PDF documents.

    We create several documents using MS-Word, Excel, CAD applications etc. When then create PDF's by either using the built-in PDF export/save as function (e.g. within Word 2013), or we use CutePDF Writer which is used as a PDF 'printer'.

    The PDF's are then merged together using CutePDF Professional. The resulting PDF is fully searchable.

    We then use Adobe Acrobat XI to convert the PDF to a PDF/A standard file.

    After doing this, although the file looks fine, the document is not searchable which defeats the point of creating an archival version of the file. When you copy text from the PDF and paste it into another document you get garbage:  $
     
     %9%  
     2

         @ . +++
    * := ;6( )-

    If you try to search, nothing is found because the text is not as shown, but instead comprises the characters seen above.

    Does anyone know how to fix this, please?

    I've already asked on Adobe's forums but have had no response in the month since the question was originally posted.

    Thanks.


    [Edit]

    I forgot to mention that if we try to use an older version of Acrobat (9), the PDF's are converted fine - except for one so far!
    Last edited by Blood; 18th May 2016, 12:52.
    A recent poll suggests that 6 out of 7 dwarfs are not happy

  • #2
    SiIly question, but if you've got Adobe Acrobat XI, why use CutePDF at all? It sounds like there may be compatibility issues between the various applications. Have you tried doing everything after the 'Office - save as' step with only Adobe XI? It may be related to the initial export from the Office apps, or anything in between. Try to remove as many variables as you can to see if the end behavior changes. Have you verified that the searchable behavior works when just the 'Office - save as' is done, and then the opening the file in Adobe Reader DC, for example?
    *RicklesP*
    MSCA (2003/XP), Security+, CCNA

    ** Remember: credit where credit is due, and reputation points as appropriate **

    Comment


    • #3
      We only have one license for Acrobat.

      We have lots of staff creating these reports so they use CutePDF to create the PDF versions. The person who uses the computer with Acrobat on it does not have time to do this. This is why we purchased the cheaper CutePDF Professional program for the other staff. It has worked fine for many years.

      It is just during the last few months that this issue has surfaced and it only affects a few PDF's (<< apologies - important info I left out. Sorry).

      So, the reports are created using CutePDF Professional and all of them are searchable. When using Acrobat to convert them to PDF/A most are searchable but a few are not.

      Every PDF/A created using Acrobat V11 is now not searchable. Just a few created using V9 are not searchable.

      I have read that this can be caused by the font not being properly referenced and that the PDF/A process cannot convert it because of a lack of metadata. It is this lack that produces the garbage when copying/pasting etc.

      A recent poll suggests that 6 out of 7 dwarfs are not happy

      Comment


      • #4
        I've just had a quick look at the definition of 'PDF/A' and it includes this: "PDF/A differs from PDF by prohibiting features ill-suited to long-term archiving, such as font linking (as opposed to font embedding). The ISO requirements for PDF/A file viewers include color management guidelines, support for embedded fonts, and a user interface for reading embedded annotations." If the expected metadata about the fonts is missing, is it because Word never put it in, or because CutePDF doesn't support it? How certain are you that all the necessary prefs are set correctly in the Acrobat XI install you're using? How long ago did you install Acrobat XI vs how long ago did this issue begin to manifest?

        When you say that "...reports are created using CutePDF Professional and all of them are searchable", did you mean searchable by CutePDF, or by Adobe Reader, or by Acrobat XI before the PDF/A creation, or what? I get that at some stage it works but then stops working, the trick is nailing down the where/when it stops. Have you tried an alternate font on any doc which turns out non-searchable the first time? Were any of the V9 non-searchable results using the same fonts as the XI version results?

        Hold the press: Just had a look for this kind of behavior, and found a few that talked about fonts. The security in Adobe was such that it should be searchable, but wasn't. Try a non-searchable doc in Acrobat XI: go to 'Tools - Edit document text' then on each page Ctrl-A (select all) and pick a different font. Then save as the PDF/A and try the search again. Report back.
        Last edited by RicklesP; 23rd May 2016, 20:01. Reason: Follow-up guidance.
        *RicklesP*
        MSCA (2003/XP), Security+, CCNA

        ** Remember: credit where credit is due, and reputation points as appropriate **

        Comment


        • #5
          Thanks a lot, RicklesP. I will give that a go - am bogged down with other things at the moment but will get back to you within a couple of days.
          A recent poll suggests that 6 out of 7 dwarfs are not happy

          Comment


          • #6
            Thanks for the suggestion but that has not worked. I've been looking into this a bit more and can't understand what is happening. I'm going to have to check everyone's settings - I just converted one of the problematic report documents to PDF, converted that to PDF/a and it is fine. Having read much more about it, I'm thinking this problem stems from the initial print job (substitute device font/download softfont etc.). I'll report back if I solve it.
            A recent poll suggests that 6 out of 7 dwarfs are not happy

            Comment


            • #7
              We await with baited breath....
              *RicklesP*
              MSCA (2003/XP), Security+, CCNA

              ** Remember: credit where credit is due, and reputation points as appropriate **

              Comment

              Working...
              X