House of Fusion
Search over 2,500 ColdFusion resources here
  
Home of the ColdFusion Community

Mailing Lists
Home /  Groups /  ColdFusion Talk (CF-Talk)

Extracting text from various file-types

  << Previous Post |  RSS |  Sort Oldest First |  Sort Latest First |  Subscribe to this Group Next >> 
Top  |   Reply  |   Original Post  |   RSS Feed  |   Subscribe to this Group
Author:
Robert Rhodes
08/10/2012 06:08 PM

Hello again to all. I need a way to extract text from word, excel, text, pdf, and ppt files with Coldfusion, as the files are each submitted via a form.  The output does not have to be particularly pretty or nicely formatted -- just plain text that can be stored and searched later. Any ideas? --RR

Top  |   Parent  |   Reply  |   Original Post  |   RSS Feed  |   Subscribe to this Group
Author:
Bruce Sorge
08/10/2012 06:12 PM

Check out the CFFILE tag. That offers this type of functionality. Bruce On Aug 10, 2012, at 4:07 PM, Robert Rhodes <rrhodescf@gmail.com> wrote: ----- Excess quoted text cut - see Original Post for more -----

Top  |   Parent  |   Reply  |   Original Post  |   RSS Feed  |   Subscribe to this Group
Author:
Robert Rhodes
08/10/2012 06:48 PM

Hi Bruce.  Thanks for the reply. I did, but no luck.  On text files, I got the text just fine.  On Word docs, I got the text but with a whole bunch of garbage in the return. On ppt, pdf, and excel docs, they all come out as unreadable garbage.  I tried both the "read" and "readbinary" actions and they both did not work. Maybe I am doing something wrong? I am using CF9. -RR ----- Excess quoted text cut - see Original Post for more -----

Top  |   Parent  |   Reply  |   Original Post  |   RSS Feed  |   Subscribe to this Group
Author:
Bruce Sorge
08/10/2012 07:07 PM

For word, did you add the attribute in cffile action="readbinary"? For excel, there is a cfspreadsheet tag that will read a spreadsheet and you can put a query attribute on it and output the result. For PDF's, there is a cfpdf tag that you can use. Obviously you will have to get the file type then use cfif to tell the page which tag to use for which file. Hope this helps Bruce On Aug 10, 2012, at 4:48 PM, Robert Rhodes <rrhodescf@gmail.com> wrote: ----- Excess quoted text cut - see Original Post for more -----

Top  |   Parent  |   Reply  |   Original Post  |   RSS Feed  |   Subscribe to this Group
Author:
Leigh
08/10/2012 08:17 PM

I do not have the URL handy but take a look at Raymond Camden's blog. He wrote an entry on extracting text from MS Office documents using POI.  For PDF, use cfpf's extract text option. -Leigh


<< Previous Thread Today's Threads Next Thread >>

Search cf-talk

May 18, 2013

<<   <   Today   >   >>
Su Mo Tu We Th Fr Sa
       1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31   

Designer, Developer and mobile workflow conference