Extract Text from PowerPoint Presentation in C#, VB.NET

As we work, we often use PowerPoint Presentation to help us finish our projects. But sometimes we need other formats to meet diffrent work needs. In such a case, you may run into situations where you want to extract text from PowerPoint Presentation to other applications like Microsoft Word or WordPad to reduce its size. Through the slide.GetAllTextFrame()method provided by Spire.Presentation for .NET, it allows you to extract text from Table, TextBox, shape, shapeGroup, and symbols. You can extract text from the whole PowerPoint presentation. This article will show how to extract text from PowerPoint Presentation in C#, VB.NET from the following two parts.

Extract Text from PowerPoint Presentation to WordPad

Extract Text from Whole PowerPoint Presentation

Install Spire.Doc for .NET

To begin with, you need to add the DLL files included in the Spire.Doc for.NET package as references in your .NET project. The DLLs files can be either downloaded from this link or installed via NuGet .

PM> Install-Package Spire.Doc

 

Extract Text from PowerPoint Presentation to WordPad

The following are the steps to perform this operation.

Specific Steps:

l  Create a new instance of Presentation and load the sample PowerPoint file.

l  Initialize a new instance of StringBuilder class, append extracted text from Presentation to StringBuilder.

l  Create a new .txt file and write in the specified string text.

full code :

[C#]

using Spire.Presentation;
using System;
using System.Diagnostics;
using System.IO;
using System.Text;
namespace ExtractText
{
    class program
    {
        static void Main(string[] args)
        {
            Presentation presentation = new Presentation("sample.pptx", FileFormat.Pptx2010);
            StringBuilder sb = new StringBuilder();
            foreach (ISlide slide in presentation.Slides)
            {
                foreach (IShape shape in slide.Shapes)
                {
                    if (shape is IAutoShape)
                    {
                        foreach (TextParagraph tp in (shape as IAutoShape).TextFrame.Paragraphs)
                        {
                            sb.Append(tp.Text + Environment.NewLine);
                        }
                    }
 
                }
 
            }
            File.WriteAllText("target1.txt", sb.ToString());
            Process.Start("target1.txt");
        }
    }
}

[VB.NET]

Imports Spire. Presentation
Import's system. diagnostics
Imports System.IO
Imports System.Text
Namespace ExtractText
        Class program
               Private Shared Sub Main(args As String())
                       Dim presentation As New Presentation("sample.pptx", FileFormat.Pptx2010)
                       Dim sb As New StringBuilder()
                       For Each slide As ISlide In presentation.Slides
                               For Each shape As IShape In slide.Shapes
                                      If TypeOf shape Is IAutoShape Then
                                              For Each tp As TextParagraph In TryCast(shape, IAutoShape).TextFrame.Paragraphs
                                                     sb.Append(tp.Text + Environment.NewLine)
                                              Next
 
                                      End If
 
                               Next
                       Next
                       File.WriteAllText("target1.txt", sb.ToString())
                       Process.Start("target1.txt")
               end sub
        end class
end namespace
 

The input PowerPoint document:


The output PowerPoint document:


Extract Text from Whole PowerPoint Presentation

The following are the steps to perform this operation.

Specific Steps:

l  Create a new instance of Presentation and load the sample PowerPoint file.

l   Instantiate a StringBuilder object

l   Using slide.GetAllTextFrame()method to get Text content and append extracted Text from Presentation to StringBuilder.

l   Write the extracted Text in .txt and save it to a local path.

full code:

[C#]

using Spire.Presentation;

using System;

using System.Collections;

using System.IO;

using System.Text;

 

namespace ExtractText

{

    class program

    {

        static void Main( string [] args)

        {

            //Create a PPT document

            Presentation ppt = new Presentation();

 

            //Load the PPT document

            ppt.LoadFromFile( "Blue2.pptx" , FileFormat.Pptx2010);

 

            // Instantiate a StringBuilder object

            StringBuilder sb = new StringBuilder();

 

            foreach (ISlide slide in ppt.Slides)

            {

                ArrayList arrayList = slide.GetAllTextFrame();

                foreach (String Text in arrayList)

                {

                    Console.Write(Text);

                    sb.Append(Text + Environment.NewLine);

                }

            }

 

            // Write the extracted text in .txt and save it to a local path

            System.IO.File.WriteAllText( "target.txt" , sb.ToString());

        }

    }

}

[VB.NET]

Imports  Spire.Presentation
Imports  System
Imports  System.Collections
Imports  System.IO
Imports  System.Text

Namespace  ExtractText
    
    
Class  Program
        
        
Private Shared Sub  Main( ByVal  args()  As String )
            
'Create a PPT document
            
Dim  ppt  As  Presentation  = New  Presentation
            
'Load the PPT document
            
ppt.LoadFromFile( "Blue2.pptx" , FileFormat.Pptx2010)
            
' Instantiate a StringBuilder object
            
Dim  sb  As  StringBuilder  = New  StringBuilder
            
For Each  slide  As  ISlide  In  ppt.Slides
                
Dim  arrayList  As  ArrayList  slide.GetAllTextFrame
                
For Each  Text  As String In  arrayList
                    Console.Write(Text)
                    sb.Append((Text + Environment.NewLine))
                
Next
            Next
            
' Write the extracted Text in .txt and save it to a local path
            
System.IO.File.WriteAllText( "target.txt" , sb.ToString)
        
End Sub
    End Class
end namespace

 

The input PowerPoint document:

The output PowerPoint document:


Conclusion:

              In this article, we introduce the method of Extracting Text from PowerPoint Presentation. In addition, we also have other functions, such as  Extract Text from a Specific Rectangular Area  ,  Extract Image From PDF  ,  Extract Comments from Word Document and Save in TXT File  , etc.  Apart from that, if you'd like to learn more, you can visit theSpire.Doc Program Guide Content for .NETto explore more about for Spire.Doc for .NET.


Comments

Popular posts from this blog

How to Convert OpenDocument Presentation (.odp) to PDF via Java Application

Java: How to encrypt or decrypt PDF documents?

How to Change Font Color in Word via Java