I need a VBA macro that will correctly break a paragraph into sentences.
At first sight, this seems simple:
Dim p As [login to view URL]
Dim i As Integer
Dim s As [login to view URL]
For i = 1 To [login to view URL]
s = [login to view URL](i)
Next
But this does not work correctly, as can be seen by some testing on fairly typical text, such as:
"This is some text. This is the second sentence. Some problems are:"
[maybe then followed by bullet points.]
The number os sentences and other statistics in such a paragraph does not match those shown by the Word 'statistics' tool (when that paragraph is selected).
I need a robust algorithm to do this. It should ideally work fairly efficiently, as I wish to use it on quite large corpora of documents.
The coding can be as Word VBA, Access VBA (this is my planned target environment), or even C++/C# (when I would rewrite to export an XML file).
This is probably not a lrge amount of code, once you have seen the problem, and a way of solving it. BUt I am just 'stuck' at the moment.
The payment to be agreed would depend on the efficiency (speed) of an implementation. I already have a very slow approach which works fine (functionally), but takes an unacceptably long time.
## Deliverables
1) Complete and fully-functional working code as complete source code of all work done.
2) Documentation of the soutrce code, and Word object model (or Word file structures) used.
3) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):
a) As VBA code under Word or Access (or see above).
4) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).
## Platform
Office 2003 under XP (but the O/s is not so ciritcal). The main thing is the Word object model.