The code below can extract HTML text from an MHT file:
Sent by FoxWeb Support on 01/24/2007 08:27:22 PM:
By the way, if you index mht files, your index will not only contain page text, but also other "words" that are included in .mht files, because the Full Text class is not aware of this format and does not know how to filter out all the garbage.
Sent by jason williams on 01/24/2007 03:03:46 PM:
I am saving html pages from a blog site to a memo field in a local database. Originally, I saved the html file as a .mht file using CDO. For whatever reason, the full text tool can not successfully index the mht file - it gives a data type error at some point. The workaround I was trying to implement involved just saving the html (ie automation innertext) and to index and search on that. The index creation works and simple searches work, but, as the number of records increases, some searches take forever( but do eventually finish). Are there memo file size limitations to this tool or other memory limitations of some type? When I test the sample dbf everything seems to work perfectly which would seem to indicate it's a problem either with the format I'm using or its size.
I can send a small zip file to you that has examples of what I'm trying to describe.
Ideally, I would be able to index and search just on the the .mht memo field. This would save quite a bit of space - it would be great if there was a fix for that.