View Complete Thread | FoxWeb Forum Home
Search:
Date:    Msg ID:   
From:    Thread:   
Subject:   
Microsoft's mht files don't just contain HTML code, but also a base64-encoded version of all images, as well as Javascript, StyleSheet and other files, referenced by the page.  It's possible that the base64 parts with their large continuous data blocks are confusing the full text class, but we would have to verify this.  Please send the file you mentioned to support@foxweb.com.
 
By the way, if you index mht files, your index will not only contain page text, but also other "words" that are included in .mht files, because the Full Text class is not aware of this format and does not know how to filter out all the garbage.
FoxWeb Support Team
support@foxweb.com email
Sent by jason williams on 01/24/2007 03:03:46 PM:
I am saving html pages from a blog site to a memo field in a local database.  Originally, I saved the html file as a .mht file using CDO.  For whatever reason, the full text tool can not successfully index the mht file - it gives a data type error at some point.  The workaround I was trying to implement involved just saving the html (ie automation innertext) and to index and search on that.  The index creation works and simple searches work, but, as the number of records increases, some searches take forever( but do eventually finish).  Are there memo file size limitations to this tool or other memory limitations of some type?  When I test the sample dbf everything seems to work perfectly which would seem to indicate it's a problem either with the format I'm using or its size.
I can send a small zip file to you that has examples of what I'm trying to describe.
Ideally, I would be able to index and search just on the the .mht memo field.  This would save quite a bit of space - it would be great if there was a fix for that.
Thanks!!