2008年1月29日 星期二
《X档案》和《老友记》剧本CHM
http://public.box.net/bamanzi
这个账号本来是为了上传ScrapBook中保存的文摘才申请的——我在ScrapBook的网站上看见了ScrapBox.net这个扩展的扩展。
网页速度有点慢,但上传、下载还不是太差。
以后一些非技术的东西或者体积比较大点的东西就传到这里吧。技术方面的东西还是在 http://bamanzi.inlsd.org 上
2007年8月2日 星期四
做了个xulplanet的镜像chm
首先搞定的是XULPlanet:
http://bamanzi.inlsd.org/xul/xulplanet.chm

写了一个小的Python脚本(html2hhk.py),把所有的XUL Element属性/方法、XPCOM组件/接口都搜了出来转换为CHM的索引(其实这个脚本的功能是读出HTML的title和keywords meta tag作为关键字,改一改也可以输出devhelp的keyword列表)。
这个CHM文件还有些问题:
1. 目录还没有做,至少一些大的分类得列上去吧,这个东西好像没有什么简单方法;
2. 内容部分每页左边都有一个侧栏,供导航用的,这个东西在CHM里面没有作用,得用sed什么的批量处理一下。
2005年8月2日 星期二
gnochm问题的定位
索引的排序问题很好解决,只要将index那个TreeView的model换成一个可以排序的就可以了。补丁如后面所示。
索引不全的问题其实并不能怪gnochm,而是有些关键字当中有非法字符导致了HTML解析失败
chm文件中的topic和index都是sitemap格式,以HTML格式为载体的
gnochm 采用python编写,很自然地用了HTMLParser这个类来解析这个文件,但碰到上面的非法标识(The "link-selected" signal,注意这里引号是不合法的),后面的就都无法读取出来了,所以会丢掉很多关键字。而xchm就会忽略这个继续往下分析。
不能跳到archor的问题明天再来琢磨,也不知道是不是gtkhtml2的问题。
--------- 8< ---------------------
[bamanzi@saynomdk ~]$ diff -Nurp /usr/bin/gnochm gnochm
--- /usr/bin/gnochm 2005-03-18 09:27:00.000000000 +0800
+++ gnochm 2005-08-02 23:23:35.000000000 +0800
@@ -811,11 +811,13 @@ class MainApp:
# Index
self.imodel = gtk.TreeStore(gobject.TYPE_STRING,
gobject.TYPE_STRING)
+ self.isortmodel = gtk.TreeModelSort(self.imodel)
self.indexview = self.xml.get_widget('IndexTView')
- self.indexview.set_model(self.imodel)
+ self.indexview.set_model(self.isortmodel)
cell2 = gtk.CellRendererText()
column2 = gtk.TreeViewColumn('Index', cell2, text=0)
self.indexview.append_column(column2)
+ self.isortmodel.set_sort_column_id(0, gtk.SORT_ASCENDING)
# Search
self.smodel = gtk.ListStore(gobject.TYPE_STRING,
gobject.TYPE_STRING)
Comments for post
HTMLParser
nick | 03/08/2005, 13:54
Use SGMLParser then. More fault tolerant.
test with sgml
nick | 03/08/2005, 13:59
$ python /usr/lib/python2.3/sgmllib.py sitemap.html
Shows that no big deal for sgml.
sorted list
nick | 03/08/2005, 14:02
For sorted list, I'd rather sort the list before feeding to the treemodel. Should be faster for big list. Treeview is already slow in itself.
But let the treemodel do the sorting is simpler.
Re: SGMLParser
bamanzi | 03/08/2005, 17:41
Really, SGMLParser works!
Thanks!
2005年8月1日 星期一
还是xchm强 (was Re: CHM viewers总结)
2005年7月30日 星期六
glib2,gtk2,pygtk2 reference in CHM format
I modified devhelp2chm a little, to workaround a problem I found when using gnochm to read the CHM files generated by it.
And updated two CHM files generated by it:
glib2/gtk2 gtk-2.6.8, glib-2.6.5, including FAQ and tutorial
(source: package libgtk2.0-doc, libglib2.0-doc)
pygtk2 reference pygtk2ref 2.6.0, and GtkSourceView, GtkSpell, GnomePrint, GnomePrintUI, GtkMozembed
(source: pygtk2 website)
----------------
How to build:
gtk2.chm:
apt-get install libglib2.0-doc libgtk2.0-doc
cd /usr/share/gtk-doc/html/
DIRS="gtk/ gdk/ gdk-pixbuf/ gtk-faq/ gtk-tutorial/ glib/ gobject/"
find $DIRS -name '*.devhelp.gz' | xargs gunzip
for d in $DIRS; do
(cd $d;
echo $d
for f in *.html; do
sed 's#/usr/share/gtk-doc/html/#../#' $f > $f.tmp
mv $f.tmp $f
done)
done
find $DIRS -name '*.devhelp | xargs ~/bin/devhelp2chm-v2.sh
-p gtk2 -T "GTK+ Reference Manual" -t gtk/index.html
...Then use HtmlHelp Workshop to build it.
pygtk2ref.chm
wget http://www.pygtk.org/dist/pygtk2reference.tbz2tar jxf pygtk2reference.tbz2
find . -name '*.devhelp' | xargs ~/bin/devhelp2chm-v2.sh
-p pygtk2ref -T "PyGTK2 Reference" -t pygtk2reference/index.html
2005年6月25日 星期六
Debian Reference CHM version (link?)
Created from Debian Reference 06/22/05.
I wrote a simple script to create the HtmlHelp project files.
2005年6月18日 星期六
总结: CHM viewers
最近又发现了几个,索性就来总结一下吧。(Nearly all are based on chmlib.)
| Viewer | Requires | CJK support | Project Description | Comments | |
| xchm | wxWidget,chmlib | fair | "xCHM is a .chmviewer for UNIX (Linux, *BSD, Solaris). Success stories of xCHM on Mac OS X have also been received, and apparently xCHM even works if compiled under the Cygwin environment in Windows." | ||
| gnochm | GNOME,python-gnome | good |
| ||
| chmsee | gtk2,gnome-vfs2, gtkhtml3 | "只支持简体和英文编码" | " ChmSee是一个浏览CHM文件的程序,但只支持简体和英文编码的CHM文件,其它编码暂不支持." | 国人开发(作者忘了在主页上留自己的名字了:0) | |
| arCHMage | chmlib,python | good |
| Actually it is not a real viewer. It is a HTTP server. You need a web browser to view the pages. | |
| kio_chm | KDE3,chmlib | good |
| kio_chm is a plugin for KDevelop, but when installed, you can view CHM files in konquorer. | |
| kchm | chmlib, KDevelop3(kio_chm), Qt3 | ? | "KCHM provides access to MS .chm files (help files) using Chmlib and Qt and KDE libraries. You can read your favourite ebooks on your Linux box!" | Just a UI front-end for kio_chm. UI written in Qt3. | |
| kchmnew | KDE | ? | "This is a chm file viewer + corresponding kpart and kio slave for KDE. It based on libchm and libchm++." | ||
| kchmviewer | chmlib, Qt3 | good | "kchmviewer is a CHM (Winhelp) files viewer written on Qt/KDE. It can be build as a standalone Qt-based application, or a KDE application. The main point of kchmviewer is compatibility with non-English chm files, including most international charsets." | ||
| chmviewer | wxGTK, libmspack | Dead project? Seems no longer active | |||
| chm_viewer | ? |
| Another chmviewer. Dead project? |
I prefer to gnochm, as the UI fits better in the GNOME desktop. As a minimalist, and taken CJK support into account, xchm and kchmviewer seems to be a good choice. If you don't care the UI, then choose archmage.
Where to download CHM books for GNU tools:
- http://lidn.sourceforge.net
- http://htmlhelp.berlios.de/ (CHM books just updated on May 31)
How to:
- How to convert DevHelp books into CHM format (devhelp2chm, written by myself :-)
- How to convert CHM book into DevHelp format
- How to convert TexInfo document into CHm format
- How to compile CHM file in Linux (with Wine + HHW)
2005年4月30日 星期六
Friends剧本CHM版本
不一定能够长期保留,要下载趁早。

是用脚本转的(见前面的帖子I, II, III),当初学习Dive Into Python里面HTML Processing一章练手写的,(原来的Word文档)
p.s Part II的脚本有点问题,因为后来发现后面有些episode不是<hr>分隔的,改用"End"和"The End"划分准确性大一点,但还是有几个需要手工分割。脚本已经更新到下载区friends-split2.py。
2005年4月8日 星期五
Convert CHM contents to normal HTML contents
I have some eBooks (CHM format or SRM format). Now I want to copy them to my cellphone. As CHM or SRM format could not be supported, thus I choose PalmDoc (.pdb) format.
Yes,I can convert a pack of HTML files into one PDB file. But:
1) Some CHM books don't have a content page. They use the CHM contents. With a content page, browsing the result PDB file would be not a happy experience.
2) SRM books could be exported as CHM files. All of them
don't have a content page either.
Then came this simple recipe.
I remember two or three years ago, I used to do these things in Perl. Perl's regex feature is so powerful. The only problem is that after a few days, the script seems to be unreadable. :-(
Python is differenent than Perl. This recipe is so simple, isn't it?
#!env python
from sgmllib import SGMLParser
import htmlentitydefs
from chmmaker import HHCWriter
import os
class SiteMapParser(SGMLParser):
def reset(self):
SGMLParser.reset(self)
# some temp variables
self.level = 0
self.link_url = ""
self.link_title = ""
def start_ul(self, attrs):
self.on_section_starts()
def end_ul(self):
self.on_section_ends()
def start_param(self, attrs):
if len(attrs)>1:
if attrs[0][0]=='name':
if attrs[0][1]=='Name':
self.link_title=attrs[1][1]
elif attrs[0][1]=="Local":
self.link_url=attrs[1][1]
def start_object(self, attrs):
self.link_title = ""
self.link_url = ""
def end_object(self):
self.on_link_found(self.link_title, self.link_url)
def on_section_starts(self):
self.level = self.level + 1
def on_section_ends(self):
self.level = self.level - 1
def on_link_found(self, title, url):
# you can override this
if title and url:
print " " * self.level + "%s [%s]" % (title, url)
class ContentParser(SiteMapParser):
""" A simple class to convert CHM contents (foo.hhc) to a normal HTML contents """
def reset(self):
print "<HTML><HEAD></HEAD><BODY>"
SiteMapParser.reset(self)
def on_section_starts(self):
print "<ul>"
def on_section_ends(self):
print "</ul>"
def on_link_found(self, title, url):
print '<li><a href="%s">%s</a></li>' % (url, title)
if __name__=='__main__':
import sys
if len(sys.argv)<2:
print "Usage: %s foo.hhc" % sys.argv[0]
sys.exit()
trans=ContentParser()
fh=open(sys.argv[1], "r")
try:
trans.feed(fh.read())
except:
pass
trans.close()
fh.close()
# vim:expandtab softtabstop=4
Powered by ScribeFire.
2005年3月28日 星期一
搞到Knoppix Hacks一书电子件
2005年3月10日 星期四
QuickCHM这个工具还不错
如果原来发现这个,就不必编写python脚本来自己提取html的标题生成hhp/hhc了
做X Files和Friends剧本的CHM版本也方便多了。
Old posts:
How to convert Friends.doc into a CHM file
Create a CHM file for PyGTK2 Tutorial
2004年11月18日 星期四
How to convert Friends.doc into a CHM file (1)
1. Open Friends.doc with MS Word and save it into a html file (friends.html). I used Office XP,
and due the garbage info added by M Word, the output file is about 28M!
2. Write a simple script (friends-diet.html) to get rid of the garbage attributes
generated by evil M word, include 'class', 'style' and 'lang' etc.
This would cut the size 75% off!
#!/bin/python
from sgmllib import SGMLParser
import htmlentitydefs
import os, sys
class FriendsDiet(SGMLParser):
def reset(self):
self.output=open("friends-thin.html", "w"[img]/images/wink.gif[/img]
SGMLParser.reset(self)
def unknown_starttag(self, tag, attrs):
if tag=='p':
self.output.write("
n"[img]/images/wink.gif[/img]
elif tag=='span' or tag=='o':
pass
elif tag=='o:SmartTagType' or tag=='SmartTagType':
print "Ignore",tag
pass
else:
strattrs=""
for key, value in attrs:
if (key!='class' and key!='style' and key!='lang' and key[0:5]!='xmlns'):
strattrs = strattrs + ' %s="%s"' % (key, value)
self.output.write("<%s%s>" % (tag, strattrs))
if tag=='body':
self.output.write('n')
def unknown_endtag(self, tag):
if tag!='span' and tag!='p' and tag[0:2]!='o:':
self.output.write("%s>n" % tag)
def handle_data(self, text):
if text.strip()!='':
self.output.write(text+"n"[img]/images/wink.gif[/img]
def handle_charref(self, ref):
self.output.write("&#%s;" % ref)
def handle_entityref(self, ref):
semicolon=""
if htmlentitydefs.entitydefs.has_key(ref):
semicolon=";"
self.output.write("&%s%s" % (ref, semicolon))
if __name__=='__main__':
import sys
parser=FriendsDiet()
#fh=open(sys.argv[1], "r"[img]/images/wink.gif[/img]
fh=open("friends.htm", "r"[img]/images/wink.gif[/img]
content=fh.read()
parser.feed(content)
fh.close()
parser.close()
2004年11月5日 星期五
Create a CHM file for PyGTK2 Tutorial
[1] http://www.pygtk.org/dist/pygtk2tutorial.tgz
The main problem is to parse the contents from index.html write then into a HHC file for MS HtmlHelp Workshop.
You can use this simple script to do this (see below)
pygtk-index2hhc.py pygtk2tut.hhc index.html
P.S:
As to the PyGTK2 Reference[2], you can download the devhelp index file[3], and then use my devhelp2chm.sh[4]to get the .HHP, .HHC and .HHK.
[2] http://www.pygtk.org/dist/pygtk2reference.tbz2
[3] http://www.moeraki.com/pygtkreference/pygtk2reference.devhelp.gz
[4] http://www.linuxeden.com/forum/blog/resserver.php?blogId=110848&resource=devhelp2chm.sh
#!env python from sgmllib import SGMLParser import htmlentitydefs from chmmaker import HHCWriter import os class ContentParser(SGMLParser): def __init__(self, outputfile): self.hhcwriter = HHCWriter(outputfile) self.hhcwriter.print_header() SGMLParser.__init__(self) def __del__(self): self.hhcwriter.print_footer() def reset(self): SGMLParser.reset(self) # some temp variables self.level=0 self.title = self.url = "" self.in_href = False def start_dd(self, attrs):
self.hhcwriter.hhcfile.write("<UL>")
def end_dd(self):
self.hhcwriter.hhcfile.write("</UL>")
def start_a(self, attrs): for attr in attrs: if attr[0].lower()=='href': self.in_href = True self.title = "" self.url = attr[1] break def end_a(self): if self.in_href and self.title and self.url: self.on_href(self.title.replace('"', ""), self.url) self.in_href = False self.title = self.url = "" def handle_data(self, text): self.title += text def handle_charref(self, ref): if self.in_href: self.title += "&#%(ref)s;" % locals() def handle_entityref(self, ref): if self.in_href: self.title += "&%(ref)s" % locals() if htmlentitydefs.entitydefs.has_key(ref): self.title += ";" def on_href(self, title, url): target=url if url[-3:]=='fig': target= "" if title and target: self.hhcwriter.add_topic(title, target) if __name__=='__main__': import sys if len(sys.argv)<3: print "Usage: %s hhcfilename index.html" % sys.argv[0] sys.exit() trans=ContentParser(sys.argv[1]) fh=open(sys.argv[2], "r"[img]/images/wink.gif[/img] try: trans.feed(fh.read()) except: pass trans.close() fh.close()
# vim:expandtab softtabstop=4
Powered by ScribeFire.
2004年9月20日 星期一
制作CHM文件时的未公开的选项
添加MSDN菜单:
window定义中将style参数(第一个0x....数值)加上0x10000(比如0x23520->0X33520)
添加字体按钮:
window定义中将buttons参数(第二个0x....数值)加上0x100000(比如0x24385e->0x34385e)
工具条按钮不显示文字:
window定义中将style参数(第一个0x....数值)加上0x40(比如0x23520->0X23560)
[@more@]
工具条按钮说明(带括号者为隐藏选项,可能部分是过时的选项, 至少现在的HtmlHelp Workshop没有提供)
Hide/Show 0x0002
Back 0x0004
Forward 0x0008
Stop 0x0010
Refresh 0x0020
Home 0x0040
(Next) 0x0080 下一步, 不知何用
(Prev) 0x0100 上一步, 不知何用
(Notepad) 0x0200 便笺, 似乎没有作用
(Contents) 0x0400 目录,似乎没有作用
Locate 0x0800
Options 0x1000
Print 0x2000
(Index) 0x4000 索引,似乎没有作用
(Search) 0x8000 搜索,似乎没有作用
(History) 0x010000 历史,似乎没有作用
(Bookmark) 0x020000 书签,似乎没有作用
Jump1 0x040000
Jump2 0x080000
(Fonts) 0x100000 字体
(Next) 0x200000 上一步,不知何用
(Prev) 0x400000 下一步,不知何用
似乎唯一有用的也就是“字体”按钮了
标签: chm
订阅 帖子 [Atom]