Discussion:
Single source strategies / Document conversion?
Michael Kesper
2014-07-17 12:11:50 UTC
Permalink
Hi all,

I hope someone of you has got experience with this subject:
A department wants to unify its documentations. What's existing are 500+
MS Word documents written in a unified template.
What they search: a way to generate different documents from single
sources (user, teacher, support, in-line help for programs) and a
"database" that enables access via search, index and keywords.
Sounds very much like a jack of all trades to me, but maybe some of you
know a feasible approach?

Best wishes
Michael
Paul Hänsch
2014-07-17 12:57:33 UTC
Permalink
Hi Michael,

Libreoffice provides a headless mode in which it can convert between
different document formats.
Apart from that .odt as well as .docx files are just zip containers
which you can unpack to work with the contained xml code.

Converting files to plain text allows you, to index them, for a
standard text search. If you don't want to do this with Libreoffice you
can also use the tool catdoc, or in case of .odt extract the plaintext
representation of the document from the zip container.

Performing XML transitions (XSLT based or other) on the contained XML
data allows you to convert between different document templates.

Free Software provides the tools for those tasks, even for MS Word
documents ;-)
--
Paul H?nsch ?? Webmaster, System-Hacker
??????
Jabber: paul at jabber.fsfe.org ?? Free Software Foundation Europe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://mail.fsfeurope.org/pipermail/discussion/attachments/20140717/c12a952a/attachment.pgp>
Federico Bruni
2014-07-19 07:50:17 UTC
Permalink
Post by Michael Kesper
Hi all,
A department wants to unify its documentations. What's existing are 500+
MS Word documents written in a unified template.
What they search: a way to generate different documents from single
sources (user, teacher, support, in-line help for programs) and a
"database" that enables access via search, index and keywords.
Sounds very much like a jack of all trades to me, but maybe some of you
know a feasible approach?
I suggest Sphinx, even if it doesn't have inline help for programs, as far as I know. But it does have a search function and it's very flexible.
There's also a tool to convert from odt to restructuredtext:
http://sphinx-doc.org/intro.html#conversion-from-other-systems

It's used by python and django project (and many others).
Hugo Roy
2014-07-21 12:24:47 UTC
Permalink
Post by Federico Bruni
I suggest Sphinx, even if it doesn't have inline help for programs, as far as I know. But it does have a search function and it's very flexible.
http://sphinx-doc.org/intro.html#conversion-from-other-systems
I suggest you look at pandoc too.
--
Hugo Roy, Free Software Foundation Europe, <www.fsfe.org>
Deputy Coordinator, FSFE Legal Team, <www.fsfe.org/legal>
Coordinator, FSFE French Team, <www.fsfe.org/fr>

Get our monthly newsletter, sign up! <https://l.fsfe.org/nl>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <http://mail.fsfeurope.org/pipermail/discussion/attachments/20140721/e9eb9a78/attachment.pgp>
Loading...