How to split Excel table into CSV files in .doc by Bold text?
1
vote
1
answer
541
views
You have 777 .doc files where each .doc file contains a big Excel table, like one here and in Fig. 1.
Here, only consider one .doc file.
I want to divide the Excel table of .doc file into CSV files by any Unix programming language and/or scripting.
I cannot find a way to handle Microsoft fileformats into CSV files.
Pseudocode:
1. Extract Excel table from .doc file, which is expanded in the thread How to extract many .doc text + tabular elements into CSV by any Unix tool?
2. Split Excel table (maybe convert here already to CSV) into separate .CSV files by Rule:
> *new bolding indicates a new table* i.e. a new CSV file.
3. Apply implicit columns *Location* (bottom/top) and *Date* (dd.mm.yyyy) in the first two lines of the .doc file on the each separate CSV file. Use *Time* column (morning/evening/night).
Target files with their columns by Rule
1. Assisstants.csv - Name, Date, Location, Time
2. Other.Assistants.csv - Name, Date, Location, Time
3. General.csv - Event, Date, Location, Time
Fig. 1 Example of Excel Table in .doc file
OS: Linux Debian Stretch 9 and others
Data: .odt file here

Asked by Léo Léopold Hertz 준영
(7138 rep)
Oct 23, 2017, 11:56 AM
Last activity: Oct 27, 2017, 05:41 PM
Last activity: Oct 27, 2017, 05:41 PM