Unix & Linux Stack Exchange

Q&A for users of Linux, FreeBSD and other Unix-like operating systems

Latest Questions

45 votes

5 answers

159441 views

Converting a UTF-8 file to ASCII (best-effort)

character-encoding text natural-language

I have a file in UTF-8 that contains texts in multiple languages. A lot of it are people's names. I need to convert it to ASCII and I need the result to look as decent as possible. There are many ways how to approach converting from a wider encoding to a narrower one. The simplest transformation wou...

                                  I have a file in UTF-8 that contains texts in multiple languages. A lot of it are people's names. I need to convert it to ASCII and I need the result to look as decent as possible.

There are many ways how to approach converting from a wider encoding to a narrower one. The simplest transformation would be to replace all non-ASCII characters with some placeholder, like '_'. If I know the language the file is written in, there are additional possibilities, like romanization.

What Unix tool or programming language library available on Unix can give me a decent (best-effort) conversion from UTF-8 to ASCII?

Most of the text is in European, latin type based languages.

user7610 (2188 rep)

Dec 6, 2014, 04:53 PM • Last activity: Apr 18, 2023, 09:38 AM

2 votes

1 answers

449 views

Text prediction in Linux while typing like on Android, iOS and Windows

linux text-processing keyboard autocorrection natural-language

I have noticed the utility of this feature while typing on Android devices (notably with Gboard virtual keyboard) [![enter image description here][1]][1] The same is available on iOS, in iPhones and iPads. I don't mean the use of a virtual keyboard on Linux, but the presence of a "suggestion strip"...

                                  I have noticed the utility of this feature while typing on Android devices (notably with Gboard virtual keyboard)

The same is available on iOS, in iPhones and iPads.

I don't mean the use of a virtual keyboard on Linux, but the presence of a "suggestion strip" on the screen, as it is called in Gboard, while typing in Linux, no matter the virtual or physical character of the keyboard.

Is this possible to have that with any text editor or at least some of them? 

Windows 10 already has that it seems .

cipricus (1779 rep)

Feb 15, 2021, 02:38 PM • Last activity: Nov 1, 2022, 09:47 PM

5 votes

3 answers

766 views

Frequency of words in non-English language text: how can I merge singular and plural forms etc.?

shell-script text-processing sed portability natural-language

I'm sorting *French* language words in some text files according to *frequency* with a focus on *insight* rather than statistical significance. The challenge is about preserving accented characters and dealing with the [article forms][1] in front of vowels(`l'`, `d'`) in the context of shaping word...

                                  I'm sorting *French* language words in some text files according to *frequency* with a focus on *insight* rather than statistical significance. The challenge is about preserving accented characters and dealing with the article forms  in front of vowels(l', d') in the context of shaping word tokens for sorting.

The topic of the most frequent  words in a file takes many shapes( 1  | 2  | 3  | 4 ). So I put together this function using *GNU* utilities:

    compt1 () {
    for i in *.txt; do
    	echo "File: $i"
    	sed -e 's/ /\
    /g' 1. I cannot provide source data but I can provide this  file as an example. The words *heure* and *enfant* in the text provide an example. The former appears twice in the text including once as "l'heure", and helps validating if the command works or not. The latter appears in both singular and plural forms(*enfant*/*enfants*) and would benefit from being merged here.

user44370

Jul 19, 2014, 01:59 PM • Last activity: Jan 22, 2019, 07:58 PM

7 votes

1 answers

1708 views

Is there a Unix command that searches for similar strings, based mostly on how they sound when spoken?

search text pattern-matching natural-language

I have a file of names, and I want to search within it, not caring too much about whether I have spelled the name ( that I am searching for ) correctly. I know that `grep` has quite a bit of functionality to search for a whole slew of similar strings within a file or stream, but as far as I am aware...

                                  I have a file of names, and I want to search within it, not caring too much about whether I have spelled the name ( that I am searching for ) correctly.  I know that grep has quite a bit of functionality to search for a whole slew of similar strings within a file or stream, but as far as I am aware, it does not have functionality to correct for spelling errors, and even if it did, since these are names of people, they wouldn't be found inside a standard dictionary.  

Perhaps I can make my file of names into a special dictionary, and then use some standard spell checking tool?  Of particular importance in this application is the ability to match similarly sounding words.

For example: "jacob" should return "Jakob".  Even better would be if inter-language similarities were also accounted for, so that "miguel" should match "Michael".

Is this something that has been implemented already, or will I have to build my own?

gabkdlly (173 rep)

Jun 14, 2013, 09:20 AM • Last activity: Jan 8, 2019, 07:07 AM

1 votes

1 answers

1077 views

Linux Mint doesn't write Arabic letters

fonts character-encoding natural-language

I installed arabic fonts for Linux mint and i can switch between arabic and English, but it seems that mint cannot write arabic letters for example renaming a file or writing in any text-editor, when typing in arabic nothing appears. Note that it can read arabic letters and i was writting and readin...

                                  I installed arabic fonts for Linux mint and i can switch between arabic and English, but it seems that mint cannot write arabic letters for example renaming a file or writing in any text-editor, when typing in arabic nothing appears. Note that it can read arabic letters and i was writting and reading arabic in other distributions like ubuntu or Gnome. What is the problem?
                                

Mark Mamdouh (11 rep)

Oct 18, 2016, 08:18 PM • Last activity: Oct 19, 2016, 08:19 AM

0 votes

1 answers

724 views

Indian Languages not available on Libreoffice Impress 5

fonts libreoffice natural-language

I need to use Indian Language 'Marathi' to prepare a presentation. But there is no Marathi or Hindi to set as document language. Probably therefore even when I try to change the properties (e.g., font size) of the Marathi text I enter, things happen like the font won't get bigger even if I try to ch...

                                  I need to use Indian Language 'Marathi' to prepare a presentation. But there is no Marathi or Hindi to set as document language. Probably therefore even when I try to change the properties (e.g., font size) of the Marathi text I enter, things happen like the font won't get bigger even if I try to change it. How can I deal with it?
                                

Ganesh Birajdar (3 rep)

Jul 21, 2016, 05:12 AM • Last activity: Jul 21, 2016, 07:20 PM

4 votes

1 answers

6362 views

How to make this conky (Conky Vision) use other language than English?

linux ubuntu elementary-os conky natural-language

I want to use this conky script: [Conky Vision][1] ![enter image description here][2] But I don't want the days of the week to be displayed in English. When I change my locale to another language, the day of today is displayed in that language but the 5-day names from the lower part of the image are...

                                  I want to use this conky script: Conky Vision 



But I don't want the days of the week to be displayed in English.

When I change my locale to another language, the day of today is displayed in that language but the 5-day names from the lower part of the image are always in English, even if I change the system language to something different.

I have also changed the system language but those days are still displayed in English.

What changes should I make to that script for it to follow the language I want?

The conkyrc file has this content:

    # Conky settings #
    background yes
    update_interval 1
    double_buffer yes
    no_buffers yes
    
    # Window specifications #
    gap_x 0
    gap_y 0
    alignment middle_middle
    minimum_size 600 460
    maximum_width 600
    own_window yes
    own_window_type normal
    own_window_transparent yes
    own_window_hints undecorate,sticky,skip_taskbar,skip_pager,below
    own_window_argb_visual yes
    own_window_argb_value 255
    #border_margin 0
    #border_inner_margin 0
    #border_outer_margin 0
    
    # Graphics settings #
    draw_shades no
    draw_outline no 
    draw_borders no
    draw_graph_borders no
    
    # Text settings #
    use_xft yes
    xftalpha 0
    xftfont Raleway:size=10
    
    override_utf8_locale yes
    
    imlib_cache_size 0
    
    # Color scheme #
    default_color FFFFFF
    
    color1 FFFFFF
    
    TEXT
    \
    #-----WOIED-----#
    \
    \
    ${execi 300 curl -s "http://weather.yahooapis.fr/forecastrss?w=615702&u=c " -o ~/.cache/weather.xml}\
    \
    \
    #---Clock+Date---#
    \
    \
    ${font Raleway:weight=Light :size=100}${alignc}${time %H}${alignc}:${alignc}${time %M}
    ${font Raleway:weight=Light:size=32}${voffset -60}${alignc}${time %A %B %d}\
    \
    \
    #---High Temperatures---#
    \
    \
    ${font Raleway:size=20}\
    ${voffset 76}${goto 40}${execi 300 grep "yweather:condition" ~/.cache/weather.xml | grep -o "temp=\"[^\"]*\"" | grep -o "\"[^\"]*\"" | grep -o "[^\"]*"}°
    ${font Raleway:weight=Light:size=14}\
    ${voffset -28}${goto 160}${execi 300 grep "yweather:forecast" ~/.cache/weather.xml | grep -o "high=\"[^\"]*\"" | grep -o "\"[^\"]*\"" | grep -o "[^\"]*" | awk 'NR==2'}°\
    ${goto 270}${execi 300 grep "yweather:forecast" ~/.cache/weather.xml | grep -o "high=\"[^\"]*\"" | grep -o "\"[^\"]*\"" | grep -o "[^\"]*" | awk 'NR==3'}°\
    ${goto 380}${execi 300 grep "yweather:forecast" ~/.cache/weather.xml | grep -o "high=\"[^\"]*\"" | grep -o "\"[^\"]*\"" | grep -o "[^\"]*" | awk 'NR==4'}°\
    ${goto 490}${execi 300 grep "yweather:forecast" ~/.cache/weather.xml | grep -o "high=\"[^\"]*\"" | grep -o "\"[^\"]*\"" | grep -o "[^\"]*" | awk 'NR==5'}°\
    \
    \
    #---Low Temparatures---#
    \
    \
    ${font Raleway:weight=Light:size=10}\
    ${voffset 48}${goto 210}${execi 300 grep "yweather:forecast" ~/.cache/weather.xml | grep -o "low=\"[^\"]*\"" | grep -o "\"[^\"]*\"" | grep -o "[^\"]*" | awk 'NR==2'}°\
    ${goto 320}${execi 300 grep "yweather:forecast" ~/.cache/weather.xml | grep -o "low=\"[^\"]*\"" | grep -o "\"[^\"]*\"" | grep -o "[^\"]*" | awk 'NR==3'}°\
    ${goto 430}${execi 300 grep "yweather:forecast" ~/.cache/weather.xml | grep -o "low=\"[^\"]*\"" | grep -o "\"[^\"]*\"" | grep -o "[^\"]*" | awk 'NR==4'}°\
    ${goto 540}${execi 300 grep "yweather:forecast" ~/.cache/weather.xml | grep -o "low=\"[^\"]*\"" | grep -o "\"[^\"]*\"" | grep -o "[^\"]*" | awk 'NR==5'}°\
    \
    \
    #---Name of the day---#
    \
    \
    ${font Raleway:weight=Light:size=14}\
    ${voffset 30}${goto 60}${execi 300 grep "yweather:forecast" ~/.cache/weather.xml | grep -o "day=\"[^\"]*\"" | grep -o "\"[^\"]*\"" | grep -o "[^\"]*" | awk 'NR==1' | tr '[a-z]' '[A-Z]'}\
    ${goto 170}${execi 300 grep "yweather:forecast" ~/.cache/weather.xml | grep -o "day=\"[^\"]*\"" | grep -o "\"[^\"]*\"" | grep -o "[^\"]*" | awk 'NR==2' | tr '[a-z]' '[A-Z]'}\
    ${goto 280}${execi 300 grep "yweather:forecast" ~/.cache/weather.xml | grep -o "day=\"[^\"]*\"" | grep -o "\"[^\"]*\"" | grep -o "[^\"]*" | awk 'NR==3' | tr '[a-z]' '[A-Z]'}\
    ${goto 390}${execi 300 grep "yweather:forecast" ~/.cache/weather.xml | grep -o "day=\"[^\"]*\"" | grep -o "\"[^\"]*\"" | grep -o "[^\"]*" | awk 'NR==4' | tr '[a-z]' '[A-Z]'}\
    ${goto 500}${execi 300 grep "yweather:forecast" ~/.cache/weather.xml | grep -o "day=\"[^\"]*\"" | grep -o "\"[^\"]*\"" | grep -o "[^\"]*" | awk 'NR==5' | tr '[a-z]' '[A-Z]'}\
    \
    \
    #---Weather Icons---#
    \
    \
    ${execi 300 cp -f ~/.conky-vision-icons/$(grep "yweather:condition" ~/.cache/weather.xml | grep -o "code=\"[^\"]*\"" | grep -o "\"[^\"]*\"" | grep -o "[^\"]*").png ~/.cache/weather-1.png}${image ~/.cache/weather-1.png -p 61,260 -s 32x32}\
    \
    ${execi 300 cp -f ~/.conky-vision-icons/$(grep "yweather:forecast" ~/.cache/weather.xml | grep -o "code=\"[^\"]*\"" | grep -o "\"[^\"]*\"" | grep -o "[^\"]*" | awk 'NR==2').png ~/.cache/weather-2.png}${image ~/.cache/weather-2.png -p 171,260 -s 32x32}\
    \
    ${execi 300 cp -f ~/.conky-vision-icons/$(grep "yweather:forecast" ~/.cache/weather.xml | grep -o "code=\"[^\"]*\"" | grep -o "\"[^\"]*\"" | grep -o "[^\"]*" | awk 'NR==3').png ~/.cache/weather-3.png}${image ~/.cache/weather-3.png -p 281,260 -s 32x32}\
    \
    ${execi 300 cp -f ~/.conky-vision-icons/$(grep "yweather:forecast" ~/.cache/weather.xml | grep -o "code=\"[^\"]*\"" | grep -o "\"[^\"]*\"" | grep -o "[^\"]*" | awk 'NR==4').png ~/.cache/weather-4.png}${image ~/.cache/weather-4.png -p 391,260 -s 32x32}\
    \
    ${execi 300 cp -f ~/.conky-vision-icons/$(grep "yweather:forecast" ~/.cache/weather.xml | grep -o "code=\"[^\"]*\"" | grep -o "\"[^\"]*\"" | grep -o "[^\"]*" | awk 'NR==5').png ~/.cache/weather-5.png}${image ~/.cache/weather-5.png -p 501,260 -s 32x32}${font}${voffset -46}\

It seems related to the file ~/.cache/weather.xml (more details on that here ).

This file contains lines like:

    
    
    
    
    


I guess, as indicated in a comment, the commands under ---Name of the day---# in .conkyrc are writing and updating the lines in ~/.cache/weather.xml posted above (containing names of days in English).  But as I see those commands just relate to the "yweather:forecast", which might mean that the days in English are written as they are grabbed by curl from the yahoo weather English_US website, and that's why they are in English.

But what intrigues me is that when I've first seen this conky was on [a Spanish site where all was in Spanish](http://entornosgnulinux.com/2014/03/02/conky-vision-en-elementary-os-luna/) . That PPA does not work anymore it seems.

---

**I'm in elementary OS Freya (based on *ubuntu 14.04)**



                                

user32012

Jul 9, 2015, 07:37 PM • Last activity: Apr 25, 2016, 08:05 PM

1 votes

2 answers

6677 views

Where can I find a dictionary file of common words?

natural-language

It's easy to generate a [strong password][1] quickly using the system dictionary: $ for i in {1..4}; do shuf --head-count=1 /usr/share/dict/words; done Amelanchier whitecup ankhs antispasmodics However, this isn't exactly the easiest list of words to remember. Is there a package or file available fo...

                                  It's easy to generate a strong password  quickly using the system dictionary:

    $ for i in {1..4}; do shuf --head-count=1 /usr/share/dict/words; done
    Amelanchier
    whitecup
    ankhs
    antispasmodics

However, this isn't exactly the easiest list of words to remember. Is there a package or file available for getting either the **N most used words** (for example Simplified English) or a list of words either **ordered by popularity** or with a **popularity index** so I can choose how many to use?
                                

l0b0 (53368 rep)

Mar 24, 2016, 01:09 PM • Last activity: Mar 24, 2016, 11:22 PM

0 votes

1 answers

1224 views

How can I translate in the CLI an English word into a German one?

shell-script command-line dictionary natural-language

I want to write a Script that picks a random English word from `/usr/share/dict/words`, translates it into German, display both of them for a certain amount of time and repeat the process. I only know the beginning part and do not know how to use a word to word translation in the shell: watch -n5 sh...

                                  I want to write a Script that picks a random English word from /usr/share/dict/words, translates it into German, display both of them for a certain amount of time and repeat the process. I only know the beginning part and do not know how to use a word to word translation in the shell:

    watch -n5 sh -c 'cat /usr/share/dict/words | shuf -n1 | .....'

Abdul Al Hazred (27610 rep)

Mar 30, 2015, 01:58 PM • Last activity: Mar 30, 2015, 06:17 PM

Showing page 1 of 9 total questions