Sample Header Ad - 728x90

Simple text browser website access BUT PROTECTED by CloudFlare - JavaScript problem

0 votes
0 answers
217 views
**EDIT on 13-11-2022 (DD-MM-YYYY) to clarify things a bit:** I, a human, want to simply read the text contents of a website, which happens to be protected by CloudFlare protection. **Yes**, I know that such a protection is useful in order to prevent spam bots to do any harm. **BUT I AM A HUMAN** who seems to not even being given the chance to prove my humanity. Reading a website with my text browser is all I want, saving some information - like humans could do - would be even better. I do not see anything bad or even illegal in an approach of simply reading text contents of a website, like a civilized human. **Isn't that the reason why websites even offer information in the first place?** Hello stackexchange community! After some hours of research and trying different things while coding ... I now think my best bet would be to ask some Linux and programming pro's like I am going to find on here. So, **my task is actually very simple**. I want to execute a (e.g. batch) **script**, that visits a certain website and **saves the HTML output** to a text file. **Problematic** about the website: It is **protected by CloudFlare**; ***JavaScript*** needed, ***which isn't supported by lynx***). So, I want to develop a simple solution that either uses Java or Linux (e.g. batch) in some ways. It has to be **as lightweight as possible** - and that's where my **headache** seems to start. I encountered a list online on github, which aims to summarize all headless (text) browsers in various programming languages. Most of them, sadly, require the use of around ~20 dependencies, which is - in my humble opinion - not appropriate, nor feasible. Also, throughout my research on StackOverflow I encountered rather similiar problems. Like this solution: https://unix.stackexchange.com/questions/703599/couldnt-download-an-url-using-curl-or-wget-but-it-works-in-browser So, there seems to be a solution using curl and transmitting some startup-parameters, which will then be used to overcome the JavaScript/CloudFlare obstacles. But, I am afraid, I don't seem to be able to get this code to run properly. This also seems to summarize my problem really well, but sadly, there are no useful answers to me: https://unix.stackexchange.com/questions/703730/command-line-tool-to-use-js-enabled-browser-to-save-web-page Could someone please give me a little tip on where to have a look at next? Important about my little project: Lightweight as possible, no human user interaction required! Thank you very much, dear community, for helping me in any way possible! My best regards to you - I am looking forward to hearing from any of you pro's :-)
Asked by Orca37 (1 rep)
Nov 9, 2022, 12:18 PM
Last activity: Nov 13, 2022, 01:30 PM