Simple text browser website access BUT PROTECTED by CloudFlare - JavaScript problem
0
votes
0
answers
217
views
**EDIT on 13-11-2022 (DD-MM-YYYY) to clarify things a bit:**
I, a human, want to simply read the text contents of a website, which happens to be protected by CloudFlare protection. **Yes**, I know that such a protection is useful in order to prevent spam bots to do any harm.
**BUT I AM A HUMAN** who seems to not even being given the chance to prove my humanity. Reading a website with my text browser is all I want, saving some information - like humans could do - would be even better.
I do not see anything bad or even illegal in an approach of simply reading text contents of a website, like a civilized human. **Isn't that the reason why websites even offer information in the first place?**
Hello stackexchange community!
After some hours of research and trying different things while coding ... I now think my best bet would be to ask some Linux and programming pro's like I am going to find on here.
So, **my task is actually very simple**. I want to execute a (e.g. batch) **script**, that visits a certain website and **saves the HTML output** to a text file.
**Problematic** about the website: It is **protected by CloudFlare**; ***JavaScript*** needed, ***which isn't supported by lynx***).
So, I want to develop a simple solution that either uses Java or Linux (e.g. batch) in some ways. It has to be **as lightweight as possible** - and that's where my **headache** seems to start.
I encountered a list online on github, which aims to summarize all headless (text) browsers in various programming languages. Most of them, sadly, require the use of around ~20 dependencies, which is - in my humble opinion - not appropriate, nor feasible.
Also, throughout my research on StackOverflow I encountered rather similiar problems.
Like this solution: https://unix.stackexchange.com/questions/703599/couldnt-download-an-url-using-curl-or-wget-but-it-works-in-browser
So, there seems to be a solution using curl and transmitting some startup-parameters, which will then be used to overcome the JavaScript/CloudFlare obstacles.
But, I am afraid, I don't seem to be able to get this code to run properly.
This also seems to summarize my problem really well, but sadly, there are no useful answers to me: https://unix.stackexchange.com/questions/703730/command-line-tool-to-use-js-enabled-browser-to-save-web-page
Could someone please give me a little tip on where to have a look at next?
Important about my little project: Lightweight as possible, no human user interaction required!
Thank you very much, dear community, for helping me in any way possible!
My best regards to you - I am looking forward to hearing from any of you pro's :-)
Asked by Orca37
(1 rep)
Nov 9, 2022, 12:18 PM
Last activity: Nov 13, 2022, 01:30 PM
Last activity: Nov 13, 2022, 01:30 PM