I have searched around looking for information on web crawlers and how to make them, but not really any for the purpose I want. I am looking to create a web crawler that searches through a target websites files and will do I guess some kind of web search to look for plagiarised work within other websites.
I am aware that this is probably a broad search as well as knowing that crawlers can take a while to make or can be hard to make, but I am hoping maybe someone could send me in the right direction in creating maybe a simple/complex web crawler that would do a search from one URL and look into other sites to find anything that could have been plagiarised and return the URL and possibly part (if not all) of the text that was taken.
As I said, I know this is probably not an easy task, and I have no knowledge of making a web crawler to begin with. However I am majoring in computer science and do have some programming knowledge, so if anyone could please direct me the right way to maybe a tutorial or good information on how to start/make one that would be awesome.
Thanks for the help
I am aware that this is probably a broad search as well as knowing that crawlers can take a while to make or can be hard to make, but I am hoping maybe someone could send me in the right direction in creating maybe a simple/complex web crawler that would do a search from one URL and look into other sites to find anything that could have been plagiarised and return the URL and possibly part (if not all) of the text that was taken.
As I said, I know this is probably not an easy task, and I have no knowledge of making a web crawler to begin with. However I am majoring in computer science and do have some programming knowledge, so if anyone could please direct me the right way to maybe a tutorial or good information on how to start/make one that would be awesome.
Thanks for the help