NB: I want to clarify that although I want to engage in Part1 of the class assignment, I would like nevertheless to sit in the classes to learn more about coding.

Part 1 – Define your goal

  • Specific
    • What are you trying to accomplish?
    • This is the mission statement for your goal.
    • What actions will you take? Don’t be afraid to get very detailed.
    • Do you want to make something new, continue to develop something, or improve a skill?

There are a number of projects I would like to accomplish. I will start first with a crawler/scraper project. As I have advanced quite a bit in the past 1.5 week, more project goals ought to come.

So I would like to create a tool that scrapes some content off a variety of webpages within a given site (and maybe a set of predefined sites) and then finds a number of forwarding links and then crawls based on those links and scrapes again the pages off those new links. The tool then would save the content of the pages onto a database (a combination of link and images urls, and some tags like title, number of views as displayed on the page).

I have just finished developing the scrapping and crawling functions, which constitute the first series of actions. I realize now that having both functions in the same program (js file) is proving to be pretty complex. So my secondary actions would be creating a worker and index function, where the program indexes first and then the worker program scrapes the content. On top of offering more visibility in terms of debugging, it will allow to create a more controlled and asynchronous program. Also I have been creating this program on a Raspberrypi and I noticed if not controlled properly, the Pi heats up and seems to shut off.

I want to create something that is needed and an essential element of a larger project (recreate a Tree of relationships between content, account and authors within the site). Given my fundamental coding experience is limited, I am looking to use something that already exists out there (either on a site or a youtube video or better both) and then adapt to my needs. While very frustrating, time consuming and many bugs to debug, it has worked ok so far.

  • Measurable
    • What metrics are you going to use to determine if you’ve met the goal? 
    • What does success for your goal look like? How much? How well?
    • Do you need to set some milestones by considering specific tasks to accomplish? Milestones are a series of steps along the way that when added up, will result in the completion of your main goal.

Metrics: is it done or not? If not done, how much has been completed?

Success: does the program work in its ability to crawl AND scrape this site? Also how many pages can the program crawl / or how much time has it be running?

Later, we can look at quality: how many pages have been accurately been crawled? (How many skipped?) how many related sites have been crawled?… (We can go in specifics)

Milestones:

  1. Build a worker / index relationship
  2. Make the worker / index relationship work and each function in symbiose
  3. Assess how good the quality of the crawling is? How good the quality of the scraping is?
  4. Assess and implement improvements to both.
  5. Assess which other related sites to include
  • Achievable • Is the goal doable? Your goal is meant to inspire motivation, not discouragement!
    • Determine any related obstacles or requirements to help you decide if your goal is realistic.
    • Do you have the necessary tools, skills, and resources? If not, what would it take to attain them?
    • Do you need to update your expectations? 

Yes, I have seen it work. The only unknown parameter is are my coding skills good enough to meet this challenge.

Tools a lot of practice, time, stackoverflow, medium…. and VSCode of course.

Update expectations yes, depending on success or failure.

  • Relevant • What is the reason for the goal? 
    • How does the goal align with your broader goals? 
    • Why is the result important? 
    • What will it help you do next, in the near or distant future?

We have been trying to understand better some of the content hosted on the website. While the aims of the website are good, it can be used for bad purposes by certain actors. The company behind the site having limited staff capacity have little means to remove the content. This work is successful would serve the purpose of learning and also help atomizing the removal after careful research.

  • Time-bound • What is the time frame for accomplishing your goal?
    • Setting realistic timing improves your chances of succeeding.
    • Providing time constraints also creates a sense of urgency.

I hope to close this in 2-3 weeks tops so I can focus on another tool. No urgency on getting it done but urgency on making the most of the Recode class while also keeping some time to review during Recode class my Thesis project.