Scraping the internet

| Internet | 2 Comments
Alright so I admit it, I have been a naughty boy lately. Desperate to find customers, one often resorts to conniving means, seedy methods bordering on the immoral. Oh well.

You see, I figured why pay one of those companies out there on the Internet tons of bucks for email address lists, or pay them even more to broadcast mailings all over the world when I can do that myself.

I am a pretty smart guy, sometimes. At least that is what I fool myself into believing.

So this is what I did. I wrote a subscription mailing list at GishTeq which allows potential customers to sign up for mailings and newsletters. Later, one may even modify personal settings or opt-out altogether by unsubscribing.

Now the trick is getting people to sign up. Why don't I scan the Internet, collect emails and sign them up myself? When they get the first mailings, they can always unsubscribe.

I created another Perl script which can scan certain sites and scrape off the emails. If possible, it can even log in to subscriber lists and automatically scan these web pages also.

All I do is give the script a site url, start it up and there it goes. Ten minutes later or so I've got my list formatted as CSV so that I can even import it into an Excel sheet for future reference.

You probably won't believe me either if I tell you that with this technique I have successfully harvested more than one thousand warm leads.

Here's a hint (if you know Perl). Use the use LWP::UserAgent module to get the contents of the page and then scan it. See anything interesting, scan deeper via the internal links. Poke around and see if you can find anything interesting. Is there a user id that is passed around? Try all values of that user id from say 1 to 1000 and collect the results.



Sorry, I've been a bad boy. But I want to become famous also.

2 Comments

tisk tisk ... the things people will do!

Yeah, I know. I feel really ashamed about it.

Random entries

Here are some random entries that you might be interested in:

Recent Assets

  • tegen-par-2024-2nd-place.jpeg
  • stanford-reunion.png
  • kiff.png
  • hoid.png
  • Dad-in-front-of-log-cabin.png
  • mistborn-trilogy.png
  • 2024-03-Heren1-27h.png
  • three-body-problem.png
  • 10CC.png
  • minds-and-machines.jpeg
  • puglia.png
  • 2023-09-24-jong-tegen-oud-1.jpg

Recent Comments

Golf Handicap

Information

This personal weblog was started way back on July 21, 2001 which means that it is 7-21-2001 old.

So far this blog contains no less than 2518 entries and as many as 1877 comments.

Important events

Graduated from Stanford 6-5-1979 ago.

Kiffin Rockwell was shot down and killed 9-23-1916 ago.

Believe it or not but I am 10-11-1957 young.

First met Thea in Balestrand, Norway 6-14-1980 ago.

Began well-balanced and healthy life style 1-8-2013 ago.

My father passed away 10-20-2000 ago.

My mother passed away 3-27-2018 ago.

Started Gishtech 04-25-2016 ago.