Yet another method to grab download-disabled slideshows from SlideShare
Yes, I know. Horrible, horrible subject. The thought of stealing jpgs which are publicly viewable... Oh, well.
Standard disclaimer applies: Teaching someone how to steal a book does not make the teacher guilty of theft. If you get in trouble for following these directions, shame on you, not on me.
So, as a proof-of-concept, I was curious as to what SlideShare does to inhibit downloading of presentations. Apparently, all they do is not provide the (original?) PowerPoint document for download 1 , 2. However, if one examines the source of the page, it is fairly easy to determine the filename of each slide image, and then automate a fetch to grab each one.
Requirements
- Web browser or something to retrieve the source of one of the slideshow's pages (well, since you're reading this, I suppose we have this one covered)
- cURL (look for a version compatible with your OS; start here)
That's it.
Steps
- Open the page containing any slide in the set you want to download.
- View the source of the page (in Mozilla-based browsers, this is usually accomplished with Ctrl-U).
- Search for "og:image" in the source, and copy the url which follows.
- Note the slide count in the lower left of the presentation.
- Open a terminal (command prompt or window session).
- Navigate to where you would like to save the downloaded images.
-
Run the following cURL command:
curl -O http://image.slidesharecdn.com/<name-of-presentation-including-numeric-string>-phpapp02/95/slide-[1-n]-<resolution>.jpg
An illustration
Searching for og:image in the source, we find:
<!-- fb open graph meta tags --> <meta name="fb_app_id" property="fb:app_id" class="fb_og_meta" content="7890123456" /> <meta name="og_type" property="og:type" class="fb_og_meta" content="slideshare:presentation" /> <meta name="og_url" property="og:url" class="fb_og_meta" content="http://www.slideshare.net/somedirectory/some-presentation" /> <meta name="og_image" property="og:image" class="fb_og_meta" content="http://image.slidesharecdn.com/somepresentation-1234567890-phpapp02/95/slide-1-1024.jpg" />
The url specified by og_image is:
http://image.slidesharecdn.com/somepresentation-1234567890-phpapp02/95/slide-1-1024.jpg
Assume that the slide count is 55 (i.e., on the first slide, the lower left indicates "1/55"). Once in the directory where I want to save the images, I simply tell cURL:
curl -O http://image.slidesharecdn.com/somepresentation-1234567890-phpapp02/95/slide-[1-55]-1024.jpg
and cURL will retrieve each jpg in the deck.
How it works
The -O option tells cURL to save the data as the original filename. Without this, cURL will dutifully retrieve a data stream, which is of little use.
The [1-55] tells cURL to successively download the filename, replacing that space (between the dashes in this example) with the subsequent number, e.g.:
curl -O http://image.slidesharecdn.com/somepresentation-1234567890-phpapp02/95/slide-1-1024.jpg curl -O http://image.slidesharecdn.com/somepresentation-1234567890-phpapp02/95/slide-2-1024.jpg curl -O http://image.slidesharecdn.com/somepresentation-1234567890-phpapp02/95/slide-3-1024.jpg [...] curl -O http://image.slidesharecdn.com/somepresentation-1234567890-phpapp02/95/slide-55-1024.jpg
Frustration with wget
My natural inclination was to use wget for this. However, wget does not support globbing for http (no wildcards), and while I could have fed it some regex to specify one url after the other, this is a horribly clumsy way of accomplishing the task.
Apply the concepts
The point of all of this is not to go and rip off every download-disabled presentation on SlideShare, but rather to present a working example of how to use cURL to retrieve sequential filenames via http (or ftp). If you find another good use for this one-liner, please post a comment to let me know.
- Point of fact #1: I don't use PowerPoint, and I absolutely go ballistic when someone emails me one of those disgustingly-huge files which I must then convert to something readable (i.e., pray that it will open in Impress and then allow me to save it to an Impress file - or better, a pdf). ↩
- Point of fact #2: I do not (yet) have an account on SlideShare, which is apparently required to download any presentations from their site. ↩
Related posts:
- A sincere apology to users of my YUM repo mirror No good deed goes unpunished. Setting things in motion...
- Egad! Why do people do their own web development? The average person nowadays brushes his or her own teeth,...
- More cloud disasters I've written before on why I think ubiquitous remote servers...
- Multiple default routes / public gateway IPs under Linux This is one way to solve a particular routing problem,...
- Why should CPAs care about the cloud? Let’s count the ways Why do articles such as this present such a...
Enjoy this article?
Recent Posts
- Novell Client for Windows (32-bit) Internal Error 0x00008993
- Noisy utility company email
- The importance of Common User Access design guidelines in 2018
- Navigating Coinbase’s customer support
- Configuring the IOGEAR GWU627 wireless ethernet bridge device under ArcaOS (and OS/2)
Categories
Support Pages
Posts by Date
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
« Jun | ||||||
1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 |
15 | 16 | 17 | 18 | 19 | 20 | 21 |
22 | 23 | 24 | 25 | 26 | 27 | 28 |
29 | 30 | 31 |
Log In
Email Notifications
RSS Feeds
Recent Comments
- LewisR on Installing Windows Server 2008 R2 x64 on the HP Proliant DL380 G4
- LewisR on Installing Windows Server 2008 R2 x64 on the HP Proliant DL380 G4
- justintd on Installing Windows Server 2008 R2 x64 on the HP Proliant DL380 G4
- LewisR on WP Post to PDF Enhanced
- pdfsc on WP Post to PDF Enhanced
Leave a comment
You must be logged in to post a comment.