ArchiveBlog
Some navigation will not work unless JavaScript is enabled
All Links: Showv | Hide^
Special Requests
Highlighted FAQs
SPACE
FALL 2013 INFO
SUMMER COURSE INFO
HEALTH
FLOOR 411*
- Labs
- Lab Policy
- Audio Help
- Video Help
- Equipment Room
- ER Check In Out Mandatory Session
- Checkout Policy
- Equipment Lists
- Classrooms
- Rooms Schedule
- Room Reservations/Policy
- Facilities
- Wireless Computing
- On-Floor Printing
- Floor Map
- Staging Space
- Physical Computing
- Announcements
- Policy
- Safety
- Project Storage
- Shop
- Safety
- Laser Cutter
- ** New ** Bobst Library Guide for ITP
- Advanced Media Studio
- Materials Connexion
- Fabricators
Help / ArchiveBlog
Help.ArchiveBlog History
Hide minor edits - Show changes to output
Changed lines 36-37 from:
[=nohup wget --mirror -w 2 -p --html-extension --no-parent --convert-links -P /path/to/save/to http://itp.nyu.edu/path/to/blog >msg 2>&1 & =]
to:
'+[=nohup wget --mirror -w 2 -p --html-extension --no-parent --convert-links -P /path/to/save/to http://itp.nyu.edu/path/to/blog >msg 2>&1 & =]+'
Changed lines 37-38 from:
[=nohup wget --mirror -w 2 -p --html-extension --no-parent --convert-links -P /path/to/save/to http://itp.nyu.edu/path/to/blog >out 2>&1 & =]
to:
[=nohup wget --mirror -w 2 -p --html-extension --no-parent --convert-links -P /path/to/save/to http://itp.nyu.edu/path/to/blog >msg 2>&1 & =]
Changed lines 40-41 from:
[=nohup wget --mirror -w 2 -p --html-extension --no-parent --convert-links -P /home/ndl5/public_html/save_myblog http;//itp.nyu.edu/~ndl5/myblog >out 2>&1 & =]
to:
[=nohup wget --mirror -w 2 -p --html-extension --no-parent --convert-links -P /home/ndl5/public_html/save_myblog http;//itp.nyu.edu/~ndl5/myblog >msg 2>&1 & =]
Changed lines 44-53 from:
%blue%nohup%% means no hangup so if you logout or get disconnected it should keep running
%blue%wget%% command line tool for retrieving files using HTTP, HTTPS and FTP
%blue%out%% puts messages to a file called out.
%blue%2>&1%% puts error messages to same file (out)
%blue%&%% at end means run in background, so you can do other things
%blue%wget%% command line tool for retrieving files using HTTP, HTTPS and FTP
%blue%
%blue%2>&1%%
%blue%
to:
%blue%nohup%%: means no hangup so if you logout or get disconnected it should keep running
%blue%wget%%: command line tool for retrieving files using HTTP, HTTPS and FTP
%blue%-mirror -w 2 -p --html-extension --no-parent --convert-links -P /path/to/save/to %%: wget options (see explanation below)
%blue%[=http://itp.nyu.edu/path/to/blog=]%%: full url of top level of the blog you want to convert to html or the site you want to mirror
%blue%msg%%: puts messages to a file called out.
%blue%2>&1%%: puts error messages to same file (out)
%blue%&%%: at end means run in background, so you can do other things
Note: Because it's running in the background, you may need to check on the process from time to time.
* You can type 'tail msg' to see the last 10 lines of the message file
* You can take a look at what files have been created in the directory you're saving to
* You can check that wget is still running or tell it to stop running by follow the instructions on dealing with a [[Help/RunawayProcess | Runaway Process]]
%blue%wget%%: command line tool for retrieving files using HTTP, HTTPS and FTP
%blue%-mirror -w 2 -p --html-extension --no-parent --convert-links -P /path/to/save/to %%: wget options (see explanation below)
%blue%[=http://itp.nyu.edu/path/to/blog=]%%: full url of top level of the blog you want to convert to html or the site you want to mirror
%blue%msg%%: puts messages to a file called out.
%blue%2>&1%%: puts error messages to same file (out)
%blue%&%%: at end means run in background, so you can do other things
Note: Because it's running in the background, you may need to check on the process from time to time.
* You can type 'tail msg' to see the last 10 lines of the message file
* You can take a look at what files have been created in the directory you're saving to
* You can check that wget is still running or tell it to stop running by follow the instructions on dealing with a [[Help/RunawayProcess | Runaway Process]]
Changed lines 10-14 from:
itp.nyu.edu/~netid/blogname/my-postname
itp.nyu.edu/~netid/blogname/123
itp.nyu.edu/~netid/blogname/2007/09/12/my-postname
itp.nyu.edu/~netid/blogname/2007/09/12/123
itp.nyu.edu/~netid/blogname/123
itp.nyu.edu/~netid/blogname/2007/09/12/my-postname
itp.nyu.edu/~netid/blogname/2007/09/12/123
to:
''itp.nyu.edu/~netid/blogname/my-postname''
''itp.nyu.edu/~netid/blogname/123''
''itp.nyu.edu/~netid/blogname/2007/09/12/my-postname''
''itp.nyu.edu/~netid/blogname/2007/09/12/123''
''itp.nyu.edu/~netid/blogname/123''
''itp.nyu.edu/~netid/blogname/2007/09/12/my-postname''
''itp.nyu.edu/~netid/blogname/2007/09/12/123''
Changed lines 17-18 from:
itp.nyu.edu/~netid/blogname/?p=123
to:
''itp.nyu.edu/~netid/blogname/?p=123''
Changed lines 20-23 from:
itp.nyu.edu/~netid/blogname/123.html
itp.nyu.edu/~netid/blogname/2007/09/12/my-postname.html
itp.nyu.edu/~netid/blogname/2007/09/12/my-postname.html
to:
''itp.nyu.edu/~netid/blogname/123.html''
''itp.nyu.edu/~netid/blogname/2007/09/12/my-postname.html''
''itp.nyu.edu/~netid/blogname/2007/09/12/my-postname.html''
Changed lines 25-27 from:
itp.nyu.edu/~netid/blogname/?p=123.html which will cause problems.
to:
''itp.nyu.edu/~netid/blogname/?p=123.html'' which will cause problems.
Changed lines 37-38 from:
nohup wget --mirror -w 2 -p --html-extension --no-parent --convert-links -P /path/to/save/to http://itp.nyu.edu/path/to/blog >out 2>&1 &
to:
[=nohup wget --mirror -w 2 -p --html-extension --no-parent --convert-links -P /path/to/save/to http://itp.nyu.edu/path/to/blog >out 2>&1 & =]
Changed lines 40-42 from:
nohup wget --mirror -w 2 -p --html-extension --no-parent --convert-links -P /home/ndl5/public_html/save_myblog http://itp.nyu.edu/~ndl5/myblog >out 2>&1 &
to:
[=nohup wget --mirror -w 2 -p --html-extension --no-parent --convert-links -P /home/ndl5/public_html/save_myblog http;//itp.nyu.edu/~ndl5/myblog >out 2>&1 & =]
Added lines 46-47:
%blue%wget%% command line tool for retrieving files using HTTP, HTTPS and FTP
Changed lines 54-56 from:
to:
!!Explanation of wget options
Mostly taken from this page:
Mostly taken from this page:
Changed line 1 from:
!I Archiving a blog to html files
to:
!Archiving a blog to html files
Added lines 1-69:
!I Archiving a blog to html files
!! or Mirroring your site for offline browsing
(:linebreaks:)
First, if it's a wordpress blog you're archiving,
you should login into the admin of the blog and go to Options -> Permalink
Make sure the permalink uses %blue%postname%% or %blue%postnum%% but not %blue%?p=postnum%%
So url for posts will be something like:
itp.nyu.edu/~netid/blogname/my-postname
itp.nyu.edu/~netid/blogname/123
itp.nyu.edu/~netid/blogname/2007/09/12/my-postname
itp.nyu.edu/~netid/blogname/2007/09/12/123
and not
itp.nyu.edu/~netid/blogname/?p=123
This way the wget will add an html after the postname or number when it creates the flat file, such as:
itp.nyu.edu/~netid/blogname/123.html
itp.nyu.edu/~netid/blogname/2007/09/12/my-postname.html
Otherwise you'd get files called
itp.nyu.edu/~netid/blogname/?p=123.html which will cause problems.
!!To run wget
*ssh to itp.nyu.edu
*Change directory: cd to public_html and
*make a directory to put your files in: mkdir somename
*Change directory: cd to where you'd like to save your message file
Run this command (it should all be on one line, which may wrap on screen)
nohup wget --mirror -w 2 -p --html-extension --no-parent --convert-links -P /path/to/save/to http://itp.nyu.edu/path/to/blog >out 2>&1 &
i.e.
nohup wget --mirror -w 2 -p --html-extension --no-parent --convert-links -P /home/ndl5/public_html/save_myblog http://itp.nyu.edu/~ndl5/myblog >out 2>&1 &
!!What it all means
%blue%nohup%% means no hangup so if you logout or get disconnected it should keep running
%blue%out%% puts messages to a file called out.
%blue%2>&1%% puts error messages to same file (out)
%blue%&%% at end means run in background, so you can do other things
'''Explanation of wget options''' - mostly taken from this page:
http://www.devarticles.com/c/a/Web-Services/Website-Mirroring-With-wget/1/
%blue%--mirror%%: Specifies to mirror the site. Wget will recursively follow all links on the site and download all necessary files. It will also only get files that have changed since the last mirror, which is handy in that it saves download time.
%blue%-w%%: Tells wget to wait or pause between requests, in this case for 2 seconds. This is not necessary, but is the considerate thing to do. It reduces the frequency of requests to the server, thus keeping the load down. If you are in a hurry to get the mirror done, you may eliminate this option.
%blue%-p%%: Causes wget to get all required elements for the page to load correctly. Apparently, the mirror option does not always guarantee that all images and peripheral files will be downloaded, so I add this for good measure.
%blue%--HTML-extension%%: All files with a non-HTML extension will be converted to have an HTML extension. This will convert any CGI, ASP or PHP generated files to HTML extensions for consistency.
%blue%--convert-links%%: All links are converted so they will work when you browse locally. Otherwise, relative (or absolute) links would not necessarily load the right pages, and style sheets could break as well.
%blue%-P (prefix folder)%%: The resulting tree will be placed in this folder. This is handy for keeping different copies of the same site, or keeping a browsable copy separate from a mirrored copy.
%blue%--no-parent%%: The simplest, and often very useful way of limiting directories is disallowing retrieval of the links that refer to the hierarchy above the beginning directory, i.e. disallowing ascent to the parent of the parent directory.
!! or Mirroring your site for offline browsing
(:linebreaks:)
First, if it's a wordpress blog you're archiving,
you should login into the admin of the blog and go to Options -> Permalink
Make sure the permalink uses %blue%postname%% or %blue%postnum%% but not %blue%?p=postnum%%
So url for posts will be something like:
itp.nyu.edu/~netid/blogname/my-postname
itp.nyu.edu/~netid/blogname/123
itp.nyu.edu/~netid/blogname/2007/09/12/my-postname
itp.nyu.edu/~netid/blogname/2007/09/12/123
and not
itp.nyu.edu/~netid/blogname/?p=123
This way the wget will add an html after the postname or number when it creates the flat file, such as:
itp.nyu.edu/~netid/blogname/123.html
itp.nyu.edu/~netid/blogname/2007/09/12/my-postname.html
Otherwise you'd get files called
itp.nyu.edu/~netid/blogname/?p=123.html which will cause problems.
!!To run wget
*ssh to itp.nyu.edu
*Change directory: cd to public_html and
*make a directory to put your files in: mkdir somename
*Change directory: cd to where you'd like to save your message file
Run this command (it should all be on one line, which may wrap on screen)
nohup wget --mirror -w 2 -p --html-extension --no-parent --convert-links -P /path/to/save/to http://itp.nyu.edu/path/to/blog >out 2>&1 &
i.e.
nohup wget --mirror -w 2 -p --html-extension --no-parent --convert-links -P /home/ndl5/public_html/save_myblog http://itp.nyu.edu/~ndl5/myblog >out 2>&1 &
!!What it all means
%blue%nohup%% means no hangup so if you logout or get disconnected it should keep running
%blue%out%% puts messages to a file called out.
%blue%2>&1%% puts error messages to same file (out)
%blue%&%% at end means run in background, so you can do other things
'''Explanation of wget options''' - mostly taken from this page:
http://www.devarticles.com/c/a/Web-Services/Website-Mirroring-With-wget/1/
%blue%--mirror%%: Specifies to mirror the site. Wget will recursively follow all links on the site and download all necessary files. It will also only get files that have changed since the last mirror, which is handy in that it saves download time.
%blue%-w%%: Tells wget to wait or pause between requests, in this case for 2 seconds. This is not necessary, but is the considerate thing to do. It reduces the frequency of requests to the server, thus keeping the load down. If you are in a hurry to get the mirror done, you may eliminate this option.
%blue%-p%%: Causes wget to get all required elements for the page to load correctly. Apparently, the mirror option does not always guarantee that all images and peripheral files will be downloaded, so I add this for good measure.
%blue%--HTML-extension%%: All files with a non-HTML extension will be converted to have an HTML extension. This will convert any CGI, ASP or PHP generated files to HTML extensions for consistency.
%blue%--convert-links%%: All links are converted so they will work when you browse locally. Otherwise, relative (or absolute) links would not necessarily load the right pages, and style sheets could break as well.
%blue%-P (prefix folder)%%: The resulting tree will be placed in this folder. This is handy for keeping different copies of the same site, or keeping a browsable copy separate from a mirrored copy.
%blue%--no-parent%%: The simplest, and often very useful way of limiting directories is disallowing retrieval of the links that refer to the hierarchy above the beginning directory, i.e. disallowing ascent to the parent of the parent directory.




