Site Search:

How to fix urls after migrating from wordpress to blogger

After migrating from wordpress to blogger (online tool or google code downloaded converting tool), the internal URLs in the blog posts can't be migrated accordingly.
for example:
http://kl2217.wordpress.com/2009/06/09/hello-world/
should be mapped to
http://xyznetwork.blogspot.com/2009/06/hello-world.html

The fixing is easy.
* First export the blog
Settings->other->export
* save a backup
* open the exported xml file with vi,
* replace the url using the following regular expressions. (Here the kl2217.wordpress.com is the wordpress's host, xyznetwork.blogspot.com is the blogger's host):

:%s/kl2217.wordpress.com\(.\d\{4}.\d\{2}.\)\d\{2}.\([^\/]*\)\//xyznetwork.blogspot.com\1\2.html/g

* finally you need to import the modified xml back to blogger: Settings->other->import


my import screwed up, blogger have an import cap for an account, after experiment a few time, I can not import anymore.

============================================
when looked back,  a better way is to link the old url and new url via post name, then use blogger API to change post links case by case.

First inventory all the urls for both wordpress and blogger
wordpress: http://en.support.wordpress.com/archives-shortcode/
[archives]

blogger: https://developers.google.com/blogger/docs/3.0/reference/posts/list#examples

then makes a url to post name maps for wordpress and blogger
======
https://kl2217.wordpress.com/2015/02/16/list-all-posts/
http://xyznetwork.blogspot.com/2015/02/name-url-pairs.html


curl https://kl2217.wordpress.com/2015/02/16/list-all-posts/ | grep "a href=" |grep kl2217|sed s'/&nbsp;/ /g'|sed s'/&#8211/-/g'|sed s'/&gt;/>/g' |sed s'/&lt;/</g' |sed s'/.*<a//g'|sed s'/<.a>.*//g' | grep -v class= | grep -v target=|sed s'/href=.//g'|sed s'/".*>/xxxxx/g'|sed s'/.>/xxxxx/g'|sed s'/&#8212/--/g'| sed s"/&#8217/'/g"|sed s"/&#8216/'/g"|sed s'/&#8230/.../g'|sed s'/&#8220/"/g'|sed s'/&#8221/"/g'|sed 's/.*http/http/g'|sed s'/xxxxx/ xxxxx /g'>wordpress-url-to-name.txt 

https://kl2217.wordpress.com/2016/04/03/wordpress-url/

make blogger-url.txt from http://xyznetwork.blogspot.com/2015/02/name-url-pairs.html

cat blogger-url.txt | sed s'/"url": "//g'|sed s'/",//g'|sed s'/\("title".*\)/\1%/g'|sed s'/"title": "/ xxxxx /g'|tr -d '\n'|tr '%' '\n'|sed s'/.*http/http/g'|sed s'/    xxxxx / xxxxx /g' > blogger-url-to-name.txt

https://kl2217.wordpress.com/2016/04/03/xyznetwork-blogspot-com-url-2/
======

now merge wordpress-url-to-name.txt and blogger-url-to-name.txt together
======

cat blogger-url-to-name.txt | while read line; do var1=$(echo ${line}|cut -d ' ' -f1|sed 's/\//\\\//g'); var2=$(echo ${line}|cut -d ' ' -f3-20); grep "$var2" wordpress-url-to-name.txt | sed s"/$var2/$var1/g"; done;

http://xyznetwork.blogspot.com/2016/04/wordpress-url-to-blogger-url-map.html
======

finally, can use the map generate sed scripts, save the output to a script.txt
======



cat urlmapping.txt | sed s'/https/http/g'|while read line; do var1=$(echo ${line}|cut -d ' ' -f1|sed 's/\//\\\//g');var2=$(echo ${line}|cut -d ' ' -f3|sed 's/\//\\\//g');  echo sed s"/$var1/$var2/g"|sed "s/sed /sed '/g"|sed "s/g$/g'/g"; done;

cat urlmapping.txt | while read line; do var1=$(echo ${line}|cut -d ' ' -f1|sed 's/\//\\\//g');var2=$(echo ${line}|cut -d ' ' -f3|sed 's/\//\\\//g');  echo sed s"/$var1/$var2/g"|sed "s/sed /sed '/g"|sed "s/g$/g'/g"; done;
======

create the sed commands for filtering content
======

cat script.txt | sed 's/sed/|sed/g'|tr -d '\n' >transformer.sh

http://xyznetwork.blogspot.com/2016/04/url-converter.html
======



some useful shell commands 
======
cat urls.txt |xargs -I {} grep `echo {} |sed 's/\(.*\)\(_[0-9]\{1,\}\)\(.html\)/\1\3/g'` home.txt


rm tmp1.txt;rm tmp2.txt;touch tmp2.txt;cat tmp.txt |cut -d '/' -f6|sed 's/^$/\|/g'|tr '\n' ' '|tr '\|' '\n' >tmp1.txt;cat tmp1.txt|while read line;do echo "sed 's/$(echo ${line}|cut -d ' ' -f1)/$(echo ${line}|cut -d ' ' -f2)/g'">>tmp2.txt; done; cat tmp2.txt |tr '\n' '|'
======