Negative Lookbehinds - Fortune [possible satire]

GordonFreeman	hi
rindolf	Hi GordonFreeman
GordonFreeman	grep -Po '(?<=<a )(?<! href=)(?<= href=["]*)[^">]+' <<< '<a gfasg href=asdf>'
GordonFreeman	grep: lookbehind assertion is not fixed length
rindolf	GordonFreeman: grep is PCRE - it's not Perl.
rindolf	perlbot: pcre
Altreus	GordonFreeman: don't use regex for HTML
perlbot	rindolf: PCRE is not Perl. It lacks several features of Perl regexes. Don't bother asking for help with a PCRE pattern in a Perl channel as the answers will not be relevant. Try #regex, or the channel for your language. See also http://en.wikipedia.org/wiki/PCRE#Differences_from_Perl and LPBD.
GordonFreeman	but this should work i think.
mauke	no, it shouldn't
GordonFreeman	though it fails at the second lookbehind ...
mauke	no, it doesn't
GordonFreeman	and fails at "* too
GordonFreeman	(grep -Po '<a +.* +href="[^" >]+' \| grep -Po '(?=<a ).' \| grep -Po '(?<= href=)["]*[^" >]+') <<< '<a gfasg href=asdf><a fgfgg="hi> " href="link" >'
GordonFreeman	this works.
mauke	GordonFreeman: dude.
anno	don't paste!
GordonFreeman	hi mauke
apeiron	where's mauke's car?
rindolf	apeiron: :-)
mauke	it's a cdr
Altreus	I watched that the other day
rindolf	pkrumins: what's up?
Altreus	I don't really know why
mauke	GordonFreeman: go to a channel where that is on-topic
GordonFreeman	mauke<< like?
mauke	no idea
Altreus	where on earth is parsing HTML with regexes on topic?
GordonFreeman	ahem ok
Altreus	except ##php lolol
GordonFreeman	well i think one can see its logical and it works like this
rindolf	GordonFreeman: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454
shorten	rindolf's url is at http://xrl.us/bf4jh6
apeiron	GordonFreeman, also, -P isn't perl.
thrig	Altreus: some special level of hell, between the angry ghosts and the hungry ghosts
rindolf	perlbot: html
apeiron	the grep docs lie to you.
perlbot	rindolf: Don't parse or modify html with regular expressions! See one of HTML::Parser's subclasses: HTML::TokeParser, HTML::TokeParser::Simple, HTML::TreeBuilder(::Xpath)?, HTML::TableExtract etc. If your response begins "that's overkill. i only want to..." you are wrong. http://en.wikipedia.org/wiki/Chomsky_hierarchy and http://xrl.us/bf4jh6 for why not to use regex on HTML
LeoNerd	Altreus: Why, surely in #html-parsing-by-regexp
Altreus	if you want perl regex use ack
Altreus	surely
rindolf	LeoNerd: sounds like programmers' hell.
anno	perl regex doesn't support variable-length lookbehind either
Altreus	apeiron: actually it says it's highly experimental and hence not working
Altreus	it could well be Perl and not PCRE when finished :)
Altreus	not that "perl regex" is a defined term, the speed Perl is moving
yrlnry	That's why you should never use Perl's builtin regexes. Just write your own package, it's sure to be more reliable.
rindolf	yrlnry: :-)
talexb	Heh.
LeoNerd	use re::engine::vim;
rindolf	yrlnry++
Altreus	LeoNerd: is it core?
yrlnry	HOP has a nice implementation. It works by generating a list of every string matched by the regex, and looking to see if your target string is in the list.
LeoNerd	I can't help thinking that may not be optimal in terms of CPU or memory usage
talexb	yrlnry, no doubt they have a Cray working on generating the list ..
yrlnry	LeoNerd: Depends; unlike Perl regexes, it has no trouble handling languages higher up the Chomsky hierarchy
yrlnry	It is guaranteed to return the right answer for any recursive language, and guaranteed to return correct 'matched' answers for any recursively enumerable language.
LeoNerd	Oh sure...
LeoNerd	In terms of CS guarantees it's very nice
yrlnry	So if you are in a big hurry to get the wrong answer...
LeoNerd	But I live in the practical pragmatic world
LeoNerd	E.g. Parser::MGC is horribly slow at backtracking and whatnot, but I write parsers in it because those are still fast for "reasonably" sized inputs, parsers are fast to write, and I like having lots of side-effects and dynamic logic -in- Perl
Altreus	Unfortunately my universe doesn't have infinite processing speeds and data storage
anno	a universe with infinite processing speed would have processed you by now
Altreus	and
Altreus	would have processed my grandchildren too
yrlnry	This algorithm doesn' t need infinite speed or storage.
yrlnry	It works slowly, but finitely.
Altreus	what
yrlnry	The infinite list is lazily generated and you never have more than one of its elements in memory at any time.
rindolf	yrlnry: is it sorted by length?
yrlnry	You will learn this sort of technique after you have been programming in Perl for eight months or so.
Altreus	how do you know when it doesn't match
Altreus	yrlnry: :D
yrlnry	rindolf: it is sorted by length, and lexicographically among strings of the same length.
rindolf	yrlnry: ah.
yrlnry	Of course, you cannot do the length-sorting thing for arbitrary languages, but for regex languages there is no trouble.
yrlnry	http://hop.perl.plover.com/book/pdf/06InfiniteStreams.pdf
LeoNerd	Eh..
LeoNerd	I dunno. I just dislike purely RE-based parsing
LeoNerd	I much prefer code doing it
GordonFreeman	why can't perl regexp do variable length lookbehind matching?
Altreus	See originally I ignored you because it sounded like you were talking shit
LeoNerd	Limit of the implementation
Altreus	mainly because it is possible to construct a regex with an infinite range that nevertheless won't match a particular string
anno	GordonFreeman: who knows? looks like it's hard to implement with the given engine
mauke	GordonFreeman: unclear semantics and no one's bothered to write the code
GordonFreeman	i see
Altreus	Plus, there's a fucking lot of Unicode to create strings out of
LeoNerd	It's not "hard" to implement. It's impossible given the algorithm being used
mauke	LeoNerd: why impossible?
yrlnry	LeoNerd: I don't think that's true. It could be done using a recursive call to the regex engine now that that is possible.
GordonFreeman	but lookbehind is cool
LeoNerd	Oooh.. yes.. I suppose it could do that now
GordonFreeman	its like a reverse regexp that can be excluded
anno	vim re's do it
LeoNerd	vim uses a different type of engine
anno	right
yrlnry	Altreus: I was talking shit. After eight months you get a license to do that.
mauke	really?
Altreus	yrlnry: but there's a pdf
yrlnry	where's a PDF?
Altreus	17:10 < yrlnry> http://hop.perl.plover.com/book/pdf/06InfiniteStreams.pdf
yrlnry	Yes.
Altreus	I didn't open it or anything
mauke	no one opens PDFs
yrlnry	PDFs are for cowards and Slavs.
Altreus	but it lent enough credence to your words that I decided to believe your spurious claims
Altreus	Actually someone did a test the other day
yrlnry	Oh, does "talking shit" mean "making up nonsense"? Then I was not talking shit.
Altreus	He linked someone to articles supporting his viewpoint and they changed their mind
yrlnry	It is in section 6.5, "regex string generation".
Altreus	but one of the articles was an argument against himself
Altreus	Showing that it is enough to cite your sources to be believed; not many people will actually bother to check them
Altreus	yrlnry: what do you normally think "talking shit" means?
Altreus	are you confusing it with shooting the shit
yrlnry	I'm not sure.
Altreus	are you foreign
yrlnry	Yes.
Altreus	ok then
mauke	hahaha

Channel	#perl
Network	Freenode
Tagline	Negative Lookbehind Regexes for matching HTML
Published	2011-11-24